You’ve all been here before: how do you get enough data to understand what’s going on in your Siebel application and perform Root Cause Analyses (RCAs), while balancing the potential impact of higher log levels on performance and disk space.
At Germain, we’ve had this conversation hundreds of times with Siebel customers when implementing our APM solution. The problem is simple and can be summarized by the expression “I want to have my cake and eat it too”; you cannot get the benefits of higher log levels with NO impact whatsoever on your production environment. The good news, however, is that it is possible to achieve barely noticeable or unnoticeable performance degradation while delivering enough information to your APM tool to provide both proactive and reactive monitoring and deep RCA capability that will improve stability and performance while reducing the risk of incidents.
Here are our 5 golden rules for balancing Siebel log levels for efficient Siebel Application Performance Monitoring:
- Take advantage of Siebel’s granularity when setting up log levels. There are 185 Siebel Log Parameters you can adjust. Don’t increase all these 185 log parameters to 4 or 5; take advantage of the granularity and be specific about what individual log parameter to increase, and to what value. With the Germain UX tool, we usually get to what we consider a normal monitoring level with a small increase of 3 log parameters out of 185. We can achieve deep monitoring with just a small increase in value of 10 or so log parameters. In our experience, that means little to no measurable impact whatsoever on production.
- Don’t let the log files get in the way. We often hear that the problem is not the performance impact of the log level increase, but that the number and size of log files becomes unmanageable. That should be easy to fix. You can create your own archiving and purging strategy to keep the log files under control on your Siebel servers. Better, you can let Germain UX software archive and purge log files as their content has been analyzed and extracted. You can then “set it and forget it”, knowing that log files won’t clutter your disks.
- Log levels are not static. Define your base level of logs for regular production operations. However, these should be temporarily overridden when dealing with a major issue. You can then turn the log levels back to their “normal” settings and leverage the additional information from the deeper log level to complete your Root Cause Analysis. And to do this easily, use the “Set Log Level” functionality in our APM tool; a single button click will change Siebel Log Levels from “normal” to “deep” with a single button click, and vice versa. Note that this functionality is reserved only for Siebel administrators, so you are not going to be opening a security hole…
- Safety is key. If you are worried that disk space may run out due to unexpected Siebel log file size, enable the Germain UX safety mechanism that will automatically lower the Siebel Log Level if the Siebel log file size exceeds an acceptable level.
- Stay away from SARM, but use it efficiently when necessary. We never recommend turning on SARM on production servers. However, there are issues like slow response time and memory leaks that will be so much easier and faster to resolve if you can use SARM information. The best way to deal with this is to turn on SARM for a short period of time (say 15 minutes) in production on a subset of your Siebel Object Managers, on a subset of your Siebel Application Servers. This should give you just enough information to be able to quickly get to the bottom of the root cause of your problem if it happens frequently. If not, repeat or leave SARM on for a little bit longer. If you are frustrated by how to interpret SARM data, consider an APM solution like Germain UX that will automatically analyze and interpret SARM information and display only the relevant information as part of your Root Cause Analysis process for any given transaction.
We hope this will help get out of the “we can’t increase log levels” road blocks that development and operation team often face from production support team. Bottom line, it’s not an “all or nothing” decision, and with a solid strategy it is easy to find the balance to achieve your APM goals while keeping production smooth.
What are your golden rules? Please comment to share your experience!