Maximize Your Monitoring Investments

I visited another large company last week that owns multiple enterprise technology monitors. As in many large IT organizations, the senior IT management is confident that with the right focus, they can make their existing, “mature”, enterprise monitors do just about anything they need them to do from infrastructure monitoring to event correlation to Application Performance Monitoring (APM) to user experience monitoring and more. After all, the IT team has a long history of working with these monitors, so the plan now is to focus on the least expensive way to extend what they have, which oddly enough sounds a bit like “keep doing what you’ve been doing for years and expect a different result”.
 
Over the years, many large IT organizations like this have sunk millions of dollars into these “mature”, enterprise monitors, and meanwhile numerous point monitoring tools have sprouted up in various corners of their IT organization like weeds to fill management gaps, or to satisfy teams that couldn’t wait six months for a formal project to extend one of the “enterprise” monitors (interestingly, I heard this particular justification from a departmental IT team at one of the large, “enterprise” monitor vendors who purchased Foglight instead of rolling out their own competitive solution). So what is the best go-forward plan when your company already owns numerous monitors?
 
  1. Define a simple, overarching monitoring strategy that includes both:
  • Component level infrastructure monitoring for all of your critical technology silo/platform teams
  • Note that you probably have multiple (i.e. 5-10+) existing monitors in this category already in use
  • Comprehensive end-to-end Service and Application Performance Monitoring for NOC and Application Support teams to correlate data across the underlying technology silos with user and business impact data, which:
  • Helps quickly prioritize the component level alerts and data from 1a, and
  • Helps quickly determine the appropriate technology silo team to engage for a given incident
  • Note that you probably do not already have this, regardless of what some of your large platform vendors may tell you
  • Go through each of your existing monitoring solutions one by one, and determine which, if any can help you with point #1 above, keeping in mind the following:
  • It is critical to apply a heavy dose of real-world, practical reality to this assessment so that you don’t end up trying to use a particular monitor for something that few other companies use it for. There is a reason why certain “mature”, enterprise monitoring products that tout capabilities to satisfy requirements for every monitoring discipline are only used in practice for a significant subset of those capabilities.
 
So what are the “mature”, enterprise monitors really good for in practice, when complemented by a comprehensive, modern APM solution such as Foglight (including real user experience monitoring capabilities)? Below is my assessment from many years of real-world engagement:
  • Traditional monitoring frameworks (i.e. IBM Tivoli, HP Openview, CA Spectrum, etc.) commonly serve well as component level infrastructure monitors, but do have gaps for certain modern technology aspects that can be filled in a complementary fashion by other solutions, including Foglight, which specifically fills gaps in this way for things such as:
  • Event Management solutions (i.e. IBM Netcool, EMC Smarts, CA Spectrum, etc.) commonly serve well as:
  • A central rollup point for alerts from all monitors, where rules/logic for notifications, escalations, etc. are centrally defined and managed in a standard way
  • A central, standard place to connect to the Service Desk solution (i.e. BMC Remedy, HP Servicedesk/Perigrine, etc.)
  • An interim, limited, pre-APM way to associate individual component level alerts to an App or Service for basic time-based correlation; this should be replaced on an app by app, service by service basis as more apps and services are configured into the APM solution over time
  • Synthetic Transaction solutions (i.e. HP LoadRunner scripts, Keynote hosted synthetic services, etc.) for User Experience commonly serve well as:

  • An availability/performance heartbeat for apps/services, to provide early warning about availability and performance issues when no real users are online
  • A consistent, repeatable measurement for select, critical, multi-step transactions (real user performance measurements for multi-step transactions can vary widely due to think time, etc, which makes it difficult to rely on real-user experience performance measurements alone for multi-step transaction timings)
 
The companies I see achieve the greatest level of success define a solid overall monitoring strategy that encompasses both the new “standard” capabilities such as APM and User Experience Monitoring, along with the now fairly commodity, historical “standard” capabilities, and then methodically evaluate what they already own against that plan. Of course, long standing, entrenched processes and individual experience with particular monitors can be very real factors as well that sometimes bring an almost fanatical aspect to the process. Often a blend of new and old monitoring technology is the best answer to leverage what you already own for what it is really good for, and then complement that with modern capabilities needed to evolve to the next level of visibility.
 
I’m curious to hear if anyone has similar experiences with unrealistic expectations around any of the “mature” monitors, or if you have successfully augmented them with an overarching strategy, with clear definitions of which teams use which interface.
About the Author