Summary
This is one of those special Enterprise Monitoring use cases that I mentioned in my last blog post; The Not So Obvious Enterprise System Monitoring Essentials. While this is not the only way to do Foglight alert integration, it is what we felt was the best way forward for this special use case. The case illustrates the need for flexibility when it comes to monitoring an Enterprise. These are the types of cases that get overlooked when considering an Enterprise System Monitor.
In large Enterprises there are often multiple monitoring systems and many ways to create events that require some IT attention. Most of these Enterprises have system(s) that receive alerts from monitors like Foglight, and do something outside of Foglight to deal with those alerts. Last month I ran into a use case where we needed to forward all events from a specific set of systems through a command line script. The final outcome is that we were able to forward all critical and fatal alerts through a command line script with one new Foglight rule. The steps we used are described below.
Step 1 – Create a service and include all of the systems in Scope
In this case we had a large Enterprise that centralized monitoring which meant they had multiple groups, running multiple apps, and had to deal with multiple ways of handling alerts. By segmenting the systems for the group that wanted alerts forwarded through the command line we could control the scope of the integration. Additionally we could add new systems to the defined service and alerts would automatically get handled. In this example I’ll use the service called “PSO Test” as show below in the Foglight Service Builder.
Step 2 – Define a time driven rule
We are going to use a time drive rule to check if there are new alarms in our target service and run a server side script if there are new alarms. In this case we’ll scope the rule to our “PSO Test” service so that it only forwards alarms from items contained in that service, as shown below.
Step 3 – Create a rule action script and test it
Next we’ll need to create groovy script that extracts information from each alarm and passes that information to our script. This is the script I used. I made a few note on it so you can understand each of the sections.
// This will tell us the last time that rule ran // So that we don't forwad the same alarms over again. lastFireTime = @rulette_data["last_true"] @rulette_data["last_true"] = System.currentTimeMillis() // Use the last 5 minutes when the rulette starts for the fist time if (lastFireTime == null) lastFireTime = @rulette_data["last_true"] - (5 * 60 * 60 * 1000) // This is a debug setting that we can use to override lastFireTime to catch old alerts //lastFireTime = System.currentTimeMillis() - (1 * 60 * 60 * 1000) aS = server.AlarmService tS = server.TopologyService subject = "" msgbody = "" outmsg = "" // Define a static array of Foglight Alarm Severities fglSev = [:] fglSev[-1] = "UNDEFINED" fglSev[0] = "NORMAL" fglSev[1] = "FIRE" fglSev[2] = "WARNING" fglSev[3] = "CRITICAL" fglSev[4] = "FATAL" // Loop through each alarm forthe service scope.aggregateAlarms.each() { a = aS.getAlarm(it.alarmId) // -1: UNDEFINED, 0: NORMAL, 1: FIRE, 2: WARNING, 3: CRITICAL, 4: FATAL // Check if we should take action based on the last fire time if (a.getCreatedTime().getTime() > lastFireTime) { // Create the message body and subject to pass to our script msgbody = " ** " + fglSev[a.getSeverity()] + " ** " + it.message + "\n" subject = scope.name + " - " + it.sourceName // This is your command line it will echo during testing, comment out after test outmsg += "[my commandline script] " + msgbody + " " + subject // Uncomment this when you get the script setup. // Note: You have to pass in the commandline as a list, not individual values // Change the line starting with outmsg below and uncomment to run. // It is the command line that will be called. // Note this runs on the Foglight Management Server //outmsg = "[full path to Script] -body msgbody -subject subject".execute().text } } return outmsg
This is the output from the script above:
Step 4 – Schedule your rule to run
Now that I have the script working I can run it. The first thing you have to do is comment out the line that reads:
outmsg += "\n\n[my commandline script] " + msgbody + " ***** " + subject
That line is only there to prove the output works. If you don’t get any output you may need to create some alerts to test it. If there are old alerts you can use line #13 to search back in time.
Next you need to create you command line call by filling in the comment. For my test case I’ll just use the echo command.
Original from test script above:
//outmsg = "[full path to Script] -body msgbody -subject subject".execute().text
Changed to:
outmsg = [‘echo’, msgbody, subject].execute().text
Final Result of test:
Finally past your script in the fire condition:
That should do it!
This is the final script that I ran:
// This will tell us the last time that rule ran // So that we don't forwad the same alarms over again. lastFireTime = @rulette_data["last_true"] @rulette_data["last_true"] = System.currentTimeMillis() // Use the last 5 minutes when the rulette starts for the fist time if (lastFireTime == null) lastFireTime = @rulette_data["last_true"] - (5 * 60 * 60 * 1000) // This is a debug setting that we can use to override lastFireTime to catch old alerts //lastFireTime = System.currentTimeMillis() - (1 * 60 * 60 * 1000) aS = server.AlarmService tS = server.TopologyService subject = "" msgbody = "" outmsg = "" // Define a static array of Foglight Alarm Severities fglSev = [:] fglSev[-1] = "UNDEFINED" fglSev[0] = "NORMAL" fglSev[1] = "FIRE" fglSev[2] = "WARNING" fglSev[3] = "CRITICAL" fglSev[4] = "FATAL" // Loop through each alarm forthe service scope.aggregateAlarms.each() { a = aS.getAlarm(it.alarmId) // -1: UNDEFINED, 0: NORMAL, 1: FIRE, 2: WARNING, 3: CRITICAL, 4: FATAL // Check if we should take action based on the last fire time if (a.getCreatedTime().getTime() > lastFireTime) { // Create the message body and subject to pass to our script msgbody = " ** " + fglSev[a.getSeverity()] + " ** " + it.message + "\n" subject = scope.name + " - " + it.sourceName // This is your command line it will echo during testing, comment out after test // outmsg += "\n\n[my commandline script] " + msgbody + " ***** " + subject // Uncomment this when you get the script setup. // Note: You have to pass in the commandline as a list, not individual values // Change the line starting with outmsg below and uncomment to run. // It is the command line that will be called. // Note this runs on the Foglight Management Server outmsg = outmsg = ['echo', msgbody, subject].execute().text } } return outmsg