Time to wrap up automation of alarm blackouts in Foglight.
In this post, we looked at using the command line to implement alarm blackouts. We used topology queries to match a host name or pattern for an instance (eg. dbss_instance.name like '%YOW%').
What if we had a list of objects (hosts, instances, etc.) that we wish to blackout? We can use a service definition that we build via the Service Builder dashboard to do that. For a refresher on how to do that, take a look at this post first.
There are two parts to this process. First you need to add components to a service definition. You can add individual instances or create rules to add via a pattern in the name. This can be done over time, so as new instances or hosts are monitored, you can simply add them to the service if they need to follow blackout periods.
In the example below, my service name is "Blackout Hosts". I've added a SQL instance directly (SQLLINUS) and also created a couple rules (the FSMDynamicManagedComponent bits) that look for instances matching a pattern.
Next, we'll run the fglcmd to blackout any SQL instances in that service. You may notice the -query parameter is much different from before.
"DBSS_Instance where $object within^inf (FSMService where name = 'Blackout Hosts')"
Basically, we're looking for DBSS_Instance objects that are at some level within the "Blackout Hosts" service definition. Normally I'd suggest not using the "within^inf" option; instead replacing inf with a number of levels where you'll find the object. But we are only running the fglcmd at a point in time - it's not constantly being queried.
We can then test that the blackout worked. There are typically alarms that will always fire, or that we can at least make fire. The database backup related alarms are good examples. You can create a new database and wait for the "no backup" alarm to fire.
After running the fglcmd to blackout the objects within the service, the alarms do not re-appear.
But wait, there's more!
I noted above that there are 2 parts to this process. Running the fglcmd to implement the blackout for objects within a service applies to the point in time that the command is run. But I also noted that service definitions can be dynamic, where you can create a rule to automatically update the service definition as new instances, hosts, etc. are added. That doesn't mean the blackout automatically applies to those new objects! We need to re-run the fglcmd to enforce the blackout on the newly added objects. In a future post, we'll take a look at how Foglight can help do that.