Forward Foglight Alarms to a Pager

A customer would like to forward alarms from Foglight with all the appropriate information, such as alarm time, source, severity, message, and the rule name, to a pager script which in turn pages the on-call DBA.

The content of this was written by my colleague Minh Nguyen. I validated the steps on Foglight 5.7.5 in October 2016.

Assumption: the Foglight Management Server (FMS) is running on Linux/Unix

  1. Login to your Foglight console
  2. Go to Dashboards => Administration => Rules & Notifications => Create Rule
  3. A rule wizard will started. Create a rule with the following info:
    1. Simple rule
    2. No scoping query
    3. Event driven
    4. Event Name => AlarmSystemEvent
    5. Give it a description

  1. Click Next. Under the Condition & Actions tab, enter the following codes block in the Condition area:

return @event.get("change/name").equals("Fire")

 

Click on the green icon to verify its syntax. 

  1. Click Next. We will use the default schedule.

  

  1. Click Next. We will define the Action Behaviors to fire action after 1 consecutive evaluation.

  1. Click Next. We will leave Rule Variables blank.

  1. Click Finish. The rule is created and saved.
  2. Now we need to edit the rule to add severity variables and command action. Click Edit Rule.

  1. Go to the Condition & Actions tab => Severity Level Variables to define parameters to pass to the pager script.

  1. We will want to add the following variables:

How to add a severity level variable:

Step 1 – select the Type (in our case, all variables are Expression)

Step 2 – enter a name

Step 3 – enter the code blocks for the variable

Step 4 – add the variable

Repeats steps 1-4 above for the following parameters: (in RED are the code blocks). We recommended using

copy-and-paste to eliminate any typing errors:

  1. Message:

@event.get("message")

 

  1. Rule:

@event.get("ruleName")

 

  1. Severity:

severity = @event.get("severityName");

if(severity.equals("Fatal"))

{

severity = "FATAL";

}

if(severity.equals("Critical"))

{

severity = "CRITICAL";

}

if(severity.equals("Warning"))

{

severity = "WARNING";

}

return severity;

 

  1. Source:

obj = server["TopologyService"].getObject(@event.get("topologyObjectID"));

if (obj.get("monitoredHost")==null) {

return "Unknown";

}

return obj.get("monitoredHost/name");

  1. Time:

@event.get("createdTime")

The variables should look like those below:

  1. Click on the Action tab and add a Command Action.

 

  1. Once it is added, click on the CommandAction link. It should take you to the Command Action properties page.

  1. Click on Default link to change its value. It will pop up a separate Action Parameter Editor page.
  2. Click on the User Defined tab. Enter the command string:

<full path to location of pager script> "@Time" @Severity "@Source" "@Message" "@Rule"

The string essentially runs the script and passes the rule variables created above as parameters to the script (see below).

For scripting purposes, the parameters passed are as follows:

Time=$1

Severity=$2

Source=$3

Message=$4

Rule=$5

  1. Click Change. Click Go to Action List.

  1. Click Save All to save all changes. Click Go to Rule List.

  1. Go to Rule Management and enable the rule.

NOTE: that the customized rule is under the Non-Cartridge drop-down menu.

Anonymous
Parents
  • As far as getting the service name and drive name from specific alarms, you should have that information in the "message" variable. Since this rule fires when any alarm fires, you won't be able to get the service and drive context without narrowing down the rule to those specific ones. You'll be better off looking at the originating alarm (service down, disk utilization) and making sure that those pass the required variable in the alarm message.

Comment
  • As far as getting the service name and drive name from specific alarms, you should have that information in the "message" variable. Since this rule fires when any alarm fires, you won't be able to get the service and drive context without narrowing down the rule to those specific ones. You'll be better off looking at the originating alarm (service down, disk utilization) and making sure that those pass the required variable in the alarm message.

Children
No Data
Related Content