How to create a Derived Metric to count log file errors

Quest Solution Architect Brian Wheeldon steps through an example of a derived metric including scoping query and calculation script.

As described in How to create a Derived Metric (to count powered on VMs), derived metrics are calculated on the FMS. They typically augment metrics collected by various agents.

Once calculated, derived metrics are first-class citizens of the Foglight model and can be used in rules, dashboards and reports.

A Foglight Community member wanted to define a derived metric to track alarm occurances, by host

In the post, he said he had configured the LogFilter agent with same match criteria on a number of hosts.

These agents send error matches to the FMS and the FMS triggers associated alarms.

His requirement was to count these error matches, grouped by host, for a specified time interval.

One approach to this problem is to use the AlarmService to count the number of alerts generated.

While it's possible to implement this approach, not every error match identified by the LogFilter agent generates an alarm.

Specifcally, a new alarm will only be triggered if previous alarms have been cleared (SOL75131).

It turns out to be both simpler and more accurate to count the number of matches returned by the LogFilter agents rather than count the number of alarms generated.

So how do we implement this?

In order to define a derived metric, we need:

a scope, or place to store the metric
a policy to determine when the metric is calculated
a calculation to define it

To implement this requirement, the scope is the Host object. But we only care about machines where the LogFilter agent has been deployed, so we can specify a more precise scope:

Host: $object = (LogFilterAgent).monitoredHost

This means "select all Host objects that have LogFilterAgent objects whose monitoredHost property reference that Host"

To create this scoping query, I referred to the $object section of Foglight's Topology query reference.

Next, we need to decide when the metric should be calculated. For our purposes, once every hour is fine, and, so we can get clock-aligned collections, specify "Schedule Driven" with the built-in "Hourly" schedule. To refine this specification, we configure the metric to evaluate at the end of each hour and we "enable trigger without data" so that the metric is calculated even if there are no metric updates to the host.

For the calculation, we need to count the number of matches on the given host over the previous hour.

Looking at the agent model in the Data browser, we see that the LogFilter agent has a LogFilter_ErrorVerbose table which has a property that's confusingly also named LogFilter_ErrorVerbose. The property lists a set of complex observations that describe the error matches.

So we'll do this in two lines. In the first line we'll extract all observations from LogFilter_ErrorVerbose on the given host for the last hour:

result = #LogFilter_ErrorVerbose from LogFilter_ErrorVerbose where monitoredHost = $scope for 1 hour#

This metric query will return a collection of lists; an outer collection for each agent on the host and an inner collection for each matching observation in the agent.

To count them, we collect them into a list, the flatten the list, then count the items:

result.topologyObjects.collect { result.values(it) }.flatten().size()

Note that if your FMS is 5.5.8 or earlier, you'll need to take out the ".flatten()".

Here's the complete calculation:

// metric query returns all LogFilter observations for this host

result = #LogFilter_ErrorVerbose from LogFilter_ErrorVerbose where monitoredHost = $scope for 1 hour#

// collect the observations into a list and count them

result.topologyObjects.collect { result.values(it) }.flatten().size()

To create this derived metric, navigate to the Administration/Data/Managed Derived Metrics dashboards and click "Add Derived Metric"
Next specify the Derived Metric Name "LogFilterErrorCount" and click Add Calculation.
Enter the scoping query above and make sure you validate it by clicking the green check button on the far right.
Next, enter the calculation above in the expression field. There are buttons to test the expression, build the expression and validate the expression on the right.
Make sure that you test the expression and validate it before proceeding.
Next, specify the Trigger Type. I checked "Enable Trigger without Data" to ensure that this metric would always be calculated and available.
Optionally enter a descriptions and click Add.
Specify the Unit for the metric and add a comment, then click Add to create the metric.

Once created, the derived metrics is available for rules, reports and dashboards.

Derived metrics are a great way to enhance the value and capabilities of your Foglight monitoring environment!

Recommended