Houston, do we have a problem? Alerting on gaps in the event log collection

Hi,

While pretty much anyone in the log management business tries to do their best to collect as many events as possible and then do analytics on top of it, sometimes an absence of a log could be a problem

Let's imagine there is a service which may stop generating events, maybe it's a logging server of a valuable enterprise application or even worse - someone hacked your Sysmon service. It appears that service is still running, but no events being generated. I, myself, saw similar behavior with some important corporate applications several times.

Luckily in the Quest InTrust product, we can detect if there are no events for an hour and alert about it.

Inside InTrust Alerts we are using REL query language which contains quite powerful built-in functions. missing is one of them, here is a snippet from InTrust documentation about the function

SYNTAX

DESCRIPTION

boolean missing(

expr condition,

string start_time,

string duration

)

This function returns true if the event that matches expr did not occur during one of the specified time intervals. Arguments:

  1. condition—testing condition

  2. start_time—specified in cron format as whitespace-delimited groups of five comma-delimited fields; the fields are as follows and can contain the asterisk (*) wildcard:

    • minute (0-59),

    • hour (0-23),

    • day of the month (1-31),

    • month of the year (1-12),

    • day of the week (0-6 with 0=Sunday).

  1. duration—duration of the time interval; the format is the same as for select() and previous_lim()

Example 1:

missing(Z.Source="Backup", "0 23 * * 3,6", "1:00");

Specifies that an event from the backup system is expected every Wednesday and Saturday from 11:00 PM until midnight.

Example 2:

missing(true, "0 * * * *", "0:30") or missing(true, "30 * * * *", "0:30");

Specifies that at least one event of any kind is expected in the log once every half-hour.

 

For detecting missing events we could use the second example in the description above or make it even simpler if we allow 1 hour of event generation delay.

In this case, rule matching code will look like this

We can create a rule and specify a datasource for it - a log that should be constantly generating events and if there is an hour of silence - that's an indication of an alert. You can download an example of the script here (It's targeted to InTrust Server log, so make sure you change it to the desired log name)

Specify a computer site which contains agents that should be monitored for absence of events in the specified log and create a Real-Time Monitoring policy connecting site and a rule we've created.

Needless to say, this rule is going to be running on the agent, so we automatically exclude a false positive because of the slow or damaged agent to server connection.

You can get an example of the rule here

Anonymous