By Geoff Vona
In my last blog I introduced Script Agents. Script Agents are a great way to get data into Foglight quickly. Foglight 5 extends the script agent data format in some interesting ways. In this blog, we're going to look at data modeling, units and reserved words.
Script Agent Field Syntax
In a Script Agent, data is sent to the server by specifying a series of field=value pairs. The syntax is shown below:
field[.type[.{id|obs}]][:unit]=value
As we work through this blog we'll learn why and how to use all elements of this syntax.
How Script Agent Data is Represented in Foglight 5
In Foglight 4, data modeling was uniform for all collections. Data was gathered by agents and organized into tables. The tables were attached to an agent instance. The agents were attached to host instances.
Foglight 5 allows for many different kinds of data models, including the Foglight 4 model. To make transition easier, Script Agents make use of the Foglight 4 data model. This means that the data in a Script Agent is gathered by an agent, organized into tables attached to the agent, and the agent is attached to a host. This model is visualized below: it applies to Foglight 4 and Foglight 5:
This means that Script Agent data models look the same up to the agent. There is a Host, and it contains an Agent. Pretty straight forward, right?
The differences between Foglight 4 and Foglight 5 are visible at the table level. A Foglight 4 table is like a database table. Each new collection is a new row in the table. A Foglight 5 table is actually a data object. The columns in the collection are turned into properties, metrics or observations, depending on information provided by the script agent.
This is a significant new bit of flexibility, but a bit tricky to fully understand. Let's consider a script agent that collects host data.
Foglight 4 Script Agent | Foglight 5 Script Agent |
---|---|
@echo off |
@echo off |
In Foglight 4, you get a table that looks like this:
Host | CPU | Memory | Disk |
---|---|---|---|
tor017820 | 90 | 55 | 20 |
tor017899 | 78 | 35 | 43 |
The problem with this table is that you get an entry per host. You have to do work to tease apart the different instances.
In Foglight 5, you get two objects of type ExpandedHost. The Host property is the unique identifier for the object - a new object will be generated for each new Host entry in the script agent. Each object has CPU, Memory and Disk metrics attached.
Each metric is a set of time series data. This gives much more flexibility, at the cost of a bit of up front complexity. You can find each unique ExpandedHost instance easily. For each instance, you can query the metrics in any number of ways, pulling out average, min, max, standard deviation, current value for any time range. This is powerful and new.
Controlling the Data Model in Script Agents
The Script Agent data model is built based on a set of assumptions. These assumptions are implemented by a transformation definition that is embedded in each Script Agent .car file.
The rules:
- By default, every field=value pair is assumed to be a metric of type double
- If the field includes a type definition (field.type), then the field is assumed to be a property of the specified type
- If the field includes a type definition and the obs suffix (field.type.obs), then the field is converted to an observation.
This is pretty straight forward. But what types are available? And what is an "observation"?
In theory, all types are available. However, it must be possible to create the type by assigning the value in the script agent. This means the real type set is limited to the simple Java-based types. These types can be reviewed by looking at the top of the topology-types.xml file in FGLHOME/config.
String, Long, Integer, Number, Double, Float, Boolean
An observation is a way of storing something that might change every collection. Normally this kind of data is stored as a metric. However, specifying a type will override this conversion to a time-series metric and turn it into a property. (Remember Rule 2?). Why does this matter? It matters because every property change triggers a topology change event. A topology change event should be a rare occurrence. Each change event will impact the model by forcing it to rebind rules, derived metrics and other monitoring policies. This can be disruptive if there are too many changes. Beyond that, it doesn't make sense to store a changing value as a property, as none of the historical values are available.
In the last blog we worked through an example with three string values. It is worth repeating the results. The three string values were the host name, IP address and host state:
Question | Host | IP Address | State |
---|---|---|---|
Identity | Yes | No | No |
Changes Frequently | No | No | Yes |
This lead to the following script agent entries:
echo Host.String.id = tor017820
echo IPAddress.String = 10.4.22.10
echo State.StringObservation.obs = Up
Here's the final reasoning:
Field | Collection Behaviour | Script Agent Syntax | Description | Data Model Behaviour |
---|---|---|---|---|
Host | Never changes, defines identity for the collection | Host.String.id | Identity property of type String | An identity property called "Host" is added to the object. For each new value, a new object is created. |
IP Address | Might change occasionally in some cases | IPAddress.String | Property of type String | A property called "IPAddress" is added to the object, and a value is stored. If the value changes, a topology change event occurs |
State | Might change every collection | State.StringObservation.obs | String observation of type String | Like a metric - one value is stored per collection |
Setting Units
By default, any metric that comes into the system will not have a unit assigned. This means any numeric value will have the unit "count". This quite often doesn't matter early on in your agent development - you get get going pretty quickly without worrying about units. But without units, you'll see things like this:
This is a CPU %, but the unit is count. This just looks wrong.
In a Script Agent, it is possible to set the unit using the last part of the field syntax:
field[.type[.{id|obs}]][:unit]=value
The possible units are listed below:
Scale | Memory/Disk | Time | Math | Default |
---|---|---|---|---|
billion, billionth, million, millionth, thousand, thousandth, trillion, trillionth | bit, byte, exabyte, gigabyte, kilobyte, megabyte, petabyte, terabyte | day, hour, microsecond, millisecond, minute, month, nanosecond, second, year | percent | count |
What's great about these units is that they can be combined to make rates - as long as the rates make sense. So for example, a disk I/O rate can be assigned a unit of megabyte/minute like this:
DiskIO:megabyte/minute
To pull it all together, I've modified the earlier example script to include untis for CPU, Memory and Disk. I've also added a DiskIO metric that has a compound unit.
@echo off @if not "%ECHO%"==" " echo %ECHO% @if not "%OS%"=="Windows_NT" goto EXIT echo TABLE HostWithUnits echo START_SAMPLE_PERIOD echo Host.String.id = tor017820 echo CPU:percent = 90 echo Memory:megabyte = 55 echo Disk:gigabyte = 20 echo DiskIO:megabyte/minute = 13 echo IPAddress.String = 10.4.22.10 echo State.StringObservation.obs = Up echo END_SAMPLE_PERIOD echo START_SAMPLE_PERIOD echo Host.String.id = tor017899 echo CPU:percent = 78 echo Memory:megabyte = 35 echo Disk:gigabyte = 43 echo DiskIO:megabyte/minute = 17 echo IPAddress.String = 10.4.21.14 echo State.StringObservation.obs = Down echo END_SAMPLE_PERIOD echo END_TABLE |
The net result is shown below. Each of the curve has a proper unit in the legend.
It is highly recommended that you put units in all your metrics. It makes your gathered data much more readable in the user interface. All sorts of good things happen when a metric has units.
Reserved Words
Not all field names are available. Many field names are reserved. The reason for this is that script agents create objects for each TABLE entry. These objects extend a type called F4Table. This type already has properties defined. You are not allowed to replace those properties with new ones- it will cause problems with how models hold together.
It is possible to figure out exactly what property names are reserved by looking at the type definitions for F4Table and its parent classes. The type hierarchy looks like this:
TopologyObject -> F4Table
The full set of reserved names can be observed by looking at an instance of a script agent in the data browser:
However, this is a bit tricky as of 5.5. The property names that appear are English readable, and are no longer the exact property name. This makes it hard in some cases to figure out what names are reserved. Below is a full list of the reserved names, based on an inspection of the types. I've hilighed the ones that cause the most trouble in the field.
Object IDs | Object Identity | Object Management | Alarms | Type and Size | Host and Agent | F4Table |
---|---|---|---|---|---|---|
objectID, id, version, topologyObjectId, topologyObjectVersionId, topologyObjectVersion effectiveStartDate, effectiveEndDate, lastUpdated |
name, longName | scheduleIds, isBlackedOut, annotations, parents | alarms, aggregateAlarms, localState, aggregateState, localStateSeverity, aggregateStateSeverity, aggregateAlarmState, alarmWarningCount, alarmCriticalCount, alarmFatalCount, alarmTotalCount, alarmAggregateWarningCount, alarmAggregateCriticalCount, alarmAggregateFatalCount,alarmAggregateTotalCount changeSummary, changeCount, aggregateChangeCount |
topologyTypeName, topologyObjectSize | monitoredHost, monitoringAgent | agent |
Here's a sample agent that includes fields that conflict with reserved names:
@echo off
@if not "%ECHO%"==" "
echo %ECHO%
@if not "%OS%"=="Windows_NT" goto EXIT
echo TABLE BadFields
echo START_SAMPLE_PERIOD
echo Host.String.id = tor017820
echo CPU:percent = 90
echo id=3
echo localState.String.obs=Up
echo version.String=3.2
echo name.String=Monkey
echo END_SAMPLE_PERIOD
echo END_TABLE
The failure mode is shown below. Note that in this case, a reserved word conflict is fatal on the first instance. We only see an error for id because it is the first field name. The remaining reserved words don't show up in the log because the agent fails.
2009-08-27 22:00:54.159 WARN The badfields_Agent_Table_BadFields.id property exists with a type of Long which is incompatible
with the requested type of Metric. 2009-08-27 22:00:54.159
ERROR An unexpected error occurred which may cause undesired behavior. You may
want to contact Quest Software customer support if you see this error again:
Canonical data transform failed: java.lang. IllegalStateException: Error processing incoming data for node path: [FglAM::
badfields-1_0_0/CDT-1_0_0/topology-adapter.xml]/SPI:SPI/badfields:*/BadFields :*/row:row/id: java.lang.RuntimeException:
Existing property is incompatible.