By Geoff Vona
Tweet me @GeoffVona
- How Script Agents Work
- Script Agent Format
- Script Agent Example
- Script Agent Pitfalls: String Data
- An Expanded Host Example
- Errors You Might See
- Conclusions
My last blog on Foglight 4 transition got me thinking about script agents. This blog will cover how the script agent format from Foglight 4 is supported in Foglight 5, and how we've extended that format to do some interesting new things.
How Script Agents Work
The Foglight Administrator's Guide has a good overview of the mechanics of script agents. Specifically Administration and Configuration Guide > Working with Foglight Tooling > Building Script Agents has the details. I highly recommend you refer to that section of the documentation when starting out with script agents. This blog isn't going to reprint those details - I'm more interested in focusing on why and outlining pitfalls.
Script agents work by running a prescribed script and processing the output. The actual agent is called JCollector. This agent runs the script, parses the output, and sends the resulting data table samples to Foglight.
There are two ways to make a script run:
- Let JCollector call the script on the sampling interval. (Type 1)
- Allow JCollector to call the script once. The script will then control the sampling interval. (Type 2)
Type 2 scripts are more complex because the script author must handle looping and honour the sampling interval from the server. This might be necessary if the length of the loop is important for doing things like calculating rates. For the purposes of getting started, use Type 1. This will minimize the complexity. Switch to Type 2 once you have a reason for hand-coding the loop.
As mentioned before, JCollector runs the script and parses the output sent to STDOUT. It then sends that output back to the server in the form of tables. There is a special CDT (Canonical Data Transformation) that knows how to interpret the data that is sent. That CDT knows how to deal with the table elements, mostly by parsing the field/column names. More on that later.
The standard model for a script agent is an agent containment model. An agent containment model means that the tables you send from your script agent are contained in an instance of your agent, and that agent is contained inside a host. This is usually good enough for getting started.
Script Agent Format
The output format is easy to understand:
TABLE TableName
START_SAMPLE_PERIOD
Field = Value
END_SAMPLE_PERIOD
END_TABLE
That's it. The TABLE directive tells Collector to start a new table of data with the specified name. A single script agent can emit multiple tables. START_SAMPLE_PERIOD and END_SAMPLE_PERIOD allow you to insert rows into that table. One or more rows are allowed. The Field = Value entry specifies the name of a table column and its value.
Host | CPU | Memory | Disk |
---|---|---|---|
tor017820 | 90 | 55 | 20 |
tor017899 | 78 | 35 | 43 |
...then the script agent results should look like this:
TABLE HostData
START_SAMPLE_PERIOD
Host = tor017820
CPU = 90
Memory = 55
Disk = 20
END_SAMPLE_PERIOD
START_SAMPLE_PERIOD
Host = tor017899
CPU = 78
Memory = 35
Disk = 4
END_SAMPLE_PERIOD
END_TABLE
This simple example demonstrates how data tables are translated into script agent format. However, it won't work because of the first column. We'll see why later in this article - it is a classic pitfall in working with script agents in Foglight 5, and deserves some airtime.
Script Agent Example
Here's a basic script agent example from Foglight 4 that we can run in Foglight 5.
@echo off
@if not "%ECHO%"==" " echo %ECHO%
@if not "%OS%"=="Windows_NT" goto EXIT
echo TABLE NT
echo START_SAMPLE_PERIOD
echo Field1 = 10
echo Field2 = 20
echo END_SAMPLE_PERIOD
echo END_TABLE
What's great about Foglight 5 is that we can simply upload that script into the server. The server will process the script and create a cartridge around it. The cartridge will contain the script, the Collector agent, and the CDT to process the data.
Here are the steps I like to use to get an agent running:
- Administration->Script Agent Builder to build the agent
- Agent Status -> Deploy Agent Package to push the agent to a particular host. In 5.5 the local agent manager running on the FMS is great for this kind of testing.
- Agent Status -> Create Agent to start the agent
- Once the agent is started, you have to activate it on the Agent Status page
- After a minute (default sampling frequency), you should be able to verify that the agent is working on the Agents page. I apply the With Agents filter to see hosts with agents. Eventually your host shows up with your agent name.
- If it doesn't show up, then go back to Agent Status and run Get Log for the agent to see what errors have occurred
- To verify the data, select the agent on the Agents page and select the Data option. This will allow us to view the raw data. I like to use the Metric Analyzer view, because it tells me the value and type of the metrics.
That's it! You should try this with the sample script shown above. The result should be something like this:
Note that the script that I uploaded was called type1.bat. That's why the agent name is type1.bat. Foglight names the cartridge and agent after the script you upload. So, for obvious reasons, choose your script names wisely.
Choose the Data option to see the data
As the image above shows, you're getting data into Foglight. Remember that this data is in a table inside the agent instance. If you want more of these, you need to create new agent instances.
Important point: iteration is built in to script agents. If you want to add a field to your script, make the changes, then run through the steps above. The version # will be automatically incremented. Note that you have to redeploy the agent package to the hosts, but you don't have to recreate the agent. The agent will automatically run the new script after upgrade. Try this out!
Script Agent Pitfalls: String Data
Remember the host data example from earlier in this article?
Host | CPU | Memory | Disk |
---|---|---|---|
tor017820 | 90 | 55 | 20 |
tor017899 | 78 | 35 | 43 |
To push this data into the server, we need a script that looks like this:
@echo off
@if not "%ECHO%"==" " echo %ECHO%
@if not "%OS%"=="Windows_NT" goto EXIT
echo TABLE HostData
echo START_SAMPLE_PERIOD
echo Host = tor017820
echo CPU = 90 echo Memory = 55
echo Disk = 20
echo END_SAMPLE_PERIOD
echo START_SAMPLE_PERIOD
echo Host = tor017899
echo CPU = 78
echo Memory = 35
echo Disk = 43
echo END_SAMPLE_PERIOD
echo END_TABLE
I put mine in a script called host.bat, created a script agent, deployed, created an agent, and waited. My agent showed up and collected data. Here's the result:
But wait a minute - where's the host name? I am missing data. In older versions of Foglight, the script agent may have failed outright. But why?
For this problem, I like to use the Log Analyzer under Administration. The pattern to look for with script agent errors is anything related to TopologyAdapter. Here's what I found:
Value = tor017820. Node path = [FglAM::host-1_0_0/CDT-1_0_0/topology-adapter.xml]/SPI:SPI/host:*/HostData:*/row:row/Host java.lang.NumberFormatException: For input string: "tor017820"
2009-07-21 07:45:41.187 ERROR [Data-3-thread-13216] com.quest.nitro.service.agent.TopologyAdapter - The value provided for the metric could not be converted to a double.
Value = tor017899. Node path = [FglAM::host-1_0_0/CDT-1_0_0/topology-adapter.xml]/SPI:SPI/host:*/HostData:*/row:row/Host java.lang.NumberFormatException: For input string: "tor017899"
What does this mean in plain language? It looks like the server is trying to convert the string host names I provided for the Host field into doubles. Why? By default, Foglight assumes all script agent entries are numeric time series data. In other words, Foglight is trying to convert these values to numbers. It isn't working, so the values are getting tossed.
This is different from Foglight 4. In Foglight 4, the values weren't typed at all. They got put into a table, and the typing occurred when a view was created to look at the data. In other words, Foglight 4 treated all values as strings. Foglight 5 wants to convert the values to something meaningful right away so the data can be rolled up and managed properly. To do that, we assumed that the vast majority of data would be metric data. (It is).
What this means is you need to do something special with your strings. The first thing we need to do is mark it as a String like this:
Host.String = tor017820
But there are actually a couple of options. What you choose will depend on what you want to accomplish. Here are the key questions:
Does the field uniquely identify the row of data? If it is, then we should mark it as an identity field. An identity field causes a new object instance to be created.
Does the field change frequently? If a string changes frequently, then it should be marked as an observation. That way Foglight will store a new value every sample, and won't track the changes. If a string changes infrequently, then it can be a property. A property has one value stored, and changes are tracked. To determine if something changes frequently, ask the question Could this change each sample period in a practical case?
This is tricky. To understand it, we need to expand our example.
An Expanded Host Example
Suppose we're actually gathering the following data about a host:
Host | CPU | Memory | Disk | IP Address | State |
---|---|---|---|---|---|
tor017820 | 90 | 55 | 20 | 10.4.22.10 | Up |
tor017899 | 78 | 35 | 43 | 10.4.21.14 | Down |
It looks like we have three string values: Host, IP Address, and State. Let's apply the questions to each of the entries:
Question | Host | IP Address | State |
---|---|---|---|
Identity | Yes | No | No |
Changes Frequently | No | No | Yes |
It is clear that Host is the name of a host, and therefore defines its identity. We want to see a new instance of the data for each unique value of Host.
IP Address, on the other hand, is not an identity property. It is unlikely to change with each sample frequency. In most environments, IP Addresses are leased long term. Marking IP Address as a property but not an observation makes sense.
Finally, the State field is not an identity property. However, it is possible that it could change from sample period to sample period. A host may not go down often, but when it does go down you want to know when. Tracking State as a string observation makes the most sense.
Here's the resulting script:
@echo off
@if not "%ECHO%"==" " echo %ECHO%
@if not "%OS%"=="Windows_NT" goto EXIT
echo TABLE ExpandedHost
echo START_SAMPLE_PERIOD
echo Host.String.id = tor017820
echo CPU = 90
echo Memory = 55
echo Disk = 20
echo IPAddress.String = 10.4.22.10
echo State.StringObservation.obs = Up
echo END_SAMPLE_PERIOD echo START_SAMPLE_PERIOD
echo Host.String.id = tor017899 echo CPU = 78
echo Memory = 35
echo Disk = 43 echo IPAddress.String = 10.4.21.14
echo State.StringObservation.obs = Down
echo END_SAMPLE_PERIOD
echo END_TABLE
Once you get this working, you'll feel cheated. What you'll see is exactly the same set of metrics - cpu, memory and disk. There will be no other indication that this is different from the original script unless you look closely:
Now we actually have two sets of entries - one for tor017820, and one for tor017899. What we never really noticed before is that we were getting two values of the same metrics into the same table before. Now we have two separate table entries - one for each host. So we've fixed at least one bug by adding Host.String.id to the script.
But where are IP Address and State? IP Address should be visible as a property. To see properties, select the Property Viewer view and scroll down:
That's great, we have accounted for two of our changes. But what happened to the State value?
Observations are a special class of metric. They are, by nature, harder to display. A time series metric of type double can be graphed. But a set of values for a String needs to be shown differently. In general, observations are a little more difficult to deal with than other types. You can still display them, write rules, etc - but you have to use special techniques. I'll cover these special techniques in a later blog. For now we're focused on getting data into the server in the right form. So where is State?
The Property Viewer will show it if we care to scroll down further:
As you can see, the state observation shows up with the other time series metrics on the property viewer. Unfortunately, he metric analyzer view does not currently show observation data. Recent requests from customers and from the field have ensured this will be considered for a future release.
Errors You Might See
If you see this in your management server log:
...then you've typed Field.String.obs instead of Field.StringObservation.obs. The StringObservation type is a special type that converts a string into a time series metric. You'll need to correct your script. This error is actually fatal to the processing of the agent data. You won't see any data pushed into the server - specifically, no agent entry on the Agents page.
If you see this:
...then everything is okay.This happens because during the processing of the data, the server came across your identity declaration. This caused the server to change the definition of the table. You might see this as part of your iterations.
Conclusions
In this blog we've covered the basics of how script agents work. We've seen simple examples from Foglight 4. We've also spent some time reviewing problems that can occur with strings, and how to solve them. Hopefully this is enough for you to get started with script agents! If it isn't, more blogs with examples are coming.