Detect, Diagnose and Really Resolve - Remediation in (v)Foglight

With vFoglight 6.5 we introduced the functionality necessary to actually resolve a problem via a workflow.

So I should point out that (v)Foglight has always had the ability to do something on an alarm being raised. At its most useful that something was to run a user supplied script. The challeneges with such an approach are:

  • it always happens automatically (usually at the most inopportune moment)
  • if there are multiple options for what to do such rules have to be codified and make the script very complex
  • if scripts are supplied OOTB then they cannot be customer modified directly because support of them becomes almost impossible

This is a traditional approach and is followed by many other vendors too. the problem is that it doesn't really work for the customer.

So a new approach is required. The customer wants to control what happens and when. the vendor needs to supply resolution mechanisms OOTB. the customer needs a supported approach to customizing the supplied resolutions. Enter workflow.

A Workflow is in effect a template. That template is constructed from a number of actions linked by various forms of flow control which can operate one one or more objects. the actions themselves come from a library where they are grouped into ActionPacks according to the type of thing they operate on. The actions tend to correspond to individual commands or API calls. The actions define their inputs and outputs and hence the interaction between one another. So now the customer can modify the templates by adding or removing calls in a manner that can still be supported by the vendor.

Many real world problems have several possible solutions. Some are chosen by organizations as a matter of policy, some by experience of the operator, some by knowledge of the systems. Often the individual can much more easily make the determination as to which one to use that can the system. And so if the user can be present with the options and can be informed of the implications complexity can be removed from the product.

The administrator or operator often wants to control when things happen, so better to have a problem that sees a system running a little slowly perhaps than to have it reboot and be offline for 30 minutes whilst it reboots to make a change take effect. So the user waits until a suitable point in the day and then initiates the remedial action, perhaps they even schedule the change.

Enter alarm remediation in (v)Foglight 6.5

We have the ability to ship OOTB workflows and link them to alarms so that when an alarm is raised the (v)Foglight user is presented with the appropriate remedial options and can choose if and when to run them. He / she can choose readily from any options presented.

In the future many more OOTB workflows linked to alarms will be provided (the users can construct and add their own too). And what about the ability to schedule it to happen during off hours? Well that is for the next release.

So now we have something that users can really use to solve problems.

And then comes the future. imagine that the (v)Foglight user has been running the same remediation workflow, they have come to know, love and trust it; why not give them the ability to say don't ask me again, just do it automatically in future? It takes us closer to where we are today but with the users trust and approval. Thoughts anyone?

Today we provide workflow templates primarily for VMware and Hyper V, though there are extensions fro PowerShell, Linux, SMS, email, et al

-Mike C