Foglight High-Availability Experiences?

We're looking to move our production monitoring environment to HA.  I wanted to check to see if anyone out here has some experiences they'd like to share, especially gotchas to look out for, before we start down this path.  I've reviewed the information related to HA in the documentation, but would like to hear from some of you with real-world experience before changing over my development, then production environment.

  • A big "gotcha" is you'll have to touch each FglAM - add the additional HA node(s) as a <config:http-upstream> line in each fglam.config.xml

  • In addition to what Brian mentioned, you can use fglam silent installer '--config' option to add that other HA url.

    You can also use ALIAS-ing for your FMS URL and handle Active/Passive thru your network (more sophisticated architecture)

    You also need to consider your ports settings as well as HA Partition logical name.

    How many HA(s) are you planning on adding?


    #AJ Aslam

  • Thank you to both of you!  I'm looking to setup 2 HA environments.  I'd like to have one to test in and my primary one for production. Depending on hw, licensing, etc I may have to settle for one.  I'll go ahead and try it out with my DEV environment first to see and I think I will explore the aliasing of the URL to make things more flexible.  Please don't hesitate to let me know if you think of other gotchas, nice to knows,etc.

    Thanks again!


  • Another thing you want put in consideration is network latencybetween HA members and between HA member to your backend database. You should neverdesign a FMS HA environment which split HA member cross WAN link


  • Thank you very much.  I'm now looking at having some level of local high-availability with a geographic fall-back in case of a local data center outage.


  • Thank you everyone, I think we're down to the final plan for what the environment will look like.  We'll have a couple of servers in differnt locations, not clustered, with the second being a failver server.  Here's a diagram I put together after working though the requirements with our Quest Architect. We'll be pointing the agent managers to load-balanced urls representing the active managmeent server and are also considering Oracle vs. MySQL for the data tier.

    I'm sharing this for closure and follow-up to my original post, but please feel free to comment or contact me if you have additional suggestions for our planned solution.

    Foglight HA Diagram (3 node) V2.png

    And thanks again to everyone for your help in getting this solution worked out.


  • Hi Jonathan,

    A few questions

    1) What is the platform where you are running your FMS (both primary and secondary)

    2) What made you consider MySQL as a backend repository?

    3) Why are you using embedded FglAM?

    4) Have you considered network latency between your two data centers? Are there monitored hosts in your secondary data center?

    5) The diagram above does not indicate where fglams will be running for remote monitoring of OS, SQL, Oracle, and etc?

    Just something I wanted to point out.


    #AJ Aslam

  • Hi AJ,

    1) Redhat Linux on both

    2) Considering MySQL becuase of the built-in replication features.  Currently on Oracle, and may stay there, but right now, but are looking at the options for going with Master-Master SharePlex, RAC or DataGuard.

    3) Embedded FglAM to monitor the Foglight Server, External FglAM to handle OS, Remote OS, FTR, Oracle, etc)

    4) Yes, there is a concern about network latency, which is the reason for having a Hot/Warm configuration instead of Clustered.  Also concerned about geographic database stbility, therefore the database replication.

    5) External FglAMs will be running with on each of the Foglight servers, different directory paths.

    We've got the ability to scale vertically in the hardware, but horizontal scaling is really not an option which is why the External FglAMs are co-located on the same sever.

    Our major goal is to isolate the monitoring environment from local datacenter issues as much as possible (NAS Filer, foglight server, local database issues, etc), and to continue provide collection, notification and dashboards from the remote location during these outages.

    Hope that answers your questions.



  • Hi Jonathan,

    Thanks for the detailed information. Glad you posted this information here.

    For production environment, I recommend Oracle and/or SQL. There is a cost in license, but worth it!

    The issue with using embedded fglam to monitor Foglight itself is if your FMS goes down, the fglam reporting to FMS will be useless as your FMS is the one which sends out alerts. If it is down - no alert :-) Worse case, your whole server is shutdown in that case fglam/fms will surely be disconnected so again no alert. Have you thought about placing an fglam on your secondary FMS and pinging your primary from there and vice versa? The host to ping will be an alias which will be handled by the network resolution.

    For external FglAms (multiple) which will be co-located on the same Foglight server - You may want to get a sign-off from Quest architect on this. Memory usage/capacity of your hardware will come into play.

    Something to think about.


    #AJ Aslam

  • Thanks AJ,

    Yes, I spent time with Ryan yesterday morning and we havea follow-up session next week.  We'll discuss the Oracle vs MySQL decision and I'm already thinking about each of them monitoring the other.

    Thanks for all of your help and suggestions, it'll help make our environment stronger,


  • Jonathon,

    MySQL Versus Oracle

    MySQL uses NDB for a storage engine for its clustering. This engine is not supported by the FOglight Schema.

    You could use MySQL replication. This would give you the same as Dataguard. More maintenance on say the Foglight Admin unless you have a MySQL admin on staff.

    Oracle More choices and usually allows foglight admins to put the Admin of the Database to the inhouse DBA's

    Just some thoughts


  • Ok, the MySQL option was actualy suggested by the DBAs because of replication, so we can talk more about it.  I'm not tied to either, just want a stable manageable platofmr for monitoring.