Agentless Exchange log truncation - is this working for anyone?

This has not yet worked even once for me. Working with Quest support who, after a lot of time & effort, is recommending agent based protection.

I have Exchange 2016 Enterprise CU19 on Server 2016 Standard. VMware tools full installation. ESXI and vCenter are 6.7 U2.

Server is fairly new thanks to the Exchange hack earlier this year. We built all new servers and decom'd the old ones. I believe the old one was agent based protection. I only recently got involved with managing Rapid Recovery, mainly because of the log files situation.

I followed steps in this article, even going beyond and just disabling the firewall completely: https://support.quest.com/rapid-recovery/kb/252578/agentless-protection-with-application-support

I just ran diskshadow as per my quest support person's recommendation. Over 1TB of log files finally disappeared.

I really hope I can get this to work.

Thanks
Danny

Parents
  • There's many layers to this, so forgive me, this prompts A LOT of questions. 

    Assuming you have not turned off Guest Quiescing off inside of Rapid Recovery (it is on by default), when the backup job runs with log truncation (manually forced or on schedule) when you open up your VMware console does the snapshot succeed? Or is there indeed a failure with the snapshot? Regardless if it works or fails take note of the time, then once the backup/snapshots are finished, go to the Exchange node and open up MS event logs and look at the application and system view for that time and see if there are any errors. 

    Again, sorry as you may have gone through sum or most of this, however modern agent-less backups with log truncation have a number of 'moving' parts. Windows Firewall/UAC and all that jazz hasn't made anyone's life easier. At least with an agent the agent is already 'on' the OS thus it basically never sees this 'outside entity' (in this case VMware acting on your backup vendor (Rapid Recovery's) behalf. That's not a plug for agent, just dialog. 

    The credential that you're using to truncate the logs generally has to be log admin or above for the OS and an account that has Exchange admin permissions. 

    I mentioned that RR defaults to Guest Quiescing for their VMware snapshots, this is s MUST for any agent-less log truncation, if this is turned off it'll never work. 

    Again, if you are aware, forgive me, but when you take a quiesced snapshot you get 2 VMware snapshots and you get 1 local OS VSS snapshot, so yeah, a few more moving parts. 

    Also to start with, does the RR GUI say under the summary page that it did truncate the logs? Like does the date/time for last log truncation change within the RR GUI when you manually truncate the logs or they automatically run? If the date changes then MS is telling RR it did it, which narrows down the issue. If it doesn't, then you know that successful exist code isn't being sent to RR. 

    Only for conversation, as far as RR is concerned it shouldn't affect your license either way if you go agent or agent-less. The short answer though is yes, not a knock on RR at all.... ANY agent-less backup system has the same problems with log truncation, the more and more that the OS and domains try to lock down their systems, the more hoops you have to jump through. 

Reply
  • There's many layers to this, so forgive me, this prompts A LOT of questions. 

    Assuming you have not turned off Guest Quiescing off inside of Rapid Recovery (it is on by default), when the backup job runs with log truncation (manually forced or on schedule) when you open up your VMware console does the snapshot succeed? Or is there indeed a failure with the snapshot? Regardless if it works or fails take note of the time, then once the backup/snapshots are finished, go to the Exchange node and open up MS event logs and look at the application and system view for that time and see if there are any errors. 

    Again, sorry as you may have gone through sum or most of this, however modern agent-less backups with log truncation have a number of 'moving' parts. Windows Firewall/UAC and all that jazz hasn't made anyone's life easier. At least with an agent the agent is already 'on' the OS thus it basically never sees this 'outside entity' (in this case VMware acting on your backup vendor (Rapid Recovery's) behalf. That's not a plug for agent, just dialog. 

    The credential that you're using to truncate the logs generally has to be log admin or above for the OS and an account that has Exchange admin permissions. 

    I mentioned that RR defaults to Guest Quiescing for their VMware snapshots, this is s MUST for any agent-less log truncation, if this is turned off it'll never work. 

    Again, if you are aware, forgive me, but when you take a quiesced snapshot you get 2 VMware snapshots and you get 1 local OS VSS snapshot, so yeah, a few more moving parts. 

    Also to start with, does the RR GUI say under the summary page that it did truncate the logs? Like does the date/time for last log truncation change within the RR GUI when you manually truncate the logs or they automatically run? If the date changes then MS is telling RR it did it, which narrows down the issue. If it doesn't, then you know that successful exist code isn't being sent to RR. 

    Only for conversation, as far as RR is concerned it shouldn't affect your license either way if you go agent or agent-less. The short answer though is yes, not a knock on RR at all.... ANY agent-less backup system has the same problems with log truncation, the more and more that the OS and domains try to lock down their systems, the more hoops you have to jump through. 

Children
  • Hi phuff, thanks so much for the reply! That's a lot of info but I think I have all the points.

    We have not turned off Guest Quiescing.

    Snapshots creation & removal appear successful in vCenter. There are several “reconfigure virtual machine” events in between. I don't know what that's about.

    We do have some errors in the Exchange logs but not related to snapshots or backups, for example one server is offline (it got destroyed in shipping) and we never removed it from AD. I think I'll go ahead and remove it to see if that helps.

    I had created firewall rules according to documentation, but have since disabled the firewall just to eliminate that. I have NOT made any changes to UAC though... I will look into that.

    Credentials should be good. I saw some exchange role requirements in documentation, and I am using an account that also is domain admin.

    As for the "2 VMware snapshots and 1 local OS VSS snapshot", I'm not sure, but I don't think anything is happening on the OS. I don't see any VSS events except "The VSS service is shutting down due to idle timeout." And that comes a few minutes after the backup or log truncation starts... Could that be a clue?

    The summary page in RR does indeed have “Last Exchange Log Truncation” and the time & date (currently displaying 6/23/2021 1:03:45 AM). But the email databases in Exchange admin center, the database “Last full backup” time & date does not change. It did change when I ran DiskShadow, and the log files did purge at that point.

    Thanks again!
    Danny

  • Sure thing Danny. Worth the dialog, and God it isn't as easy as it was with ESXi 4.1 and Server 2008, unfortunately more and more layers got added, keep getting added, to the onion so to speak with agent-less log truncation. 

    We have not turned off Guest Quiescing - Good, it's required. 

    Snapshots creation & removal appear successful in vCenter. - Good, if there was a failed snapshot error when taking a quiesced snap then you'll never get log truncation. You can always test this on your own of course too by going into VMware and taking a quiesced but a 'non-memory' snapshot. 

    I had created firewall rules according to documentation, but have since disabled the firewall just to eliminate that. I have NOT made any changes to UAC though... I will look into that.

    Credentials should be good. I saw some exchange role requirements in documentation, and I am using an account that also is domain admin - The account would need to be an Exchange admin, if you can see the Exchange info in the RR GUI when you are looking at the Agent's summary page it is probably sufficient, just something to validate. 

    As for the "2 VMware snapshots and 1 local OS VSS snapshot", I'm not sure, but I don't think anything is happening on the OS. I don't see any VSS events except "The VSS service is shutting down due to idle timeout." And that comes a few minutes after the backup or log truncation starts - This is an absolute, there MUST be a local VSS snapshot taken as that is the only way ANY backup vendor can get an agent-less backup to trunc you logs, as the request is coming from ESXI in. With that being said, the fact it says shutting down, means it was called if it is happening around the same time as the backup. If you got to the Application logs however you should see Exchange event logs in the Windows Application Event logs on the Exchange server that refer to the DB being flushed or stunned. That is the piece where the truncation is done. 

    The summary page in RR does indeed have “Last Exchange Log Truncation” and the time & date (currently displaying 6/23/2021 1:03:45 AM). But the email databases in Exchange admin center, the database “Last full backup” time & date does not change - this is your problem. Period, this is your problem. This means that everything, MS, VMware, Exchange, everyone is giving RR the thumbs up that it worked, as if any of those steps gave RR a failure that date would not change. So if the 'last log truncation' date is changing then the Exchange conduit is passing through a success response, only way that happens. 

    So, with that being said, when you look in RR on the Summary page for the Exchange node, where it lists the Exchange info and DB paths/log paths, are they the correct paths? If any part of those paths are incorrect, or differ than reality, then there is a erroneous data in your EX DB that is being reported to RR. 

    Also, was there a DAG involved? 

    Paul