7.5.1: Backup task hanging at 100% of VMDK backup until Timeout reached, aborting

We have a vRanger 7.5.1 job that backs up 13 VMs on two ESXi 6.0U3 hosts.  This job typically runs in 1-1.5 hours.  We have been seeing an issue where a single VM within this job will back up its only VMDK to 100% (as shown in the GUI) and then hang there, never completing the task, until the Timeout of 14 hours is reached at which time vRanger aborts it and cleans up the attached disk, snapshot, etc.  The VM is not necessarily the same VM every night.  The VM then usually backs up the next night successfully.  This does not occur every night but happens often, and always seems to be just one VM at a time.  As you can see in the log below, at 7:14p, it indicates the VMDK backup is complete (the GUI shows 100% at this point) but the task job just hangs there until 10:07a the next morning where it aborts with "Backup task using VDDK HotAdd failed: The operation was cancelled."

 

[2017-07-09 19:00:43.546]: vRanger Backup & Replication - v7.5.1.0
[2017-07-09 19:00:43.546]: Selected Options: Backup powered on machines only. | Update notes with the latest backup results. | Enable Active Block Mapping™ (ABM). | Selected SpaceSavingTech: Differential.
[2017-07-09 19:00:43.546]: Task for virtual machine <VM NAME> was queued.
[2017-07-09 19:06:58.122]: SourceVm:<VM NAME> | Uuid:4208d2f1-bb41-f42b-20f8-fe106869c837 | VC:<vCenter Name>, Host:<Host Name> [ESXi 6.0.0]
[2017-07-09 19:06:58.122]: Beginning backup task for Backup '<Name of Backup>'-<VM Name>
[2017-07-09 19:06:58.122]: Starting task validation.
[2017-07-09 19:06:58.122]: Connection to <vCenter Name> was properly validated.
[2017-07-09 19:06:58.122]: <Host Name> is properly licensed.
[2017-07-09 19:06:58.122]: Test connection to repository <Data Domain Boost volume> starting...
[2017-07-09 19:07:04.309]: Test connection to repository <Data Domain Boost volume> successful!
[2017-07-09 19:07:04.309]: Ending task validation... success!
[2017-07-09 19:07:04.309]: Beginning initialization of backup information.
[2017-07-09 19:07:04.544]: Retrieving the tasks parent information.
[2017-07-09 19:07:04.559]: Retrieving save points for any full backups associated with this job.
[2017-07-09 19:07:04.825]: Verifying content of repository.
[2017-07-09 19:07:20.873]: Finished initialization of backup information successfully.
[2017-07-09 19:07:20.873]: Initialization was sucessful. Backup type to run: Differential
[2017-07-09 19:07:32.982]: Retrieving the VM BIOS configuration completed.
[2017-07-09 19:07:34.701]: Checking mounted disks completed.
[2017-07-09 19:07:36.248]: Creating a snapshot for vRanger completed.
[2017-07-09 19:07:44.170]: Loading virtual machine '<VM Name>' information completed.
[2017-07-09 19:07:49.123]: Local machine is a VMware virtual machine.
[2017-07-09 19:07:49.373]: Backup task will attempt to use Machine-based VDDK HotAdd.
[2017-07-09 19:08:04.843]: Filtering out content of pagefile for disk: [<Volume Name>] <VM Name>/<VM Name>.vmdk
[2017-07-09 19:08:04.843]: Using filter type(s) active map for disk: vix:1:r:<vCenter Name>\:443:0:[<Volume Name>] <VM Name>/<VM Name>.vmdk
[2017-07-09 19:14:27.387]: Backing up disk 'vix:1:r:<vCenter Name>\:443:0:[<Volume Name>] <VM Name>/<VM Name>.vmdk:moref=vm-508897:snapshot-534167:4' completed.
[2017-07-10 10:07:18.680]: Backup task using VDDK HotAdd failed: The operation was cancelled.
[2017-07-10 10:07:31.180]: Check the manual for compatibility issues when attempting Lan-free operations.
[2017-07-10 10:08:02.619]: Removing snapshot for vRanger completed.
[2017-07-10 10:08:04.494]: Updating notes for <VM Name> completed.
[2017-07-10 10:08:04.494]: Updating VM notes completed.
[2017-07-10 10:08:04.947]: Setting VM event completed.
[2017-07-10 10:08:10.228]: Cancel detected; exiting task. (150)

  • Let me summarize. 2 ESX hosts, Ranger is on VM, backup is running in Machine-based HotAdd.
    Probably something with Windows dealing with newly attached disks.
    I would suggest 2 things:
    1. change Job transport to Custom and disable Advanced, HotAdd in this case. So Ranger will proceed in slower LAN, but it should be more reliable.
    2. If both ESX hosts are members of a cluster, deploy Ranger VA (4CPUs/4GB RAM) in cluster mode (single VA to handle both hosts). If source VMs reside on shared storage, VA-based HotAdd should work. You can also deploy 2 VAs, one per ESX host.
    Linux proxy (VA) works much better than Windows.
    One bad thing about Ranger VA. If you keep your backups in CIFS repo, VA is hardcoded to use SMB1 to communicate. If SMB1 is disabled in your environment due to recent virus attacks, VA is not an option. All 7.x VAs are affected. Devs are looking into it.
  • Wondering if you ever found a fix for this. I am experiencing the same exact problem on a brand new vRanger server. Every night 3-4 different VMs will just hang after the Backing up task completes. Forcing me to either cancel them or just wait till the timeout period. It happens on both SAN and LAN transport methods. Very frustrating to say the least.
  • In reply to adrian.alba:

    No unfortunately I haven't found a fix. I have one of three vRanger installs that do this almost every night. The other two no problem. This one a new 7.5.1 install, the other two were upgrades. This problematic one replaced another vRanger in the same cluster and the previous one, which also was an upgrade, did not have this problem. My target source is an EMC Data Domain but don't think that is related.
  • In reply to ceestep:

    Interesting, I have 6 total vRanger servers, and two of them do this, the one that it happens on every single night though is a new install. We also use Data Domain but via CIFS not DDBoost. I will open a ticket with Quest on Monday and see if they can help. Will update if I find a solution.
  • In reply to adrian.alba:

    DDBoost here. 2012R2.
  • In reply to ceestep:

    ceestep, adrian.alba.
    Can you try Ranger Virtual Appliance as suggested earlier? At least for few days to see if there is a difference.