This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

7.5.1: Backup task hanging at 100% of VMDK backup until Timeout reached, aborting

We have a vRanger 7.5.1 job that backs up 13 VMs on two ESXi 6.0U3 hosts.  This job typically runs in 1-1.5 hours.  We have been seeing an issue where a single VM within this job will back up its only VMDK to 100% (as shown in the GUI) and then hang there, never completing the task, until the Timeout of 14 hours is reached at which time vRanger aborts it and cleans up the attached disk, snapshot, etc.  The VM is not necessarily the same VM every night.  The VM then usually backs up the next night successfully.  This does not occur every night but happens often, and always seems to be just one VM at a time.  As you can see in the log below, at 7:14p, it indicates the VMDK backup is complete (the GUI shows 100% at this point) but the task job just hangs there until 10:07a the next morning where it aborts with "Backup task using VDDK HotAdd failed: The operation was cancelled."

 

[2017-07-09 19:00:43.546]: vRanger Backup & Replication - v7.5.1.0
[2017-07-09 19:00:43.546]: Selected Options: Backup powered on machines only. | Update notes with the latest backup results. | Enable Active Block Mapping™ (ABM). | Selected SpaceSavingTech: Differential.
[2017-07-09 19:00:43.546]: Task for virtual machine <VM NAME> was queued.
[2017-07-09 19:06:58.122]: SourceVm:<VM NAME> | Uuid:4208d2f1-bb41-f42b-20f8-fe106869c837 | VC:<vCenter Name>, Host:<Host Name> [ESXi 6.0.0]
[2017-07-09 19:06:58.122]: Beginning backup task for Backup '<Name of Backup>'-<VM Name>
[2017-07-09 19:06:58.122]: Starting task validation.
[2017-07-09 19:06:58.122]: Connection to <vCenter Name> was properly validated.
[2017-07-09 19:06:58.122]: <Host Name> is properly licensed.
[2017-07-09 19:06:58.122]: Test connection to repository <Data Domain Boost volume> starting...
[2017-07-09 19:07:04.309]: Test connection to repository <Data Domain Boost volume> successful!
[2017-07-09 19:07:04.309]: Ending task validation... success!
[2017-07-09 19:07:04.309]: Beginning initialization of backup information.
[2017-07-09 19:07:04.544]: Retrieving the tasks parent information.
[2017-07-09 19:07:04.559]: Retrieving save points for any full backups associated with this job.
[2017-07-09 19:07:04.825]: Verifying content of repository.
[2017-07-09 19:07:20.873]: Finished initialization of backup information successfully.
[2017-07-09 19:07:20.873]: Initialization was sucessful. Backup type to run: Differential
[2017-07-09 19:07:32.982]: Retrieving the VM BIOS configuration completed.
[2017-07-09 19:07:34.701]: Checking mounted disks completed.
[2017-07-09 19:07:36.248]: Creating a snapshot for vRanger completed.
[2017-07-09 19:07:44.170]: Loading virtual machine '<VM Name>' information completed.
[2017-07-09 19:07:49.123]: Local machine is a VMware virtual machine.
[2017-07-09 19:07:49.373]: Backup task will attempt to use Machine-based VDDK HotAdd.
[2017-07-09 19:08:04.843]: Filtering out content of pagefile for disk: [<Volume Name>] <VM Name>/<VM Name>.vmdk
[2017-07-09 19:08:04.843]: Using filter type(s) active map for disk: vix:1:r:<vCenter Name>\:443:0:[<Volume Name>] <VM Name>/<VM Name>.vmdk
[2017-07-09 19:14:27.387]: Backing up disk 'vix:1:r:<vCenter Name>\:443:0:[<Volume Name>] <VM Name>/<VM Name>.vmdk:moref=vm-508897:snapshot-534167:4' completed.
[2017-07-10 10:07:18.680]: Backup task using VDDK HotAdd failed: The operation was cancelled.
[2017-07-10 10:07:31.180]: Check the manual for compatibility issues when attempting Lan-free operations.
[2017-07-10 10:08:02.619]: Removing snapshot for vRanger completed.
[2017-07-10 10:08:04.494]: Updating notes for <VM Name> completed.
[2017-07-10 10:08:04.494]: Updating VM notes completed.
[2017-07-10 10:08:04.947]: Setting VM event completed.
[2017-07-10 10:08:10.228]: Cancel detected; exiting task. (150)

  • Let me summarize. 2 ESX hosts, Ranger is on VM, backup is running in Machine-based HotAdd.
    Probably something with Windows dealing with newly attached disks.
    I would suggest 2 things:
    1. change Job transport to Custom and disable Advanced, HotAdd in this case. So Ranger will proceed in slower LAN, but it should be more reliable.
    2. If both ESX hosts are members of a cluster, deploy Ranger VA (4CPUs/4GB RAM) in cluster mode (single VA to handle both hosts). If source VMs reside on shared storage, VA-based HotAdd should work. You can also deploy 2 VAs, one per ESX host.
    Linux proxy (VA) works much better than Windows.
    One bad thing about Ranger VA. If you keep your backups in CIFS repo, VA is hardcoded to use SMB1 to communicate. If SMB1 is disabled in your environment due to recent virus attacks, VA is not an option. All 7.x VAs are affected. Devs are looking into it.

  • Wondering if you ever found a fix for this. I am experiencing the same exact problem on a brand new vRanger server. Every night 3-4 different VMs will just hang after the Backing up task completes. Forcing me to either cancel them or just wait till the timeout period. It happens on both SAN and LAN transport methods. Very frustrating to say the least.
  • No unfortunately I haven't found a fix. I have one of three vRanger installs that do this almost every night. The other two no problem. This one a new 7.5.1 install, the other two were upgrades. This problematic one replaced another vRanger in the same cluster and the previous one, which also was an upgrade, did not have this problem. My target source is an EMC Data Domain but don't think that is related.
  • Interesting, I have 6 total vRanger servers, and two of them do this, the one that it happens on every single night though is a new install. We also use Data Domain but via CIFS not DDBoost. I will open a ticket with Quest on Monday and see if they can help. Will update if I find a solution.
  • ceestep, adrian.alba.
    Can you try Ranger Virtual Appliance as suggested earlier? At least for few days to see if there is a difference.
  • The vRanger appliance(s) didn't make a difference, still had the same issue. No offense but the appliance kind of sucks. It's a helluva lot slower than backing it up from the vRanger machine itself. And disabling HotAdd is by no means a solution at all and it's really a joke you'd suggest that at all. With HotAdd this backup takes less than an hour. With LAN it's over 8 hours. That's not a solution.
  • Took only 4 months to respond.
    > appliance kind of sucks.
    VA-based transport is a preferred one. VA-based HotAdd is 2nd fastest transport. The fastest is Server-based (Windows proxy) SAN. But it is expensive.
    Hope you configured VA as suggested back in July and did not abuse concurrency (number of tasks per VA).
    > disabling HotAdd is by no means a solution at all.
    It was not supposed to be. It was a way to troubleshoot, isolate the issue.
    I would open a support case if you are eligible so our engineer can take a closer look.
    Extra note. Ranger 7.6 has been released but I would not rush to upgrade yet.
  • I have had a support case open for a couple of months now, no solution or root cause has been found yet....
  • Took only 4 months to respond.
    > appliance kind of sucks.

    It took me 4 months because I was trying different variations with different settings for the VA over that time before I scrapped it. Not only did it not fix the problem, it was significantly slower. I've tried numerous times over the years to use the appliance in several different environments, as everyone seems to say that is the way to go, but it's never been fast, significantly slower than backing up VMs "on this vRanger machine". Hence my conclusion that the appliance sucks.