This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Virtual Standby to ESXI are slow

Appliance: DL4300

Software Version: RapidRecovery 6.1.2.115

ESXI Version: 6.0

ESXI Server Spec: Dell R730x, 24 15k SAS Drives in RAID 6 - Adaptive Read Ahead - Write Back, 64GB RAM, Dual Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz

Problem: Exports from RR to ESXI run below 10MB/s on a 10Gb/s network. Many of our standby's are in the TB per day range, meaning most take approx 20-30 hours to complete. During this time backups of that agent do not take place, leaving the server vulnerable.

 

Troubleshooting taken with Support:

 - All software and hardware updates completed

 - Direct 10Gb/s fibre between Core and ESXI

 - Pausing all other tasks on the Core and restarting the server

 - Change RAID policy to Adaptive Read Ahead and Write Back

 

Performing an export of an agentless backup last night seemed to run in the 100MB/s+ range, however after checking the logs this morning it dropped back down to 18MB/s after an hour.

 

Any help is appreciated, and even a comment to say you are experiencing the same and info about your setup will assist.

 

James

Parents
  • Hello James,

    The situation that you are describing does tend to come up fairly often in data protection, as that is one of the few times that a 3rd part entity (in this case the RR core) has to write into VMware with a large amount of data for an extended period of time and/or often. My reply will use the words 'generally' and 'usually' as many assumptions are made since in the end we are referring to a Windows OS (the core) writing into VMware. By all means though, happy to have a discussion.

    One thing that I will throw out which will be a reoccurring theme, writing from the outside into VMware (in this case our Windows core writing out to ESXi and then to a datastore) is not nearly as fast as anyone would like it to be. It is generally a fraction of the speed you would get if you were to take the same hardware and do a file copy. Regardless of product that I have used, and/or supported over the years, this sentiment comes up time and time again, writing into VMware is not as fast as one would think. Especially when you factor in variables (like with Rapid Recovery) where you have to un-compress, and re-hydrate your data before it can be written. That takes a slower process and makes it even slower.

    Having said that, that specifically is why they provide the ability/features of HotAdd and Lan-Free SAN, to try to 'boost' performance a bit for their customers. Looking at your post it suggests that you might have tried to setup Lan-Free SAN, or did, is that correct? Normally this action doesn't single handily improve single job performance, it tends to assist with 'towing capacity' or the ability to run multiple jobs without incurring as much of a drop in performance. Lan-Free SAN is not intended to make 1 task 'faster' but the ability to run more jobs simultaneously. The adverse technology, HotAdd, on the other hand is knowing for having a higher boost performance. HotAdd is utilized if your Core is running on a VMware VM, and has access to the same datastores as the VMware VMs that it is backing up. Both technologies however are known/notorious for slowing down over time, HotAdd tends to be more evident, however both do.

    What we commonly see, and what we relay to customers, is that on average we see 10-20 MB/s when writing into VMware. Some customers are lower, some customers are higher, however the vast majority of customers fall into this range. When I mention these rates I'm referring to the average speed throughout, burst speeds do come and go however this is where the vast majority sit. What I can tell you is that even with a 1 Gb NIC, a transport into VMware is more than likely not going to max out that single NIC, let alone a 10 Gb. For an example, with LAN-Free SAN and 10k disks in VMware and in my repo, right now I am humming along at ~ 17 MB/s. Now when I watch it, it goes up into 20 and then back below, usually in the teens. I kicked off a HotAdd restore, again with 10k disks, and it started in the 30s and within minutes here it is back to low 20s. This is absolutely the reason for virtual standbys being configured, is so that you can prevent this latency in the restore as long as you have the disk space available. For the speed however, when asked those are the metrics that I use for customers as they are realistic and tend to be the average of what we see. Some higher, some lower, but the ones that are higher tend to be burst speeds, or environment where vsan (or other form of flash media) is involved (for the repo or the datastores).

    You mentioned that your virtual standbys are TBs in size, is that the initial? Or do you really have TBs worth of change on a daily basis? That would be a higher change rate than what we normal see on a day to day basis.

    Also, you mentioned that while the virtual standby is going you can't perform a backup of the machine, that should not be the case. You should be able to do backup of a protected machine while doing an export of it. I can't recall off-hand if that use to be in a problem in AppAssure or not, however in Rapid Recovery an export of a protected machine does not prevent a transfer for that protected machine.
Reply
  • Hello James,

    The situation that you are describing does tend to come up fairly often in data protection, as that is one of the few times that a 3rd part entity (in this case the RR core) has to write into VMware with a large amount of data for an extended period of time and/or often. My reply will use the words 'generally' and 'usually' as many assumptions are made since in the end we are referring to a Windows OS (the core) writing into VMware. By all means though, happy to have a discussion.

    One thing that I will throw out which will be a reoccurring theme, writing from the outside into VMware (in this case our Windows core writing out to ESXi and then to a datastore) is not nearly as fast as anyone would like it to be. It is generally a fraction of the speed you would get if you were to take the same hardware and do a file copy. Regardless of product that I have used, and/or supported over the years, this sentiment comes up time and time again, writing into VMware is not as fast as one would think. Especially when you factor in variables (like with Rapid Recovery) where you have to un-compress, and re-hydrate your data before it can be written. That takes a slower process and makes it even slower.

    Having said that, that specifically is why they provide the ability/features of HotAdd and Lan-Free SAN, to try to 'boost' performance a bit for their customers. Looking at your post it suggests that you might have tried to setup Lan-Free SAN, or did, is that correct? Normally this action doesn't single handily improve single job performance, it tends to assist with 'towing capacity' or the ability to run multiple jobs without incurring as much of a drop in performance. Lan-Free SAN is not intended to make 1 task 'faster' but the ability to run more jobs simultaneously. The adverse technology, HotAdd, on the other hand is knowing for having a higher boost performance. HotAdd is utilized if your Core is running on a VMware VM, and has access to the same datastores as the VMware VMs that it is backing up. Both technologies however are known/notorious for slowing down over time, HotAdd tends to be more evident, however both do.

    What we commonly see, and what we relay to customers, is that on average we see 10-20 MB/s when writing into VMware. Some customers are lower, some customers are higher, however the vast majority of customers fall into this range. When I mention these rates I'm referring to the average speed throughout, burst speeds do come and go however this is where the vast majority sit. What I can tell you is that even with a 1 Gb NIC, a transport into VMware is more than likely not going to max out that single NIC, let alone a 10 Gb. For an example, with LAN-Free SAN and 10k disks in VMware and in my repo, right now I am humming along at ~ 17 MB/s. Now when I watch it, it goes up into 20 and then back below, usually in the teens. I kicked off a HotAdd restore, again with 10k disks, and it started in the 30s and within minutes here it is back to low 20s. This is absolutely the reason for virtual standbys being configured, is so that you can prevent this latency in the restore as long as you have the disk space available. For the speed however, when asked those are the metrics that I use for customers as they are realistic and tend to be the average of what we see. Some higher, some lower, but the ones that are higher tend to be burst speeds, or environment where vsan (or other form of flash media) is involved (for the repo or the datastores).

    You mentioned that your virtual standbys are TBs in size, is that the initial? Or do you really have TBs worth of change on a daily basis? That would be a higher change rate than what we normal see on a day to day basis.

    Also, you mentioned that while the virtual standby is going you can't perform a backup of the machine, that should not be the case. You should be able to do backup of a protected machine while doing an export of it. I can't recall off-hand if that use to be in a problem in AppAssure or not, however in Rapid Recovery an export of a protected machine does not prevent a transfer for that protected machine.
Children
No Data