This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Data Transfer Speeds Between Core and Client

Hi All

I have a dedicated 10GB link between my core and a protected server. I note the transfer speed is maxing out at around 120mb/s. I have investigated a little and can see I can amend the transfer rates but can only find articles referring to throttling the speed.

Can I increase settings to improve data transfer rates if so any recommendations?

Many Thanks!

Parents

0 Tudor.Popescu over 8 years ago

Hi Paulw:
I would like to add my "tupence" to the conversation.

When sizing RapidRecovery assignments, a common approach is to focus on the network connection characteristics and give very little importance if any to the storage performance both on the protected machine and the core. In fact the most important characteristic of any system transferring huge amounts of data is the available IOPS on the storage systems at the ends of the transfer pipe. For instance, if the source system hosts a SQL server that is hammered with I/O requests, there may be very little room left to transfer the backup data. On the repository size, when doing Attachability, mounatbility, data transfers, replication and exports plus "invisible" background jobs such as deferred deletes at the same time subtract IOPS from the available pool; it is not uncommon to see 100% disk active time and queue lengths of 5 or more (according to Microsoft, a queue length of 2 is a bottle neck).

Other limitation, which is inherent to Rapid Recovery is the fixed size of the blocks -- 8KB (while windows works with 64KB chunks). This was decided in order to get high deduplication/compression (despite its simplicity it is one of the best in the business).

Another factor diminishing performance is the size of the in line dedupe cache -- the larger it is the more time it take between data is received on the wire and the moment it is committed to storage. Moreover, the dedupe cache is flushed on the disk every hour or so. During this process all repository jobs are suspended. In certain cases this operation can take 5, 10 even 20 minutes every hour (the case of large dedupe cache on slow storage -- i.e. 64GB being flushed on 5400 RPM/512MB cache Raid 1 disks). At last but not at least, data read on the protected machine and committed to the repository is read and written in a random pattern which diminishes considerably the performance compared with sequential operations.

It does not make too much sense to compare Windows copy and data backup. First, Windows copy is caching data but reports data still in the memory as being transferred (which a backup operation should report only committed data). Second, Windows dedupes data (if enabled so) at a time the system does not do something else, while Rapid Recovery backups are doing it before committing the data to the repository. Windows copy is unable to sustain copying large volumes of data without errors for long periods of time as it is optimized toward small files compared with backup jobs which should be able to perform terabytes of transfers without error.

So, returning the question of how to improve performance, the first thing to do is to make sure that the storage system on core and protected agents perform properly (fast disks, SAS rather than SATA, use writeback caching and adaptive reds on Raid arrays etc). Second is to see if jobs can be staggered in such a way that most of the time they run with minimal "competition". Third, on the networking side, set end-to-end jumbo frames, avoid using VLANS if possible and use proven quality switches. If you have a cahance, make a RAID 1 out of 2 SSDs with high sustainable write capabilities and move there the Dedupe cache location. For improved reliability I would choose SSDs at least twice the optimal size of the Dedupe cache (which it won't be that difficult taking into account that the max recommended size is around 32GB. Keep in mind that you have two copies so plan for double storage size.

At last but not at least, based on the quality of your core storage, you may attempt to increase the value of transfers settings for agents (i.e. the segment size). The benefits of changing the transfer settings are rather volatile so I won't get into details. Sufice to say that there are better chances of success on iSCSI storage.

Hope that this helps.
Cancel
Up 0 Down

Cancel

Reply

0 Tudor.Popescu over 8 years ago

Hi Paulw:
I would like to add my "tupence" to the conversation.

When sizing RapidRecovery assignments, a common approach is to focus on the network connection characteristics and give very little importance if any to the storage performance both on the protected machine and the core. In fact the most important characteristic of any system transferring huge amounts of data is the available IOPS on the storage systems at the ends of the transfer pipe. For instance, if the source system hosts a SQL server that is hammered with I/O requests, there may be very little room left to transfer the backup data. On the repository size, when doing Attachability, mounatbility, data transfers, replication and exports plus "invisible" background jobs such as deferred deletes at the same time subtract IOPS from the available pool; it is not uncommon to see 100% disk active time and queue lengths of 5 or more (according to Microsoft, a queue length of 2 is a bottle neck).

Other limitation, which is inherent to Rapid Recovery is the fixed size of the blocks -- 8KB (while windows works with 64KB chunks). This was decided in order to get high deduplication/compression (despite its simplicity it is one of the best in the business).

Another factor diminishing performance is the size of the in line dedupe cache -- the larger it is the more time it take between data is received on the wire and the moment it is committed to storage. Moreover, the dedupe cache is flushed on the disk every hour or so. During this process all repository jobs are suspended. In certain cases this operation can take 5, 10 even 20 minutes every hour (the case of large dedupe cache on slow storage -- i.e. 64GB being flushed on 5400 RPM/512MB cache Raid 1 disks). At last but not at least, data read on the protected machine and committed to the repository is read and written in a random pattern which diminishes considerably the performance compared with sequential operations.

It does not make too much sense to compare Windows copy and data backup. First, Windows copy is caching data but reports data still in the memory as being transferred (which a backup operation should report only committed data). Second, Windows dedupes data (if enabled so) at a time the system does not do something else, while Rapid Recovery backups are doing it before committing the data to the repository. Windows copy is unable to sustain copying large volumes of data without errors for long periods of time as it is optimized toward small files compared with backup jobs which should be able to perform terabytes of transfers without error.

So, returning the question of how to improve performance, the first thing to do is to make sure that the storage system on core and protected agents perform properly (fast disks, SAS rather than SATA, use writeback caching and adaptive reds on Raid arrays etc). Second is to see if jobs can be staggered in such a way that most of the time they run with minimal "competition". Third, on the networking side, set end-to-end jumbo frames, avoid using VLANS if possible and use proven quality switches. If you have a cahance, make a RAID 1 out of 2 SSDs with high sustainable write capabilities and move there the Dedupe cache location. For improved reliability I would choose SSDs at least twice the optimal size of the Dedupe cache (which it won't be that difficult taking into account that the max recommended size is around 32GB. Keep in mind that you have two copies so plan for double storage size.

At last but not at least, based on the quality of your core storage, you may attempt to increase the value of transfers settings for agents (i.e. the segment size). The benefits of changing the transfer settings are rather volatile so I won't get into details. Sufice to say that there are better chances of success on iSCSI storage.

Hope that this helps.
Cancel
Up 0 Down

Cancel

Children

No Data