Replication bandwidth

Question

We've just moved our replica core offsite, but TYPICALLY! our source core has decided to take a base of a very large volume on one of our FS, grrrrrrrrrrrrrrrrrrrrr. not sure why it did but it has, anyway it now wants to copy 2.2Tb up to the offsite replica, nice! Lucky we have a 1gbs line to our offsite location but we're only getting the replica job running at about 10MB/s. I've got the replication set as follows, so it shouldn't be throttled, and there isn't anything else restricting traffic. Any ideas as to how to increase the wire speed, is there a setting somewhere else?

Tudor.Popescu · Answer

Hi dtghelp : 
There are a few things that you may try to increase the replication speed, albeit they may not be applicable to the job you are currently running. However, there are a few items to consider first. 
Basically replication has a few separate steps, each of which affects the overall performance. 
 
First step is the rate of retrieving data on the source core. This is conditioned mainly by the repository performance which in turn is determined by the hardware capabilities, drivers/firmware and overall load on the core. Assuming that the hardware is performant, that the drivers and firmware (storage AND Hard Drives) are current, you need to monitor the load on the core. Beside the regular backup you have a host of other jobs that consume Storage IOPS. In my experience, mountability, attachability, RP checks and rollups (with sometimes huge amounts of deferred deletes) and other replications may slow down considerably the data retrieval for replication. If possible limiting these operations during large replication jobs may increase performance a few times over. Please note that, by default, the repository takes 64 concurrent operations and most jobs are composed of multiple operations. For instance, backups feature 8 streams each by default and replication a total of 8 streams. 
 
The second step is the data ingestion rate which is influenced by all of the above (except backup jobs) plus the inner works of the dedupe cache on the target core. Please note the "read-match-write" process which is intended to send over the wire only the blocks that cannot be found in the cache. 
 
The third step is the actual WAN pipe performance. 1 Gbs is obviously a very good speed (125MB/s max theoretical speed) and the replication process was not optimized for it so it is unlikely to reach its full potential. However, if replicating locally worked better than it works now when doing it remotely, you may need to troubleshoot the connection. For instance, depending of provider and your Service Agreement, the upload speed may be considerably slower than the download one. Since replication depends on uploading data on the target core, it may make sense to check if it is the case. Another possibility may be that the WAN speed is limited on a per client basis at your organization level. For instance, if one considers various performance factors AND the way some providers calculate the WAN speed they offer, 10MB/s may correspond to an 100Mbs. This is a rather common issue that needs to be addressed with the SysAdmin guys in your organization. 
 
All these being said and assuming that none of the performance degrading factors apply, you can increase the number of parallel streams allocated to replication. I would start with 16 and go up or down based on what the pipe can take (if you get errors, reduce the number of streams, otherwise, you may attempt increasing them until you begin witnessing errors). 
 
Hope that this helps.