Question - What denotes how quickly an archive is taken?

Question

Does anyone know what denotes how quickly an archive is taken? Through the week, we run 5 archives that will be backed up to a SAN and then copy over to a tape using Netvault. Recently, the archives have slowed right down to a point where some of the archives, generally around 8TB in size, can take 6 days to complete which means that the 5 archives a week can't be done. The archive speed seems to be sporadic, between 5MB/s up to 30MB/s and we've even seen some at 100MB/s but will generally chug along at around 8-10MB/s. 
 It's a DL4300 box with 2x10Core cpus and 128GB of ram, the SAN is on a 10GB link that the DL4300 never comes close to ever saturating. We've tried reboots to clear down some memory usage, we've tried clearing out the antivirus to see if the antivirus was upsetting anything but it still seems to speed up and slow down and chugs along. The slow down seems to have happened around the update from 6.2.1 to 6.3 and we also have repeated Schannel errors in the System Event viewer - "A fatal alert was received from the remote endpoint. The TLS protocol defined fatal alert code is 46." that also correspond with the 6.3 update, haven't found any other people with this issue on a Rapid Recovery machine and all the google links i've looked at in regards to this error point to Exchange servers and certificates, we don't run Exchange and the DL4300 server is off domain and isn't changed at all, apart from for Windows and driver updates, which are all up to date, the only other thing it does is it runs the HyperV server for the exports but only a basic Netvault server is run on that..

Tim Seymour · Accepted Answer

So the disks are maxed out. So speed fluctuations are then based on the location of the data being read, whether it's a random or sequential read, and size of the data that can be read from that location. When the bottleneck is the disks, there isn't anything we can do to speed them up.

simon peart · Answer

Just as an update, we got a job logged with support, second time this has been investigated by support, and it's managed to be sorted. The KB that resolved the issue is below, it does talk about it effecting Server 2008 predominantly and the older 6.1 and 6.2 Core versions. We got the issue on Server 2012 R2 and Core 6.3 
 https://support.quest.com/kb/119686/high-paged-pool-ram-utilization-on-systems-running-appassure-or-rapid-recovery 
 Exports are exporting a heck of a lot quicker and it's night and day for Archives, i've seen speeds of 300-500MB/s on some parts of an archive but the average speeds across the jobs are detailed below. 
 Pre change 
 30th September - 3.06TB at 13.37MB/s 
 4th October - 2.56TB at 23.3MB/s 
 11th October - 2.33TB at 21.78MB/s 
 14th October - 5.93TB at 24.73MB/s 
 20th October - 5.94TB at 20.18MB/s 
 Post change 
 28th October - 1.43TB at 90.77MB/s 
 29th October - 7.75TB at 161.35MB/s

Tim Seymour · Answer

What does the disk queue on the repository disks look like during the archive? It's rare to see a DL max out it's CPU or RAM during an archive. What maxes out is the disks. Rapid Recovery writes everything in an 8 KB fixed block size (that's what the dedupe engine uses) and so when you are doing an archive it's doing reads with an 8 KB IO block size. That generally means slower performance since that's a small IO size. Speed fluctuations then become a function of whether or not the reads are sequential or random. The larger the chunk of data that can be read sequentially for the archive, the faster it will be. Remember also that since the data is deduped each backup is not stored sequentially. More than likely some portion of it is stored randomly since there were blocks that were deduped with other blocks. So even if you were to archive just 1 agent, it's still going to have to do a lot of random reads to get the data for the archive. 
 So, I recommend checking the disk queue to see what it's doing. Any disk queue greater than 1 on the repository disk means that it's maxing out the array (since Windows sees the RAID virtual disk created by the hardware as 1 disk). 
 If there are other tasks running at the same time, especially background tasks like deleting index RPFS files, then you get increased IO competition on the disk and that can slow things down too.

Question - What denotes how quickly an archive is taken?

Top Replies