This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Rapid Recover 6 is slow - Update, it's not as slow anymore, hopefully still improving

Just posting this as an update.

 

I've been quite vocal on here recently about how poor RR6 has been, and virtually unusable.  We had tried multiple things, writecache, firmware & driver updates, moving metadata locations around between storage systems (have found definite improvement having metadata and data on different disks).

Last weekend, having decided that nothing was really updating anyway we decided to pause replications for an extended window and exclude them for ~10-8 hours/ day.  Also left virtual standby's disabled  This has finally allowed the system to tidy the repository up, deleting index RPFS has now gone to the x00's MB/s whereas during replication jobs it was closer to single digits.

It is still deleting index RPFS recovery points, however whilst replicating it is now managing this at a reasonable speed (still have the issue with updating agent metadata where I apparently have to upgrade the source cores, does appear to have improved it for one of the cores we replicate from).

I have even managed to have exports running at a much more reasonable 30MB/s (was kB/s) whilst replication is running and not impacting things as much as before.  almost back to pre-upgrade level

 

Basically, it appears to my untrained eye that it was so bad because it was heavily fragmented and this reared its head after the repository update, it has now gained some spots to place new data and is running better.  it is still going through this process, hopefully i can start some rollups again soon.

 

If you have a similar issue after upgrading, check to see if any repository Delete Index RPFS jobs are running and if they are just stop everything else and give it some time to finish these or at least get to the stage whereby it has deleted a lot of them.

going forwards, try and have some windows where no replication is happening, i want to reduce my exclusions but waiting until everything has caught back up, suspect i'm a few weeks away from that at this stage.

 

anyway, just thought i'd update this, even though disk latencies were generally low (<5ms) and disk activity was <5MB/s it must have been creating such small writes all over the place that they didn't register on those tools but were impacting performance to a large degree.

I still feel that there are potential improvements to the underlying engine that have raised their head here and that certain 'don't try this at home' workarounds would improve the amount of disk reads/writes required in to a more sequential nature but i'm feeling that i may get through this.

 

But just to add, i have had to find and do all this myself, support have been pointless, today was the first time they even looked at the system (when it was better) and stated it appears to be ok (which to be honest it is much more respectable).

Parents
  • That is correct, you beat me to it. The DDs you can schedule as a nightly job to try an alleviate some of this job contention, that was put in a build or two for exactly what you have experienced. If you were able to schedule your environment in such a way that the DDs had a clear path to hammer through everything on a nightly basis then in theory you wouldn't have to see the performance hit one the jobs conflict with one another. The trick is to find that happy medium where the DDs finish on a daily basis and you don't fall behind, while keeping your backups/replications/exports up to date as well.
Reply
  • That is correct, you beat me to it. The DDs you can schedule as a nightly job to try an alleviate some of this job contention, that was put in a build or two for exactly what you have experienced. If you were able to schedule your environment in such a way that the DDs had a clear path to hammer through everything on a nightly basis then in theory you wouldn't have to see the performance hit one the jobs conflict with one another. The trick is to find that happy medium where the DDs finish on a daily basis and you don't fall behind, while keeping your backups/replications/exports up to date as well.
Children
No Data