This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Rapid Recover 6 is slow - Update, it's not as slow anymore, hopefully still improving

Just posting this as an update.

 

I've been quite vocal on here recently about how poor RR6 has been, and virtually unusable.  We had tried multiple things, writecache, firmware & driver updates, moving metadata locations around between storage systems (have found definite improvement having metadata and data on different disks).

Last weekend, having decided that nothing was really updating anyway we decided to pause replications for an extended window and exclude them for ~10-8 hours/ day.  Also left virtual standby's disabled  This has finally allowed the system to tidy the repository up, deleting index RPFS has now gone to the x00's MB/s whereas during replication jobs it was closer to single digits.

It is still deleting index RPFS recovery points, however whilst replicating it is now managing this at a reasonable speed (still have the issue with updating agent metadata where I apparently have to upgrade the source cores, does appear to have improved it for one of the cores we replicate from).

I have even managed to have exports running at a much more reasonable 30MB/s (was kB/s) whilst replication is running and not impacting things as much as before.  almost back to pre-upgrade level

 

Basically, it appears to my untrained eye that it was so bad because it was heavily fragmented and this reared its head after the repository update, it has now gained some spots to place new data and is running better.  it is still going through this process, hopefully i can start some rollups again soon.

 

If you have a similar issue after upgrading, check to see if any repository Delete Index RPFS jobs are running and if they are just stop everything else and give it some time to finish these or at least get to the stage whereby it has deleted a lot of them.

going forwards, try and have some windows where no replication is happening, i want to reduce my exclusions but waiting until everything has caught back up, suspect i'm a few weeks away from that at this stage.

 

anyway, just thought i'd update this, even though disk latencies were generally low (<5ms) and disk activity was <5MB/s it must have been creating such small writes all over the place that they didn't register on those tools but were impacting performance to a large degree.

I still feel that there are potential improvements to the underlying engine that have raised their head here and that certain 'don't try this at home' workarounds would improve the amount of disk reads/writes required in to a more sequential nature but i'm feeling that i may get through this.

 

But just to add, i have had to find and do all this myself, support have been pointless, today was the first time they even looked at the system (when it was better) and stated it appears to be ok (which to be honest it is much more respectable).

Parents
  • Not sure how it applies to your situation or how encompassing this fix is, but 6.1.2 release notes say:

    Export rate was slow for recovery points from repositories that have high fragmentation.
    34758
    Virtual export, Repository

    Since we have talked about this issue a fair bit, it would be great if someone from quest could give us a bit more detail on what was fixed, what should work as expected and what is still an issue when it comes to the fragmentation/ performance issue
Reply
  • Not sure how it applies to your situation or how encompassing this fix is, but 6.1.2 release notes say:

    Export rate was slow for recovery points from repositories that have high fragmentation.
    34758
    Virtual export, Repository

    Since we have talked about this issue a fair bit, it would be great if someone from quest could give us a bit more detail on what was fixed, what should work as expected and what is still an issue when it comes to the fragmentation/ performance issue
Children
No Data