Rapid Recovery: Replication shows Repository is full - but it is not full.

I'm having issues with my Rapid Recovery replication for one machine.  It is failing with the error "Repository is full" - but neither repository is full.

The total size of the machine's recovery points is 800GB.  Its local repository is 1.11TB in size (343GB free).  The remote core has a repository that is 950GB in size and has no data stored in it yet. 

Specific symptoms:

  • The machine is protected locally and has successful base image and perhaps other recovery points as well
  • The machine is added to replication without use of seed drive (successfully done multiple times with other larger servers)
  • Replication starts, but fails - usually within a minute or two, but sometimes longer
  • Error: The repository is full. Adjust your retention policy or delete the old recovery point chains to free up space, or add a new storage location to the existing repository. Call to service method [Remote servername/address] POST failed: The repository is full. Adjust your retention policy or delete the old recovery point chains to free up space, or add a new storage location to the existing repository

Other details:

There is a 100Mbps connection between the sites, so we usually replicate without seed drives and it takes approximately 24 hours.  I have it setup to have one repository per machine, and other larger machines are successfully replicating with no issue.

Things I've tried:

  • Giving the remote repository more space (at one point it was larger than the local repository)
  • Giving the local repository more space
  • Recreating both the local and remote repositories using new names
  • Removing and re-protecting the machine using the local core
  • Removing/recreating the replication

I'm out of ideas.  I have it giving it another try now and it's working on replication and has been for about 30 minutes, but just earlier today it worked on replication for about 50 minutes before reporting the repository was full.

Rapid Recovery Version:

  • Hi Crof:

    The most likely cause of the "repository full" message is the data that was transferred during replication for your previous replication attempts but was not discarded yet. In charge of this operation is the deferred deletes job and it may take some time until it finishes. (You can see if any deferred deletes are processed if you hit the "gear" icon in the events tab).  My guess is that the quickest approach, since there is no usable data on your repository, is to delete and recreate it. Additionally, it is good practice to have the repository sized least about 1.2 times the total amount of data to protect but you probably would be able to circumvent it.

    If you still decide to continue with the current repository, it makes sense to change   the value of the HKEY_LOCAL_MACHINE\SOFTWARE\AppRecovery\Core\RepositoryService\Clear Duplicates to 1 (it will revert to 0 automatically after restarting the service), bounce the rapidrecoverycore service and wait for the deferred deletes to finish before attempting a new replication.

    It make sense to install the latest cumulative update on the target core while having the rapidrecoverycore service stopped. You can find it at s3.amazonaws.com/.../P-1834.msi

    (It may good sense to install it on the Source core as well but it can wait until you have a chance doing it. Never install an update on the Source core first, though).

    Please let me know how it goes.

  • Thanks for the suggestions and insight.  I'll report back later with how things go.

    For now, I'm trying to use a seed drive to start the replication process in hopes that if I can get the initial replication done, it'll fix the issue.

    Differed Deletes Job: Assuming you're referring to the "Deleting index RPFS file" jobs, I did notice those before.  They were working on the remote core's repository to delete around 400GB after some of the failed replications - even though the replications failed after transferring only around 20GB.  Before posting this, I ended up recreating the repository so I wouldn't need to wait for those jobs to finish.  

    Cumulative update: Is there a place where I can get notices of updates like this for Rapid Recovery?

  • Hi Crof:

    Yes, you are correct -- the delete index RPFS file job are the deferred delete jobs. Normally, cumulative updates are posted both on the support portal and on the License Portal (the downloads section).

    The easiest way to check is going to support.software.dell.com/rapid-recovery

    On the right side there is a pretty big section called "download software". Expand it by clicking on the "See all downloads" link. In the new page that opens up, scroll down until you find the "Patches Section" -- it is straight at the bottom of the page. Check the date name and Download it.

    if it is more recent that your currently installed Cumulative Update. Please note that you may need to log in using your license portal credentials.

  • Using the seed drive option fixed the issue.

    Specifically, what I did was:

    - Created a seed drive (in my case, I just used a folder on a USB drive connected to the main backup server)

    - Told the replication target to use the seed drive to start replication.  In my case, I never moved the drive, I just had the target server access the information over our 100Mbps site-to-site connection

    *wait 5 days for replication server to consume the seed drive information*

    - Once completed, the off-site replication for that particular machine started working normally.

    As a note for anyone else finding this information later: I have strong suspicions that there are other hardware/software issues on the main backup server.  There are even log entries noting long access times to backup information (15+ minutes).  This may have contributed to the false errors of not having enough space on the remote repository.  I suggest checking your log for similar issues and addressing those first to ensure reliable backups.