Intermitten Failure

I have a 2012 RDS server we are trying to backup.  Everyone once in a while it fails to perform the backup.  It will give an error like this..

The next time the machine tries to backup will be successful.  Unfortunately, this error is driving the tech support guys in house nuts, they don't like errors.

Any thoughts on how to resolve this?

The transfer of the backup of '(Volume Labeled 'System Reserved'); C:\; E:\' on 'gs-app3' failed

Agent is offline

Replay.Core.Contracts.Agents.AgentIsOfflineException: Agent is offline ---> WCFClientBase.ClientServerErrorException: Call to service method https://gs-app3:8006/apprecovery/api/agent/metadata/summaryMetadata PUT failed: Failed to call Create File on disk '\\.\PhysicalDrive45' - The system cannot find the file specified ---> Replay.Common.Contracts.Win32Api.Win32ApiFailedException: Failed to call Create File on disk '\\.\PhysicalDrive45' - The system cannot find the file specified
   at Replay.Common.Implementation.Win32Api.Win32.ThrowLastError(String message, Object[] args)
   at Replay.Common.Implementation.Win32Api.Win32.GetDiskAttributes(String diskName)
   at Replay.Common.Implementation.Storage.DiskInfoBase.get_IsReadOnly()
   at Replay.Common.Implementation.Metadata.CommonMetadataService.GetDiskInformation(ICommonSummaryMetadata metaData, IStorageMetadata storageMetadata)
   at Replay.Common.Implementation.Metadata.CommonMetadataService.GetCommonSummaryMetadata(ICommonSummaryMetadata metadata, Boolean includeNonSnapableVolumes)
   at Replay.Agent.Implementation.Metadata.AgentMetadataService.GetCurrentSummary(MetadataCredentials metadataCredentials)
   --- End of inner exception stack trace ---
   at Replay.Common.Implementation.Utilities.SingletonTask`1.Execute(Func`1 function, CancellationToken cancellationToken)
   at Replay.Core.Implementation.Agents.AgentClient.GetCurrentSummaryMetadata(MetadataCredentials metadataCredentials, CancellationToken cancellationToken)
   at Replay.Core.Implementation.Agents.AgentsMetadataHelper.GetAgentSummaryMetadataInternalClient(AgentDescriptor agentDescriptor, IAgentClient agentClient, CancellationToken cancellationToken)
   at Replay.Core.Implementation.Agents.AgentsMetadataHelper.GetSummaryMetadata(AgentDescriptor agentDescriptor, IAgentClient agentClient, CancellationToken cancellationToken)
   at Replay.Core.Implementation.Agents.ProtectedAgent.<>c__DisplayClass40_0.<GetSummaryMetadata>b__0()
   at Replay.Core.Implementation.Agents.ProtectedAgent.AgentClientSend[TResult](Func`1 func)
   --- End of inner exception stack trace ---
   at Replay.Core.Implementation.Agents.ProtectedAgent.AgentClientSend[TResult](Func`1 func)
   at Replay.Core.Implementation.Agents.ProtectedAgent.GetSummaryMetadata(CancellationToken cancellationToken)
   at Replay.Core.Implementation.Transfer.Validation.Implementation.ProtectedAgentTransferValidator.Validate()
   at Replay.Core.Contracts.Validation.ValidatorBase.AggregateValidator.Validate()
   at Replay.Core.Implementation.Transfer.Queuing.Implementation.TransferQueueService.StartTransfer(TransferQueueEntry entry)
---

About this event: The transfer of a new recovery point from the protected machine has failed

Parents
No Data
Reply
  • You said, every once in a while it fails a backup. The error message says "Agent is offline" is there any consistency to when the backup fails? Is it at the same time each day or week? Does it correlate with something else going on with your network or that server? The fact that it is able to start a backup job tells us that the agent was online when it queued the job (otherwise it would throw an alert that the backup couldn't be started cause the agent is offline). So that means that the job is able to start and then the agent goes offline when it's trying to do a metadata call and a create shadow copy call. So Emte could be correct that something in VSS trying to shadow copy the device &#39 is failing, or it could be something causing the commands to be blocked (hence the agent offline error), or it could be something related to the agent service actually going offline or being stopped and restarted. 

Children
No Data