This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

VM's randomly experience connectivity errors.

Every so often I will get errors with backup transfers that end up being connectivity errors in Rapid Recovery. For example, I have a VM that performed backups no issues every hour today except at 2:00 when I received an error that said it couldn't complete a backup due to "The Virtual Machine 'machine name' paired to another core." The stack trace reveals this:

Server side:

System.Security.Authentication.AuthenticationException: The virtual machine 'OTTEIQDATACOL' paired to another Core at Replay.Core.Implementation.VSphere.EsxVirtualMachineClient.GetVirtualMachine(Boolean ignorePairing) at Replay.Core.Implementation.VSphere.EsxVirtualMachineAgentClient.GetCurrentMetadata(MetadataCredentials metadataCredentials) at Replay.Core.Implementation.Agents.AgentsMetadataHelper.GetAgentMetadataInternalClient(AgentDescriptor agentDescriptor, IAgentClient agentClient) at Replay.Core.Implementation.Agents.ProtectedAgent.b__9() at Replay.Core.Implementation.Agents.ProtectedAgent.AgentClientSend[TResult](Func`1 func) at Replay.Core.Implementation.VSphere.EsxVirtualMachineAgent.GetMetadata() at Replay.Core.Implementation.Metadata.Cache.MetadataCacheService.UpdateAgentMetadataCacheEntry(IAgent agent, Boolean isForced, Boolean tryAgentServiceHostRestart)

UI side:

at Replay.Core.Implementation.VSphere.EsxVirtualMachineClient.GetVirtualMachine(Boolean ignorePairing)
at Replay.Core.Implementation.VSphere.EsxVirtualMachineAgentClient.GetCurrentMetadata(MetadataCredentials metadataCredentials)
at Replay.Core.Implementation.Agents.AgentsMetadataHelper.GetAgentMetadataInternalClient(AgentDescriptor agentDescriptor, IAgentClient agentClient)
at Replay.Core.Implementation.Agents.ProtectedAgent.b__9()
at Replay.Core.Implementation.Agents.ProtectedAgent.AgentClientSend[TResult](Func`1 func)
at Replay.Core.Implementation.VSphere.EsxVirtualMachineAgent.GetMetadata()
at Replay.Core.Implementation.Metadata.Cache.MetadataCacheService.UpdateAgentMetadataCacheEntry(IAgent agent, Boolean isForced, Boolean tryAgentServiceHostRestart)

We aren't performing any maintenance or doing anything with our VMware / Rapid Recovery infrastructure. Upon closer inspection of the VM in RR console, at the top it says "Some actions and metadata are unavailable because machine is unreachable."
I can connect to the VM using RDP just fine. Additionally in the RR console it says the disks are missing (which is also untrue).

What's causing these errors and how can I prevent them?

Parents

0 phuffers over 8 years ago

Gotcha, thank you. The reason I ask is because that symptom is consistent with behaviors/abnormalities that appear when you do backup your VC or the VM hosting your backup solution agent-lessly.

May I direct you to this KB article that we have on the topic of backing up a VC:

support.quest.com/.../229098

This is common discussion point, especially when your VC is the VC VA, since there is not an agent available of it. Luckily the VC is mostly static and does not incur much change, and if you're not running distributed switches or vvols/vsan, most of the data (with the exception of the cluster and sso) is stored upon the hosts and the VC is just your single pane of glass, thus frequent backups of it agent-lessly are not recommended. Even if you have the backups offset, all it takes it for 1 day that the other backups run long, or the backup of the VC to run long, and you'll find yourself in a situation where the VC has to either snapshot itself, or close a snapshot on itself, while it is closing or opening snapshots for other VMs, which is where problems start to arise.

The KB states pretty much the same scenario in a little more detail. However that is why I asked about the VC/backup server, as this behavior tends to follow that type of configuration.
Cancel
Up 0 Down

Cancel

Reply

0 phuffers over 8 years ago

Gotcha, thank you. The reason I ask is because that symptom is consistent with behaviors/abnormalities that appear when you do backup your VC or the VM hosting your backup solution agent-lessly.

May I direct you to this KB article that we have on the topic of backing up a VC:

support.quest.com/.../229098

This is common discussion point, especially when your VC is the VC VA, since there is not an agent available of it. Luckily the VC is mostly static and does not incur much change, and if you're not running distributed switches or vvols/vsan, most of the data (with the exception of the cluster and sso) is stored upon the hosts and the VC is just your single pane of glass, thus frequent backups of it agent-lessly are not recommended. Even if you have the backups offset, all it takes it for 1 day that the other backups run long, or the backup of the VC to run long, and you'll find yourself in a situation where the VC has to either snapshot itself, or close a snapshot on itself, while it is closing or opening snapshots for other VMs, which is where problems start to arise.

The KB states pretty much the same scenario in a little more detail. However that is why I asked about the VC/backup server, as this behavior tends to follow that type of configuration.
Cancel
Up 0 Down

Cancel

Children

No Data