This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

SQL Locks

Hello,

 

We are using MSSQL 2014, and RR 6.1.1.137.

Our database server has C (170GB), D (2.2 TB), E (360GB), F (5.6TB).

Currently all databases are set to simple recovery, and our schedule is set to 2 hours.

Unfortunately, at least once a day we have a period where our application (custom built .NET web app) throws a bunch of errors saying it cannot write to the database, we are getting locks.  These are happening only at scheduled backup times, and typically when that happens we have 2-3 retries of the backups.

This is, of course, causing client impact.

Our database is not particularly busy, I feel like maybe we are just doing something wrong or Rapid Recovery would never really work for live databases.  In which case I'm unsure how they could be selling it.

One suspicion, we do have some fairly long running transactions in the application.  These are not changeable at this time.  Could this be causing issues like I've mentioned where DB locks are occurring?

 

Looking for any tips at all.

  • It is possible your server may be experiencing a volume shadow copy writer issue. Our agent utilizes Microsoft volume shadow copy writers to prepare our snapshots, and it is the job of these writers to provide the "quiescing" of data with live applications (such as SQL) to prepare application-consistent snapshots.

    You could confirm with this via Windows Event Logs, and troubleshoot their status, but I would strongly recommend reaching out on our online help chat or opening a service request for further investigation. Go to - support.quest.com/rapid-recovery.

    Additionally, I suggest for you to upgrade your core and agent to the latest version (6.1.3) and then monitor for the behavior described.
  • I would not update unless you have a clear indication that this issue was fixed in an upgrade.

    What errors do you see in the SQL logs, Apprecovry log and the Application Event Logs?

    Is this a physical machine or VM, and if so is it vmware?

    RR can backup active DB but it requires Microsoft VSS be able to get a snapshot and the writer (sql in this case) to be functioning

    I have never experienced a DB lock issue with VSS but I suppose its possible. But its hard to imagine if you understand how VSS works.

    I am not saying the backup/ VSS is not involved, but I would expect there to be more issue with the VSS thaw than with the snapshot
  • In the event viewer on the SQL machine, we get an event 3197 (I/O is frozen on database XXXXXXXX. No user action is required." for each database, then 2-3 seconds later 3198 saying the I/O was resumed.

    In the successful cases we get 18264 ("Database backed up. Database: XXXXXXX...") messages.

    In the unsuccessful cases we get VSS event 12340 ("Volume Shadow Copy Error: VSS waited more than 40 seconds for all volumes to be flushed. "). The process will retry it seems like 3 times before a full failure.

    For additional context, it is a physical machine, and this issue is ongoing despite reboots etc

    I dont really know much about VSS, but I agree with the spirit Emte that it would seem like this should be a basic function of the rapid recovery appliance.
  • It is a basic function, the product is not what is failing at this point. It is Microsoft VSS (but I still dont understand why it locks)

    The error you are getting is pretty straight forward. VSS needs disk access to pause for a very short time (miliseconds) If access to the disk can not be paused in 40 seconds, VSS throws this error.

    So you can try to increase the wait time (40 seconds) or decrease the pause time (there are limits) or figure out why the disk IO can not stop. If it is that job you mentioned, try running backups when this job is not active. There maybe a lot of things to dig into for this

    But all of that being said, none of this should lock the DB since VSS does not do anything. He asks for the disk IO to pause, since it does not he never even attempts the snapshot so I would not expect any impact on the SQL files.

    Is there any mention of the issue that causes the lock in sql logs?
  • The transactions that sometimes take a long time are user driven, so not something I can easily change.

    There’s nothing in the sql logs. But if those long transactions are running is that maybe it? Because if they’re writing data in one large transaction sql wouldn’t want to give up doing that to pause for a backup, as it’s considered one operation. These transactions can take half an hour sometimes I’m told.
  • Take everything with a grain of salt, since I don't know your environment.

    I think there are 2 issues

    1) VSS Snapshot Fails
    2) Database is locked

    SQL has a VSSWriter, I would expect this writer to be able to pause DB access for a few miliseconds even if there was a long job running. But maybe there is a certain type of SQL job that will not respond or maybe it cant respond fast enough. I dont know enough about SQL to help with the first but for the second, make sure the disk can handle the IO of this job and snapshots. Maybe try moving the snapshots to another disk, Check if the logs and db are on the same drive, maybe move one. There are a lot of possible causes and fixes

    For the DB lock, I have no idea. Maybe ping Microsoft since both SQL and VSS are their products.

    Open a case with Quest and see if their people can help. It is not their product that is failing but since their product calls VSS, they should have a fair amount of knowledge and should help since in the end, their backups are failing due to a call they are making (vss) failing
  • Just to add to the conversation -- VSS locks only WRITE IOs; READ IOs continue to work.
    The issue may be related to the number (or size) of the databases as there are hard limitations for both as of what VSS can handle.
    If the databases (and logs) are spread over multiple volumes, I would suggest attempting to backup the volume pairs databases+logs separately. For instance if C: is the system volume D: contains some databases and their logs and E: contains databases and F: contains the logs for the databases on E:, back up C; separately, D: separately and E: and F: separately (at 30 min intervals for instance). This would reduce the time the SQLWriter needs to freeze various databases.