Array Performance and Data Protection
cancel
Showing results for 
Search instead for 
Did you mean: 

vCenter Snapshots freeze MS-SQL server

SOLVED
Go to solution
N/A

vCenter Snapshots freeze MS-SQL server

We're battling an issue with a virtual SQL server and vCenter sync'd snapshots.  We're not having snapshot failures, but the snap is causing just a bit of disruption so that some of our apps are timing out with VSS snap creation.  We don't have the problem when using Veeam's application consistent snaps so I think there must be something else wrong on my guest OS.  I know I can disable unnecessary VSS writers, but I'm a bit unsure which ones it would be safe to disable and still get the consistent snaps in Nimble.

Here is what we have on our list of enabled VSS writers.  There is a way to disable them using this article: https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1031200

vssadmin 1.1 - Volume Shadow Copy Service administrative command-line tool

(C) Copyright 2001-2005 Microsoft Corp.

Writer name: 'Task Scheduler Writer'

   Writer Id: {d61d61c8-d73a-4eee-8cdd-f6f9786b7124}

   Writer Instance Id: {1bddd48e-5052-49db-9b07-b96f96727e6b}

   State: [1] Stable

   Last error: No error

Writer name: 'VSS Metadata Store Writer'

   Writer Id: {75dfb225-e2e4-4d39-9ac9-ffaff65ddf06}

   Writer Instance Id: {088e7a7d-09a8-4cc6-a609-ad90e75ddc93}

   State: [1] Stable

   Last error: No error

Writer name: 'Performance Counters Writer'

   Writer Id: {0bada1de-01a9-4625-8278-69e735f39dd2}

   Writer Instance Id: {f0086dda-9efc-47c5-8eb6-a944c3d09381}

   State: [1] Stable

   Last error: No error

Writer name: 'System Writer'

   Writer Id: {e8132975-6f93-4464-a53e-1050253ae220}

   Writer Instance Id: {27c00a72-3215-48b6-a699-7bd01b674fa8}

   State: [1] Stable

   Last error: No error

Writer name: 'SqlServerWriter'

   Writer Id: {a65faa63-5ea8-4ebc-9dbd-a0c4db26912a}

   Writer Instance Id: {c7c46f7d-8a70-4530-9445-30a5a80d7fd2}

   State: [1] Stable

   Last error: No error

Writer name: 'ASR Writer'

   Writer Id: {be000cbe-11fe-4426-9c58-531aa6355fc4}

   Writer Instance Id: {c29e355b-c672-4ef3-8d34-7e75407e96ae}

   State: [1] Stable

   Last error: No error

Writer name: 'Shadow Copy Optimization Writer'

   Writer Id: {4dc3bdd4-ab48-4d07-adb0-3bee2926fd7f}

   Writer Instance Id: {0048c718-52fd-416e-bdcc-fc6efac7c369}

   State: [1] Stable

   Last error: No error

Writer name: 'Registry Writer'

   Writer Id: {afbab4a2-367d-4d15-a586-71dbb18f8485}

   Writer Instance Id: {2c79e02d-8724-469f-8bb8-280e73a21a89}

   State: [1] Stable

   Last error: No error

Writer name: 'BITS Writer'

   Writer Id: {4969d978-be47-48b0-b100-f328f07ac1e0}

   Writer Instance Id: {abc3b9c3-61f6-4de9-a4de-873133b6c8d9}

   State: [1] Stable

   Last error: No error

Writer name: 'COM+ REGDB Writer'

   Writer Id: {542da469-d3e1-473c-9f4f-7847f01fc64f}

   Writer Instance Id: {f0908cd6-c87c-4dc3-9bb1-2d4b7862e538}

   State: [1] Stable

   Last error: No error

Writer name: 'WMI Writer'

   Writer Id: {a6ad56c2-b509-4e6c-bb19-49d8f43532f0}

   Writer Instance Id: {b8fc017d-7c54-439d-a612-e70e80bb98f7}

   State: [1] Stable

   Last error: No error

Which of these are necessary in order to get a quiesced snap in VMware?

22 REPLIES
mkieran59
Advisor

Re: vCenter Snapshots freeze MS-SQL server

Hi Adam,

I see it's been more than 24 hours without a response to this question. Please feel free to contact support@nimblestorage.com and they'll be happy to help you resolve the time-out problem.

Thanks for being part of the NimbleConnect community, and sorry we couldn't get this one answered promptly.

Michael

dbauder92
Valued Contributor

Re: vCenter Snapshots freeze MS-SQL server

I would like to add that they are not the only ones seeing / experiencing this issue.  We too have had to deal with this and would appreciate a response a fix to this problem.

Thanks

N/A

Re: vCenter Snapshots freeze MS-SQL server

Thanks for the responses Michael and Dan.  We did have a ticket open with support, but it didn't really lead us anywhere. 

dbauder92
Valued Contributor

Re: vCenter Snapshots freeze MS-SQL server

Further testing as led me to believe this is a Nimble Schedule replication issue:

Take a VMware snapshot including memory and quiescing the file system - no errors

From Nimble volume management click on the Take Snapshot button - no errors

Wait for the schedule Nimble protection scheduled snapshot / replication to occur - errors occur

To me this pretty much screams there is a bug in the protection schedule routines, or they are doing something much differently from the take snapshot button.  Either case is bad news.  Hello Nimble, time to fix this.

N/A

Re: vCenter Snapshots freeze MS-SQL server

This is not the issue we have.  We get the same freeze/pause with schedule, manual sync'd snaps from the array, or by checking the quiesce box in VMware.  We don't ever use the "snap memory" option and my understanding is that the Nimble snaps don't use that option either.

dbauder92
Valued Contributor

Re: vCenter Snapshots freeze MS-SQL server

Understood, are you seeing the Windows event log errors about disk being

surprised removed and the SQL Server service failing? If not is sounds

like there may be more than one problem here.

Dan Bauder, VCP 3, 4, 5, DCV

UNC Charlotte, Information & Technology Services

9201 University City Blvd., Charlotte, NC 28223

Phone: 704-687-0274

dbauder@uncc.edu <username@uncc.edu>* |* http://www.uncc.edu

N/A

Re: vCenter Snapshots freeze MS-SQL server

Nope, the VM just pauses a bit longer than I'd like and some applications timeout when connecting to the DB.

milovanov88
Advisor

Re: vCenter Snapshots freeze MS-SQL server

Hello Adam,

I sincerely apologize the information shared with you during the course of the case was not satisfactory answer to your question. I would like to clarify the behavior from the Nimble Storage Array perspective.

When Nimble Storage makes an API call to vCenter to create a snapshot, the array has to wait until vCenter reported to array that freeze was successful before array can take a snapshot and only then let vCenter know that it has done so. The vCenter quiesce operation depends on the volume collection volumes membership on Array, VM's contained on each volume and the VSS writers present on each VM, which at times is not optimal.

I completely understand the need for the consistent application state for the SQL database and the best course of action is the following:

a) Create separate Nimble Array volume for database and one more for log

b) Connect the VM with direct connections to Nimble Storage Array

c) Move the DB to database and log to log volumes on Nimble Storage array

d) Install the Windows Toolkit on the VM which is directly attached

e) Disable vCenter sync on the Operating System , which is on VMware datastore

f) Enable Nimble Array SQL VSS integration on the volume collection with DB and log volumes for this VM

With direct connection to Nimble Storage array, the quiesce time is minimized because the hypervisor layer is by-passed. Since array is only engaging single VSS writer on a single VM, the time to quiesce is also minimized.

Please let Nimble Storage Support know if you need any assistance in completing above steps and ensuring the snapshots for your SQL server are application-consistent.

milovanov88
Advisor

Re: vCenter Snapshots freeze MS-SQL server

Hello Dan,

I am sorry you are having an issue with your snapshots. Have you opened a case with Support to troubleshoot the issue you are having?

From the little information I have from your post, I can state the following:

Replication engine does not involve any host/guest operation and the transfer is done after the snapshot already exists on the source array.

When you are making a snapshot from the volume level, the snapshot is Nimble Array crash-consistent. The vCenter is not engaged because this is managed by the Volume Collection.

When you are making a snapshot through vCenter you are making a snapshot of a single VM.

When you are making snapshots via a schedule on Nimble Storage array with volume collection which is configured for vCenter sync, all the volumes under the volume collection are calling to vCenter to quiesce all the VM's within those volumes.

If the vCenter is unable to finish the quiesce operation within reasonable time (due to performance or errors), Nimble Storage array will still take crash-consistent snapshot.

You may duplicate the behavior of the scheduled snapshots by taking a manually-triggered snapshot collection under the Volume Collection in question.

N/A

Re: vCenter Snapshots freeze MS-SQL server

I appreciate your reply.  Your recommendations are similar to those of support, however, we have just P2V'd this machine and migrated it to VMDKs for the data/log volumes so we aren't interested in doing the reverse of this.  We were not aware that the snapshots would act differently otherwise we wouldn't have chosen this route, but it is where we are currently.

I'd really hoped to find a way to get the quiesced snaps to happen more efficiently so that we could keep our RPO low with Nimble replication like we had previously when it was a physical server directly connected to the array.

dbauder92
Valued Contributor

Re: vCenter Snapshots freeze MS-SQL server

Unfortunately I have had zero luck with support when reporting these kinds

of issues in the past. I see nothing to be gained by going that route

You can have a stand alone volume that is protected and don't have the

ability to take a volume collection snapshot. In this case I would hope

the take a snapshot button would use the settings from the stand alone

protection and integrate with vCenter. It does not appear to work that way.

Thanks for the info.

Dan Bauder, VCP 3, 4, 5, DCV

UNC Charlotte, Information & Technology Services

9201 University City Blvd., Charlotte, NC 28223

Phone: 704-687-0274

dbauder@uncc.edu <username@uncc.edu>* |* http://www.uncc.edu

chrismcqueen62
Occasional Visitor

Re: vCenter Snapshots freeze MS-SQL server

Hi Adam/Dan,

     This sounds very similar to a problem which we experienced and went through the process or Nimble Support, VMWare Support and finally on to Microsoft...

In Windows Event log we had errors about disks being surprised removed, we actually found that all VMs were having these however it was only the latency sensitive ones which we noticed it on.

We use Veeam along with VMWare, what we found was that Veeam did not display any errors as it uses its own VSS, VMWare calls on Microsoft VSS, Nimble calls on VMWare which in turn calls on Microsoft VSS

This link relates to the original issue which we experienced, and the proposed workaround

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2006849


We tested the theory by building new VMs to test with, both as MBR and GPT, we found that the MBR VMs continued to have the issue, the GPT ones were resolved. Quite a lot of work as its a full infrastructure rebuild but it did resolve the problems

Further to this we had the same issues with EMC AppSync with RecoverPoints, same solution for us

Thanks

 

dbauder92
Valued Contributor

Re: vCenter Snapshots freeze MS-SQL server

Chris - Thanks for the input.  We've got about a dozen VMs running SQL server that we placed on volumes with vCenter Synchronization implemented.   This is where we are seeing the issues, yes they do have MBR disks.  If I understand your post is sounds like rebuilding these with GPT disks might help us out.  Or do we have to do both the GPT disks and implement the VMware KB work around as well?

Thanks

milovanov88
Advisor

Re: vCenter Snapshots freeze MS-SQL server

Chris,

Thank you very much! This is very good bit of information.

ndyer39
Honored Contributor

Re: vCenter Snapshots freeze MS-SQL server

Hello Adam,

Sadly this is the downside to using VMDKs and relying on vCenter snapshots, and it's why for ultimate best practice we do recommend using in-guest iSCSI. Having said that, I'm told there are significant improvements to vCenter snapshot procedures in vSphere 6. I personally haven't tested this myself - here's a blog detailing the benefits from a friend at Veeam: http://www.virtualtothecore.com/en/vsphere-6-snapshot-consolidation-issues-thing-past/.

All of the above disappears with Virtual Volumes and NimbleOS 3 you'll be pleased to hear.

chrismcqueen62
Occasional Visitor

Re: vCenter Snapshots freeze MS-SQL server

Hi Dan,

Unfortunately the answer for us was to rebuild the VMs as GPT, we have since updated all templates in VMWare to use GPT as a standard and do not use MBR anymore

On the VMWare link, the first Microsoft KB 2853247, Resolution 2 is the GPT fix. Resolution 1 outlines having a single partition on an MBR disk which is not possible on newer OS's so ruled that straight out.

It sounds exactly along the lines of what we experienced and we went through all the proper support channels to get this outcome and it was VMWare which pointed us to this article, I'm sure you would but spin up a VM and give it a test to ensure it is the same issue which you have

Thanks

dbauder92
Valued Contributor
Solution

Re: vCenter Snapshots freeze MS-SQL server

Thanks for your insights. Yes I could / would have gone down the support

role, I will take your example and see if that solves our issue too. If

not I guess I will have to do the support route.

Thanks again and have a wonderful weekend - Cheers

Dan Bauder, VCP 3, 4, 5, DCV

UNC Charlotte, Information & Technology Services

9201 University City Blvd., Charlotte, NC 28223

Phone: 704-687-0274

dbauder@uncc.edu <username@uncc.edu>* |* http://www.uncc.edu

dukeboles26
Occasional Visitor

Re: vCenter Snapshots freeze MS-SQL server

Adam,

I dealt with this since we purchased our Nimbles. I have solved the problem. It is as @Nick Dyer says. If your application is at all latency sensitive you must use direct attached iscsi disks. There are lots of gotchas that I would be glad to share if you decide to go that route. There are many but one is that you don't want to go with in-guest iscsi if you use Vmware Site Recovery Manager and array based recovery. In that case you need physical mode Raw Device Mapping. These are identical to in guest on the Nimble side but instead of the iscsi initiator in the guest attaching the hosts attach instead and present to the VM.

There are many more gotchas I am glad to share if you want.

Duke

edit: P.S. I love Nimble support and they do a great job but if you take this to them they will lead you down many wrong paths. I got the final key pieces of info from an internal engineer. I was only offered that after complaining to my account rep. Once I had access to this person I had the whole thing solved in two weeks after eight months of gut wrenching struggle bus action.

N/A

Re: vCenter Snapshots freeze MS-SQL server

Hi Nick,


Sorry I let this one go dark.  I did upgrade to ESXi6 and the snapshots seem to be better, however I am having an unrelated problem where the OS doesn't seem to be recognizing the quiesced snapshots as there are no entries in the application log indicating the freeze, thaw, and I/O resumed like we would expect.  We're working with Microsoft now to see who wins the finger pointing match - Microsoft or VMware. 

I have a question about the VVols comment you had above.  What will the migration to that look like?  Will there be a way to convert the volumes over to this, or will it be a manual data migration in the OS to the new volumes?  (I fear it is the latter) 

ndyer39
Honored Contributor

Re: vCenter Snapshots freeze MS-SQL server


Adam Bond wrote:



I have a question about the VVols comment you had above.  What will the migration to that look like?  Will there be a way to convert the volumes over to this, or will it be a manual data migration in the OS to the new volumes?  (I fear it is the latter)


Hi Adam!

The great news is that it's very simple to move to and from VVol deployments - as it is all controlled via Storage vMotion. It's possible to move a vmdk in VMFS to VVol, and vice versa without any gotchas. And now that we finally support XCOPY in NimbleOS 3, it takes all the copy burden away from the ESX host and network - meaning these conversions will take place even more rapidly than before

N/A

Re: vCenter Snapshots freeze MS-SQL server


Nick Dyer wrote:


All of the above disappears with Virtual Volumes and NimbleOS 3 you'll be pleased to hear.


Nick - can you expand on this perhaps?  Specifically - are there differences in the way application aware snapshots are done with VVols?

rottengeek35
Occasional Contributor

Re: vCenter Snapshots freeze MS-SQL server

I would add, in general, P2V of a SQL server is not going to turn out well.  SQL Server essentially has it's own OS and manages memory, CPU and i/o on it's own and makes assumptions about it's environment.  I love running it virtualized, but I would never P2V it.  (I'm a DBA)