Array Performance and Data Protection
1751851 Members
5177 Online
108782 Solutions
New Discussion юеВ

Re: Which vmdk after volume restore

 
vladv106
Advisor

Which vmdk after volume restore

I have a simple question that had me scratching my head the whole day and I need some help with.

Say you want to restore a file from a vm that was residing on a volume protected by nimbles synched snapshots.

When cloning the snapshot and mounting it I, in the vm folder, I see 2 extra vmdks for each of the original ones. Say the VM had a vmdk named test.vmdk, then the mounted snapshot clone has test.vmdk, test-000001.vmdk and test-000002.vmdk. The same thing happens if you manually snapshot the vm using vcenter and checking quiesce guest file system.

So my question is: If I don't want to import the vm into inventory and just want to add the vmdk to another vm, which vmdk should I choose? Which is the vmdk that has the quiesced and commited writes?

Thank you

7 REPLIES 7
vladv106
Advisor

Re: Which vmdk after volume restore

I want to and a bit of information. I was preparing a test environment to see which is the correct quiesced vmdk composed of 2 VMs, one Win10 and one Server2012R2. After snapshoting both using synched snapshots I took a look in their VM folder.

In case of Win 10 (same behavior on Win8.1), there was a  win10.vmdk and a win10-000001.vmdk. The vm configuration after the snapshot has been taken and before it was commited (the state in which you will find the vm if you clone the snapshot and mount it to the host) pointed to win10-000001.vmdk

In case of Server 2012 R2 there was server12.vmdk, server12-000001.vmdk and server12-000001.vmdk. The vm configuration after the snapshot has been taken and before it was commited (idem above) pointed to server12-000001.vmdk

My thoughts, which I am not sure of, are:

In case of Win 10, going with 000001 is wrong as it is after the queiscing has ended and the write resumed on the vmdk which the vmx pointed to. So maybe going with the win10.vmdk is correct.

In case of Server 2012 R2, I don't know what to think...

I know importing the VM into inventory and reverting to the snapshot is a safer approach but it adds additional steps to something that may be easier to do if we know which is the frozen vmdk.

vladv106
Advisor

Re: Which vmdk after volume restore

I confirmed with support that the base vmdk is the one with writes quiesced. The other 2 in case of Server 2012 or the other 1 in case of Windows 10 are delta disks. What threw me off was the extra 2 disks in Server 2012 case. Still don't know why vmware does this but I will try to get an answer from vmware forum.

A couple of minutes ago I just got of the phone with vmware support. They didn't know what I was talking about and said that I should NOT be seeing those vmdks. After pointing them to 3 web sources confirming that for 2003 server OS through 2012R2 you should have 2 additional vmdk, they confirmed that 00002.vmdk is in fact the correct vmdk that is in an app-consistent state.

So take this into consideration when trying to attach a disk to another vm from a clone synched snapshot. Do not follow the timestamp snapshot.

Sources:

vSphere 5.5 Documentation Center

VMware KB: vSphere API property currentSnapshot may not contain a pointer to a snapshot

VM Snapshots with VSS - Traditional versus VVols - CormacHogan.com

vladv106
Advisor

Re: Which vmdk after volume restore

I am sorry for adding this much contradicting information but I prefer to update frequently as I receive confirmation from support (vmware or Nimble).

Today I have received confirmation from Nimble support that the base vmdk is the correct one. Vmware support, after I refused to have the case closed until a written answer is received, came back to me and said that I should not see 2 additional vmdks while taking a quiesced snapshot of a Windows Server OS (Jesus, the amateurism in their lower level support...). After I pointed them to the links from my previous post they said they will come back to me...

In my testing, SQL dbs in both the highest numbered vmdk and base vmdk, were the same. Tested MD5 and content of the DB. Took multiple quiesced snapshots while hammering the DB with hammerdb and sqliosim generating aproximately 960000 SQL TPM and 200000 NOPM.

I am still inclined to go with 0002.vmdk instead of base vmdk because that is what the vddk documentation says, but there is a method that will take guessing work out of the picture but it take a bit more time. Import the vm from the cloned snapshot in vcenter, then revert the snapshot and finally delete all snasphots all this without powering on the VM. After that you can unregister it and attach the required disk to your choice VM.

This gets pretty complicated if the DB disk is separate from the vmx and you included both volumes in a volume collection. In this case, after cloning the respective snapshots of both volumes and after importing the vm, you need to manually edit the vm config vmx and change the disk path for every disk that was separate from the vmx config file (every disk that is not root of vmx). Importing the VM will not change those for you, instead they will point to the production VM datastore and fail to power up. After that, you need to manually edit the vmsd file so that it points to the correct redo log vmdk datastore (same as in vmx case). You can then repeat the above procedure for snapshot reverting.

That is why I wanted confirmation on which vmdk is the consistent one. Otherwise you need to go through multiple steps to import the VM and have it in a correctly configured state instead of just attaching the required disk to a working VM. Ah, and not to forget: If you attach 00002.vmdk and you decide to keep it (on the cloned volume or migrate it on a master volume), then you will surely want to have it committed and not keep the whole string of base vmdk snapshot dependency. To do this you need to take a simple snapshot of the VM you have attached this disk to and then consolidate (not delete) the snapshots (vCenter will also inform you that the VM needs consolidation). After that you can delete all snapshots from vcenter.

Hope I clarified or helped more then confuse people.

Cheers and happy Holidays.

I will probably post an update if vmware responds.

sio67549
New Member

Re: Which vmdk after volume restore

Why go through all this in the first place would be my question. Why not just attach the vmdk from the clone to the original system and restore the data that way? Attach them, browse them, figure out which is the correct one and restore the data. Done.

vladv106
Advisor

Re: Which vmdk after volume restore

Hi Mark,

I don't think you read my posts. What you are saying is what I want to do, but after you mount the clone you can choose to mount between 3 vmdks of the disk your are trying to recover data from (that si if your nimble snapshot was of a Windows Server VM with synch on).

LE: Finding the correct vmdk is not as easy as you'd think when having multiple application writers to manage. That is why I am trying to find a definitive answer from someone who understands vmware snapshots in case of app-consistent vm snapshots.

sio67549
New Member

Re: Which vmdk after volume restore

Crash consistent is good enough and if you are syncing with vcenter you will not scale well and are exposing your environment to more risk for little reward. If crash consistent isn't good enough for your application then you shouldn't be relying on storage snapshots for data protection.

vladv106
Advisor

Re: Which vmdk after volume restore

I appreciate your opinion but I do not agree.

Of course we have a separate backup solution (Veeam) which we use for replication, off-site replication, backup, off-site backup, etc. But Veeam-ing is a bit more disruptive and takes more time than a simple storage snapshot which has the added benefit of quickly creating an app-consistent restore point. Because of no Veeam integration with storage snapshots (in case of Nimble) we usually schedule replication once every 2-3 hours. Now, with Nimble, we have seen the opportunity of reducing the RPO to 30 mins with minimum impact on production VMs (rarely we see a ping drop when quiescing). This also reduces RPO when recovering large SQL dbs.

Up until now we did not see the risk you mentioned as Nimble simply triggers a vmware procedure which should be stable enough to succeed every time. After that snasphot you can return to it and decide to go crash consistent route (just by powering on the respective vm or mounting the running vmdk) or go the app consistent route by selecting the proper vmdk. In the past, we had situations when exchange 2010 would not mount a DB that was restored from a simple vmware snapshot. To be able to mount it we lost time with checking and redo-ing logs, thus defeating every procedure and equipment we had in place for these kind of situations.

Lastly, I usually design our infrastructure or internal procedures based on what tools I have available on site or are available for purchase. I do not decide on a target objective or design and purchase around it. So going with Nimble, which heavily markets vmware integration and backup/restore abilities of their storage, opened up these options. Now I am looking for proper ways to put them to work and decide on internal procedures for different scenarios.