Application Integration

Re: Number of VMs per Datastore

 
SOLVED
Go to solution
jrich52352
Trusted Contributor

Number of VMs per Datastore

Just setup protection (volume collection snapshots) on my VMWare datastore and they seem to be working well but i keep getting an error msg that says that it cant delete the snapshots.

"Failed to delete vCenter snapshot associated with volume collection test schedule sched1 since the vCenter virtual machine snapshot tasks have not yet completed."

I sent this in to support and they are saying I shouldnt have any more than 10 VMs on a datastore to avoid this. I only have 28 right now and none have massive amounts of data.

He said he can SSH in and change the timeout for that to help prevent this issue.

Is it just me or does 10 vms per datastore seem really low?

21 REPLIES 21
tjoffs71
Occasional Advisor

Re: Number of VMs per Datastore

The key here is the way that VMWare handles storage sharing.  Basically, when a VM wants to write to a datastore, it tells ESX.  ESX then puts a little token on the datastore locking it for write for that VM.  Once the VM writes the data, the locking token is removed.  This is VERY simplified, but gets my point across.  Now, what happens with 25 VMs on 10 different ESXi hosts during a mass snapshot operation.  Well, now they are all battling for the token, and it slows a few things down.  This is basically why you should size your datastores to hold around 8-12 VMs each (depending on the load).  There are variables, that can change this, but this is a pretty well known standard.

jrich52352
Trusted Contributor

Re: Number of VMs per Datastore

yeah that makes sense. but basically those IO's should be fairly small, again as you stated depending on the IOps of that vm. this is basically the quiescing process, which the tech stated i can actually get some details on (how long each VM takes and if there is a problem VM) which i havent had a chance to dig in to yet. but with SSD (we've got the 4TB cache system) it should be fairly quick.

i completely understand the problem, i just think that unless im doing massive IO (for the most part these VMs should be idle) this shouldnt be too much of a problem.

I guess once i can dig in to this a bit more to see what the VMs are doing i might be able to identify whats causing the issue.

thanks for the response

tjoffs71
Occasional Advisor

Re: Number of VMs per Datastore

Generally speaking, true.  But...snapshots result in changed metadata on the LUN.  This then forces a LUN/Volume level lock from the ESXi Hosts that triggered the operation.  While these are very tiny, they when triggered all at once can create a very noticeable storage I/O latency issue due to random interval retries that stack up.  If you are using ATS (VAAI) the locks are then moved away from the ESXi hosts, and onto the array where rather than locking out the volume as a whole, it only locks the specific data being accessed.  This can help with the issues and allow for a theoretical (not in reality) unlimited number of VMs per datastore.

Now, you must take Queue Depth into account.  Basically each LUN in a VMWare environment has a pre-defined Queue Depth of, I believe, 32.  That is 32 active I/Ops threads per host per data store.  Lets look at two options in math terms:

Data:

2 Hosts

20 VMs (10 VMs Per Host)

Number of LUNs Variable

Option One:

5 Datastores * 2 Hosts * 32 QD Streams / 20 = 16 Available I/O Streams Per VM.

Option Two:

2 Datastores * 2 Hosts * 32 QD Streams / 20 = 6.4 Available I/O Streams Per VM.

So, as you can see, there are multiple factors at play and I have not even touched on the other I/O factors like how the CPU, Memory, Network, Etc. can come into play.  I suggest sticking with the rule of 8-12 VMs per Datastore -- this is a pretty tried an true configuration.  Of course each environment is different, so test away  I have seen some places get upward of 20-30 per datastore, but they were not doing snapshots, and had pretty low I/Ops requirements.

tjoffs71
Occasional Advisor

Re: Number of VMs per Datastore

By the way, just in case you want to dig in, you can change the Queue Depth on your ESXi volumes and while this may help some, there are the storage vendors and/or HBA (Hardware iSCSI Card) settings to contend with.  Rather than re-writing their KB, I am giving a link to the VMWare KB; use with extreme care though as modifying such without knowing the vendor (what is Nimble's optimal Queue Depth?  Anyone?) matching specifics, you can make things much worse really fast.

VMware KB: Controlling LUN queue depth throttling in VMware ESX/ESXi

jrich52352
Trusted Contributor

Re: Number of VMs per Datastore

so the lun im trying to snapshot is a development env. Im building up this base env, and then when someone needs a new copy of the env (agile teams) i can use that snapshot as the zero write clone and mount and sysprep.

because this is the base system, it isnt used at all. this is mostly why i think the 20+ vms shouldnt be an issue.

mandersen81
Valued Contributor

Re: Number of VMs per Datastore

Justin,

You are correct.  It comes down to what is an "acceptable" io pause for your environment.  If you aren't experiencing problems this should be fine.  We like 8-10 as a general rule, but there are always exceptions to the rules.  You should be fine, but if you start seeing issues as described below you will know why

Thanks,
Matthew

tjoffs71
Occasional Advisor

Re: Number of VMs per Datastore

Exactly, and thank you for the response.  It has been a busy day here and I was not yet able to reply!

jrich52352
Trusted Contributor

Re: Number of VMs per Datastore

well see thats the problem, no one is using these vms, i've just installed software on them to prep them so its not like there is high IO, or really any IO actually... when i look at the nimble perf info its usually single digit MBs and under 100 iops (not uncommon to see it register under 10 iops)

when i look at infosight the CPU is registered at a MAX of 4% with an average cache usage of 5% or less (these are based on upgrade %, so i dont think its actually utilization %)

basically what im saying is... its the 460G-X2 with a total of like 30 idle VMs

also im only using 7.6% of the space, which gives me a near perfect cache hit ratio..

tjoffs71
Occasional Advisor

Re: Number of VMs per Datastore

I have seen this type of issue in other, non-nimble, environments.  I wonder, what are the specs on your vCenter server?  I suggest 4 CPU and 16GB memory minimum.  Without that, I have seen where the snapshots from other solutions can not get executed by vCenter in a timely manner.  May not be your issue, but 5.1 has a much higher resource requirement than 4.x versions.