BladeSystem - General
1748347 Members
5198 Online
108762 Solutions
New Discussion

Known issue with ESX and MSCS (Microsoft Cluster) = very slow BfS. Architectural limitation of ESX with MSCS

 
chuckk281
Trusted Contributor

Known issue with ESX and MSCS (Microsoft Cluster) = very slow BfS. Architectural limitation of ESX with MSCS

Terri was working an Microsoft Clustering issue with VMware ESX and at least got to an understanding of the situation:

 

************************************************************************************

 

This is an FYI only.

 

ISSUE: c-class BL495 blades servers and BfS (Boot from SAN) with ESX 4.0 U2. If a blade was rebooted it could take up to 22 minutes to re-aquire access to the LUN and actually boot. This was a Virtual Connect solution with Brocade SAN switches

 

I was able to verify the delay was not with the blade logging back into the Brocade switch. Once customer accepted this, he worked with Vmware tech support and the following information was provided by Vmware:

 

Using VMDKs would prevent the boot issues by containing the SCSI reservations within a file instead of an actual volume.  However, that limits you to keeping both MSCS nodes on the same host - not the most ideal.

 

Unfortunately, other than that, there is nothing we can do about the slow boot issues other than make the adjustment to the SCSI.UWRetries count to reduce the wait time.  This is an architectural limitation of ESX with MSCS simply based on the reservation types that the two use.

 

One thing we may want to try-  we can reduce the amount of time that hostd waits for the volumes.  If the driver itself is what is taking time, this will not help, but if it is actually ESX that is taking that long, it may improve things.To do so, go to-  configuration -> advanced settings (software) -> Scsi.UWConflictRetries and change the value to 80. I'd go ahead and make the change to all of your hosts with MSCS clusters, and possibly consider continuing to isolate them as you get downtime for maintenance.  Ideally, we'd have the MSCS nodes off in their own 2-4 host cluster so that the normal VM clusters are completely separate from them and totally unaffected by the other side-effects of MSCS clustering.

 

***********************************************************************************

 

Let us know if you have run into this situation and what you have done to resolve the problem if anything.