StoreVirtual Storage
1752292 Members
4596 Online
108786 Solutions
New Discussion юеВ

vSphere takes 10 minutes to reconnect to volume?

 
Paul Hutchings
Super Advisor

vSphere takes 10 minutes to reconnect to volume?

Doing some testing of a vSphere server with dedicated iSCSI NICs connected to a dedicated iSCSI switch, to which some P4000 nodes are also connected.

The P4000 management group has 2 clusters in it, with each cluster currently having a single volume accessible to the vSphere box.

The vSphere box is setup to use vSphere MPIO with round-robin.

So, I pull the power on the iSCSI switch, watch in vSphere, the NICs go down, the iSCSI connection is lost to both volumes.

I put power back to the switch, watch in vSphere, the NICs come back up, the connection to one volume comes back, the other volume stays offline for around 10 minutes at which point it comes back of its own accord.

Is there anything I may have missed here that would account for this?

I did find this which suggests I may need to disable iSCSI load balancing on the Server object in the CMC?

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1016836

Thanks in advance.
24 REPLIES 24
Paul Hutchings
Super Advisor

Re: vSphere takes 10 minutes to reconnect to volume?

Oh and to confirm/clarify doing a refresh of the storage or the HP from within the vSphere client doesn't make the volume re-appear.
Paul Hutchings
Super Advisor

Re: vSphere takes 10 minutes to reconnect to volume?

Be grateful if anyone has any thoughts on this as I'm waiting on support to remote in.

I'm confused as my nodes are in the correct sites/switch as are my servers, so I would assume that pulling a single site would mean that the resources in the remaining site stay online which is not what I'm consistently seeing (it seems to vary depending which node the initial gateway connections are made through).
adolbec
Advisor

Re: vSphere takes 10 minutes to reconnect to volume?

Hello Paul,

You do not specify which version of vSphere you are using; from my experience of using vSphere 4.0 U2 connecting to P4000 VSAs, you may be having an "All Paths Dead" (APD) on your vSphere servers when connectivity is broken to your iSCSI volumes. If this is the case, you will see a bunch of errors in your servers vmkwarning logs. You would have to patch your servers and apply some configuration changes; search on the string above.

Alain
Paul Hutchings
Super Advisor

Re: vSphere takes 10 minutes to reconnect to volume?

Thanks for the reply.

I've been using 4.1 (vmware edition) but could use 4.0 U2 or the HP ESX ISO (not entirely clear which additional drivers/patches this has applied).

Be appreciative of any suggestions as I don't have an explicit need to use anything other than a 4.x release.
adolbec
Advisor

Re: vSphere takes 10 minutes to reconnect to volume?

I suggest that you first review you ESX servers' logs especially the vmkwarning one to see what you have in there and then apply either the appropriate patch and / or workarounds. If it is APD that is the cause of this, you may want to test if running "esxcfg-advcfg -s 1 /VMFS3/FailVolumeOpenIfAPD" solves your problem.

Alain
Paul Hutchings
Super Advisor

Re: vSphere takes 10 minutes to reconnect to volume?

Just had a bit of a eureka moment.

I had been pinging all the cluster IPs from a laptop connected to the switch infrastructure, rather than pinging from the vsphere hosts at the time of a dropout/failure.

When the LUNs disappear I can ping one cluster IP but not the other.

This obviously suggests the vsphere hosts in some way shape or form.
Paul Hutchings
Super Advisor

Re: vSphere takes 10 minutes to reconnect to volume?

oh and I don't have a /var/log/vmkwarning log..
adolbec
Advisor

Re: vSphere takes 10 minutes to reconnect to volume?

If you are using ESXi instead of ESX, you will find the kernel's messages in /var/log/messages. In any cases, you should find the related messages within the /var/log files.

Alain
Paul Hutchings
Super Advisor

Re: vSphere takes 10 minutes to reconnect to volume?

Thanks, seeing this sort of thing (attached).

What is odd is that all of the servers are connected to a dedicated switch infrastructure.

When we have the issue, neither vSphere host will ping *one* of the cluster IPs but will ping the other one.

The vSphere hosts will ping all the individual nodes in both clusters.

A laptop connected to the switches will ping both cluster IPs.