StoreVirtual Storage
1752848 Members
3674 Online
108790 Solutions
New Discussion юеВ

Re: vSphere takes 10 minutes to reconnect to volume?

 
adolbec
Advisor

Re: vSphere takes 10 minutes to reconnect to volume?

Paul,

Reading your description at the beginning of this thread, I assume that you have at least 4 nodes split in 2 clusters? Is it always the same cluster that gives you trouble or is it random?

Alain
Paul Hutchings
Super Advisor

Re: vSphere takes 10 minutes to reconnect to volume?

We have:

2xP4500 SAS
2xP4300 MDL (we actually have 4 but only have 2 up/configured).

The ping problem seems to vary but is always consistent in that one or both cluster IPs won't be pingable from one or both vSphere hosts but both cluster IPs will always be pingable from a laptop on the same dedicated network.

Regardles of whether one or both cluster IPs can (or cannot) be pinged from a particular vSphere host, the ALB IP of each node can always be pinged.

This is what's so confusing - pinging the nodes would seem to confirm networking is good/present from the vSphere hosts.
adolbec
Advisor

Re: vSphere takes 10 minutes to reconnect to volume?

From what I gather from your log file, you are using the software iSCSI adapter. If that is the case, ensure that you are using vmkping instead of the ping utility to ensure that you are pinging from the proper port group instead of the ESX console. My guess at this time is that you may have configuration mismatch between your port groups, vSwitch and switch ports.

Alain
Paul Hutchings
Super Advisor

Re: vSphere takes 10 minutes to reconnect to volume?

ping/vmkping seem to produce the same results.

Right now behind me I have one host that can ping both cluster IPs and one host that can only ping one cluster IP.

I've tried moving the NICs to different switches (just in case) and it makes no difference.

Yet if I just walk away and come back in 15 minutes I'll they'll both be pinging away.

It seems to be "something" that is happening either when the iSCSI initiator is upset, or when the NICs are reconnected (I'm using different NICs on a different card right now).
adolbec
Advisor

Re: vSphere takes 10 minutes to reconnect to volume?

Then I suggest you take it back one step, disable iSCSI altogether and only use vmkping to test your connectivity issues if possible. If you have a monitoring port on your switches, you may want to snif the traffic to see if the ICMP messages are flowing back and forth.
Paul Hutchings
Super Advisor

Re: vSphere takes 10 minutes to reconnect to volume?

Yes I think that is "Plan B" - install ESX from scratch and start with a single VMK then add iSCSI etc.

If I'm doing that, does anyone have any thoughts on 4.1 vs. 4.0 U2 vs. the HP December 4.1 ISO?
Paul Hutchings
Super Advisor

Re: vSphere takes 10 minutes to reconnect to volume?

Right, iSCSI initiator disabled, same result.

From the vSphere hosts I can ping all the nodes ALB IP's, but cannot ping either cluster IP from one host following a switch pull/reconnect (and again, I'm guessing in around 10 minutes from now the ping will work).

From the CMC I can ping both VMKs on the vsphere hosts from any node.
adolbec
Advisor

Re: vSphere takes 10 minutes to reconnect to volume?

The issue then seems more network related than an iSCSI connectivity problem. From you description of the symptoms, it looks to me like an ARP resolution / cache issue. Do not know right now how to check for that; I suggest that you review at least your port groups and vSwitch configs to ensure that there is no mismatch.

Are you using any port lock down security measures on your switches?
When you are pinging from a workstation, is it located in the same subnet as your P4000 nodes? Is the ARP table the same for the workstation after you restart your switch?
Paul Hutchings
Super Advisor

Re: vSphere takes 10 minutes to reconnect to volume?

Yes that's my thinking now that we appear to have eliminated the iSCSI initiator.

The switches are 2 x 2910al 1 x 2510g in a redundant loop (so MSTP enabled) and and have flow-control enabled, and a VLAN for iSCSI, that's about all (can post configs if required).

No port lock down, these switches are dedicated for iSCSI, the workstation is just a laptop that I connect to any of the switches.

I will check out the ARP table - I didn't see a way to check that within vSphere though.

(oh and sure enough something timed out and my ping started working again all by itself)
adolbec
Advisor

Re: vSphere takes 10 minutes to reconnect to volume?

If you are using multiple switches, it may also be your switches config that affect the vSphere host. VMware recommends disabling any STP methods on ESX connected ports. As an additional step, you could test your switch reboot scenario by disabling / disconnecting one of the ESX server interface that is a member of the iSCSI segment (assuming that you only use 2 interfaces). You will loose connectivity during the reboot but it should not take up to 10 minutes to ping your cluster IPs.