StoreVirtual Storage
1754940 Members
3019 Online
108827 Solutions
New Discussion

Re: SV3200 Flowcontrol dropping heartbeat packages causing SC failovers?

 
Fred Blum
Valued Contributor

SV3200 Flowcontrol dropping heartbeat packages causing SC failovers?

Since a storage controller broke down in october and was replaced by HP with a new controller we have not been able to get the SV3200 back to operationable.

HPE support has been over the machine and logs multiple times and has however not come up with a cause or a solution. We followed the advice of a HP engineer after an on-site visit and Checked Flow Control and Jumbo frame settings, patched our DL380-G7 firmware and drivers and W2012 R2 update level and rchanged the port cache on the 3500yl switches to the recommeded 512kb as the HP engineer was suggesting heartbeat packages being dropped causing the SC's to fail over.

We are still seeing the same problems namely every 2 minutes iscsi link errors, storage controllers failing over, coming back up, resyncing, etc. During a live move of VM Storage or backup we will see this cascade into a situation that the cluster disks fails.

If I create a RAID0 volume and do a live move again I see the same iScsci link errors and SC's failing over only now isolated to one Storage Enclosure as the other one is not involved.

We are using DL380-G7 W2012 R2 FOC, switches HP 3500yl, SV3200.

Has any else experienced this situation?

4 REPLIES 4
Fred Blum
Valued Contributor

Re: SV3200 Flowcontrol dropping heartbeat packages causing SC failovers?

We have been trying to find a cause for this this situation for another day.

One active path one volume connected max 122 MB/s, 2 volumes connected different IP adresses with each one active path 200 MB/s. this increases with a third and maxes out to the max througput with a fourth. No SV3200 events during this period.

However when MPIO with multiple active paths is used this occasionaly drops to zero in IOMeter, giving the errors on the SV3200 of iscsci links being down, storage controllers failing over and excsessive IO package errors on the FOM.  

MPIO uses the default Windows DSM Round Robin with subset. The servers iScsi NICs have jumbo frames enabled 9014 with Flowcontrol TX and RX on. The SV3200 is using ALB on the bonds and flowcontrol custom set to RX and TX on and the max size for Jumbo frames is 9000.

The SV3200 assigns an IP adress to a Storage Volume. All traffic from the Windows Hyper-V cluster to this volume is directed to this IP adress.  On the switch we are seeing an excessive amount of TX package drops to the corresponding switch port. There are no Giant counters, so no jumbo frames fragmentation or drops because of jumbo frames.  

The TX excessive drops a thousand per second suggest that the Windows FOC MPIO is sending faster as the SV3200 NIC can receive and like there is no load balancing over the NICS, Storage Controllers or Storage Enclosures like with our old Lefthand P4300. No package drops also on any of the Lefhand P4300 ports. 

The 3500yl uses distributed trunking. We see the corresponding MAC adress of the volume's IP adres on the port that has the TX drops.  It switches to the other switch port and bond NIC when the connection is physically broken.  As the bond has two IP adresses and two MAC adresses we see the MAC adress visible on the other switch's port as if it are different IP adresses and not both NICs being actively engaged in send/ receive as in ALB.   

So where does this ALB show? It is obvious that one 1 GB NIC cannot keepup with multiple active paths in the server and that is excatly what is the problem showing in IOMeter.

 

 

Venkat-V
HPE Pro

Re: SV3200 Flowcontrol dropping heartbeat packages causing SC failovers?

Hi Fred Blum,

please find some initial action plan to do at the ESX side which helps to improve the  Situation. 

1. Adjust Maximum Queue Depth for Software iSCSI adapter to value 16

https://pubs.vmware.com/vsphere-55/index.jsp?topic=%2Fcom.vmware.vsphere.troubleshooting.doc%2FGUID-0D774A67-F6AC-4D8A-9E5A-74140F036AD2.html


2. Adjust the Maximum Outstanding Disk Requests value to 16

https://pubs.vmware.com/vsphere-55/index.jsp#com.vmware.vsphere.troubleshooting.doc/GUID-88A16E71-161E-493E-97D3-2B154819E6BF.html

3. Disable ATS Heartbeating

http://h20566.www2.hpe.com/hpsc/doc/public/display?docId=emr_na-a00009136en_us&hprpt_id=HPGL_ALERTS_1967118&jumpid=em_alerts_us-us_May17_xbu_all_all_1017565_1967118_StorageOptions_critical__/

4. ESXi hosts might experience read or write performance issues with certain storage arrays (1002598)

Disable Delayed ACK

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002598


5. check the IOPs limit:

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2069356


6. I think VMware end most of the time IO size set in MBs which can be tuned to KBs to get good performance. Please find the article below,

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003469 

 

7) Check if its working without the Jumbo frame, or you may reduce the size from 9000 to 8900 and check if that works.

8) The DSM is not supported with the SV3200, plese refer the page #5 in the below release notes.

     https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00035652en_us&docLocale=en_US

Please let me know if there is any thing to assist further.

 

Regards,

I am an HPE Employee

Accept or Kudo

Fred Blum
Valued Contributor

Re: SV3200 Flowcontrol dropping heartbeat packages causing SC failovers?

 

Hi Venkat,

This is not on VMWare but on a Microsoft W2012 R2 Fail Over Cluster.

Regards,

Fred

 

 

Venkat-V
HPE Pro

Re: SV3200 Flowcontrol dropping heartbeat packages causing SC failovers?

Hi Fred,

 

The DSM is not supported with the SV3200, plese refer the page #5 in the below release notes.

     https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00035652en_us&docLocale=en_US 

Please check if the patch 136-009 is installed, if stil the issue persist, I think it should go to the HPE labs to check further.

 

Regards,

Venkat

I am an HPE Employee

Accept or Kudo