BladeSystem - General
1751945 Members
4858 Online
108783 Solutions
New Discussion

Re: Pause flood protection issue in VC 4.10 Environment

 
chuckk281
Trusted Contributor

Pause flood protection issue in VC 4.10 Environment

Chris was looking to help a customer with "Pause Frame" flooding issues:

 

**************

 

Experts,

IHAC that has experienced an issue in their VC 4.10 environment with “pause flood protection”. He has emailed me with the following statement;

 

 

We had some issues today with a virtual connect module shutting down the uplinks to a blade because of "pause flood protection".  We saw this once before in early January on a separate blade and while we are able to get it back online it's not a great situation that our VMware hosts are suddenly losing network connectivity.  I've found a few articles that I linked below that talk about how to resolve it but I haven't figured out why it is happening. I am inclined to disable pause flood protection to prevent this from happening again what are your thoughts on that? Also we are on SPP 2013.09 and Virtual Connect 4.10.

  

Here are the articles that they reference above.

  

http://h20565.www2.hp.com/portal/site/hpsc/template.PAGE/public/kb/docDisplay/?javax.portlet.begCacheTok=com.vignette.cachetoken&javax.portlet.endCacheTok=com.vignette.cachetoken&javax.portlet.prp_ba847bafb2a2d782fcbb0710b053ce01=wsrp-navigationalState%3DdocId%253Demr_na-c02623029-6%257CdocLocale%253D%257CcalledBy%253D&javax.portlet.tpst=ba847bafb2a2d782fcbb0710b053ce01&sp4ts.oid=3794423&ac.admitt...

  

https://h20565.www2.hp.com/portal/site/hpsc/template.PAGE/public/psi/mostViewedDisplay/?javax.portlet.begCacheTok=com.vignette.cachetoken&javax.portlet.endCacheTok=com.vignette.cachetoken&javax.portlet.prp_efb5c0793523e51970c8fa22b053ce01=wsrp-navigationalState%3DdocId%253Dmmr_kc-0105640-3%257CdocLocale%253Den_US&javax.portlet.tpst=efb5c0793523e51970c8fa22b053ce01&sp4ts.oid=4144084&ac.admitted=139...

  

Aside from the usual blanket statement to update the firmware and the NIC drivers on the system that may be a creating the issue, do we know of a root cause that creates this problem? The customer is very astute and will want to resolve the problem by correcting exactly what is causing the issue.

  

Thank you in advance for any assistance you may be able to provide.

 

***********

 

Input from Kant:

 

**************

 

The cause is incompatibility between firmware and driver (faulty NIC), which causes NIC to send continuous pause frames. The result is VC buffers are consumed in to response pause frame. If the traffic pattern is largely unicast, then it would take longer time before buffers are consumed,  If there are lots of multicast/broadcast traffic, then it may not take too long for VC to get into unpredictable state.

 

The proper fix should come from NIC side by not sending continuous pause frames in case of firmware and driver incompatibility, we have communicate this requirement to NIC team. The fix in VC is short term.

 

Cisco follows similar approach for excessive pause frames by disabling switch port –

http://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus5000/sw/troubleshooting/guide/N5K_Troubleshooting_Guide/n5K_ts_fcoe.html#wp1030594

Switch ports err-disabled due to pause rate-limit

 

Switch ports go into error-disable state due to pause rate limit

 

Possible Cause

 

If the switch interface receives excessive Xoff pause frames from the server, ports become error-disabled due to the high rate of pause frames received. Usually the port goes into an err-disable state due to pause frames, only if the drain rate is less than 5Mbps on a 10Gb port. This means that the server is very slow and is sending a large number of pause frames to the switch ports.

 

************

 

And from Hoa:

 

**********

 

If the Pause Frame issue continued with updated FW/driver we have been successful in eliminating the issue by replacing the offending NIC/LOM.

 

*************

 

Other comments or suggestions?

4 REPLIES 4
marcelkoedijk
Frequent Advisor

Re: Pause flood protection issue in VC 4.10 Environment

Do disable the protection, this can full you VCM memory, and can crash the VC domain.

In VCM check the downlink and uplinkport status. Go to..

 

>Enclosure > InterConnect Bays > UplinkPorts /DownlinkPorts > Port Statics

Look for "Dot3InPauseFrames" if the port is found where there are a very lot of frames.

 

In most cases it will be a bad interface and have to replaced.

 

You also can turn off flowcontrol temporyin VCM, in case of iscsi it can be downgrade your iscsi performance.

 

 

 

GrahamZulauf
Occasional Advisor

Re: Pause flood protection issue in VC 4.10 Environment

The two links included at the beginning of this post are no longer working. Would it be possible to update them?

Dennis Handly
Acclaimed Contributor

Re: Pause flood protection issue in VC 4.10 Environment

>The two links included at the beginning of this post are no longer working.

 

Sure they are, just not very well.  :-)

You have to click on the "HP Support Center - Hewlett Packard Enterprise" links:

http://h20565.www2.hpe.com/hpsc/doc/public/display?docId=emr_na-c02623029-6

http://h20565.www2.hpe.com/hpsc/doc/public/display?docId=mmr_kc-0105640-3

andyalder
Occasional Contributor

Re: Pause flood protection issue in VC 4.10 Environment

You mention NIC/firmware mismatch but how are we to tell what to use when the download for the 536FLB lists both Broadcom and Qlogic drivers and firmware?