HPE StoreVirtual Storage / LeftHand
cancel
Showing results for 
Search instead for 
Did you mean: 

P4500G2, Cisco Nexus 5000 and flow-control

RemyZ
Advisor

P4500G2, Cisco Nexus 5000 and flow-control

Hi all,

 

Anyone of you using P4500(G2) nodes together with Cisco Nexus 5000 switches? I am interested in your flow-control settings on both the P4500 nodes and the Cisco switches.

 

We always configured the nodes with flow-control enabled. The Nexus switches had flow-control enabled as well. A while ago we had serious performance problems with our Windows fileserver cluster (which turned out to be a SMB bug in the end). We have had many suppliers on the floor to help us, HP, CIsco, etc.

 

During the period of debugging, Cisco advised our networking guys to disable flow-control on the switches, since that would effect only layer 2 traffic. Since iSCSI is layer 3 traffic, which is passed transparently by the switches, it would implement it's own flow-control on layer 3.

 

First of all, we do not see any problems with this setup. However, I would like to know if this is an 'supported' setup. The P4500 manuals are clear: when jumbo frames are enabled or DSM/MPIO is used (we do), flow-control must be enabled end-to-end. From HP support we learned that enabling flow-control on the P4500, it affects both layer 2 and 3.

 

So what would be the effect of turning layer 2 flow-control off on the switches? Is there anything to watch for in the logs of both the nodes and the switches? How do you configure your nodes and switches?

 

Thanks

Remy

 

--------------------------------------
Remy Zandwijk
VU University Amsterdam
7 REPLIES
Emilo
Trusted Contributor

Re: P4500G2, Cisco Nexus 5000 and flow-control

Hello,

 

I don think I would concern myself to much with it being on or off. In this version of SANiq you cannot enable flow-control on the SAN with out being enabled on the switch. Currenty flow control is only on the "recieve" side, not on the transmit. The best practices not only reccomend it for the SAN's but also for all devices on that VLAN. You can see if you will benifit from flow control being off or on by looking at the switch port stats. If this is a Cisco device using the command line interface you can just do a show port status XX/XX where that equals blade and port. If you see lots of dropped packets along with retranmissions , dont be surprised if you see some on the transmit side (especially if you are using ALB on the NICS on the SAN) if you are not seeing any dropped packets on the recieve side then you wont benifit much from flow control. 

 

To anwser your question I dont believe you will be able to get the NIC's on the SAN to have flow control on without it being enabled on the switch.

 

I am curious to see what you come up with , if you read would you mind letting me know?

Thanks

RemyZ
Advisor

Re: P4500G2, Cisco Nexus 5000 and flow-control

Hi Emilio, thanks for your answer.

 

The thing is, flow-control on the switches was turned off after we configured the nodes. I suspect the nodes won't complain flow-control being configured, but anywhere along the route being switched off.

 

Therefore, we shuffled around some nodes to get 1 for a test. I started by deleting the bond and set flow-control to off. Then I set flow-control to on, which the CMC allowed me to do so. After that I created an ALB-bond again.

 

So I guess the flow-control setting on the Cisco switches do not interfere with the flow-control settings on the nodes.

 

I'm still waiting on the networking guys to send me the  show status output of the switches. Once received, I'll post them here.

 

Thanks.

 

--------------------------------------
Remy Zandwijk
VU University Amsterdam
Emilo
Trusted Contributor

Re: P4500G2, Cisco Nexus 5000 and flow-control

I would be very surpised if when you created the bond if flow controlled stayed enable if it was turned off at the switch.

Flow control must be enabled prior to creating a bond, it is recommended that both interfaces have an IP assigned to properly configure flow control.

 

For flow control to function properly, you must enable it on both the switches and the NICs/iSCSI initiators. If it is not enabled everywhere, the network defaults to the lowest common denominator, which is flow control disabled.

RemyZ
Advisor

Re: P4500G2, Cisco Nexus 5000 and flow-control

OK, recap:

 

  • Flow-control on Cisco Nexus is disabled/off.
  • Flow-control on Cisco Nexus works on layer 2 trafic only.
  • Flow-control on Cisco Nexus does not affect layer 3 traffic; it's transperant (according to the network guys).
  • Flow-control on nodes can be enabled without problems.
  • Flow-control on nodes stay enabled after creating a bond (and fwiw: it survives a node reboot).

 

So I still think we're OK. Wish someone from HP/Lefthand could confirm this. HP support only point out to the manual. But hey, we aren't the only one using Cisco Nexus and Lefthand, right?

 

 

--------------------------------------
Remy Zandwijk
VU University Amsterdam
Emilo
Trusted Contributor

Re: P4500G2, Cisco Nexus 5000 and flow-control

Yes that is perfect.

Flow control should be enabled on a per port basis.

 

RemyZ
Advisor

Re: P4500G2, Cisco Nexus 5000 and flow-control

I got some results back from the networking guys. Switch counters are reset 14 weeks ago.

 

Up untill now, all counters remained 0  (Align-Err,FCS-Err,Xmit-Err,Rcv-Err,UnderSize,OutDiscards,Single-Col,Multi-Col,Late-Col,Exces-Col,Carri-Sen,Runts,Giants,SQETest-Err,Deferred-Tx,IntMacTx-Er,IntMacRx-Er,Symbol-Err).

 

That's a good thing.

 

--------------------------------------
Remy Zandwijk
VU University Amsterdam
vmdewd
Occasional Visitor

Re: P4500G2, Cisco Nexus 5000 and flow-control

I know this post is a little old but I figured what the heck, this info might help someone.

 

I read this post a while back and figured I was ok leaving flow control enabled on the nodes and not on the Nexus 5000 switch until I started having latency issues during a restriping operation. While looking through some of the logs I noticed tons of errors, overruns, dropped packets, and frame errors on the bonded NICs (We are using the Emulex Dual 10GbE cards). However, the Nexus switch was showing absolutely no errors on the individual ports or the port channel (using Link Aggregation 802.3ad). Here's a sample of the "hist.ifconfig.log" from one of the storage nodes. This log is found under the Log Files tab of the Diagnostics section under the nodes.

 

 

bond0   Link encap:Ethernet HWaddr xx:xx:xx:xx:xx:xx 
inet addr: xx.xx.xx.xx Bcast: xx.xx.xx.xx Mask:255.255.252.0
inet6 addr: xx::xx::xx::xx::xx/0 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:9000 Metric:1
RX packets:74840480223 errors:283939 dropped:7158 overruns:10142 frame:145029
TX packets:75982815662 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:55620360247819 (50.5 TiB) TX bytes:74460339653861 (67.7 TiB)

 

 

Granted, these log files are cumulative and the numbers are from an extended period of time, but the number of errors was enough to warrant some research.

 

Now I don't know if something else changed around the same time but it seems that these errors stopped appearing after I disabled flow control on the storage nodes. Also, just an FYI, with the latest version of CMC and SANiQ (9.5.00.1222), and the latest firmware, you have the option of enabling/disabling flow control for both Tx and Rx individually.

 

Another tip I learned from an HP support rep is to adjust the local bandwidth priority. This option is set by editing the management group and moving the slider. If you increase the local bandwidth you're allowing local operations, such as resyncing and restriping, to use more bandwidth and those operations complete much faster than if you're using the default setting of 16 MB/s. Keep in mind this takes bandwidth away from applications such as ESX server so be sure to monitor performance after adjusting this. I was told that setting it to 35 MB/s was safe and my systems haven't suffered from this change.

 

I can't guarantee that everyone with this configuration will have the same problems and success but it's a place to start if you're having latecy issues.

 

Matt