BladeSystem - General
1753666 Members
5954 Online
108799 Solutions
New Discussion юеВ

Testing Flexfabric failover and Linux bonding

 
Enrico Ferro
Occasional Advisor

Testing Flexfabric failover and Linux bonding

We are performing some tests on a enclosure with VC Flexfabric. We removed a VC from the enclosure (!). We have linux hosts configured with bonding + multipath + boot from san running on BL460 g7.
The failover worked well, hosts continued working after a short delay. We observed something strange when the VC was inserted again in the enclosure. While the multipathd discovered the FC path available again, the bonding driver reported the NIC still down. Only restarting the networking the driver discovered that the port was up again (tested with cat /proc/net/bonding/bond0).

The test was performed with 3.15 VC firmware, CNA firmware 2.102.517. The behaviour was the same in both RHEL 5.5 and SLES 1.1 running with default drivers.

Thank your for your suggestions,
best regards,
2 REPLIES 2
pedro-chicago
Advisor

Re: Testing Flexfabric failover and Linux bonding

I don't think the issue you describe is exactly the same as the issues that we have encountered, but it sounds like it might be related. I was planning on posting something to the forums about our issues anyway.

The first issue that we encountered was with ESX (4.0 U2 or 4.1) running on BL460c G7 blades with the NC553i (Emulex). After installing ESX using the HP provided ISO (required as the be2net network driver for these NICs is not included in the standard ESX install media), powering a VM would cause a loss of network connectivity on the vSwitch/VMKernel interfaces which were mapped to LOM:1-a and LOM:2-a. We tried many different things (different chassis, different blades, updating firmware and drivers, reconfiguring VC profile, reconfiguring ESX networking, etc.) and eventually opened cases with both VMware and HP. Eventually (after a few months) we were able to do a debugging session with some Emulex engineers. They had us add the following initialization parameter to the be2net driver: "vlan_offload=0". After doing this, the issue went away. Apparently the driver was trying to offload VLAN related processes to the card itself (a new feature). There is obviously a problem with this and Emulex is now using information obtained from this debugging session to address the issue. For now we are just using the "vlan_offload=0" parameter until they come up with a permanent fix.

The other issue that we encountered was with CentOS (Red Hat) running on similar hardware. There were issues with setting up VLAN tagging on interfaces which carried multiple VLANs. Specifically, mapping an OS NIC (ethx) to a VC network which was mapped to an external uplink port with VLAN tunneling enabled. This issue is still not resolved, but I believe that it is related to the ESX issue described above. Emulex is aware of this issue as well. They are currently working on the ESX issue, and hopefully the necessary driver and firmware fixes will also be applied to the Linux drivers.

If anyone wants more details about any of this, please let me know. I tried to keep this as short as possible by leaving out all of gory details related to working with various support groups for the past three months.
Enrico Ferro
Occasional Advisor

Re: Testing Flexfabric failover and Linux bonding

Hi, the problem was fixed simply using the HP driver for the NIC (the one available in the HP web site) instead of the default one.