BladeSystem - General
1753798 Members
7322 Online
108805 Solutions
New Discussion

Portchannel issue Virtual Connect (VC) to Cisco 4500

 
chuckk281
Trusted Contributor

Portchannel issue Virtual Connect (VC) to Cisco 4500

Pere was looking for some Cisco Catalyst help:

 

******************

 

We have two local customers with a very similar issue regarding Virtual Connect connected to a Catalyst 4500 (one of the customers is version 12.2(53)SG1), both giving service to a ESXi 4.1U1 farm. One customer is running G6 blades and the other is already on G7.

 

The Virtual Connect configuration is a Active/Active setup with SUS-A/SUS-B configuration with 10/15 vNets that map to real customer VLANs, Smartlink and server profiles giving service to a redundant pair of LOMs, that configure a vSwitch with an Active/Active teaming strategy on the ESXi side.

 

The connection between VC and Cisco 4500 is formed with two portchannels of 2x1Gb (RJ-45) each, named Po230 on the Cisco side. During startup portchannels formed properly and all the failover test were deployed successfully.

 

Virtual Connect version is 3.15 in both situations. Almost once per month, there are issues in one of the portchannels, with situations similar to the following:

 

-          One of the ports that create the portchannels goes to down, and when the 4500 tries to bundle it again to the portchannel the following is observed in the log file (being 7/38 and 7/39 the ports from the Cisco side, connected to X2 and X3):

 

840380: Nov  8 12:48:05.758: %EC-SP-5-CANNOT_BUNDLE_LACP: Gi7/39 is not compatible with aggregators in channel 230 and cannot attach to them (flow control send of Gi7/39 is off, Gi7/38 is on)

 

-          Since portfast is not enabled the blocking-learning-fordwarding period can be observed on the Cisco logs. After this period finishes, the port goes to down again and subportchannels Po230A, Po230B are created.

 

So it looks like for some reason Cisco is not able to bundle both ports in the same Po230 interface. Since this is happening only in one of the ports at the same time, the SUS-A definition does not go completely down, Smartlink does not take action and the ESXi hosts lose connectivity from this side (black-holed), and hypothetically lose connectivity to the outside world.

 

Actions deployed:

 

-          Ask the customer to apply the portfast trunk configuration on the Po230 interfaces in order to minimize the “downtime” period while the ports are being reinserted on the portchannel.

-          Validated that the pause frames related to flow control as per advisory c02623029 are 0 or close to 0. This applies both for the downlink and uplink ports.

-          Checked DNS settings and other specifics on VC side. None. No remarkable logs on the OA/VC side. Everything looks fine. Configuration fine and working from a portmapping perspective.

 

Has anybody seen such a behavior with VC and Cisco 4500??

 

We have several customers connected using Catalyst gear (3750, 2960, 6500,…) and no issues have been reported so far. Also a variety of VC versions are running with any particular issue similar to this one.

 

******************

 

Input from Mark:

 

***********

 

Some reason is not a mystery, it is in the log entry you posted.

The flow control settings are what caused the LAG/PortChannel on the Cisco side to dissolve the PortChannel Group.

Virtual Connect by default is TX/RX Flow Control = OFF for Gigabit Ethernet uplink ports.

Cisco typically by default is TX=Desired and RX=OFF for Gigabit Ethernet ports.

[OFF, ON, DESIRED] are the valid choices likely on the Cisco side

 

Below you show that the Cisco negotiated Flow Control to TX=On on Port Gi7/38, that is the one that doesn’t match what it should be.

The Cisco should have negotiated the port to TX/RX=OFF, not TX=ON, RX=OFF.

 

So you have some work to do to determine if someone has changed the Flow Control settings on VC and what the Cisco ports are configured for.

But honestly, the question is if you have VC = Flow Control TX/RX = OFF, why isn’t the Cisco also configured the same?
To change this, you have to shut down the portchannel and re-enable it afterwards.

 
Switch(config)# interface g7/38
Switch(config-if)# flowcontrol send off
Switch(config-if)# flowcontrol receive off
 
Switch(config)# interface g7/39
Switch(config-if)# flowcontrol send off
Switch(config-if)# flowcontrol receive off

 

DNS, Pause Frames, etc are not relevant to the behavior you describe, and the portfast is only a semi-workaround when it does happen.

 

********************

 

From Dave:

 

*********************

 

I have seen this behavior when you pull the cable from the SFP on the VC and then you reconnect it again.

In such cases, the flow control on the Ciscos were negotiated as “output flow control” ON when it should be OFF making impossible the aggregation with other members of the LACP with an OFF status. Are you sure this is the cause of the problem or these are the logs recorded when you try to fix the problem by pulling the cable?

 

If my experience with C6500, if you  >>shut/no shut<<  the cisco port, it will renegotiate again the flow control setting to OFF as it should do by default.

 

As Mark indicates, the default flow control settings are:  input flow-control set to OFF and output flow-control set to “desired”

VC has by default flow-control set to “auto” (OFF for uplinks-external ports and ON for downlinks-blades)

Cisco should negotiate to like this: >>   input flow-control is off, output flow-control is off

 

But If cables are removed from VC ports and plugged again, flow control in Cisco re-negotiate as:  input flow-control is off, output flow-control is ON. I don’t know why.

àThe cisco port needs to be “shut”/”no shut” to fix it.

 

As said, with “flowcontrol send off” and “flowcontrol receive off” you will prevent this problem from occurring.

 

The question still remains about why all suddenly the uplink port starts to malfunction. I can’t answer that.

 

******************

 

Any other questions or suggestions on this issue?

2 REPLIES 2
Trippes
New Member

Re: Portchannel issue Virtual Connect (VC) to Cisco 4500

 

we experienced this too on a Port channel between a 6509 and HP Virtual connect switch. For some reason, 1 port dropped out of a 4 port bundle (part of a Port channel).

 

There is no discernible reason for it dropping out of the bundle. But it would not rejoin the bundle, and exhibited exactly the same behaviour as described above. We suspected a GBIC issue on the HP (after much cisco troubleshooting - new port / new port on new card / new cable) - reseating the HP GBIC resolved it. But as soon as the cable is disconnected, the problem reoccurs upon reconnection.

 

Flow control on the VC we have is set to auto. As is Send flow control on our end. to be honest - we stuck with defaults all the way - and nothing was different on any of the port channel member ports.

 

I tried modifying the flow control on the individual port itself (without hope really as it meant i was introducing a change to something i knew hadnt changed)  - to no avail.

 

I didnt try disabling the port channel at any point (as this would have been disruptive) - so that may be something i try. What i too cannot grasp is why this suddenly happened. there were no changes taking place - the ports had just happily sat there for 434 days and then one dropped out.

 

Clearly somethign isnt right - i just havent found it yet :-)

Only1SB
New Member

Re: Portchannel issue Virtual Connect (VC) to Cisco 4500

Hello chuckk281

 

the above didn't work for me either, but...

 

I encountered "%EC-SP-5-CANNOT_BUNDLE_LACP:" as well when trying to trunk to HP Enclosure Virtual Connect Enclosure...

 

try this, it worked for me,

 

!

int range te13/7 - 8
!
shut
no channel-group 16 mode active
channel-group 16 mode passive
no shut
!


6500#sh etherchan 16 sum

 

Group  Port-channel  Protocol    Ports
------+-------------+-----------+-----------------------------------------------
16     Po16(SU)        LACP      Te13/7(P)      Te13/8(P)

 

---No  "%EC-SP-5-CANNOT_BUNDLE_LACP:" log error messages!

 At this point the server guys confirmed both ports are now active/active, so I changed it back to mode active -----

 

!

int range te13/7 - 8
!
shut
no channel-group 16 mode passive
channel-group 16 mode active
no shut
!

 

6500#sh etherchan 16 sum

 

Group  Port-channel  Protocol    Ports
------+-------------+-----------+-----------------------------------------------
16     Po16(SU)        LACP      Te13/7(P)      Te13/8(P)

 

Still no log error messages and all looks good, still yet to get the all clear from Infrastructure server guys, but if it fails I'll simply change it back to ' channel-group 16 mode passive ' and do further testing if necessary.

 

 

Good Luck!

SB