Comware Based
1752785 Members
5813 Online
108789 Solutions
New Discussion

Re: Two S9500E IRF core “leak” packets

 
Trinh_Nguyen
Advisor

Two S9500E IRF core “leak” packets

Hello,

 

My cluster of S9500E SW version S9500E-CMW520-R1726 connects to two VM farm DELL blades via two single 10 Gb fiber port.  Using Wireshark connects to a switchport and filter out all broadcast, multicast, ARP, DNS,…, I observe:

 

When the blade switches in VM farm sending about 100-500 Mbps across HP S9500s, about 5-10 percent of these TCP packets can be visible with Wireshark.  At the peak, many other packets from other hosts in different protocols such as GRE, ICMP, and SNMPv3… also became visible at all HP S9500 switchports, the S9500 cluster acts like a hub!  The amounts of these TCP packets caused latency on the switching network.   

 

What is the problem?  Any advices are much appreciated!

 

Best Regards,

Trinh Nguyen

9 REPLIES 9
Fredrik Lönnman
Honored Contributor

Re: Two S9500E IRF core “leak” packets

CAM / MAC-table depletion?

---
CCIE Service Provider
MASE Network Infrastructure [2011]
H3CSE
CCNP R&S

Peter_Debruyne
Honored Contributor

Re: Two S9500E IRF core “leak” packets

Hi,

 

Can you give some information about the setup (cleaned configuration) ?

Would be good to understand the connection and configuration between the VM hosts and the switch.

 

Just to verify the mac info, I believe in the IRF system, in system-view you can use the "irf switch-to x" (replace x with slave irf number) to get into the slave context. In this context you can use the display commands again, so you can try to verify the IP/MAC information on that system.

 

You can also run any debugging locally on the slave device using this view.

 

In general I have seen this behavior in these situations:

* teaming software misconfigurations on the server side

* Microsoft NLB running on VMs, which by design uses unknown MAC addresses by default (so does requires switch configuration if you want to avoid the packet flooding hub behavior)

 

Best regards,Peter

Trinh_Nguyen
Advisor

Re: Two S9500E IRF core “leak” packets

Peter,

 

Thank you for reading and providing your advices. 

My team, Network Infrastructure, we provided VM/Server team two 10 Gb ports connect to their two Dell blade switches.  I was told the blade switches are working in active/standby mode and for the purpose of troubleshooting, the standby Dell blade is now disable. 

From the two S9505E IRF, we provided:

  • Two 10 Gb tagged vlans ports with spanning tree disable to Dell blade as mention above.
  • Dozens of closet switch with LACP.  Again for the purpose of troubleshooting, one link is now disable, all LACP run on single links
  • Many 1 Gb ports to servers.  Servers configured with no NLB neither teaming

Best Regards,

Trinh Nguyen

 

Peter_Debruyne
Honored Contributor

Re: Two S9500E IRF core “leak” packets

Hi,

 

ok, can you provide some configuration information ?

* core switch config (cleaned up)

* vm host configuration (networking part, how the vswitches uplinks are configuration, type of redundancy)

* sample access switch config (cleaned up)

 

Next I would start tracing a concrete example of flooded traffic which you found in the wireshark.

Just pick some destination mac/ip you see which is being flooded in the wireshark, and start reviewing the switches mac/arp tables, and see if you can find them.

* dis mac-address xxxx

* dis arp

The fact that some traffic is flooded typically happens when the destination mac/ip is not learned by the switches, or when it is learned with a wrong/invalid mac.

Try to trace it down to the source switch of the destination address, and see if that switch can actually learn the mac address info.

Try to run "dis mac-add" and "dis arp" several times, to check if the mac/ip is flapping between ports

 

 

Trinh_Nguyen
Advisor

Re: Two S9500E IRF core “leak” packets

Hi Peter,

 

Using eliminating process, we determined the flooding observations from VM farm are the affect, not the cause as I posted.  We saw the most unicast flooding from the farm, because it carried the most traffic.

Furthermore, we found the unicast flooding triggered every time MSTP TC (topology change) generated. 

 

Some configuration at uplink, down link ports.  All switches spanning tree are MSTP with only instance 0

 

TEN GIGABIT PORT from HP to DELL BLADE

[SW1-H3C-Ten-GigabitEthernet1/3/0/1]dis this

#

interface Ten-GigabitEthernet1/3/0/1

 port link-mode bridge

 description DELL BLADE SW

 port link-type trunk

 port trunk permit vlan 1 to 3 101 105

 undo jumboframe enable

 stp disable

#

 

UPLINK PORT at DELL BLADE to HP:

#

interface ethernet 1/e48

spanning-tree disable

switchport trunk allowed vlan add 2-3,101,105

#

INT PORT-CHANNEL at CLOSET UPLINK to HP:

!

interface port-channel 0

        trusted

        trusted vlan 1-4094

        switchport mode trunk

        switchport trunk allowed vlan 1-6,8-4094

!

 

Best Regards,

Trinh Nguyen

 

 

Trinh_Nguyen
Advisor

Re: Two S9500E IRF core “leak” packets

 

We still having dozens of old Dell switches that do not support MSTP and causing massive TC messages and unicast flooding.  By disable all STP to downlink ports to these Dell switches, my network is now quiet. 

 

Each Dell switch is now self STP root.  If loop occurs at these Dell switches, STP is still able to block the looped-port.  

Am I in any risk when disable STP to these downlink ports at my HP switch?

 

Regards,

Trinh Nguyen

 

Peter_Debruyne
Honored Contributor

Re: Two S9500E IRF core “leak” packets

ok, more question:

Can you also review the mac-tables of the dell blade ? (or 2 dell blades ?)

Are the 2 9500 devices configured with IRF or as standalone devices ?

Do you have xSTP, link aggregation or some kind of smart link between the blade switches and the  core ?

Are the blade switches running as independent switches or is it a real stack (really 1 switch) ?

 

I had similar issue in the past, the problem then was 2 blade switches (interconnect between 2 switches was offline or blocking with stp). The servers were doing transmit loadbalancing, which means that all traffic out (from server) via nic1 was sent with source mac a1, all traffic out via nic2 with source mac a2 (the idea is that the switches do not need link-aggregation/port channel configuration for the 2 server nics, while at the same time mac-flap is avoided, since the source mac on the 2 nics is different, and the server can still do outbound loadbalancing).

 

For the inbound traffic (to server), the server would only reply with 1 mac to arp requests, so all traffic to the server would be sent over 1 link only, so let's assume mac a1 is used for this example.

 

The 2 blade switches would see the 2 mac address as individual hosts, but 1 mac would be "locally" connected (mac a1 on blade switch1), while the other mac a2 would be learned via the core uplink (which had learned it via the port to the other blade switch2).

 

Since mac a2 is only used as source mac and never as destination mac (it is never mentioned in the arp replies), mac a2 is not part of the problem.

 

Now consider that server2 wants to send data to this server1, it will send an arp request, will get arp reply from server1 with mac a1, so will use mac a1 as the destination address.

Based on the server2 nic teaming, it will select either nic1 or nic2 to transmit the data. So now assume that it is taking its nic2 to send the data. This will pass via blade switch2, over the core to blade switch1 to the server.


No problem so far.

Now the OS arp cache is typically like 10-15minutes, so the server2 will keep on sending data in the next 10-15 minutes without additional arp request. But if server1 is not sending any arp broadcasts anymore, blade switch2 would not see any traffic from that mac and after 5 minutes, the mac a1 would be removed from the mac tables on blade switch2 (blade switch1 is the switch which is getting the data from server1 nic1 with mac a1, so that switch will still know and re-learn the mac).

When that happens, all traffic from server2 to server1 via blade2 would be sent to an unknown destination mac from bladeswitch2 point of view, so it started flooding the traffic to all the switch ports, flooding it to all the server ports. (at least all the server ports nic2 that is).

 

This took quite some time to troubleshoot, but the solution was very simple in the end : increase the mac cache of the switches from the default 5 minutes to e.g.60 minutes or even several hours.

 

I do not know if this logic can help you in the troubleshooting, but I really recommend review all the switches and blade switches mac-tables, specially for the flooded mac-addresses you found.

 

Best regards,Peter.

 

Trinh_Nguyen
Advisor

Re: Two S9500E IRF core “leak” packets

Kudos to Peter for a well written explanation and your time devote to my problem(s)

 

To answer your questions:

Can you also review the mac-tables of the dell blade? (or 2 dell blades ?)

Yes, they are almost identical to the mac-tables at the two S9505E core switch, about 4000 entries

 

Are the 2 9500 devices configured with IRF or as standalone devices?

The two S9505 are IRF

 

Do you have xSTP, link aggregation or some kind of smart link between the blade switches and the core?

Yes, MSTP at core and Dell blades, but no LACP.  They are single links.  We even shutdown one link but the problem remain

 

Are the blade switches running as independent switches or is it a real stack (really 1 switch) ?

They are independent, no stacking

 

I tried to increase mac-address aging time to 600s, but did not help.

 

After disable spanning tree at all downlink ports to old switches that not support MSTP, my unicast storm went quiet.  We are going to replace these old switches this year.

 

Best Regards,

Trinh Nguyen

Richard Brodie_1
Honored Contributor

Re: Two S9500E IRF core “leak” packets

A TC change flushes the MAC forwarding tables, so that at least is expected.