Unicast Flooding network-wide

Jeffrey Belles · ‎11-01-2010

Hello,

We are seeing some strange behaviour in our fully HP-switched Layer-2 network.

Basically our setup is as follows: We have a ring of 5 core-switches (8212zl), connecting into our datacenters to a wide variety of rackswitches (anything from 1800 to 28xx series)
For a while now, we are seeing unicast packets to be flooded out of all the core-ports.

I recently connected a linux-box straight into on of our core-routers and ran a tcpdump, and i am seeing all traffic for any specific vlan to be sniffed by tcpdump. Source and destination mac-addresses are known in the core, so these should not be passed to this linux box.

On a side-note, we are also seeing a spanning-tree topology change every 42 seconds on the switches. It will be hard to track down where this change occurs (?)

Anyone has any clue where to begin?

Richard Brodie_1 · ‎11-01-2010

Since topology changes will cause unicast flooding, I would try to track those down first.

I would start by looking at the logs on the core switches, see if either the root or the blocked link move around. You have the root on one of the 8212zl switches?

If that's OK, you could try temporarily disabling spanning tree on individual (simply connected) edge parts of your network with bpdu-filter. That forces the port to forwarding and drops BPDUs. You might be able to isolate the fault that way.

Peter Tobin · ‎11-02-2010

Hi Jeffery,

I hope you have resolved.

Another thought, you may want to enable the Instrumentation Monitor feature and view results. Not that this is a DOS attack, however I have read that DOS attacks can cause a CPU to take to long to respond to new events, which can lead to a breakdown of Spanning Tree or other features. A delay of several seconds typically indicates a problem. Information on this can be found in the HP ProCurve Switch Software Access Security Guide.

I hope this helps and look forward to reading resolution to this.

Jeffrey Belles · ‎11-02-2010

Thanks,

Pretty sure it is not a DOS attack, since our Arbor doesnt see any of this.
It will be very hard to track down the constant toplology changes I think.
The network consists of over 300 rackswitches, all connecting to the 5 core-switches.

Shutting down portions of the spanning-tree topology will not work i'm afraid.

-J

Jeffrey Belles · ‎11-02-2010

Further on this.
All rackswitches are dual-connected to at least 2 core-switches, so starting to filter bpdu's will create loops...

Richard Brodie_1 · ‎11-02-2010

That does sound like it's going to be tricky. My choice would be to script SNMP queries across several boxes see what you can see is flapping.

Finding where:
dot1dStpRootCost or dot1dStpRootPort are changing might be a place to start.

Good hunting!

Olaf Borowski · ‎11-02-2010

Jeffrey, start by looking at the log file of the core switches. They should give you some indication on what's going on (show logg -r).

Olaf

rick jones · ‎11-02-2010

I'm not HP Networking, but out of semi-idle curiousity, just how many distinct MAC addresses are there in this layer-2 network? The 8212's seem to sport a rather large forwarding table of 64,000 entries, but thought I might check - particularly if there are lots of virtual machines, or in the unlikely event there is a node or three deliberately trying to overwhelm the forwarding tables.

there is no rest for the wicked yet the virtuous have no pillows

Jeffrey Belles · ‎11-02-2010

Gents,
Many thanks for your replies.
Managed to track down the evil by issueing a:
"show spann debug instance <1>" on the DR switch and looking for increasing counters for received TCN packets. Followed all the way down untill we hit it.

Turned out to be a buggy Dell-blade switch who was in a constant reboot-loop, causing a continuous topology change.
(for what it's worth: Dell's take exactly 42 seconds to reboot:-))

Shut down the links and all is stable again.

PS: there are roughyly 3000 Mac-adresses in the table. Not cool to flush them every 40 seconds.

Thanks for your responses, it's all quiet on the western front now.

-J

rick jones · ‎11-02-2010

Nice debugging work. I guess the Dell switch was the exception to your "fully HP-switched Layer-2 network" assertion in the initial post?

If you like I'd be more than happy to try to get you in touch with an HP sales type to help you replace that nasty, buggy Dell equipment :)

there is no rest for the wicked yet the virtuous have no pillows

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Unicast Flooding network-wide

Unicast Flooding network-wide

Re: Unicast Flooding network-wide

Re: Unicast Flooding network-wide

Re: Unicast Flooding network-wide

Re: Unicast Flooding network-wide

Re: Unicast Flooding network-wide

Re: Unicast Flooding network-wide

Re: Unicast Flooding network-wide

Re: Unicast Flooding network-wide

Re: Unicast Flooding network-wide