Switches, Hubs, and Modems
cancel
Showing results for 
Search instead for 
Did you mean: 

complete network was down for 5 minutes!

Mayer_5
Occasional Visitor

complete network was down for 5 minutes!

Hi All!

we had a problem in our network at the company.
The network is mised with hp procurve 5300 / 2650 and 2626 switches. Rapid Spanning Tree is activated to block redundand links at the fiber optic ports.

The network is now working for about 2 years. Without any problems. There wasn´t any change in the network for a long time now.
This morning however there was a complete network crash. For about 5-10 minutes no one can login to the servers or other ressources.

On my troubleshooting I have seen that on every port and every switch in the network were at this time range "High Collision /Drop Rates" Messages in the logfile. At the fibre-optic ports there were "Excessive Broadcasts detected"

Can somebody tell me, what was going on in the network for this 5-10 minutes? And first of all, why were these problems at all ports at one time??? This does make no sense to me?

Especially the main switch in our central server station. The logfile of this switch was completely clean before this!

Any help would be greatly appreciated!

best regards

Alexander Mayer
7 REPLIES
Matt Hobbs
Honored Contributor

Re: complete network was down for 5 minutes!

It sounds like there was a spanning-tree instability which flooded the network with broadcast traffic.

If the firmware you're running is 2 years old then my very strong recommendation would be to update the firmware. There have been many fixes in regards to spanning-tree which should help prevent this from occurring again.

Check 'show span' to see how recent the last topology change was. My guess is that during the outage the topology change would have been incrementing quite quickly.
Mayer_5
Occasional Visitor

Re: complete network was down for 5 minutes!

Yes, I think so too.

It was not the only crash today. For about 1 hour there was the next downtime.

The time since last RSTP change are only minutes. So I think that the downtime comes from RSTP operation. The Topology Change Counters from the switches are between 500-700. The uptime is between 100 and 300 days.

Hmm, yes firmware upgrade could help maybe.
the actual firmware is on the 2650 H.08.67
and on the 5300 E.09.22.
I think I will upgrade these to the most recent versions this evening.

I have never seen such mighty broadcast storms. I can´t ping (icmp) any device in the network for about 10 minutes. All active ports on the switches are under heavy load in this time.

The STP blocks are still at the same point in the network. I think a loop should not be possible.

Thank you very much for your message!

best regards

Alexander Mayer
Ladrouz
Advisor

Re: complete network was down for 5 minutes!

Hi,
Sound like a broadcast storm or a loop in the architecture.
look at :
sho tech instrumentation
sho logging
What is your spanning tree configuration, do you use trunking ?
bye
Mayer_5
Occasional Visitor

Re: complete network was down for 5 minutes!

Hi,

no I don´t use trunking.

For example I paste and copy some screenshots from the central switch. It is 5300xl
The time is not correctly. SNTP sucks with my linux chronyd....:-(

The ports A1-A4 are the important ones. These are fibre-optic ports to the other switches.

W 01/25/07 05:29:11 FFI: port F1-High collision or drop rate. See help.
W 01/25/07 05:29:11 FFI: port F2-High collision or drop rate. See help.
W 01/25/07 05:29:11 FFI: port F3-High collision or drop rate. See help.
W 01/25/07 05:29:11 FFI: port F4-High collision or drop rate. See help.
W 01/25/07 05:29:11 FFI: port F7-High collision or drop rate. See help.
W 01/25/07 05:29:11 FFI: port F14-High collision or drop rate. See help.
W 01/25/07 05:29:11 FFI: port F15-High collision or drop rate. See help.
W 01/25/07 05:29:11 FFI: port F16-High collision or drop rate. See help.
W 01/25/07 05:29:11 FFI: port F17-High collision or drop rate. See help.
W 01/25/07 05:29:11 FFI: port F23-High collision or drop rate. See help.
W 01/25/07 05:29:22 FFI: port C1-High collision or drop rate. See help.
W 01/25/07 05:29:22 FFI: port C7-High collision or drop rate. See help.
W 01/25/07 05:29:22 FFI: port C8-High collision or drop rate. See help.
W 01/25/07 05:29:22 FFI: port C10-High collision or drop rate. See help.
W 01/25/07 05:29:22 FFI: port C12-High collision or drop rate. See help.
W 01/25/07 05:29:22 FFI: port C15-High collision or drop rate. See help.
W 01/25/07 05:29:22 FFI: port C16-High collision or drop rate. See help.
W 01/25/07 05:29:22 FFI: port C19-High collision or drop rate. See help.
W 01/25/07 05:29:22 FFI: port C20-High collision or drop rate. See help.
W 01/25/07 05:29:22 FFI: port C21-High collision or drop rate. See help.
W 01/25/07 05:29:22 FFI: port C23-High collision or drop rate. See help.
W 01/25/07 05:29:22 FFI: port C24-High collision or drop rate. See help.
W 01/25/07 05:30:06 FFI: port A2-Excessive Broadcasts. See help.
W 01/25/07 05:30:06 FFI: port A3-Excessive Broadcasts. See help.
W 01/25/07 05:30:12 FFI: port G1-High collision or drop rate. See help.
W 01/25/07 05:30:36 FFI: port B2-High collision or drop rate. See help.
W 01/25/07 05:30:36 FFI: port B3-High collision or drop rate. See help.
W 01/25/07 05:30:36 FFI: port B4-High collision or drop rate. See help.
W 01/25/07 05:30:40 FFI: port D2-High collision or drop rate. See help.
W 01/25/07 05:30:40 FFI: port D6-High collision or drop rate. See help.
W 01/25/07 05:30:40 FFI: port D8-High collision or drop rate. See help.
W 01/25/07 05:30:40 FFI: port D11-High collision or drop rate. See help.
W 01/25/07 05:30:40 FFI: port D12-High collision or drop rate. See help.
W 01/25/07 05:30:40 FFI: port D14-High collision or drop rate. See help.
W 01/25/07 05:30:40 FFI: port D20-High collision or drop rate. See help.
W 01/25/07 05:30:40 FFI: port D23-High collision or drop rate. See help.
W 01/25/07 05:31:01 FFI: port E2-High collision or drop rate. See help.
W 01/25/07 05:31:01 FFI: port E6-High collision or drop rate. See help.
W 01/25/07 05:31:01 FFI: port E7-High collision or drop rate. See help.
W 01/25/07 05:31:01 FFI: port E8-High collision or drop rate. See help.
W 01/25/07 05:31:01 FFI: port E9-High collision or drop rate. See help.
W 01/25/07 05:31:01 FFI: port E12-High collision or drop rate. See help.
W 01/25/07 05:31:01 FFI: port E13-High collision or drop rate. See help.
W 01/25/07 05:31:01 FFI: port E16-High collision or drop rate. See help.
W 01/25/07 05:31:01 FFI: port E19-High collision or drop rate. See help.
W 01/25/07 05:31:01 FFI: port E21-High collision or drop rate. See help.
W 01/25/07 05:31:01 FFI: port E23-High collision or drop rate. See help.
W 01/25/07 05:31:01 FFI: port E24-High collision or drop rate. See help.
W 01/25/07 05:31:09 FFI: port A1-High collision or drop rate. See help.
W 01/25/07 05:31:09 FFI: port A3-High collision or drop rate. See help.
W 01/25/07 05:31:09 FFI: port A4-High collision or drop rate. See help.
W 01/25/07 05:32:44 FFI: port A2-Excessive Broadcasts. See help.
W 01/25/07 05:32:44 FFI: port A3-Excessive Broadcasts. See help.
I 01/25/07 05:32:54 ports: port E6 is now off-line
I 01/25/07 05:32:56 ports: port E6 is Blocked by LACP
I 01/25/07 05:32:58 ports: port E6 is now off-line
I 01/25/07 05:33:00 ports: port E6 is Blocked by LACP
I 01/25/07 05:33:00 ports: port E6 is Blocked by STP
I 01/25/07 05:33:00 ports: port E6 is now on-line
I 01/25/07 05:33:01 ports: port E6 is now off-line
I 01/25/07 05:33:03 ports: port E6 is Blocked by LACP
I 01/25/07 05:33:03 ports: port E6 is Blocked by STP
I 01/25/07 05:33:03 ports: port E6 is now on-line
W 01/25/07 05:37:07 FFI: port A2-Excessive Broadcasts. See help.
W 01/25/07 05:37:07 FFI: port A3-Excessive Broadcasts. See help.
I 01/25/07 05:38:37 ports: port E6 is now off-line
I 01/25/07 05:38:39 ports: port E6 is Blocked by LACP
I 01/25/07 05:38:41 ports: port E6 is now off-line
I 01/25/07 05:38:42 ports: port E6 is Blocked by LACP
I 01/25/07 05:38:42 ports: port E6 is Blocked by STP
I 01/25/07 05:38:42 ports: port E6 is now on-line
I 01/25/07 05:38:44 ports: port E6 is now off-line
I 01/25/07 05:38:46 ports: port E6 is Blocked by LACP
I 01/25/07 05:38:46 ports: port E6 is Blocked by STP
I 01/25/07 05:38:46 ports: port E6 is now on-line
W 01/25/07 05:39:00 FFI: port B2-High collision or drop rate. See help.
W 01/25/07 05:39:00 FFI: port B3-High collision or drop rate. See help.
W 01/25/07 05:39:00 FFI: port B4-High collision or drop rate. See help.
W 01/25/07 05:39:05 FFI: port E6-Excessive CRC/alignment errors. See help.
W 01/25/07 05:39:05 FFI: port E6-Excessive Broadcasts. See help.
W 01/25/07 05:39:26 FFI: port E6-High collision or drop rate. See help.
W 01/25/07 05:39:34 FFI: port A1-High collision or drop rate. See help.
W 01/25/07 05:39:34 FFI: port A3-High collision or drop rate. See help.
W 01/25/07 05:39:34 FFI: port A4-High collision or drop rate. See help.

; J4819A Configuration Editor; Created on release #E.09.22

hostname "hpswitch1"
snmp-server location "Rechenzentrum"
time timezone 60
time daylight-time-rule Western-Europe
cdp run
web-management ssl
no telnet-server
module 1 type J4878A
module 2 type J4821A
module 3 type J4820A
module 4 type J4820A
module 5 type J4820A
module 6 type J4820A
module 7 type J4821B
interface A1
speed-duplex 1000-full
exit
interface A2
speed-duplex 1000-full
exit
interface A3
speed-duplex 1000-full
exit
interface A4
speed-duplex 1000-full
exit
ip default-gateway 172.24.1.20
sntp server 172.24.1.10
timesync sntp
sntp unicast
sntp 300
logging 172.24.1.10
snmp-server community "public" Unrestricted
vlan 1
name "DEFAULT_VLAN"
untagged A1-A4,B1-B4,C1-C24,D1-D24,E1-E24,F1-F24,G1-G4
ip address 172.24.1.31 255.255.0.0
exit
fault-finder bad-driver sensitivity high
fault-finder bad-transceiver sensitivity high
fault-finder bad-cable sensitivity high
fault-finder too-long-cable sensitivity high
fault-finder over-bandwidth sensitivity high
fault-finder broadcast-storm sensitivity high
fault-finder loss-of-link sensitivity high
ip authorized-managers 172.24.1.10
access-controller vlan-base 2000
spanning-tree
no spanning-tree A1 edge-port
no spanning-tree A2 edge-port
no spanning-tree A3 edge-port
no spanning-tree A4 edge-port
ip ssh
ip ssh key-size 1024
password manager
password operator

Rapid Spanning Tree (RSTP) Information

STP Enabled : Yes
Force Version : RSTP-operation

Switch Priority : 32768 Hello Time : 2
Max Age : 20 Forward Delay : 15

Topology Change Count : 811
Time Since Last Change : 56 mins

Root MAC Address : 000d9d-b8ce80
Root Path Cost : 40000
Root Port : A3
Root Priority : 32768

show instrumentation
Current Prev Prev Prev Since
Value 5 Minutes Hour Day Boot
Free CPU msgs# 919 919 918 919 918
Low * * 918 919 915
High * * 919 919 919
Free CPU pkts# 1025 1003 1026 1026 1025
Low * * 1026 1026 1003
High * * 1026 1026 1026
Mesh bcast chg cnt# 0 0 0 0 0
Low * * 0 0 0
High * * 0 0 0
Mesh address count# 0 0 0 0 0
Low * * 0 0 0
High * * 0 0 0
Mesh num switches * * * * *
Low * * * * *
High * * * * *
MAC addr count# 162 160 164 113 94
Low * * 131 68 14
High * * 188 203 203
IGMP Joined Mcasts 0 0 0 0 0
Low 0 0 0 0 0
High 0 0 0 0 0
IGMP Filtered Mcasts 0 0 0 0 0
Low 0 0 0 0 0
High 0 0 0 0 0
Number of VLANs 1 1 1 1 1
Low 1 1 1 1 1
High 1 1 1 1 1
lldp num neighbors 4 4 4 4 3
Low 4 4 4 4 1
High 4 4 4 4 5
System delay# 0 0 0 0 0
Low * * 0 0 0
High * * 0 0 0
Mesh addr learns/min 0 0 0 0 0
Low * * 0 0 0
High * * 0 0 0
Mesh addr moves/min 0 0 0 0 0
Low * * 0 0 0
High * * 0 0 0
Mesh addr deletes/min 0 0 0 0 0
Low * * 0 0 0
High * * 0 0 0
MAC moves/min 0 0 0 0 0
Low * * 0 0 0
High * * 1 0 2745
Learn discards/min 0 0 0 0 0
Low * * 0 0 0
High * * 0 0 2057
Locked MAC moves/min 0 0 0 0 0
Low * * 0 0 0
High * * 0 0 0
Lockout MAC rjcts/min 0 0 0 0 0
Low * * 0 0 0
High * * 0 0 0
lldp neighbors lost 0 0 0 0 0
Low * * 0 0 0
High * * 0 0 0
DbgLgMSGDrops-CPU 0 0 0 0 0
DbgLgMSGDrops-MSG 0 0 0 0 0
Mesh port toggle cnt 0 0 0 0 0
Mesh oversub count 0 0 0 0 0
Mesh flush tbl cnt 0 0 0 0 0
VLAN downs 0 0 0 0 0
sFlow CP MSG Drops 0 0 0 0 0
sFlow sendto Fails 0 0 0 0 0
Min free CPU msgs 811 811 811 811 811
Min free CPU pkts 934 934 934 990 991
Rx unkn IP mcasts Off Off Off Off Off
IGMP Enable Off Off Off Off Off
Routing enable Off Off Off Off Off
Data is updated at 5-minute intervals.
* - Data unavailable.
# - Polled value. Low and high are sampled at 5-minute intervals.

bye

alex



Ladrouz
Advisor

Re: complete network was down for 5 minutes!

Hi,
You can limit broadcast by running broadcast-limit command.
Is there mirror port configured ?
What's about the E6 port?
Are fiber gbic module correctly plugged ?

Try to disconnect one by one A port.
bye
Mayer_5
Occasional Visitor

Re: complete network was down for 5 minutes!

Hi,

Port E6 is being checked by my colleagues the next time.

At the moment there is no mirror-port configured on the switch.

Can you tell me more about the broadcast-limit command? I think I can not give some parameters to the command. I can write it into the CLI. But I do not see it in the "show run" output.

Is it enabled by default on the 5300xl?

I can not find the command in the documentation to the switch. I only found it for 5400yl with percentage parameters.

Alex
Mayer_5
Occasional Visitor

Re: complete network was down for 5 minutes!

Oh, sorry. I was wrong.
I can see the command in the "show run" output.
It was to much on the top of the output :-))