ProCurve / ProVision-Based
cancel
Showing results for 
Search instead for 
Did you mean: 

Network going down due to switches 'freezing'

TsAmE
Occasional Contributor

Network going down due to switches 'freezing'

Hello. I am managing a number of HP Procurve and Aruba switches. I have experienced an issue where part of the network goes down: this happens randomly sometimes weekly, other times it may happen more than once in one day. When this problem happens, all the switches are affected and their LED lights go solid (when the lights are blinking the network is fine). Interestly I am able to SSH into the switches when this problem occurs. Syslog has been enabled on all switches.

What I did was to make notes of the syslog events which were logged around the time that the network went down and the switch LEDs were solid. I made these notes four 4 different occassions when the network went down. After going through these logged syslog events, the below are the events which happen around the time that the network goes down:

Some edge ports change between "on-line", "off-line" and "Blocked by STP" more than once:

I 08/01/17 09:09:05 00076 ports: port 24 is now on-line

I 08/01/17 09:09:05 00435 ports: port 24 is Blocked by STP

I 08/01/17 09:08:58 00077 ports: port 24 is now off-line

There is a high collision or drop rate reported on some ports:

W 08/01/17 09:01:23 00331 FFI: port 9-High collision or drop rate.

Excessive CRC/alignment errors on some ports:

W 07/21/17 09:08:13 00329 FFI: port 18-Excessive CRC/alignment errors.

There are some messages containing Applying Power to PD on some ports:

I 07/31/17 08:43:01 ports: port 24 Applying Power to PD.

I 07/31/17 08:43:01 ports: port 24 PD Detected.

I 07/31/17 08:42:58 ports: port 24 PD Removed.

I have also updated the firmware of all switches to the latest version, but this has not solved the problem. Is there a way that this could be troubleshooted to find the cause of the network downtime?

4 REPLIES
Rob_Tr
Occasional Visitor

Re: Network going down due to switches 'freezing'

We had the same problem in our company. As a workaround, STP was switched off and the switches got stable. But a HP service technician told me to update the firmware to the latest version as soon as possible, too.

Vince-Whirlwind
Honored Contributor

Re: Network going down due to switches 'freezing'

Your times look similar but the dates are different, so not sure you should be looking at these 4 different types of log events at the same time.

Log#1 is normal when a device comes up or restarts, same as Log#4.

Logs#2 & #3 should be investigated. Find out what is in those ports and have a closer look at what they are up to.

TsAmE
Occasional Contributor

Re: Network going down due to switches 'freezing'


Rob_Tr wrote:

We had the same problem in our company. As a workaround, STP was switched off and the switches got stable. But a HP service technician told me to update the firmware to the latest version as soon as possible, too.


I disabled STP on all switches to test if the switches would become stable, but part of the network went down today. I have already updated all the switches to the latest version.

TsAmE
Occasional Contributor

Re: Network going down due to switches 'freezing'


Vince-Whirlwind wrote:

Your times look similar but the dates are different, so not sure you should be looking at these 4 different types of log events at the same time.

Log#1 is normal when a device comes up or restarts, same as Log#4.

Logs#2 & #3 should be investigated. Find out what is in those ports and have a closer look at what they are up to.


I traced the device shown in Log#2 (port 9-High collision or drop rate) to an IP phone and left it disconnected, but part of the network still went down regardless.

I have monitored the logs when the network has gone done and have found that Log#3 (port 18-Excessive CRC/alignment errors.) only sometimes appears in logs when there is downtime.

What I noticed recently was that when part of the network goes down, looking at switch A, some devices connected to switch A will lose connectivity, while other devices connected to switch A will have network connectivity. It seems like only some of the switch ports are going down (as oppossed to the whole switch).