HPE Aruba Networking & ProVision-based
1834786 Members
3536 Online
110070 Solutions
New Discussion

HP Procurve 5412zl - seemingly random port dropouts causing chaos

 
smacdonald
New Member

HP Procurve 5412zl - seemingly random port dropouts causing chaos

We have a 5412zl - about 3 and a half years old - that seems to be causing chaos on our network. It is our core switch, connected directly to our main SANs and VMWare hosts, along with hundreds of other devices.

Over the past couple of months we have experienced many occasions where several ports simply lose their link for no apparent reason, usually in quick succession over a few seconds. At first it was maybe once a week, now it is happening several times per day.

Here's a quick copy-paste from some recent entries in the syslog - I've added notes for which devices are connected to the affected ports:

Wed Aug 30 15:42:34 2017 Warning Loss of link Lost connection to multiple devices on port I3 - SAN Node
Wed Aug 30 15:42:34 2017 Warning Loss of link Lost connection to multiple devices on port I4 - SAN Node
Wed Aug 30 15:42:34 2017 Warning Loss of link Lost connection to multiple devices on port I5 - backbone to remote switch stack 1
Wed Aug 30 15:42:34 2017 Warning Loss of link Lost connection to multiple devices on port I6 - backbone to remote switch stack 2
Wed Aug 30 15:42:34 2017 Warning Loss of link Lost connection to multiple devices on port I7 - backbone to remote switch stack 3
Wed Aug 30 15:42:34 2017 Warning Loss of link Lost connection to multiple devices on port I8 - backbone to remote switch stack 4
Wed Aug 30 15:42:36 2017 Warning Loss of link Lost connection to multiple devices on port J1 - ESXi Host 1
Wed Aug 30 15:42:36 2017 Warning Loss of link Lost connection to multiple devices on port J2 - ESXi Host 2
Wed Aug 30 15:42:36 2017 Warning Loss of link Lost connection to multiple devices on port J3 - SAN Node
Wed Aug 30 15:42:36 2017 Warning Loss of link Lost connection to multiple devices on port J4 - SAN Node
Wed Aug 30 15:42:36 2017 Warning Loss of link Lost connection to multiple devices on port J5 - backbone to remote switch stack 1
Wed Aug 30 15:42:36 2017 Warning Loss of link Lost connection to multiple devices on port J6 - backbone to remote switch stack 2
Wed Aug 30 15:42:36 2017 Warning Loss of link Lost connection to multiple devices on port J7 - backbone to remote switch stack 3
Wed Aug 30 15:42:36 2017 Warning Loss of link Lost connection to multiple devices on port J8 - backbone to remote switch stack 4
Wed Aug 30 15:42:37 2017 Warning Loss of link Lost connection to multiple devices on port K1 - backbone to remote switch stack 1
Wed Aug 30 15:42:37 2017 Warning Loss of link Lost connection to multiple devices on port K2 - backbone to remote switch stack 1
Wed Aug 30 15:43:01 2017 Warning Loss of link Lost connection to multiple devices on port L2i - 10G to something
Wed Aug 30 15:43:59 2017 Warning Loss of link Lost connection to multiple devices on port A4 - iLO Host 1
Wed Aug 30 15:43:59 2017 Warning Loss of link Lost connection to multiple devices on port A9 - Port 1 Host 1
Wed Aug 30 15:43:59 2017 Warning Loss of link Lost connection to multiple devices on port A12 - Legacy Physical Server
Wed Aug 30 15:43:59 2017 Warning Loss of link Lost connection to multiple devices on port A17 - Kitchen PC
Wed Aug 30 15:44:01 2017 Warning Loss of link Lost connection to multiple devices on port B1 - SAN 1
Wed Aug 30 15:44:01 2017 Warning Loss of link Lost connection to multiple devices on port B5 - VM Host2
Wed Aug 30 15:44:01 2017 Warning Loss of link Lost connection to multiple devices on port B9 - not in patch docs - probably PC
Wed Aug 30 15:44:01 2017 Warning Loss of link Lost connection to multiple devices on port B24 - not in patch docs - probably PC
Wed Aug 30 15:44:02 2017 Warning Loss of link Lost connection to multiple devices on port C2 - ICT Manager PC
Wed Aug 30 15:44:02 2017 Warning Loss of link Lost connection to multiple devices on port C5 - ICT Suite PC
Wed Aug 30 15:44:06 2017 Warning Loss of link Lost connection to multiple devices on port F22 - Room 6 PC
Wed Aug 30 15:44:09 2017 Warning Loss of link Lost connection to multiple devices on port H11 - Main Building AP3
Wed Aug 30 15:44:09 2017 Warning Loss of link Lost connection to multiple devices on port H17 - Hall Aerohive AP
Wed Aug 30 15:44:09 2017 Warning Loss of link Lost connection to multiple devices on port H18 - Main Building AP6
Wed Aug 30 15:44:09 2017 Warning Loss of link Lost connection to multiple devices on port H19 - Main Building AP2
Wed Aug 30 15:44:11 2017 Warning Loss of link Lost connection to multiple devices on port I1 - ESXi Host 1
Wed Aug 30 15:44:11 2017 Warning Loss of link Lost connection to multiple devices on port I2 - ESXi Host 2

When the ESXi hosts lose their connection to vCenter and to their SANs, failover manager goes crazy and starts trying to shuffle them around. The disconnections can last seconds or minutes, but stabilizing the VMWare environment after one of these "blips" takes a lot longer.

If it was just the VMWare environment (SANs and hosts) that were affected, I'd think the fault maybe lay there, but given that it also affects a seemingly random collection of devices on other ports, I'm becoming increasingly convinced that this switch is dying.

I can't seem to find direct evidence that the switch is faulty, but I cannot think of anything else that would be causing this issue. Despite the huge inconvenience and work involved in a swapout, I would really like to try a replacement switch - even if just to rule out the switch as the cause.

Does anyone have any suggestions for troubleshooting steps, possible tests to run or any experience with faults like these? I'm just trying to gather more information before I raise a call with HPE support.

I would like to add that I'm by no means a networking expert, but I have worked in this environment for several years.

Thank you very much for reading.

1 REPLY 1
parnassus
Honored Contributor

Re: HP Procurve 5412zl - seemingly random port dropouts causing chaos

Hello, first thing first...outputs of show flash , show system power-supply and (sanitized, without Modules' S/Ns) show modules CLI commands would be of help to understand what is the current running software version on your 5400 zl Switch, PSU status and Modules list it is currently equipped with.

The numerous logged warning messages (Loss of link) indicate that link of (a) uplink ports to other switches (case: each port learned multiple MAC Addresses) is lost and/or that link of (b) access ports to devices - such as workstations/servers - (case: each port learned only one MAC Address) is lost too.

That's pretty strange. Frequent and random topology changes are occurring? Is STP enabled?


I'm not an HPE Employee
Kudos and Accepted Solution banner