ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

Faulty Port on DL 380 G7 NIC causing the core switch CPU to go up 100%

 
systemssuck
Occasional Visitor

Faulty Port on DL 380 G7 NIC causing the core switch CPU to go up 100%

We faced an unusual problem. Our Proliant DL 380 G7 server has vmware esxi 4.1. One of the ports in NIC (model HP NC375T PCIe 4Pt Gigabit) freaked out and that was enough to bring the whole network down. There are four ports in that NIC.

 

The port was directly connected to the core switch and it would cause the core switch CPU to spike to 100% everytime it was brought online.

 

The NIC port was later disabled and a different port used. However, it would be very nice to know what might cause this overload on the core switch from a nic port of a virtual host.

 

 

3 REPLIES
Johan Guldmyr
Honored Contributor

Re: Faulty Port on DL 380 G7 NIC causing the core switch CPU to go up 100%

You may find some answers in a capture of what the port is doing. Perhaps cable that NIC to another server where tcpdump/wireshark is listening?

Matti_Kurkela
Honored Contributor

Re: Faulty Port on DL 380 G7 NIC causing the core switch CPU to go up 100%

It might have been useful to check the port statistics of the core switch when the faulty NIC was connected to it.

 

Tcpdump or wireshark may be enlightening if the fault is at the software/firmware level. But if there is a low-level electrical fault in the NIC hardware causing the NIC to spew an endless stream of electrical "noise", it might be filtered out by the receiving NIC hardware/driver, only manifesting itself in the statistics counters as e.g. a large number of malformed packets being received.

 

In that case, you might need a hardware-level Ethernet analysis to get more information on the nature of the problem (i.e. you might have to look at the physical signals on the wires and compare them to what the proper gigabit Ethernet signaling should be like). This is beyond the scope for most network administrators: usually seeing a large number of malformed packets and/or absurdly high switch CPU load and locating the cause to a particular NIC is enough to conclude that "it's broken."

 

If you are curious and happen to have a University of Technology or some other school of Electrical Engineering near you, you might donate the faulty NIC to them on the condition that they tell you what exactly was wrong with it. They might actually be happy to receive a real-world example of faulty network signaling, for the students to puzzle over in the lab exercises.

MK
systemssuck
Occasional Visitor

Re: Faulty Port on DL 380 G7 NIC causing the core switch CPU to go up 100%

The counters were more or less the same on the core and that made identifying the source of the problem even more difficult.

 

We ended up replacing the NIC and the faulty one is no longer with us.

 

HP support suggested that even though our issue was a different one, we apply the patch to our NC375T card as outlined in

 

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c02964542

 

https://my.vmware.com/web/vmware/details/dt_esxi40_qlogic_qlcnic_40727/ZHcqYnQqQGhiZEBlZA

 

The patch has been applied and hopefully we won't have to face the same thing again. Thank you all for you suggestions!