ProLiant Servers (ML,DL,SL)
1753781 Members
7518 Online
108799 Solutions
New Discussion

Re: ML350 Gen10 Network errors

 
SOLVED
Go to solution
JaapHoetmer
Occasional Visitor

ML350 Gen10 Network errors

Hi there.

Recently installed a Proliant ML350 Gen 10, running VMware: HPE Customized Image ESXi 6.7.0 Update 2 version 670.U2.10.4.1. Using a single NIC for the moment, and the iLO interface is also connected.

We are monitoring the machine remotely, using Check_MK, and noticed that there are packet errors on both the network interface used for VMware, as well as on the iLO interface. The errors are inbound only.

On the main NIC we see slightly less than 1% of errors, on the iLO NIC about 10%.

The switch shows no errors at all.

While debugging in VMware, I noticed that all errors are Receive length errors, indicating packet size issues.

Checked the interface, switch, monitoring system, all are set for MTU 1500, so I would not expect to see packets over 1500 bytes long. Yet, when I run tcpdump on VMware, there are large packets arriving.

This is the only Proliant that shows this behaviour. We have other Proliants under our watch, but not the same models, and none show any symptoms similar to this one.

So I am thinking that the NICs in this machine maybe use offloading to reassemble the fragmented packets before handing these over to the OS. However, I couldn't find anything related to this in VMware, so I am first seeking advice here to see if anyone has seen this too.

Does this make sense? Or does anyone else have any other explanation or experience?

Thanks, regards,

Jaap

4 REPLIES 4
DANDKS
HPE Pro

Re: ML350 Gen10 Network errors

Dedicated iLO5 Port & the 4x Embedded NIC ports are 1Gb ports. The iLO5 port is used only for management network & there are no chances of huge packet transfers on the management network. Management & Production network should not be on the same subnet. Hence, if these two are on the same subnet we recommend to move them on seperate network.

We request to attempt the below steps

Make sure the iLO5 firmware is at the latest version
Set the same speed of the Network ports of the server & the switch
Set the same Duplex on both ends
Connect a Laptop directly to the server ports (iLO5 & NIC) & test to isolate the issue
Replace the NIC cable with a good known cable

Thank you


I am an HPE employee
Accept or Kudo
DanRobinson
HPE Pro

Re: ML350 Gen10 Network errors

"when I run tcpdump on VMware, there are large packets arriving"

From where?  You should be able to see the MAC address and trace down the remote server.


I work for HPE

Accept or Kudo

JaapHoetmer
Occasional Visitor

Re: ML350 Gen10 Network errors

After more checks it appears the errors are only shown on the production interface, iLO simply reports the same errors against the same MAC address, resulting in two alerts for the same problem, albeit with different percentages.

Interface errors VMware on Proliant

The large packets seen in VMware are apparently normal, we see this also on hosts that do not report any receive length  errors on their NICs.

The cables have been checked, and no problems were found.

We've updated the NIC driver in VMware, but this did not resolve the issue.

I have also posted a message in the VMware Community, see here: https://communities.vmware.com/thread/622803, because the iLO interface is not affected.

Tips or ideas very welcome. Thanks in advance.

JaapHoetmer
Occasional Visitor
Solution

Re: ML350 Gen10 Network errors

Hi all,

 

This issue was resolved.

The problem originated from the firewall, the LAN interface of the firewall (Fortigate) was configured to forward spanning tree packets. As this wasn't required, switching this off made the errors disappear completely.

 

Hope this helps someone else in a similar situation.

 

Kind regards,