Servers - General

Server crashes, ESXi 6.0 on an HPE Proliant ML350 G6

 
SOLVED
Go to solution
Joe_Papa
Advisor

Server crashes, ESXi 6.0 on an HPE Proliant ML350 G6

I have an old ML350 G6 set up as my home server. It has 2 x Xeon X5675 cpus, 72GB of ECC RAM, a 1TB SSD, and a 1.6TB SAS RAID array of 8 2.5" SAS HHDs. It runs 3 VMs and has all compatible hardware in it and has run stable for months. It just started PSoDing almost nightly. The PSoD message the first two times referenced a Mellanox driver which my system doesn't use so I removed all the associated libs. This last time it PSoD'ed it didn't reference a driver at all. Here's the PSoD information.

PSoD (Large).jpg

In the iLO2 logs it says the following:

Critical | PCI Bus | 01/04/2021 10:11 | 01/04/2021 10:11 | 1 | Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 3, Function 0, Error status 0x00000010)

ML350 G6 Devices.JPG
Anyone have insight into what I should look for? Is this likely a motherboard failure? A driver failure? a CPU failure?
4 REPLIES 4
StorageMike
HPE Pro

Re: Server crashes, ESXi 6.0 on an HPE Proliant ML350 G6

Hi

The ML350 G6 doesn't have USB 3 - https://support.hpe.com/hpesc/public/docDisplay?docId=emr_na-c01713311

Do you have a USB 3 card plugged in?

I work for HPE

Accept or Kudo

Joe_Papa
Advisor

Re: Server crashes, ESXi 6.0 on an HPE Proliant ML350 G6

Yes. I have a 2-port USB-3 card that my UPS is connected to. 

StorageMike
HPE Pro
Solution

Re: Server crashes, ESXi 6.0 on an HPE Proliant ML350 G6

If I understand the screen shot above then it's the USB card in the slot reporting the error.  I'd try moving the card and see if the error follows you.  If it doesn't then maybe an issue with the motherboard.

If it does then maybe a problem with the card or driver...

I work for HPE

Accept or Kudo

Joe_Papa
Advisor

Re: Server crashes, ESXi 6.0 on an HPE Proliant ML350 G6

Thanks. I will try that. 

** UPDATE ** 

I pulled all of the PCIE cards, cleaned the leads, and reseated them in the same slot. And I moved the USB 3.0 card down one slot. So far there have been no crashes again after 4 days. We'll see if that was it.