ProLiant Servers (ML,DL,SL)
1752701 Members
6072 Online
108789 Solutions
New Discussion

ML350p Gen8 reboots under load

 
Palustec
Occasional Contributor

ML350p Gen8 reboots under load

Backup depository server reboots when under network/disk load.

ml350p gen 8 / 64gb ram / 1 xeon e5 2620 / p420i  2gb / 4x4tb raid 5

Managemnt Log Viewer captures this error (and many similiar)

Critical 109 CPU 3 9/14/2019 20:58 9/14/2019 20:58 1 Uncorrectable Machine Check Exception (Board 0 Processor 1 APIC ID 0x00000003 Bank 0x00000003 Status 0xBE000000'00800400 Address 0x00000000'0000E18A Misc 0x00000000'00000000)

This problem has been in existence since Gen 6 and Gen 7.   My server is uptodate on firmware, both motherboard and p420i (2gb) raid controller.

I have found two advisory bulletins published that offer work arounds by changing the bios settings to Maximum Power and Performance and making sure the Ram Frequency is set to AUTO.   That has helped but not fixed the problem.   The bulletins say that a permanent solution will be forthcoming.   WHEN ????   These bulletins were published 5 years ago and 7 years ago - where is the fix????

Advisory Document ID c04046303 : R date 2013-12-13 : U date 2014-03-13

Advisory Document ID c02914393 : R data 2011-07-11 : U date 2012-02-13

I've rotated memory sticks - still reboots, I've added a stand alone sata drive to the motherboard sata ports to bypass the p420i as the target drive for the backup - still reboots, changed cache memory on p420i - still reboots.  I've reduced network speed from gigabit to 100 mbps to see if it was saturation in either the network or storage channels - still reboots.

A little help would be most welcome.

Palustec

 

 

 

 

 

4 REPLIES 4
AshutoshM
HPE Pro

Re: ML350p Gen8 reboots under load

Hi,

Apart from setting power profile to "Maximum Performance",having the Minimum Processor Idle Power State set to “No C-states” and Intel QPI Link Power Management set to “Disabled” may help.


If this is already done and based on the actions taken already, i.e. BIOS, firmware and drivers are up to date and correct profiles in BIOS set then hardware component failure needs to be ruled out next.

For this request you to raise a support case to have this looked into in depth.

I am an HPE Employee
Palustec
Occasional Contributor

Re: ML350p Gen8 reboots under load

I found and changed all the recommended BIOS settings according to the advisories.

EXCEPT, I could not find an entry any where in the BIOS menu for the Intel QPI Link Power Management option.

It doesn't exist in my system (unless I'm staring right at it).  I only have 1 cpu installed.   Would that curtail the QPI option from being listed?  (Based on what the QPI is supposed to do for the system)

Firmware for the motherboard is the latest, Firmware for the P420i is the latest.   Windows Server 2012 R2 has had all Windows updates applied.

Nothing internal is getting overly warm according to Speccy.

Still the system reboots when transferring large files accross our internal network via FTP.

I have two remaining options to explore:

1 - The Broadcom integrated NICs were disabled because they were incompatible with VMWare & Hyper-V vm's.   (Random network failures in the VM's).   Intel Server NICS were installed - and that resolved the network issues.

For troubleshooting - I can undo that fix and see if the reboots happen with the Broadcom based integrated nics.

2 - Replace the installed RAM.

My company purchased these servers prior to HPE existing.   Needless to say we were very unhappy with wallet grab for support after HPE was spunoff. 

I look forward to everyone's ideas.

Palustec

 

AshutoshM
HPE Pro

Re: ML350p Gen8 reboots under load

The symptoms described sound as if  some component is going to sleep (low power mode) and is not waking up in time triggering a crash.

Recommendations:

Firstly go through the "Power Management Options menu" in BIOS and doublecheck that no setting present to "optimise" power consumption. Also disable Collaborative Power Control if enabled.

You are correct Intel QPI setting is not applicable with the single processor 

Note that you may need to test setting changes one by to be sure and easily reverse if necessary.

Secondly run the offline diagnostics, preferably with multiple loops to see if the server crashes outside the OS and also to see if any memory errors etc can caught.

These checks are better done before any hardware is swapped around which may potentially introduce new problems.

I am an HPE Employee
Palustec
Occasional Contributor

Re: ML350p Gen8 reboots under load

Update - my ML350p g8 no longer reboots under load   

I removed the Intel I350-T2 Server Adapter I installed 5 years ago.

Enabled the Broadcom integrated NICs and updated their drivers.

Hopefully with all the updates to the server being applied my VM's won't suffer weird network freezes like they did with the Broadcom nics back in the beginning.

Unfortunately, the transfer speed performance is still inadequate for transferring 2 TB in the time window I have available.

Palustec