ProLiant Servers (ML,DL,SL)
1826215 Members
2805 Online
109691 Solutions
New Discussion

HP ML330 Gen6 - after 2nd CPU upgrade power cycles after 1-2 hours

 
Shadow_1982
Occasional Advisor

HP ML330 Gen6 - after 2nd CPU upgrade power cycles after 1-2 hours

Hi!
Recently I bought 2 ML330 Gen6 as homelab / virtualization serverrs
I decided that 1 upgraded will be better for home / educational purposes, than 2 running separately.
So I bought 2nd CPU expansion card and installed 2nd CPU (2x X5650) and installed 2nd set of RAM DIMMs

After doing this, server runs ok. until it's a little heavier used (let's say CPUs at constant usage around 60%) it power cycles after an hour or two of running.

iLO reports only: Server power removed, Server power restored, Server reset. And that's quite it.

I have 2 same CPUs but RAM is 9x 8192 MB 1600 MHz + 9x 8192 MB 1333 MHz - one set installed for CPU1, 2nd for 2nd.

I tried to use only 1600MHz RDIMMs installing them in two approaches:

  • All on MNB, none on 2nd CPU expansion card
  • 6 DIMMs on MNB + 3 on 2nd CPU exp card

Both approaches ended with power cycle afte hour or two...

Now I'm testing DIMMs 1033MHz installed 3+3 as on photo below:

As for power usage - I have non redundant PSU and I haven't seen higher usage than 300W, so this might be not the problem...

I upgraded BIOS and iLO to the newest possible versions - it didn't help.

Maybe I should set something in BIOS, which I missed?

 

Any hints?

 

Update:

After installing 3+3 1333MHz DIMMs server power cycled at 2 AM in the night and later it worked stable for about 10 hours until I started to do CPU stress test on my VMs.

Stress test was hitting both Xeons at about 100% of usage for approx 30 min. and server power cycled again.

CPU temperatures were about 50-60 degrees.

Il was reporting Temp 19 at 85 degrees (where Caution is at 110C and Critical:at 115C), so here it should be ok.

Again the reboot cause was: Server power removed, server power restored, server reset.

Any hints? Please

14 REPLIES 14
TVVJ
HPE Pro

Re: HP ML330 Gen6 - after 2nd CPU upgrade power cycles after 1-2 hours

Hello,

From the descriptions of the symptoms, one is unable to point out what it could be. You may try removing non HPE parts from the server and check if the server is stable beyond 1-2 hours threshold. If so, you may conclude it could be because of one or more of those components. Add one at a time and see what happens.

Regards,



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[All opinions expressed here are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Shadow_1982
Occasional Advisor

Re: HP ML330 Gen6 - after 2nd CPU upgrade power cycles after 1-2 hours

I get it might be tough to troubleshoot.

Yet everything there is HP OEM, except: CUs (Intel), RAM DIMMs (Samsung), 10Gb NIC and hard drives (Seagate and Samsung).

Tam92
HPE Pro

Re: HP ML330 Gen6 - after 2nd CPU upgrade power cycles after 1-2 hours

Hello,

 

I believe the below document should be helpful for you

 

https://support.hpe.com/hpesc/public/docDisplay?docId=mmr_kc-0121549

 

You can try to enable Virtualization in BIOS.

To enable VT-d in BIOS?
1. Reboot or switch On the server.
2. Press the F9 key on the keyboard during POST.
3. Within the RBSU screen navigate to the following menu: System Options -> Processor Options -> Intel VT-d.

 

Thanks,

TAM



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Shadow_1982
Occasional Advisor

Re: HP ML330 Gen6 - after 2nd CPU upgrade power cycles after 1-2 hours

Hi!

Thanks for the hint, but I already have that enabled.

Based on all hints from you Guys, I gathered all data from the server with 3 things to put accent on:

  1. You'll see in IML that last reboot was caused by FAN failure. This was few days ago, when I used Noctua FANs. All FANs have been replaced to OEMs (I bought another server box, cause it was cheaper than buying FANs separately)
  2. All troubles have started since I added 2nd CPU extension board. Since then server reboots (power cycles) from time to time.  Once per 1-5 hours if it runs idle and once per 30 min. - 1 hour if I run CPU stress test
  3. No temperatures gets to yellow or red level and top power usage that I noticed was 300W, so here it should be ok.

Below I paste my BIOS config - all things that I thought might be interesting, but if you want something more, just drop me a message.

Gallery on imgur: https://imgur.com/a/mTalCPQ 

All iLO and ESXi logs, diagnostics, etc, here: https://docs.google.com/spreadsheets/d/e/2PACX-1vRWPeMcOmgBErcGRqSid8HPKUYmn4zi-eoj0q0tV71HBA6j4cjN3m_sbjOt06oRmQ/pubhtml

Now I removed all non HP stuff from the server. It had 1x NVME on PCI-E, 1x SSD, 3 HDD, and 10Gb NIC on PCI-E.

Now they are unplugged.

I'll let you know on results after 24h.

 

But - any hints? Maybe something to change in BIOS?

 

Shadow_1982
Occasional Advisor

Re: HP ML330 Gen6 - after 2nd CPU upgrade power cycles after 1-2 hours

Hi again!

Today - after removing non HP stuff - server have reached 20 hours of uptime and it's still running

image.png

This might be causes by:

  1. removed devices mentioned beofre (NVME SSD, SATA SSD, 3 HDDs and 10Gb PCI-E NIC).
  2. CPUs that were running in idle (no drives, no VMs)
    image.png
  3. Lower power usage (116W vs. 140-160W before)

 

 

 

image.png

I will format USB drive with some Live OS and go with CPU stress test without non HP devices in the server.

Still - any hints? BIOS settings to change? Change of approach?

Tam92
HPE Pro

Re: HP ML330 Gen6 - after 2nd CPU upgrade power cycles after 1-2 hours

Hello,

 

The settings look good. The only thing you need to change is the power profile to Maximum performance and monitor the server.

 

Thanks, 

TAM



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Tam92
HPE Pro

Re: HP ML330 Gen6 - after 2nd CPU upgrade power cycles after 1-2 hours

Apart from that , you have a memory error

 

System Board 18 Memory Yellow 0 65 error
0.7.18.39
09/05/2022 09:23:22 UTC
Memory

Error

 

 

Would recommend to get this rectified by replacing the memory module.

 

You can log a case with the HPE support for further assistance on this.

 

Thank,

TAM



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Shadow_1982
Occasional Advisor

Re: HP ML330 Gen6 - after 2nd CPU upgrade power cycles after 1-2 hours

About Sys Board 18 mem error - I'm trying to sort it out. Today I removed almost all DIMMs leaving 2x8GB in MNB only, but the error is still there. I need to sort it out which bank is it (but only 2 have left). This error was always there, no matter if I used 1333MHz RDIMMs and/or 1600MHz ones.

Also - today I'll run the server again without non HP devices in it with live OS and put it to stress test.
We'll see how it goes

Tam92
HPE Pro

Re: HP ML330 Gen6 - after 2nd CPU upgrade power cycles after 1-2 hours

Hello,

 

Looks like the OEM parts are the main cause.

 

Request you to use all HPE parts and if the issue persists, please log a case with HPE.

 

Thanks,

TAM



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Shadow_1982
Occasional Advisor

Re: HP ML330 Gen6 - after 2nd CPU upgrade power cycles after 1-2 hours

That might be the case.
Yesterday I booted Ubuntu live from USB, having no non HP stuff in server (except Intel CPUs, Samsung DIMMs, and that pendrive). Server was stress tested for 5 hours and was working. After 5 PM I've had to go out and short after my leave, server started to run FANs on 100% stressing my roommates :D, so I've had to shut it down remotely.

Today I repeat the test, so far so good (circa 2 hours today)

Screenshot_20220907_095614_com.tplink.iot.jpgIMG_20220907_095517.jpgZrzut ekranu 2022-09-07 100619.png

Shadow_1982
Occasional Advisor

Re: HP ML330 Gen6 - after 2nd CPU upgrade power cycles after 1-2 hours

@Tam92 

Got an update.
After 5 hours of stress test using Ubuntu live OS on USB server have power cycled again.
There are no non HP devices in server. except 2 Intel Xeons, 2 Samsung DIMMs (2x8GB 1333MHz) and that pendrive
No temperatures reached yellow threshold. ilo reports "server power removed, server power restored, server reset"...

Temperatures just after reboot (w/o N/A values):

Zrzut ekranu 2022-09-07 140841.png

iLO logs:

Zrzut ekranu 2022-09-07 140902.png

Today server power usage stats:

Screenshot_20220907_141206_com.tplink.iot.jpg

Any hints?

Tam92
HPE Pro

Re: HP ML330 Gen6 - after 2nd CPU upgrade power cycles after 1-2 hours

Hello,

 

Server has automatically rebooted.

 

I would advise to get the hardware logs analyzed by HPE support and get the required parts replaced.

 

Probably the system board.

 

Thanks,

TAM



I work at HPE
HPE Support Center offers support for your HPE services and products when and how you need it. Get started with HPE Support Center today.
[Any personal opinions expressed are mine, and not official statements on behalf of Hewlett Packard Enterprise]
Accept or Kudo
Shadow_1982
Occasional Advisor

Re: HP ML330 Gen6 - after 2nd CPU upgrade power cycles after 1-2 hours

Would you do that witch such old unit?
Who to look for? Where should I head to?

techin
Valued Contributor

Re: HP ML330 Gen6 - after 2nd CPU upgrade power cycles after 1-2 hours