ProLiant Servers (ML,DL,SL)
cancel
Showing results for 
Search instead for 
Did you mean: 

DL585 G7's rebooting with no warning..

 
JimTaylor
Advisor

DL585 G7's rebooting with no warning..

We have four DL585 G7's, purchased in December 2011, to become a VMware cluster. They are to join our existing estate of DL585 G5's and G6's. VMware version is ESX 4.1 U1. 

 

The problem we are having is spontaneous reboots. One of the nodes was just resetting without any warning - nothing went into the VMware logs and nothing is logged in the IML as being at fault. The only thing we see in any log is the ILO log saying "Server Reset" and "Power Restored". 

 

A firmware update and a replacement of all 128GB of RAM in one server seems to have cured this problem (but for how long?) but it just happened on another node now. Exactly the same behaviour. Nothing logged on the IML or within VMware. 

 

Has anyone else been experiencing this? I am at my whits end and I simply cannot risk putting these servers into production - meanwhile the warranty / lifespan on them is burning away. HP have been pretty useless at diagnosing the faults and even having the brass neck to suggest it was a VMware configuration issue.. when we put this to VMware support, they were less than impressed! 

 

Thanks in advance. 

15 REPLIES
Matti_Kurkela
Honored Contributor

Re: DL585 G7's rebooting with no warning..

"Power Restored" is not the same as "nothing": if it appears without good reason, like plugging in the power cables after server installation/maintenance, it implies the server lost all power at some earlier time. And that would indicate a serious problem with your site's power and/or UPS.

 

I guess you're probably already ruled out power problems, but since you did not explicitly say, I'll have to ask: are you sure there was no interruption or brown-out in the power feeds of those G7s?

MK
JimTaylor
Advisor

Re: DL585 G7's rebooting with no warning..

Yeah we're quite sure there is nothing going on with power in the datacentres.

Not only are they in cabinets where other servers are running (and not rebooting) we have 4 PSU's in each G7 and according to the BIOS POST information we only need 1 PSU to run the server, and 2 for redundancy. There's no way in hell we're losing power to all four PSU's at the same time.

 

I've read some comments regarding similar problems with smaller DL's G7's (DL165 I think) rebooting in the same way and read that advice is to force them into high power mode.. coincidentally we had similar problems with a DL385 G6 in the past and forcing into high power mode resolved the problems, so going to try the same with these servers and see how they behave. 

Tichx
Occasional Visitor

Re: DL585 G7's rebooting with no warning..

We are having the exact same issues with our DL585 G7s (512GB RAM / 64 Processors), we have 3 of these in an ESX 4.1.0 U2 environment.

 

The issue can be replicated if you add high load to one of the hosts, the server will just reboot, crash ESX and the front of the G7 will show a flashing orange light.

 

I have been working with HP today hoping for a resolution.

 

So far we have replaced both memory boards (primary and secondary) but the problem persists....

 

I have the engineer onsite again tomorrow so will keep this forum updated..

 

Regards,

 

Tich.

Tichx
Occasional Visitor

Re: DL585 G7's rebooting with no warning..

Jim,

 

Can you replicate the issue by performing the following test:

 

1) migrate the workload (VMs) onto the DL585 G7, power the guests all off....after 5/10 minutes attempt to turn them all on at once. This will stress the server...

 

I've noticed in our problem when we power on around 40 to 60 Windows 7 VMs (1CPU/2GB/40GB) on the DL585 G7 (6200 processors) the host will reboot and then the HP Health light will start to flash amber.

 

It seems like a problem with the second processor board, do you have one installed in your environment? if so try removing and then see if server remains stable.

 

Thanks.

 

Tich.

VSS
Occasional Visitor

Re: DL585 G7's rebooting with no warning..

There is a customer advisory related to this topic. After updating to the proper ROM version there are entries to iml... But they didn't solve the problem of unwanted reboot!

 

Or does anyone has updated information?

HampusLind
Occasional Advisor

Re: DL585 G7's rebooting with no warning..

Guys, we have similar problems with our six DL585 G7 servers running ESXi 5... HP hardware diag test shows no errors, vmware support are clueless.

 

I have posted a similar thread on the vmware site, there it is a lot talk/ideas about qlogic drivers and firmware cause issues. What types of HBA cards do you guys run?

 

My thread in vmware forum: http://communities.vmware.com/message/2116878#2116878

 

We have also replaced our HP nic's (qlogic chipset) with Intel, and are not using the onboard nics because of problems with the new qlogic chipset that is/was used for those nics, the problems we had was unexpected server reboots.

 

Anyone solved this issues or do we need to replace our server platform?

HampusLind
Occasional Advisor

Re: DL585 G7's rebooting with no warning..

Are you running AMD or Intel servers?

HampusLind
Occasional Advisor

Re: DL585 G7's rebooting with no warning..

JIM, did the high power mode resolve your reboot issue? And with high mode do you mean "HP Static High Performance Mode"?

 

Thanks,

Hampus

VSS
Occasional Visitor

Re: DL585 G7's rebooting with no warning..

We run AMDs, both the 61xx and 62xx experience the unexpected reboots. The curious thing, yesterday we noticed an unexpected reboot with a machine which stayed in VMware maintenance mode. So the problem can't be load-related.

 

We also experience the problem with "HP Static High Performance Mode" and "No C-States" options enabled.

 

Kind regards

Jan Soska
Honored Contributor

Re: DL585 G7's rebooting with no warning..

Hello,

Is it possible to run another version of ESX on one server (ESX5u1 or ESX5.1) just to see any difference?

Another hint - I saw here in forum user with similar problem - random reboots. It was cause by UPS devices which werenot able to handle modern G5 and G6 PSU's - just wonding - is everything in the same server room/ same rack / same UPS / same curcuits?

 

Jan

HampusLind
Occasional Advisor

Re: DL585 G7's rebooting with no warning..

We found some BIOS setting that should be turned off when running esx... ASR seems to be the most important one.. Also please see this KB from vmware on what steps should be taken when the server freezes (press NMI button to generate a esx core dump).

 

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1014767

 

 

Recommended BIOS settings for DL585G7:

—-Embedded NIC Boot Options Disabled If you don’t PXE boot the server, set it to disabled. Otherwise, leave it enabled. Setting it to disabled saves you 2 seconds when booting the server.

 

--HP Power Profile Custom Allows to enable custom Power settings specific for vSphere –HP Power Regulator OS Control Mode Hands over the Power Management to vSphere. The other options give this control to the server itself.
 
–Redundant Power Supply Mode High Efficiency Mode (Auto) By default (Balanced Mode), the server uses all installed PSU’s. This might look like the most efficient use, but the more power is drawn from a PSU, the more efficient it operates.

 

–ASR Status Disabled ASR monitors an agent running in the Service Console. When this does not respond within 10 minutes, the host is rebooted.

 

–Automatic Power-On Disabled  

 

--Virtual Install Disk Disabled This Virtual Install Disk only contains drivers for Microsoft Windows Operating system.
HampusLind
Occasional Advisor

Re: DL585 G7's rebooting with no warning..

Jan, we have checked with our DC vendor and they have checked thier power and UPS logs with no indication of a problem.

 

Do you remember how the power supplies affected the UPS (or the other way around)?

 

Thanks,

Hampus

Jan Soska
Honored Contributor

Re: DL585 G7's rebooting with no warning..

Hello, I am not able find exact link (it was on old-style forum), but here are some hints - generally it was issue between very modern HP PSU (G6+) an line-interactive UPSs - maybe try this link http://blog.samkendall.net/2010/03/22/hp-proliant-g6-series-server-issues-with-line-interactive-upss/ , another discussion: http://serverfault.com/questions/199861/ups-with-a-hp-proliant-server

 

A I remember problem was fixed by replacing UPS's with higher level UPS.

In your case - try to find out what is exact type of your DC vendor UPS and check compatibility with your HP servers with HP.

 

(again - UPS problem is only my hypothesis :( )

 

Jan

HampusLind
Occasional Advisor

Re: DL585 G7's rebooting with no warning..

Thanks Jan! I dont think that is an problem in our case, feels like they talk about low-end UPS for small offices or so. We run our servers in a really big DC with huge, and quite new, UPS's.

Pez067
Occasional Visitor

Re: DL585 G7's rebooting with no warning..

same issue experienced here but with the smaller DL385 G7's running ESXi 4.1 U1 we have 14 in production across 2 site and both clusters experience the same unexpected reboots I've logged countless cases with VMware and HP I've lost count on the number of motherboards and CPU’s  we have replaced  and still see the same unexpected reboots.