ProLiant Servers (ML,DL,SL)
1748051 Members
5037 Online
108758 Solutions
New Discussion

DL585 G7's rebooting with no warning..

 
JimTaylor
Advisor

DL585 G7's rebooting with no warning..

We have four DL585 G7's, purchased in December 2011, to become a VMware cluster. They are to join our existing estate of DL585 G5's and G6's. VMware version is ESX 4.1 U1. 

 

The problem we are having is spontaneous reboots. One of the nodes was just resetting without any warning - nothing went into the VMware logs and nothing is logged in the IML as being at fault. The only thing we see in any log is the ILO log saying "Server Reset" and "Power Restored". 

 

A firmware update and a replacement of all 128GB of RAM in one server seems to have cured this problem (but for how long?) but it just happened on another node now. Exactly the same behaviour. Nothing logged on the IML or within VMware. 

 

Has anyone else been experiencing this? I am at my whits end and I simply cannot risk putting these servers into production - meanwhile the warranty / lifespan on them is burning away. HP have been pretty useless at diagnosing the faults and even having the brass neck to suggest it was a VMware configuration issue.. when we put this to VMware support, they were less than impressed! 

 

Thanks in advance. 

15 REPLIES 15
Matti_Kurkela
Honored Contributor

Re: DL585 G7's rebooting with no warning..

"Power Restored" is not the same as "nothing": if it appears without good reason, like plugging in the power cables after server installation/maintenance, it implies the server lost all power at some earlier time. And that would indicate a serious problem with your site's power and/or UPS.

 

I guess you're probably already ruled out power problems, but since you did not explicitly say, I'll have to ask: are you sure there was no interruption or brown-out in the power feeds of those G7s?

MK
JimTaylor
Advisor

Re: DL585 G7's rebooting with no warning..

Yeah we're quite sure there is nothing going on with power in the datacentres.

Not only are they in cabinets where other servers are running (and not rebooting) we have 4 PSU's in each G7 and according to the BIOS POST information we only need 1 PSU to run the server, and 2 for redundancy. There's no way in hell we're losing power to all four PSU's at the same time.

 

I've read some comments regarding similar problems with smaller DL's G7's (DL165 I think) rebooting in the same way and read that advice is to force them into high power mode.. coincidentally we had similar problems with a DL385 G6 in the past and forcing into high power mode resolved the problems, so going to try the same with these servers and see how they behave. 

Tichx
Occasional Visitor

Re: DL585 G7's rebooting with no warning..

We are having the exact same issues with our DL585 G7s (512GB RAM / 64 Processors), we have 3 of these in an ESX 4.1.0 U2 environment.

 

The issue can be replicated if you add high load to one of the hosts, the server will just reboot, crash ESX and the front of the G7 will show a flashing orange light.

 

I have been working with HP today hoping for a resolution.

 

So far we have replaced both memory boards (primary and secondary) but the problem persists....

 

I have the engineer onsite again tomorrow so will keep this forum updated..

 

Regards,

 

Tich.

Tichx
Occasional Visitor

Re: DL585 G7's rebooting with no warning..

Jim,

 

Can you replicate the issue by performing the following test:

 

1) migrate the workload (VMs) onto the DL585 G7, power the guests all off....after 5/10 minutes attempt to turn them all on at once. This will stress the server...

 

I've noticed in our problem when we power on around 40 to 60 Windows 7 VMs (1CPU/2GB/40GB) on the DL585 G7 (6200 processors) the host will reboot and then the HP Health light will start to flash amber.

 

It seems like a problem with the second processor board, do you have one installed in your environment? if so try removing and then see if server remains stable.

 

Thanks.

 

Tich.

VSS
Occasional Visitor

Re: DL585 G7's rebooting with no warning..

There is a customer advisory related to this topic. After updating to the proper ROM version there are entries to iml... But they didn't solve the problem of unwanted reboot!

 

Or does anyone has updated information?

HampusLind
Occasional Advisor

Re: DL585 G7's rebooting with no warning..

Guys, we have similar problems with our six DL585 G7 servers running ESXi 5... HP hardware diag test shows no errors, vmware support are clueless.

 

I have posted a similar thread on the vmware site, there it is a lot talk/ideas about qlogic drivers and firmware cause issues. What types of HBA cards do you guys run?

 

My thread in vmware forum: http://communities.vmware.com/message/2116878#2116878

 

We have also replaced our HP nic's (qlogic chipset) with Intel, and are not using the onboard nics because of problems with the new qlogic chipset that is/was used for those nics, the problems we had was unexpected server reboots.

 

Anyone solved this issues or do we need to replace our server platform?

HampusLind
Occasional Advisor

Re: DL585 G7's rebooting with no warning..

Are you running AMD or Intel servers?

HampusLind
Occasional Advisor

Re: DL585 G7's rebooting with no warning..

JIM, did the high power mode resolve your reboot issue? And with high mode do you mean "HP Static High Performance Mode"?

 

Thanks,

Hampus

VSS
Occasional Visitor

Re: DL585 G7's rebooting with no warning..

We run AMDs, both the 61xx and 62xx experience the unexpected reboots. The curious thing, yesterday we noticed an unexpected reboot with a machine which stayed in VMware maintenance mode. So the problem can't be load-related.

 

We also experience the problem with "HP Static High Performance Mode" and "No C-States" options enabled.

 

Kind regards