BladeSystem - General
1751838 Members
5350 Online
108782 Solutions
New Discussion юеВ

Re: BL465c G7 hangs on reboot

 
Rob Buxton
Honored Contributor

BL465c G7 hangs on reboot

Hi All,
Not sure if this is a Vmware or HP Server issue.
We have a number of BL465c G7 servers. They're running the HP variant of ESXi 4.1.

We are seeing a couple of issues where the server just seems to hang.
Putting servers into maintenance mode has triggered this, but not always.
Rebooting a server always triggers it. The server hangs on shutdown and becomes completely unresponsive. The console still shows the standard ESXi screen but F12 etc. don't work. Takes a press and hold power cycle to free it up.
38 REPLIES 38
cnb
Honored Contributor

Re: BL465c G7 hangs on reboot

Not sure if this fits your symptoms but they just released new BIOS which addresses several hang issues:


http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=3709945&prodSeriesId=4132949&swItem=MTX-2268411d681d444ba751c87e47&prodNameId=4132827&swEnvOID=4091&swLang=13&taskId=135&mode=4&idx=3


Also look at the revision history notes for the June fix.

Hope this helps.


Rgds,

Rob Buxton
Honored Contributor

Re: BL465c G7 hangs on reboot

Thanks, just found that myself and downloaded it.Looks promising but not exactly what I'm seeing. Fairly recent as well as I checked a few days ago I'm sure.
cnb
Honored Contributor

Re: BL465c G7 hangs on reboot

Not sure if you already saw this one also:

New Blade Matrix document that came out today:

http://www13.itrc.hp.com/service/cki/docDisplay.do?docLocale=en&docId=emr_na-c02458781



Rgds,
Rob Buxton
Honored Contributor

Re: BL465c G7 hangs on reboot

Alas the upgraded FW made no difference.
Tested putting the server into maintenance mode and then initiating a Restart using F12 from the console rather than from vCentre - same result.
Hangs almost immediately, comes up with Restart Host, Restart In Progress... and that's it. Takes a hard reset to get it to move again.
Updated with very latest ESXi Offline bundle.
Chris Gardner
Occasional Advisor

Re: BL465c G7 hangs on reboot

Hi Rob,
I'm seeing the same thing with 4 of these BL465c G7 blades (2x 12core AMD chips, 96GB RAM) running the HP ESXi 4.1.

The VMware cluster isn't in production yet so I've the luxury of playing with them.
So far, it seems a reboot or shutdown of the server will cause it every time - maintenance mode isn't required to trigger it.

Shutting down from the ESXi console causes a purple screen and coredump with either an "Uncorrected ECC error" or "#PF Exception 14" (page fault) which suggests something in the memory sub-system.
Shutting down from vCenter sometimes triggers the coredump but more often just hangs, requiring a cold boot.
The ECC errors are less frequent and correspond with an entry in the blade's IML.

50+hrs of Memtest86+ v4.10 were run with ECC checks turned on and no problems were reported.

The updated BIOS/system ROM 30/09/2010 (released 15th Oct) hasn't helped.

I've had RAM and a motherboard replaced which haven't helped.

A bare-metal RedHat on one of these blades didn't seem to cause a lock-up on reboot/shutdown but I need to test this further.

regards,
Chris
Rob Buxton
Honored Contributor

Re: BL465c G7 hangs on reboot

Chris, shouldn't say this, but excellent news!.
If you want to e-mail me on rob dot buxton at wcc govt nz we can further compare notes.
I've a call logged with HP. If you do the same we may be able to move things on a bit more.

I've also found that doing guest migrations causes network connectivity to drop. I thought it was hanging the server, but found the console was still responsive.

Seems we have a similar setup, AMD 12 core. We're only running a single processor, so if you're running 2 that means that's not the issue. We have 88GB memory, 5 x 16 plus the original 2 x 4.
bradfordc
Occasional Advisor

Re: BL465c G7 hangs on reboot

Rob, I work with Chis Gardner.

I was wondering if you are also suffernig from any ECC memory issues logged in the iLO IML and the vSphere/vCenter console for your G7's.

We have now completed the following actions:
> Patched BIOS to A19
> Replaced System Board
> Replaced
> Replaced CPUs
> Tested 8 and 12 core CPU's

I have also tried installed the following updates;
> HP NMI Sourcing Driver for VMware ESX/ESXi 4.1
> HP ESXi Offline Bundle for VMware ESXi 4.1
> VMware ESX/ESXi 4.x Driver CD for ServerEngines BladeEngine 10Gb Ethernet Controller

We are still suffering from the PSOD on reboot issue (PF Exception 14) and we're still getting the ECC memory issues.

The non-HP ESXi image will not install so I can't test this.

Have you progressed at all with this issue? Any updates from HP?

Thanks.
Rob Buxton
Honored Contributor

Re: BL465c G7 hangs on reboot

No further updates from HP. We sent them the Vmware Export bundle.
We had some memory issues but a replacement of several memory modules fixed that. Only one server affected.

We have an oustanding issue with an Array Battery not charging. That needs the array card replaced. That's the only error we currently see in the IML.

We tried removing all of the 16GB memory and just using the defauly 8GB (2 x 4GB) but the hang on shutdown issues persists.

Similar to you, we used the HP ESXi install as the Vmware ESXi would not install.
We've updated the BIOS FW to latest rev, applied ESXi 1.0a offline and the NMI 1.1.02 offline bundles.

I generated a PSOD when attaching a non-existent device via the ILO. Finger trouble not deliberate testing! Operational guys have mentioned other PSODs but not sure what triggered them.
woby
Advisor

Re: BL465c G7 hangs on reboot

Hi @all,

we were facing the same problem. esx4.1 custom hp image, server did not reboot.

I removed the mazzanin cards (QLogic QMH2462 and HP NC532m) and the cmos battery (1 minute); inserted the cmos battery and booted. Rebooted esx4.1_hp - reboot worked ! Tried to reboot a second time, still worked.

Inserted the mezzanin cards again booted, rebooted esx4.1_hp - reboot did not work !!

Removed the cmos-battery again (1 minute) inserted the battery, booted.
Rebooted esx4.1_hp - reboot did not work !!

Removed the cmos-battery again (1 minute), removed the HP NC532m mezzanin card, booted.
Rebooted esx4.1_hp - reboot worked !!

For me it looks like the problem is related to the mezzanin-cards, maybe to the HP NC532m mezzanin card.

hope this helps a little

regards
t.