BladeSystem - General
cancel
Showing results for 
Search instead for 
Did you mean: 

BL465c G7 hangs on reboot

 
Rob Buxton
Honored Contributor

BL465c G7 hangs on reboot

Hi All,
Not sure if this is a Vmware or HP Server issue.
We have a number of BL465c G7 servers. They're running the HP variant of ESXi 4.1.

We are seeing a couple of issues where the server just seems to hang.
Putting servers into maintenance mode has triggered this, but not always.
Rebooting a server always triggers it. The server hangs on shutdown and becomes completely unresponsive. The console still shows the standard ESXi screen but F12 etc. don't work. Takes a press and hold power cycle to free it up.
38 REPLIES
cnb
Honored Contributor

Re: BL465c G7 hangs on reboot

Not sure if this fits your symptoms but they just released new BIOS which addresses several hang issues:


http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=3709945&prodSeriesId=4132949&swItem=MTX-2268411d681d444ba751c87e47&prodNameId=4132827&swEnvOID=4091&swLang=13&taskId=135&mode=4&idx=3


Also look at the revision history notes for the June fix.

Hope this helps.


Rgds,

Rob Buxton
Honored Contributor

Re: BL465c G7 hangs on reboot

Thanks, just found that myself and downloaded it.Looks promising but not exactly what I'm seeing. Fairly recent as well as I checked a few days ago I'm sure.
cnb
Honored Contributor

Re: BL465c G7 hangs on reboot

Not sure if you already saw this one also:

New Blade Matrix document that came out today:

http://www13.itrc.hp.com/service/cki/docDisplay.do?docLocale=en&docId=emr_na-c02458781



Rgds,
Rob Buxton
Honored Contributor

Re: BL465c G7 hangs on reboot

Alas the upgraded FW made no difference.
Tested putting the server into maintenance mode and then initiating a Restart using F12 from the console rather than from vCentre - same result.
Hangs almost immediately, comes up with Restart Host, Restart In Progress... and that's it. Takes a hard reset to get it to move again.
Updated with very latest ESXi Offline bundle.
Chris Gardner
Occasional Advisor

Re: BL465c G7 hangs on reboot

Hi Rob,
I'm seeing the same thing with 4 of these BL465c G7 blades (2x 12core AMD chips, 96GB RAM) running the HP ESXi 4.1.

The VMware cluster isn't in production yet so I've the luxury of playing with them.
So far, it seems a reboot or shutdown of the server will cause it every time - maintenance mode isn't required to trigger it.

Shutting down from the ESXi console causes a purple screen and coredump with either an "Uncorrected ECC error" or "#PF Exception 14" (page fault) which suggests something in the memory sub-system.
Shutting down from vCenter sometimes triggers the coredump but more often just hangs, requiring a cold boot.
The ECC errors are less frequent and correspond with an entry in the blade's IML.

50+hrs of Memtest86+ v4.10 were run with ECC checks turned on and no problems were reported.

The updated BIOS/system ROM 30/09/2010 (released 15th Oct) hasn't helped.

I've had RAM and a motherboard replaced which haven't helped.

A bare-metal RedHat on one of these blades didn't seem to cause a lock-up on reboot/shutdown but I need to test this further.

regards,
Chris
Rob Buxton
Honored Contributor

Re: BL465c G7 hangs on reboot

Chris, shouldn't say this, but excellent news!.
If you want to e-mail me on rob dot buxton at wcc govt nz we can further compare notes.
I've a call logged with HP. If you do the same we may be able to move things on a bit more.

I've also found that doing guest migrations causes network connectivity to drop. I thought it was hanging the server, but found the console was still responsive.

Seems we have a similar setup, AMD 12 core. We're only running a single processor, so if you're running 2 that means that's not the issue. We have 88GB memory, 5 x 16 plus the original 2 x 4.
bradfordc
Occasional Advisor

Re: BL465c G7 hangs on reboot

Rob, I work with Chis Gardner.

I was wondering if you are also suffernig from any ECC memory issues logged in the iLO IML and the vSphere/vCenter console for your G7's.

We have now completed the following actions:
> Patched BIOS to A19
> Replaced System Board
> Replaced
> Replaced CPUs
> Tested 8 and 12 core CPU's

I have also tried installed the following updates;
> HP NMI Sourcing Driver for VMware ESX/ESXi 4.1
> HP ESXi Offline Bundle for VMware ESXi 4.1
> VMware ESX/ESXi 4.x Driver CD for ServerEngines BladeEngine 10Gb Ethernet Controller

We are still suffering from the PSOD on reboot issue (PF Exception 14) and we're still getting the ECC memory issues.

The non-HP ESXi image will not install so I can't test this.

Have you progressed at all with this issue? Any updates from HP?

Thanks.
Rob Buxton
Honored Contributor

Re: BL465c G7 hangs on reboot

No further updates from HP. We sent them the Vmware Export bundle.
We had some memory issues but a replacement of several memory modules fixed that. Only one server affected.

We have an oustanding issue with an Array Battery not charging. That needs the array card replaced. That's the only error we currently see in the IML.

We tried removing all of the 16GB memory and just using the defauly 8GB (2 x 4GB) but the hang on shutdown issues persists.

Similar to you, we used the HP ESXi install as the Vmware ESXi would not install.
We've updated the BIOS FW to latest rev, applied ESXi 1.0a offline and the NMI 1.1.02 offline bundles.

I generated a PSOD when attaching a non-existent device via the ILO. Finger trouble not deliberate testing! Operational guys have mentioned other PSODs but not sure what triggered them.
woby
Advisor

Re: BL465c G7 hangs on reboot

Hi @all,

we were facing the same problem. esx4.1 custom hp image, server did not reboot.

I removed the mazzanin cards (QLogic QMH2462 and HP NC532m) and the cmos battery (1 minute); inserted the cmos battery and booted. Rebooted esx4.1_hp - reboot worked ! Tried to reboot a second time, still worked.

Inserted the mezzanin cards again booted, rebooted esx4.1_hp - reboot did not work !!

Removed the cmos-battery again (1 minute) inserted the battery, booted.
Rebooted esx4.1_hp - reboot did not work !!

Removed the cmos-battery again (1 minute), removed the HP NC532m mezzanin card, booted.
Rebooted esx4.1_hp - reboot worked !!

For me it looks like the problem is related to the mezzanin-cards, maybe to the HP NC532m mezzanin card.

hope this helps a little

regards
t.
Rob Buxton
Honored Contributor

Re: BL465c G7 hangs on reboot

Thanks for that.
We're running the QMH2562 which is the 8Gb variant. We're not running the NC532m.

I'll try and do some testing with the Mezz card removed.
What do you have beyond the Mezz card, an FC Switch or pass-through?
Just trying to get an idea of what's common here.
Also included an e-mail address above if you want to swap notes directly.
Rob Buxton
Honored Contributor

Re: BL465c G7 hangs on reboot

Removing the only Mezz car (QMH2562) made no difference here.
woby
Advisor

Re: BL465c G7 hangs on reboot

Hi Rob,

we are using 4 flex-10 modules and 2 fc-switches in this enclosure. I can reproduce the hanging by adding the mezzanincard. I am not at the office today, can do a little more testing tomrrow. btw have you also removed the cmos battery?

regards
w.
Rob Buxton
Honored Contributor

Re: BL465c G7 hangs on reboot

No, but we do have an Array battery problem.
I've not been the one opening up the server (well not usually) where's the CMOS Battery?
woby
Advisor

Re: BL465c G7 hangs on reboot

Hi Rob,

i don´t mean the array-battery; the cmos-battery is seated under the array-battery
. I removed the cmos-battery because i have found this article describing a similar problem.
https://supportforums.cisco.com/message/3135172

rw
Rob Buxton
Honored Contributor

Re: BL465c G7 hangs on reboot

rw,
Different result here.
I removed the CMOS battery for a minute, removed the only Mezz card we have installed and retested. Server hung on the first reboot test.
bradfordc
Occasional Advisor

Re: BL465c G7 hangs on reboot

Reboot Issue
-------------
Can confirm this issue is not mezz card related for us. I have tested 2 BL465c G7 blades without any mezz cards - they exhibit the same problem.

On further analysis I have looked at the tech support local console during the reboot and the server hangs on 'Requesting system reboot.' When comparing this to a BL460c G6 this appears to be the last output prior to the power being reset.

We have made some progress on other fronts.

ECC Issue
-----------
I implemented the below, and this has since resolved the ECC memory issues.

1. Reboot the server and enter RBSU
2. Select Power Management Options
3. Select HP Power Profile
4. Select Maximum Performance
5. Verify that HP Power regulator is now set to HP Static High Performance Mode

HP Support will give more information on this fix.

PSOD Issue
-----------
There is also a new CNA driver available from HP support, version 2.102.486. This resolves the PSOD #PF Exception 14 errors we have been getting.


We're still left with the reboot issue at this stage, but 2/3 isn't too bad at this stage. The serevrs appear to be stable now, and I'm not lookign to reboot them all that often so the urgency has gone from this issue.
bradfordc
Occasional Advisor

Re: BL465c G7 hangs on reboot

OK, an interesting development. If I unassign the Virtual Connect profile from one of the ESXi 4.1 blades it will reboot without any issue. I have tested this several times.

Rob, can you replicate this in your environment?

I know without a profile the servers are far from useful, but this may help identify the root cause.
Rob Buxton
Honored Contributor

Re: BL465c G7 hangs on reboot

I can confirm we saw the same here.
As you say removing the profile has quite an impact. The host obviously is not registering with vCentre etc.
Removed the profile and an F12 / Restart from the console worked as expected.
bconstant
Advisor

Re: BL465c G7 hangs on reboot

Hello,

I've got exact same behaviour on my environment.
I'm using HP VC Flex-10 Enet Module for Ethernet connectivity and HP B-series 8/24c SAN Switch for SAN connectvity.
I've server profiles applied to all my blades and get a PSOD when I reboot these.
Did you progress on this issue?

Regards,
Rob Buxton
Honored Contributor

Re: BL465c G7 hangs on reboot

Not yet. I'm not seeing PSODs though.
we are in the process of updating the following:
be2net driver to 2.102.518.0
be2net firmware to 2.102.485.4
virtual connect switch firmware to 3.15
onboard Admin to 3.21

It's complicated by a problem we have where vMotion seems to cause the host to lose network connectivity. It's a different issue from the hang as the server is still operational. But we do then need to cold boot it to get connectivity back. Means we can't put the server into maintenance mode without losing a bunch of VMs. we only have development servers on it at the moment, but losing those also causes quite a bit of ire in the dev community!

Once we have this all updated I'll retest. From what Chris has said it doesn't seem as though this will fix the hang. If it fixes the vMotion issue I'll be a lot happier though. I can live with the hang issue for a bit.
Rob Buxton
Honored Contributor

Re: BL465c G7 hangs on reboot

Also note that my e-mail address is listed above. Feel free to drop me a line. We can swap HP call numbers. Chris and I have already made HP aware of the associated calls. The more the merrier.
bradfordc
Occasional Advisor

Re: BL465c G7 hangs on reboot

I've been in contact with someone in Turkey with similar issues this week.

My most recent update on this is as follows:
> New CNA driver 2.102.518.0 from VMware does NOT resolve this issue
> New CNA firmware 2.102.517.6 does not resolve this issue.

I have found that executing the following commands from the console (unloading the be2net driver manually before reboot) allows the server to reboot.
1. /sbin/services.sh stop
2. /sbin/esxcfg-module â u â f be2net
3. reboot

HP have all of the support case numbers, so we're slowly building a case. If anyone else has a similar issue please contact me:
chris dot bradford AT spicers dot co dot uk

HP are now trying to replicate our setup, they have a full virtual connect and chassis config dump.

As soon as anything else comes along I'll be sure to update here.
bconstant
Advisor

Re: BL465c G7 hangs on reboot

Fyi, I can also trigger a PSOD (PF Exception 14) if I remove the last vNetwork Distributed switch from a host.
This is not a common operation I do but the PSOD occurs eveytime I do that.
I also noticed vCenter is triggering alarm on memory health even tough the IML is empty.
Only one of my blades had IML entries related to ECC memory errors so I got a replacement from HP but the alert is still present in vCenter.
ronsexton
Occasional Visitor

Re: BL465c G7 hangs on reboot

Only see the PSOD issue now but we haven't widely tried the BIOS power setting 'solution' yet.
BL465c G7 2.1Ghz 2 x 12 core AMD 6172

Sept HP BIOS
ILO 1.15
be2net 518 driver and 517 firmware
Running off 4GB MicroSD (Kingston - $10) no hard drives or added array controller exist.
HP LP1205 mezzanine card.
HP September ESXi 4.1 image for install. All updates applied to current.
VC-F10 virtual connects (bay 1 and 2) and VC-FC 8GB virtual connects(bay 3 and 4).
OA 3.21
VC 3.10 and 3.15(i would do 3.15)

Right now only the PSOD issue exists for us and it is very infrequent and not reproducible which makes it hard to resolve.