BladeSystem - General
1748133 Members
3529 Online
108758 Solutions
New Discussion юеВ

iLO2 on bl460c G1 consuming CPU "Hardware Interrupts"

 
ccpyle
Occasional Contributor

iLO2 on bl460c G1 consuming CPU "Hardware Interrupts"

I have multiple bl460c servers that every now and then will begin consuming 10% of CPU on "Hardware Interrupts". If I reset the iLO2, the CPU consumption subsides. All I can see is in Procexp.exe (Process Explorer) where the Interrupts process is taking the CPU cycles. I've stopped all HP services, SNMP, Citrix services, etc. with no change in the behavior. The only way for me to correct this issue is to reset the iLO2 which is only a temporary fix. I can't figure out what triggers the event, either.
All BIOS, Firmware, drivers, etc. are latest as of 8/5/09. This is happening to most of 16 new servers we just received. Some are SAN attached, others are local drives.
Ideas?
9 REPLIES 9
ukon97
New Member

Re: iLO2 on bl460c G1 consuming CPU "Hardware Interrupts"

Have you opened a ticket for this issue with HP. We opened a ticket one month ago and still have no solution. I was curious to see what if any information they have provided you.
ccpyle
Occasional Contributor

Re: iLO2 on bl460c G1 consuming CPU "Hardware Interrupts"

Opened a ticket. Found one component that was not updated. The Power Management Controller version was 3.3 which I was told is very problematic and was even removed from the HP web site. Updated to 3.4 and haven't seen the problems pop back up yet. I'll wait a few more days to say for sure the problem is gone.
ukon97
New Member

Re: iLO2 on bl460c G1 consuming CPU "Hardware Interrupts"

We are on all the latest rev's as HP support requires that to investigate the issues we are submitting. We were to the 3.4 power rev about 2 weeks ago I think. We still have the issue. It seemed to go away for like 2 days and then it started again. I was just hoping you had already opened a ticket so they could start getting good visibilty on this issue and take it seriously.
TJ Toohey
Frequent Visitor

Re: iLO2 on bl460c G1 consuming CPU "Hardware Interrupts"

I was having the exact same issue with about 7 of our blade centers with BL469C G1s in them. I believe we have finally resolved the issue, however.

In my troubleshooting, I also found that resetting the ILO temporarily relieved the issue. While looking at the VCAgent and all the firmware/driver versions I noticed that the ILO management controller had a new version that was not on the latest PSP. I have updated all of our servers to ILO fw 1.78 and the controller driver to 1.11.2.0. So far so good.

I hope this helps.

Tim
ccpyle
Occasional Contributor

Re: iLO2 on bl460c G1 consuming CPU "Hardware Interrupts"

I should have followed up on this sooner.
Just like above, the driver update version 1.11.2.0 solved our problem. The issue has not returned in over a month.
DMartin_1
New Member

Re: iLO2 on bl460c G1 consuming CPU "Hardware Interrupts"

This same problem happened to me on a ProLiant DL360 G5 today running iLO2 FW 1.77 and iLO2 Management Controller Driver 1.11.1.0.

I will be updating both to the latest versions and I assume it will resolve the problem in my case as well. Thanks!
Donald J Wood
Frequent Advisor

Re: iLO2 on bl460c G1 consuming CPU "Hardware Interrupts"

I├в m hoping Hp can provide a solution really quick regarding this. It├в s wreaking havoc all over the place in our shop.

I'm seeing it on ILO2. It├в s on many of our bl465G5 and bl685G5 with iLO firmware 1.78 and HP iLO Management Channel Interface Driver 1.14.0.0. It really cranks on one of the cores and moderately on the others. This I/O it creates causes a performance issue with the applications and many of our nightly processing that need to finish before the real time day.

Other symptoms include:
Servers hang when you do a graceful shutdown or a reboot. The only way to get them back up is to power them down. I created an NMI dump and sent it off to Microsoft for analysis. They said they found the iLo driver waiting and waiting and waiting like it was stuck in a never ending loop.

Clusters produce event ID 1123 because the I/O gobbling up too many interrupts. This is making the cluster think it can├в t talk over the heartbeat and cause the cluster service to eventually terminate after receiving several bus resets. We├в ve had several failovers too. My customer is trying to cut over from a couple of Dell 6800 (4 year old servers) and he can├в t because of this problem.

This issue is really bad news for our servers as it's dragging down performance and taking a lot of time to resolve. I wouldn├в t recommend any of this to anybody that wants to roll out a working server fleet. This is way far less then I expected to see coming from a world class computer company. To put it bluntly, it├в s what I would expect to see from a beginner.
Donald J Wood
Frequent Advisor

Re: iLO2 on bl460c G1 consuming CPU "Hardware Interrupts"

Here's an update on this issue:
There's a customer advisory that's been posted c01802766 released 2010-01-20. The advisory talks nothing about the excessive interrupt issue that we've seen. Instead, it talks about Event ID 57 and unexpectedly rebooting. The fix is to insure your iLO is at firmware 1.78 (or later) and iLO 2 Management Controller Driver Version 1.11.2.0 (or later).

Off the record Hp says that it fixes the high interrupt per second count and the reboot hang. I've asked for a customer advisory stating the above.

I applied the fix to 4 systems and it fixed nothing. After contacting Hp again they told me that the blade would have to be reseated either physically or using the e-fuse command in order to implement it sucessfully. I did that 10 hours ago and so far I have not seen the issue occur again. I'll repost here in a few days.

Here's the link to the customer advisory.

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&objectID=c01802766
Donald J Wood
Frequent Advisor

Re: iLO2 on bl460c G1 consuming CPU "Hardware Interrupts"

This system is running W2k3 enterprise sp2.

Day two after implementing the iLO driver update from cp010792.exe and reseating the blade. The ILO firmware was already at 1.78.

Since this one cluster was built we've seen constant event ID:57 and event ID:1123, cluster failovers and bus errors due to loss of the heartbeat, one abnormally high cycles on one of 8 cores and the inability to do a gracefull shutdown and reboot.

Since I've implemented the recommended fix this one cluster has been trouble free.

I'll post back in a week.