General
cancel
Showing results for 
Search instead for 
Did you mean: 

HP agents related to crash under Debian

Bill Blough
Occasional Visitor

HP agents related to crash under Debian


I have a dozen DL360 G5 servers, all with similar (but not identical) configurations.

They all run fine, except for one server, which randomly crashes after starting the HP health (OpenIPMI) and SNMP agents. (if I don't start the HP agents, it doesn't crash)

All of these machines are running Debian Lenny, with the 2.6.26-2-xen-686 kernel, and all are Xen hosts (xen-hypervisor-3.2-1-i386).

I've upgraded the agents on all servers to 8.25 with no improvement.

I've attempted to update the firmware on the failing server (using the bootable CD via ILO) but hardware discovery fails consistently.

I've also tried isolating differences between the failing server and the others, but none of the changes have helped. For example, the problem server's processor supports hyperthreading (which was enabled), but none of the working servers do. I disabled hyperthreading, but it made no difference.

At this point, I'm not sure if it's a kernel issue, an HP agent issue, or a hardware/firmware issue.

Has anyone seen anything like this? Can anyone offer suggestions?



10 REPLIES
Goran Koruga
Honored Contributor

Re: HP agents related to crash under Debian

Hello.

Perhaps start by defining what a crash means in this case.

Can you ping the box in question? Are keyboard LED-s blinking?

Regards,
Goran
Viktor Balogh
Honored Contributor

Re: HP agents related to crash under Debian

>They all run fine, except for one server, >which randomly crashes after starting the HP >health (OpenIPMI) and SNMP agents. (if I >don't start the HP agents, it doesn't crash)

Start these in debug mode and look for the error message. That's why I like *nix systems: they tell me their problem so I can solve it. ;)
****
Unix operates with beer.
Daniel Frazier
Frequent Advisor

Re: HP agents related to crash under Debian

hp-OpenIPMI is not supported in lenny - HP did release a version of it for sarge, but that is not necessary/tested on lenny.

I'd suggest removing the hp-OpenIPMI package.

If the crash persists, please provide a console log of the crash.
Bill Blough
Occasional Visitor

Re: HP agents related to crash under Debian


Goran Koruga:

I meant a kernel crash. The entire machine is unresponsive - no network, no I/O, etc.

As for the keyboard LEDs, I have no idea. It's headless and I only access it via ILO.


Daniel Frazier

I apologize - I misspoke. It's not actually running hp-OpenIPMI, but rather the distribution ipmi.

I've attached one of the crash logs. This particular crash was when hyperthreading was still enabled. I seem to recall the crashes without HT being different. While I don't have logs from any of the non-HT crashes, I can get one later tonight and post it for comparison.
Daniel Frazier
Frequent Advisor

Re: HP agents related to crash under Debian

fyi, I was unable to reproduce on a dl380g5 running 2.6.26-xen-686 version 2.6.26-19lenny2:

dl380g5:~# cat /proc/version
Linux version 2.6.26-2-xen-686 (Debian 2.6.26-19lenny2) (dannf@debian.org) (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 SMP Wed Nov 4 23:23:33 UTC 2009

Is this the same version you are using?
Bill Blough
Occasional Visitor

Re: HP agents related to crash under Debian


It looks the same to me:

bblough@XXXXXXXX:~$ cat /proc/version
Linux version 2.6.26-2-xen-686 (Debian 2.6.26-19lenny2) (dannf@debian.org) (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 SMP Wed Nov 4 23:23:33 UTC 2009


It doesn't surprise me that you can't reproduce it - I have 11 other DL360s running the exact same software configuration and I can't reproduce it on them either. This is the only box that has problems.

When comparing the hardware, there are differences in CPU model/speed, amount of RAM, and firmware revisions.

The RAM amounts and firmware versions, though different, are all relatively close.

However - something just jumped out at me, though I don't know how relevant it is.

All of the working servers have CPUs that are 5130, 5140 or 5160 Xeons. The failing server has a P4 Xeon (a 5050 I think).

Could this be related?


Daniel Frazier
Frequent Advisor

Re: HP agents related to crash under Debian

Not sure if that could be related - but I would suggest updating the firmware (system and iLO2) to the latest versions.
Bill Blough
Occasional Visitor

Re: HP agents related to crash under Debian

I did manage to upgrade the ILO2 firmware via ILO.

Unfortunately, when I try to upgrade the rest of the firmware (by booting off of the firmware CD image) it fails to discover the hardware. I'm not really sure where to go from there, but then, I haven't had a lot of time to research it yet.
Goran Koruga
Honored Contributor

Re: HP agents related to crash under Debian

Hello.

Looks like a crash in the IPMI subsystem, probably best to discuss this with the maintainers of it.

You can find the details in the MAINTAINERS file (I purposely don't want to advertise their email address here).

Regards,
Goran
Bill Blough
Occasional Visitor

Re: HP agents related to crash under Debian


OK, I'll confirm that my first impression was wrong - the issue isn't directly to the HP agents. As it turns out, not loading the agents does not stop the crashes, it just decreases the frequency (days between crashes, rather than hours).

As Goran pointed out, it looks like the issue is with IPMI. I'm guessing that the agents exacerbated the issue since presumably, IPMI would be used more when the agents were loaded than when they were not.


After jumping through some hoops, I did manage to get the rest of the firmware updated. I'm going to wait and see if that makes any difference before I contact the kernel team.

I appreciate all of the help from everyone. Hopefully I'll be able to get this sorted out sooner rather than later.

Cheers!