Operating System - Linux
1832871 Members
2950 Online
110048 Solutions
New Discussion

Re: Gernerate NMI to hung server - get me a nice vmcore :-)

 
Lee Harris_5
Valued Contributor

Gernerate NMI to hung server - get me a nice vmcore :-)

Hello,

We have a numbr of Intel ProLiant servers running RedHat Enterprise Linux 3. From time to time, and for no apparent reason these boxes just hang. We can't SSH to them, but they still respond to a ping. We connect to the iLO and attempt to login to the console, but after putting in the username and password again it just hangs and won't actually give a command prompt.

In an effort to try and figure out why this is happening, we need to force a hung box in this state to perform a crash dump so we can send it off to whoever for some analysis.

So, I setup a netdump-server and setup the server which crashes as a client. Tested it and it all worked OK. I modified the kernel parameter kernel.unknown_nmi_panic to 1 from 0 so when I sent it an NMI it should die on it's arse and give me a nice dump...

However, the box went this morning. So I logged onto the iLO, generated an NMI, but all it did was dump a couple of log lines onto the netdump server, no actual crash dump was produced.

Have I missed anything out here? I did this on another couple of RHEL3 test boxes and got a lovely big vmcore file of about 4 gig on my netdump server, but I'm getting nothing on the server I actually WANT a crashdump from.

Thanks in advance - Lee
3 REPLIES 3
Craig Gilmore
Trusted Contributor

Re: Gernerate NMI to hung server - get me a nice vmcore :-)

It sounds like you are on an early version of RHEL 3. Or, at least prior to RHEL 3U7.

There is a known problem with memory management that will cause a silent hang. The fix is to get the updated memory manager that Red Hat released with Update 7 for RHEL 3.

The hang happens at various times with various different loads. I've seen this problem more than a few times, and the update usually resolves the hangs.

Good luck!

CG
Rick Beldin
HPE Pro

Re: Gernerate NMI to hung server - get me a nice vmcore :-)

Check the netdump server's /var/log/messages for messages. You can run into this situation if:

- you don't have enough space in /var on the server to capture the dump

- you have a noisy network (netdump uses udp)

- you have one of the many bugs in early versions of netdump :-)

There were issues prior to RHEL3 U6 that prevented netdump from working properly. Ensure that your are NOT using bonding on the netdump server interface OR the client if you are trying to attempt this prior to U6.

Your second option is diskdump. diskdump was enhanced starting with RHEL3 U6 to allow for dumping to cciss (Smart Array) devices.
Necessary questions: Why? What? How? When?
Bill McNAMARA_1
Honored Contributor

Re: Gernerate NMI to hung server - get me a nice vmcore :-)

what's your release?
if it's redhat you need a fix or workaround:

see notes on cciss and diskdump:

http://www.redhat.com/docs/manuals/enterprise/RHEL-3-Manual/release-notes/as-x86/RELEASE-NOTES-U6-x86-en.html

Bill
It works for me (tm)