1748284 Members
3569 Online
108761 Solutions
New Discussion

Re: Kdump

 
Duffs
Regular Advisor

Kdump

Hi,

Can someone please advise if it is overkill installing kdump on my RH AS4 server when I already have the "SOSREPORT" script for analysing in the event of a crash.

If it is what are the pro's of kdump over the sosreport and should I really have kdump installed on all of my production servers?

R,
D.
1 REPLY 1
Matti_Kurkela
Honored Contributor

Re: Kdump

When a system crashes, it often means the kernel has detected some corruption in its own data structures. If the structures related to disk I/O are damaged, the crashing kernel cannot write to disk any information about the crash.

Kdump works by setting up an area of memory for a minimal Linux kernel and a special crashdump environment. When the regular kernel crashes, the kdump kernel will start up using only its dedicated memory area. It will re-detect all the hardware from scratch. It will mount the appropriate disk, and then it will salvage all the system state information it can find from the crashed kernel's memory area.

The information kdump attempts to salvage includes the process listing at the time of the crash, the message buffer (the "dmesg" output) of the crashed kernel, list of active network sockets, and lots of kernel internal information useful for kernel or driver debugging.

The "sosreport" output can be produced only after the system is rebooted. At that time, any information that was not written into system or application logs is already lost.

For example, the OOM killer messages included in sosreport might tell you the system crashed because it was running out of RAM and swap space. But what was the biggest running process at the exact time of the crash? Sosreport cannot tell you that; but kdump might.

Whether kdump is overkill or not, depends on your situation. Are you likely to deeply analyze system crashes yourself or to open support calls to RedHat for them? Or will you just move an unreliable server to a less critical job (or to a testbench for reproducing the problem) and replace it with a spare?

My suggestion: test kdump by triggering it intentionally in a test environment. Examine the information it can provide, and decide how useful it is to you.

MK
MK