cancel
Showing results for 
Search instead for 
Did you mean: 

Error mesg in kernel

 
dawn_jose85
Frequent Advisor

Error mesg in kernel

Hi

I got an error message in /var/log/messages as below
Unable to handle kernel NULL pointer dereference at 0000000000000001 RIP

My server had got rebooted at 2.00 0 clock
While checking the messages i got this messages
Unable to handle kernel NULL pointer dereference at 0000000000000001 RIP

in /var/log/messages

Can anyone help me the root cause for this error message in the log
4 REPLIES
Matti_Kurkela
Honored Contributor

Re: Error mesg in kernel

The message means something in the main kernel or one of the kernel modules attempted to use (dereference) a pointer variable that was set to a special NULL value. The NULL value means roughly "there is no meaningful value in this variable". Yet something tried to use it as if it was a meaningful value.

Normally this message is followed by many other lines that will provide more details about what the processor was doing when the error happened.

From this one line only, it is impossible to say anything more.

An example of a complete "Unable to handle kernel NULL pointer dereference..." message is in the first post of this Fedoraforum thread:
http://forums.fedoraforum.org/showthread.php?t=52508

All the lines dated "Apr 22 13:00:06" were generated by a single error event. Much of the numeric contents are useful for serious kernel programmers only, but the "Call Trace" might be the easiest to understand: it is a list of kernel functions executed just before the error was detected.

If the error happens again and the Call Trace is again the same, you may have found a kernel bug. If the system reboots again but the Call Trace is always different, a failing hardware component (processor, RAM or system board) might be a likely cause.

MK
MK
dawn_jose85
Frequent Advisor

Re: Error mesg in kernel

Hi,

The message log is showing the following messages.

Mar 20 02:03:01 vpp211 crond(pam_unix)[25484]: session closed for user linus
Mar 20 02:04:01 vpp211 crond(pam_unix)[25809]: session opened for user linus by
(uid=0)
Mar 20 02:04:02 vpp211 crond(pam_unix)[25809]: session closed for user linus
Mar 20 02:05:01 vpp211 crond(pam_unix)[26141]: session opened for user linus by
(uid=0)
Mar 20 02:05:01 vpp211 crond(pam_unix)[26141]: session closed for user linus
Mar 20 02:06:01 vpp211 crond(pam_unix)[26467]: session opened for user linus by
(uid=0)
Mar 20 02:06:02 vpp211 kernel: Unable to handle kernel NULL pointer dereference
at 0000000000000001 RIP:
Mar 20 02:20:33 vpp211 syslogd 1.4.1: restart.
Mar 20 02:20:33 vpp211 syslog: syslogd startup succeeded
Mar 20 02:20:33 vpp211 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Mar 20 02:20:33 vpp211 kernel: Bootdata ok (command line is ro root=LABEL=/ hdd=
ide-scsi console=tty0 console=ttyS1,115200)


After that error , the server got restarted automatially .(ASR is enabled in this server )
we are using
Linux 2.6.9-78.ELsmp #1 SMP Wed Jul 9 15:46:26 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
kernel of RHEL (64 bit)
Some sceduled jobs are assigned for the server. These cron jobs are executing automatically , while this time the error got happened , and nobody was manually interrupted any of the processes at this time.
dawn_jose85
Frequent Advisor

Re: Error mesg in kernel

Hi,
The server is getting hang atleast once in a month; Since ASR is enabled it will get recovered form the hang ( automatic rebbot will occur) . So i'm not able to get the exact error , and the message logs are not showing proper information.
can i assume this frequent hanging as hardware issue ? Is there any possibility for that ?
I'm expecting a reply ...
Matti_Kurkela
Honored Contributor

Re: Error mesg in kernel

Yes, it's possible it is a hardware issue. But without any further information and no other symptoms, it's hard to say for sure.

Apparently the error causes problems with the disk I/O subsystem, because the kernel seems to be unable to write the full error message to /var/log/messages. (You might want to check driver versions and/or firmware levels. If newer versions are available, read the release notes to see if any of the descriptions of fixed problems matches your problem.)

There are a few things you might want to do to gather more information when the error happens again:

* If the server can remain in a hung state for a while (i.e. not a critical service) and the server is conveniently accessible physically, you might disable the ASR and wait for the problem to happen. Most likely it will cause the server to hang. Then look at the console display: the full error message might be visible there, even if it cannot be written to disk. Once you've captured the error message from the screen (a camera might be useful here), you can reboot the system and enable ASR again.

* If the server has a serial port that is not used for anything else, you might connect it to another system and send all kernel messages to that serial port. Then the other system can be used to record all incoming data from the serial port. (A serial port is a very simple device, and it may remain functional even if more complex parts of the system are disabled because of this error.)
For example, if you want to send all kernel messages to serial port ttyS0 (=COM1), you'll need to add this to your boot options:

console=ttyS0 console=tty0

This will keep the normal console display working as usual, but all kernel messages are sent to the serial port too.

* You might also try configuring a crash dump feature (in your RHEL 4, either diskdump or netdump is available). This may require more configuration changes than the other options, but it has a chance of providing much more information for troubleshooting.

These RedHat Knowledge Base documents might be helpful: (RedHat Network access required)

https://access.redhat.com/kb/docs/DOC-5413
https://access.redhat.com/kb/docs/DOC-7075
https://access.redhat.com/kb/docs/DOC-6855

You might also want to run memtest86 or a similar hardware test utility in a loop over a weekend or so. If the test utility produces error messages or crashes, it's a hardware issue.

MK
MK