Operating System - Linux
1830517 Members
2792 Online
110006 Solutions
New Discussion

DL585 G1 + RHEL4 U4 (X86) == NMI received for unknown reason

 
Siert Zijl
Advisor

DL585 G1 + RHEL4 U4 (X86) == NMI received for unknown reason

We are running RHEL4 U4 on a DL585 G1 and are
experiencing kernel oops'es once in a while during backup or restore.

The tape device being used is a Ultrium 3/960 attached to a "LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI" scsi controller.

The following kernel information is logged to syslog/dmesg a couple of minutes before the kernel oops:

Uhhuh. NMI received for unknown reason 31 on CPU 2.
Uhhuh. NMI received for unknown reason 31 on CPU 3.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
Uhhuh. NMI received for unknown reason 31 on CPU 1.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
Uhhuh. NMI received for unknown reason 31 on CPU 0.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
Uhhuh. NMI received for unknown reason 21 on CPU 2.
Uhhuh. NMI received for unknown reason 21 on CPU 0.
Uhhuh. NMI received for unknown reason 21 on CPU 3.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
Uhhuh. NMI received for unknown reason 21 on CPU 1.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?

I wonder if other people experienced the same problem or have possible solutions. The kernel is -not- tainted.
Linux system administrator
3 REPLIES 3
Siert Zijl
Advisor

Re: DL585 G1 + RHEL4 U4 (X86) == NMI received for unknown reason

I fixed it.

During a restore I saw the following information logged from the kernel to syslog:

st0: Failed to read 32768 byte block with 1024 byte transfer.
st0: Failed to read 32768 byte block with 1024 byte transfer.

The server crashed (oops) with the line:

Unable to handle kernel NULL pointer dereference at virtual address 00000044

After more research I saw that the st.o kernel module was used, instead of sg.o. Once I changed that and reconfigured the tapedevice in Netvault, the above system behaviour is not returning.

Conclusion:
My backup software (Netvault) only supports the sg.o module for a stand alone tape device. The st.o module should not be used.
Linux system administrator
Siert Zijl
Advisor

Re: DL585 G1 + RHEL4 U4 (X86) == NMI received for unknown reason

See last post.
Linux system administrator
Maarten Verwijs
Advisor

Re: DL585 G1 + RHEL4 U4 (X86) == NMI received for unknown reason

RHEL4? Siert, you know better than that: you _know_ you should use Debian. ;-)


-mverwijs@cistron.nl^Wsron.nl