1756524 Members
2737 Online
108848 Solutions
New Discussion юеВ

NMI Error Messages

 
Connie Fadriquela
Occasional Contributor

NMI Error Messages

Hello Support,

I'm currently investigating our problems right now in our HP server (DL380-G3) and we have RHEL 4 U2 linux installed on it.

These server always encountered hang-up.

Then, I found error messages in the linux messages log files.

The message are the following:
May 23 02:27:09 dgpsvr25 kernel: Uhhuh. NMI received. Dazed and confused, but trying to continue
May 23 02:27:09 dgpsvr25 kernel: cpqphp: power fault interrupt
May 23 02:27:09 dgpsvr25 kernel: You probably have a hardware problem with your RAM chips
May 23 02:27:09 dgpsvr25 kernel: cpqphp: power fault bit 0 set

There are cases because of this error messages that the server keeps rebooting, that's why we tried to replace the memory physically.

But these error keeps occurring.

And i tried to checked what does NMI related to but I'm having difficulties to understand what does error messages we have mean.

Now I'm seeking on your help if we have problem in memory (or RAM itself) but it keeps happening even we change RAM.

Is the problem related to memory slot?

Parity error in PCI (this is what I got through researching but I cannot fully understand)?

Or we have problem in motherboard or any part of the hardware server?

This problem gives us headaches because this (the past two weeks, we encountered hang-up due to the error logs above). This cause us a lots of downtime in our system.
7 REPLIES 7
Steven E. Protter
Exalted Contributor

Re: NMI Error Messages

Shalom,

You probably have a hardware problem with your RAM chips

I'd do a full hardware diagnostic. If its a server class box there should be a boot disk that came with it for doing these tests.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
dirk dierickx
Honored Contributor

Re: NMI Error Messages

we had this a while back, you will need to change the RAM, just like it says.
Connie Fadriquela
Occasional Contributor

Re: NMI Error Messages

We did change the physical memory many times, but the problem keeps coming back after few days
Connie Fadriquela
Occasional Contributor

Re: NMI Error Messages

Reply to Steven..

Can you help me on how to diagnose the hardware?

Or can you lead me to a link or documentation that will help me do that?

any help is really appreciated.

thanks in advance
Luca Conti
New Member

Re: NMI Error Messages

Hi Connie,
I have the same problem (I have a DL 385 G1 with RHEL 4 linux) I opened a HW call and the operator told to me to download SmarStart CD 7.80 (latest version) from HP site, boot from that CD and use diagnostic tools on it.
I tried nothing until now and I waiting for further analysis by HP.



Hope help you,
Luca
Galbrun
New Member

Re: NMI Error Messages

I guys,
I'v got the same problem.See the /var/log/message :
Jun 30 17:25:17 kernel: Uhhuh. NMI received. Dazed and confused, but trying to continue

Jun 30 17:25:17 kernel: Uhhuh. NMI received. Dazed and confused, but trying to continue

Jun 30 17:25:17 kernel: You probably have a hardware problem with your RAM chips

Jun 30 17:25:17 kernel: Uhhuh. NMI received. Dazed and confused, but trying to continue

Jun 30 17:25:17 kernel: You probably have a hardware problem with your RAM chips

Jun 30 17:25:17 kernel: Uhhuh. NMI received. Dazed and confused, but trying to continue

Jun 30 17:25:17 kernel: You probably have a hardware problem with your RAM chips

Jun 30 17:25:17 kernel: You probably have a hardware problem with your RAM chips

Jun 30 17:25:17 hpasmd[2812]: WARNING: hpasmd: ASR Lockup Detected: (casm device driver alerted)

Jun 30 17:25:18 shutdown: shutting down for system reboot

And the server reboot ...
We already change the RAM chips but the problem continu ...
I contact the technical support but the techician says to me it is not a hardware problem (!), he advise to upgrade the linux kernel for a newer version but don't advise me which version I must install ... It is a joke ?
Connie Fadriquela
Occasional Contributor

Re: NMI Error Messages

Here's what we did, but I'm not sure if its going to work in your case.

We replaced the motherboard of the server. And one of the linux guy help us to fix something on the linux.

If you will do the same action, I recommend to backup everything whatever in the server in case the Linux will crash.

But, I knew that NMI error messages also related to memory allocation or usage in the OS side. Hopes this help.