1748204 Members
3926 Online
108759 Solutions
New Discussion юеВ

Re: Redhat Linux

 
SOLVED
Go to solution
earlysame55
Occasional Advisor

Redhat Linux

Dear all,
I'm running 2.6.9-42.ELsmp on a ProLiant BL685c G1. This machine gets into a hung state sometimes at night. I have yet to enable sysrq option. I'm seeing the following in my dmesg. Can one of you please let me know whether the following needs some corrective actions. Would be a great help as i need to promote these servers to production.

-Please enable the IOMMU option in the BIOS setup

-Total of 8 processors activated (35366.05 BogoMIPS).
..MP-BIOS bug: 8254 timer not connected to IO-APIC
failed.
timer doesn't work through the IO-APIC - disabling NMI Watchdog!
Using local APIC timer interrupts.

-Uhhuh. NMI received for unknown reason 30.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?

-Uhhuh. NMI received for unknown reason 20.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enable

Thanks in advance.
13 REPLIES 13
Matti_Kurkela
Honored Contributor

Re: Redhat Linux

Your hardware is trying to tell you something, but the OS does not understand it. Maybe you don't have the Proliant Support Pack or at least the hpasm package installed?

Go to http://www.hp.com/go/support and select "Download drivers and software". Type in your server model. Then select your OS version from the listed choices and you'll get to a download page.

You can download the Proliant Support Pack to get all the Proliant hardware-specific features with one download.

Or if you want to take a minimalist approach, do this:

Download and install the packages from the category "Driver - System Management".

- The HP OpenIPMI device driver is an enhanced version of the regular OpenIPMI driver included in RedHat, and is needed to make the OS understand the hardware monitoring information from iLO2. This is required by the hpasm package.

- The HP iLO2 Watchdog Timer Driver acts as a replacement for the NMI watchdog, with extra functionality.

Then download and install from the category "Software - System Management":

- "HP System Health Application and Insight Management Agents for RedHat": this is the "hpasm" package. It is required for hardware health monitoring and contains useful command-line tools (hpasmcli, hpimlview).
This package (together with the Watchdog Timer Driver) will "teach" the OS to understand the NMI messages, so the "NMI received for unknown reason xx" messages should go away.

- You may wish to install a "HP Array Configuration Utility" (either the GUI or CLI version, choose the one you prefer)

- various diagnostic tools ("HP Insight Diagnostics Online Edition" and "HP Array Diagnostic Utility") might be useful to detect hardware problems, but are not mandatory.

- "HP NIC Agents for RedHat" collects statistics information from your network cards and makes it available to other monitoring tools. It is not mandatory.

Finally, from the "Driver - Lights out Management" category, download and install k either "HP Lights-out Drivers and Agents" or the newer "HP Proliant Channel Interface Device Driver". Choose the one that matches the major version number of your hpasm package: if your hpasm package version is 8.0.xx, pick a version 8.0.yy from here. This allows you to update your iLO2 firmware without rebooting the OS, if necessary.

MK
MK
JKytsi
Honored Contributor
Solution

Re: Redhat Linux

Have You enabled the HPET timer from RBSU ?
Remember to give Kudos to answers! (click the KUDOS star)

You can find me from Twitter @JKytsi
earlysame55
Occasional Advisor

Re: Redhat Linux

Gurus,

Thanks for the replies, I found something interesting in this machine. The ilo2 shows:

Proc 1: 2200 MHz
Execution technology: 2/2 cores; 2 threads
Memory technology: 64-bit capable
Processor 1 Internal L1 Cache: 64 KB
Processor 1 Internal L2 Cache: 1024 KB
and there are 4 processors like that.
My uname -a shows:

Linux linuxdevbl51.mediasolv.com 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:32:02 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux. This means my machine arch/processor class is x86_64

And the cpuinfo in the OS tells me:
processor : 7
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 8214
stepping : 3
cpu MHz : 1004.703
cache size : 1024 KB
physical id : 3
siblings : 2
core id : 1
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni cx16
bogomips : 2009.29
TLB size : 1088 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp [4] [5]

For each of the logical ones. But a similer machine which does not have the hanging issues report the machine arch/processor arch, i mean the uname -m as i686 and the uname -p as athlon. There isn't anything that the dmesg reports as my original post.
This machine is also a 685c G1. Following is the cpuinfo from the OS of the machine which dosn't have problems:
processor : 7
vendor_id : AuthenticAMD
cpu family : 15
model : 65
model name : Dual-Core AMD Opteron(tm) Processor 8214
stepping : 3
cpu MHz : 2210.543
cache size : 1024 KB
physical id : 3
siblings : 2
core id : 1
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni
bogomips : 4420.56

What do U think is wrong?. Should i set the HPET and see or is there anything else also done?
JKytsi
Honored Contributor

Re: Redhat Linux

Well You must enable HPET in 64bit Linux

http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c00781084тМй=en&cc=us&taskId=101&prodSeriesId=2510373&prodTypeId=15351
Remember to give Kudos to answers! (click the KUDOS star)

You can find me from Twitter @JKytsi
Jimmy Vance
HPE Pro

Re: Redhat Linux

On the machine that is hanging according to the output of uname you have installed an x86_64 (64bit) kernel. On the system not hanging uname is reporting an x86 (32bit) kernel. For x86_64 kernels you need to enable HPET as others have stated.
No support by private messages. Please ask the forum! 
earlysame55
Occasional Advisor

Re: Redhat Linux

Thanks for all the responses. The NMI messages have gone off after the changes in the bios. I still have the

"Please enable the IOMMU option in the BIOS setup"

I'm not used to Blade hardware. How can i correct this ?
JKytsi
Honored Contributor

Re: Redhat Linux

Your kernel is not the .....hmm most current =) There is no setting in 685c BIOS to set that.
Remember to give Kudos to answers! (click the KUDOS star)

You can find me from Twitter @JKytsi
earlysame55
Occasional Advisor

Re: Redhat Linux

Thanks for the reply. There is another 685C which does not have any of these problems. And it's the same kernel version as well. After i enable the HPET option in the bios should i do any changes in the installed OS?. Should it be re-installed ?. Also i saw some where there is a change that is to be done in the grub.conf. Can one of you let me know whether there is anything like that to be done.
JKytsi
Honored Contributor

Re: Redhat Linux

no reinstall needed, are server BIOS versions same ? I remeber that there has been some corrections for opteron based server bioses
Remember to give Kudos to answers! (click the KUDOS star)

You can find me from Twitter @JKytsi