Operating System - Linux
1752866 Members
3973 Online
108791 Solutions
New Discussion юеВ

Re: Unexplained Reboots - DL385 RHEL4 AS x86_64

 
Mark Addinall
New Member

Re: Unexplained Reboots - DL385 RHEL4 AS x86_64


We have the same issue on two of my new DL385s. AMD64 Opteron, RedHat ES 4.2 and HP Toolset 7.4.

I'll follow this thread.

Ta,
Mark.
Algimantas
Advisor

Re: Unexplained Reboots - DL385 RHEL4 AS x86_64

Hello again,

I have received following suggestion from HP:
"1. In the /etc/sysconfig/powersave/common file, replace this line
POWERSAVE_CPUFREQD_MODULE=""
with
POWERSAVE_CPUFREQD_MODULE="off"
2. Reboot the system"

Since that systems up and running (uptime 11 days).

Might be it helps.
Andreas Linnert
Advisor

Re: Unexplained Reboots - DL385 RHEL4 AS x86_64

Hello,

after deactivating the ASR, System is up and running for 8 days without any errors.

The solution provided by our HP support contact is to apply the latest pro liant support pack 7.52 where some of related issues where solved.

- Andreas -

Re: Unexplained Reboots - DL385 RHEL4 AS x86_64

I don't think PSP 7.52 is going to be any better. PSP 7.52 contains the same version of the hpasm driver as the 7.51 version, and that seems to be the problematic driver.

Disabling the ASR Lockup Detection in the BIOS does prevent the errors and the reboot from occurring. I'll stick with the workaround for now.
Walt McDaniel
Occasional Contributor

Re: Unexplained Reboots - DL385 RHEL4 AS x86_64

I'd be real interested in knowing if anyone is seeing the same issue on AS 3.0 on the dual-core Opterons. We are seeing numerous ASR reboots across our 100+ dual-core opterons and the only thing in the hplog is ASR Detected by System Rom. There is nothing in the system log to indicate there is a system problem
kenny chia
Regular Advisor

Re: Unexplained Reboots - DL385 RHEL4 AS x86_64

Hi all
I am facing the same problem with a DL385G1 server with RH AS 4 update 3 64 bit. PSP is 7.51.

The strange thing is that I have an additional temperature log after the ASR log (Not before). Can I conclude that the ASR caused a temperature violation?

0003 Critical 19:19 09/09/2006 19:19 09/09/2006 0001
LOG: ASR Lockup Detected: (casm device driver alerted)

0004 Caution 19:21 09/09/2006 19:21 09/09/2006 0001
LOG: POST Error: 1610-Temperature violation detected Waiting 5 minutes for system to cool Press Esc key to resume booting without waiting for the system to cool. WARNING: Pressing Esc is NOT recommended as the system may shutdow
All Your Bases Are Belong To Us!

Re: Unexplained Reboots - DL385 RHEL4 AS x86_64

Kenny,

I saw the same thing on one of my 385s and it has not recurred since disabling the ASR lockup detection.

I notice that the temps run a little higher on this box overall, but I have not had the temps go over the threshold since.

I would recommend that you disable the ASR Lockup detection and then keep an eye on the temps over at least a few days.

- Alex
Tait Sanders
Occasional Advisor

Re: Unexplained Reboots - DL385 RHEL4 AS x86_64

I've got 6 * BL20p g3 blades with RH EL3 up7 installed. Also HP PSP 7.52. Every now and then one of the blades will 'kernel panic' and there are no entries in either the system or HP logs (IML, Insight Diagnostics). Then last night I saw an error on the terminal from a resulting kernel panic: "Uhhuh. NMI received. Dazed and confused, but trying to continue."

I've had a job logged with HP for months and am waiting for this issue to be resolved. This issue was also there when I had HP PSP 7.50 installed.

I've disabled the ASR on one of the blades and the only difference is that the server doesn't reboot (which is ASR's function) but just hangs with the kernel panic...

I'm wondering if it is the case that the HP PSP 7.52 version of the 'Insight Manager Health Driver' is buggy?

Has anyone seen a resolution to the kernel panick issue?
Richard Rydstr├╢m
Valued Contributor

Re: Unexplained Reboots - DL385 RHEL4 AS x86_64

Hello guys
I've seen the same issue on 5 of our servers, and it was related to the HW combination SA6404 and FC1243 (Qlogic). After replacing the Qlogic based FC1243 with an Emulex based FC2243 problem was solved.