1830051 Members
2523 Online
109998 Solutions
New Discussion

RedHat freeze

 

RedHat freeze

Hello,

On Saturday a RedHat 2.1AS server freeze. We couldn't access the machine from ssh or localy so we reboot it. Now I'm wondering what happens as it the second times it happens. In /var/log/messages I get regularly messages from IPTABLE (all input is logged), but when the machine freeze I get no more message anywhere. Does somebody know where I should look at or what should I do to see where the probleme come from ??
PS: The server is a DL-380 G3 with Hyper-Threading.

Thanks,
12 REPLIES 12
Steven E. Protter
Exalted Contributor

Re: RedHat freeze

I got the very same behavior on a ES 3.0 server on Saturday, interestingly enough.

The only possible issue with that server was the var filesystem was on its way to getting full.

The behavior was the same, no messages, no nothing. I'd check bugzilla for this issue becasue mine is a Dell Poweredge and has few smililarities to yours in the hardware realm.

Interesting.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Don_89
Trusted Contributor

Re: RedHat freeze

Here is a good article on troubleshooting this..

http://www.linuxdevcenter.com/pub/a/linux/2001/11/01/postmortem.html

Since Steve had the same issue, maybe it was an up2date package. I checked my /var/log/up2date.1 and didn't see anything being updated.. Could be different for you since it depends on what packages are installed..

Re: RedHat freeze

What is curious is that hpasm is running and, from one second to the other the Insight Console change the state of the server from "OK" to "not accessible".
Is it possible that the Hyper-Threading could have some problem with linux ?
I will take a look at bugzilla too.
Don_89
Trusted Contributor

Re: RedHat freeze

I've had nothing but problems with version 7.0 of the HPASM drivers including automatic system reboots (ASR's). We went back to 6.40 and never had a problem since on 20+ servers.

Re: RedHat freeze

As I see you are using hpasm I will ask one just last question about ASR: Do you let it enabled on your servers ??
I disable it because some of my servers are sunnning Oracle and they can't be rebooted when they want.

Thanks for all information.
Jan Sladky
Trusted Contributor

Re: RedHat freeze

hi Jean,
in this situation I would consider allowing to log all messages by syslog:
/etc/syslog.conf

*.* -/var/log/allmessages

maybe you will find there the cause for next freezing (I hope it will not come again, but for certainty)

br Jan

GSM, Intelligent Networks, UNIX

Re: RedHat freeze

Thanks,

I will try that too.
Darrin St. Amant
Frequent Advisor

Re: RedHat freeze

RH does not suggest because of performance using Hyper-Threding turned on for RH2.1

could be the culprit.

cheers!
Vitaly Karasik_1
Honored Contributor

Re: RedHat freeze

Darrin, can you please point me to this RH recommendation?

10x, Vitaly

Re: RedHat freeze


I am interrested in too.

Re: RedHat freeze


One of the best ways to troubleshoot complete system freezes is to use the "NMI watchdog" feature. Your machine needs to either be SMP or have an enabled APIC (if uniprocessor). If you're running the SMP kernel with a single Xeon, that works, too.

To enable it, append "nmi_watchdog=1" to the boot line in GRUB, or add an 'append="nmi_watchdog=1' if you're using LILO. Once this is done, reboot and you're all set.

The NMI watchdog generates interrupts periodically that will cause the kernel to panic if the processor freezes. You can then use the aforementioned debugging tools to find out what is going on. I used this recently to detect system freezes caused by an errant driver.

Hope this helps!

Re: RedHat freeze

Many Thanks,

While talking about that, I wanted to stop hpasm on a server (running about still 70 days), and the /etc/init.d/hpasm stop failed for the 2 following processus: cevtd and casmd. They are now totally lost in the space and take each 100% of a CPU. As I have 4 the server is still up but really unhappy to work so much for nothing ;).
BTW I think Ingight Manager 7 got very big trouble... Where should I make some feedback to HP ??