1826578 Members
3704 Online
109695 Solutions
New Discussion

Re: Linux server hang

 
B K Arun Kishore
New Member

Linux server hang

Problem definition:
We are facing server hang problem from past 3 months. We have analyzed all our services that we are executing, and the server logs in /var/log/ but couldn’t find the solution. We are manually rebooting the server to recover it from hung state.

Action taken:
We have analyzed all the system logs and application logs in all our servers but we haven’t found any fixed pattern of messages in system logs. We are taking memory dump by top command for every 15 minutes and we found sufficient memory left before server going into hang state.

Additional Information:
My linux server is a NFS client. I have already enabled kernel.sysrq to 1 in sysctl.conf. As the hang state is random so I am not able to use "Magic" SysRq key.

System configuration:
Red Hat Enterprise Linux ES release 3 (Taroon)
Kernel: 2.4.21-40.EL
Postgres: 7.3.8-2
Redhat Cluster Manager: 1.2.28
RAM: 2GB
Server: HP ML 370 G3, DL 760 G2

Please let me know the scenario’s in which server gets into hung state and what we need to check for rectifying the server hang problem.

Thank you in advance
5 REPLIES 5
Alpha977
Valued Contributor

Re: Linux server hang

Hello B K Arun Kishore!

Some years ago i have the same problem with 2 server, but was 2 different problems.
In the first was the unstable kernel version, have a reset or hang system. I solve with recompiling a new kernel version.

In the second was a bios problem, the hardware sent a reset command continuosly.

In both cases, i don't see anything into /var/log/messages.

Try first to download and compile a new kernel version.

Regards
B K Arun Kishore
New Member

Re: Linux server hang

Hi,

I have updated by kernel version 2 month back from 2.4.21-4.EL to 2.4.21-40.EL but I am facing the same problem. Is there any tool availble to check.

Regards,
Alpha977
Valued Contributor

Re: Linux server hang

Hello!

If i don't wrong the number pair of the kernel are stable versions, the other are only for test.

ex: 2.4.20 stable
2.4.21 unstable
2.4.22 stable... etc etc

Regards.
Heironimus
Honored Contributor

Re: Linux server hang

I've also seen systems hang like that because of overheating or bad hardware. If you're not seeing any messages on the console and can't SysRq both of those are definite possibilities.

The last number of the kernel version is not what differentiates between stable and development kernels, it's the second number. 2.4.x is a release, 2.5.x and 2.3.x are development trees. In any case, if you're using RHEL you need to use Red Hat's kernels, no matter what version they are. If you're using anything else they won't support you.

You could ask Red Hat support if they know of any particular issues with that release on your hardware or have any suggestions for troubleshooting.
dirk dierickx
Honored Contributor

Re: Linux server hang

alpha779, you are mistaken, the kernel versioning does not work like that.

the reasons could be plenty. if possible try to run a hardware checker. memtest86 (http://www.memtest86.com/) is the best known and used for detecting memory problems (but your system will have to go offline for the period of the test).

you must update your whole RH system to the latest patches. not only the kernel packages are important, but things like glibc are too. update your packages with 'up2date' command.

also important, are you running a pure RH without extra software? are you using unofficial kernel modules? do you really need those? try to run without them. if you are using X, try to disable X and see what happens.