Operating System - Linux
1829627 Members
1724 Online
109992 Solutions
New Discussion

How do I find out the cause for the hang?

 
GnanaShekar
Regular Advisor

How do I find out the cause for the hang?

Hi,

We have a RHEL AS 3 Update 6 server. This server seems to hang everyday since a couple of days.
I tried to login (using telnet/ssh) when the users reported the problem. After I type the username at the login prompt the system seems to hang.
The only option for me would be to go to the lab and power-off and power-on.

What could be the possible causes?. How do I find out the cause for the hang?. Please suggest.

# more /etc/redhat-release
Red Hat Enterprise Linux AS release 3 (Taroon Update 6)

Thanks & Regards
6 REPLIES 6
Steven E. Protter
Exalted Contributor

Re: How do I find out the cause for the hang?

Shalom,

Lots of possible causes.

check /var/log/messages

See what the last messaging prior to the crash say.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Vitaly Karasik_1
Honored Contributor

Re: How do I find out the cause for the hang?

In addition to Steven's suggestion I recommend you to run CPU/RAM tests. You can use ultimate boot CD http://ubcd.sourceforge.net/.
GnanaShekar
Regular Advisor

Re: How do I find out the cause for the hang?

Hi,

Following are few things I find in the logs.

[root@bangpcrh32 log]# grep -i error /var/log/messages
Jul 18 06:24:59 bangpcrh32 auditd[1480]: output error
Jul 18 06:24:59 bangpcrh32 auditd[1480]: output error
Jul 18 06:24:59 bangpcrh32 auditd[1480]: output error; suspending execution
Jul 19 04:39:21 bangpcrh32 insmod: Hint: insmod errors can be caused by incorrect module parameters, including invalid IO or IRQ parameters. You may find more information in syslog or the output from dmesg
Jul 19 04:39:24 bangpcrh32 xinetd[1660]: Error parsing attribute user - DISABLING SERVICE [file=/etc/xinetd.d/auth] [line=14]
Jul 19 04:43:52 bangpcrh32 insmod: Hint: insmod errors can be caused by incorrect module parameters, including invalid IO or IRQ parameters. You may find more information in syslog or the output from dmesg
Jul 19 04:43:55 bangpcrh32 xinetd[1664]: Error parsing attribute user - DISABLING SERVICE [file=/etc/xinetd.d/auth] [line=14]
[root@bangpcrh32 log]# grep -i error /var/log/messages.1
Jul 14 09:21:00 bangpcrh32 auditd[1469]: output error
Jul 14 09:21:00 bangpcrh32 auditd[1469]: output error
Jul 14 09:21:00 bangpcrh32 auditd[1469]: output error; suspending execution
Jul 17 10:51:23 bangpcrh32 insmod: Hint: insmod errors can be caused by incorrect module parameters, including invalid IO or IRQ parameters. You may find more information in syslog or the output from dmesg
Jul 17 10:51:25 bangpcrh32 xinetd[1674]: Error parsing attribute user - DISABLING SERVICE [file=/etc/xinetd.d/auth] [line=14]
GnanaShekar
Regular Advisor

Re: How do I find out the cause for the hang?

Hi

Thanks to all.

The issue was with auditd daemon getting suspended when /var filesystem space threshold is reached.

This will stall any process trying to deliver audit event messages until the auditd daemon resumes normal processing.



We have done the following:

1. Deleted all the old files in /var/log/audit.d folder.

2. Killed the auditd daemon.

3. Edited it configuration file to delete old files automatically when the threshold is reached.

4. Started the auditd daemon.

Thanks & Regards,
-GnanaShekar-
Jorge Cocomess
Super Advisor

Re: How do I find out the cause for the hang?

GnanaShekar,

Great write-up! Would you know what's the default size of /var filesystem space threshold? Or how can I check mine?

Regards,
J