Operating System - Linux
1833184 Members
2814 Online
110051 Solutions
New Discussion

Re: Server grinds to a halt with no clues

 
SOLVED
Go to solution
steven Burgess_2
Honored Contributor

Server grinds to a halt with no clues

Hi Everyone

rhel3 update 4

http server running IBM Websphere is grinding to a halt daily. I am running the below to attempt to capture if memory is leaking and causing a problem. Whilst usage is increasing there is still > 2gb free

echo "Checking Stats on `date` " >> $LOG
echo >> $LOG
/usr/bin/mpstat -P ALL >> $LOG
echo >> $LOG
echo >> $LOG
/usr/bin/free >> $LOG
echo >> $LOG
/bin/ps -e -o 'user,cpu,pcpu,vsz,args' | sort -rnk 4 >> $LOG
echo >> $LOG
/usr/bin/vmstat >> $LOG
echo >> $LOG
echo "number of webshere processes = $(ps -ef | grep -i web | wc -l)" >> $LOG
echo >> $LOG
mapped_proc=$(pmap $(ps -ef | grep -i WebSphere | grep -v grep | head -1 | awk '{print $2}') | tail -1)
echo Sum memory for Websphere = $mapped_proc >> $LOG
echo >> $LOG
echo "----------------------------------------------------" >> $LOG


I have also set *.* /var/log/messages for syslog.conf to attempt to catch everything

At the moment, i'm not getting anything that is pointing me to the source of the problem. When the server dies, checking at the console the system doesn't even serve me a logging prompt, running at run level 3.

Any ideas, what further to check

tia

steve
take your time and think things through
6 REPLIES 6
Ivan Ferreira
Honored Contributor
Solution

Re: Server grinds to a halt with no clues

To record performance statistics, you should use sar, a command part of sysstat package.

We had "shutdowns" on a server once because the hardware was informing to the OS that a memory FAN failed, then to prevent crashes, the operating system do a shutdown.

You should check that you are not having a hardware problem, we had hardware console to see this, on an integrity server.
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Steven E. Protter
Exalted Contributor

Re: Server grinds to a halt with no clues

tail -f /var/log/messages

in console and remote session to try and get more clues.

This is probably due to a sudden hardware failure.

I've had servers do this when their power supply was blown but still working to a fashion. I'd check into those types of issues.

I'd also run checkrootkit and make sure the system is not compromised in some way.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Florian Heigl (new acc)
Honored Contributor

Re: Server grinds to a halt with no clues

You might want to try to send messages off to a syslog host, in case the system can't write it's last chokes to disk.

I have one (xen) host which will also turn up such errors after a few weeks without trouble, and I hunted it down a bit, but not completely, the kernel most of the time also didn't even manage an 'Oops at 0x0000000' message. :)
yesterday I stood at the edge. Today I'm one step ahead.
steven Burgess_2
Honored Contributor

Re: Server grinds to a halt with no clues

Hi

What is interesting is that the network stack stays up, ie machine responds to pings, yet looking at the messages file cron and everything else stops.....

Steve
take your time and think things through
Ivan Ferreira
Honored Contributor

Re: Server grinds to a halt with no clues

You should enable a remote syslog server, as menthioned above.

You should also enable the magic sysreq key to generate memory dumps, sync disks and reboot the server, that may be helpful.

/etc/sysctl.conf

kernel.sysrq = 1

See the kernel documentation sysrq.txt for more information about how to use it.

Sometimes a extremely hihg cpu usage causes this kind of problem.

The LOG you get, also hungs?

Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Gopi Sekar
Honored Contributor

Re: Server grinds to a halt with no clues


if the server hangs without any log messages and kernel oops then it is more likely that it is a hardware issue.

I did face similar kind of problems in a customer place and later found that the motherboard of the server gone bad and had to be replaced.

It would be better to start off with a memory check, use the memtest86 utility to check for possible RAM problems. Then if that does not reveal anything log a hardware support call.

Regards,
Gopi
Never Never Never Giveup