System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

System blackout [no response]

SOLVED
Go to solution
Amit Agarwal_1
Trusted Contributor

System blackout [no response]

We are facing strange behavior on our Linux boxes running kernel 2.4.21 and deployed as a tomcat server. In a very unpredictable manner, the system goes "dead" for 50-60 seconds and then resumes by itself. By "dead" I mean CPU doesn't respond to any user process and spends the whole time in kernel-mode.

We suspect buffer cache (flushing of data to disk) and/or paging to be the culprit, but aren't able to pinpoint them. The reason being that system being "dead", we are unable to collect any data for that period.

Any help would be appreciated.
4 REPLIES
Sac_3
Frequent Advisor
Solution

Re: System blackout [no response]

Hi Amit,

As the server is dead, if you have configured netdump, try checking if netdump was generated or not in /var/crash of netdump server.

Else

Try SYSRQ facility:
Below is the procedure to configure sysrq:

http://kbase.redhat.com/faq/FAQ_80_5559.shtm

If you would like to know how to configure netdump client and netdump server, plesse check the below link:

http://kbase.redhat.com/faq/FAQ_43_2467.shtm

http://www.docs.hp.com/en/5991-7402/ch07s09.html

HOPE THIS HELPS.. :o)

Regards,
SaC

PS: Best way to thank in this forum is to assign points
Yogeeraj_1
Honored Contributor

Re: System blackout [no response]

hi,

try dmesg

see if there are any hardware failures or like.

Could it also be a network failure?

kind regards
yogeeraj
No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)
skt_skt
Honored Contributor

Re: System blackout [no response]

Configure at least the sysrq on all linux servers(it does not require a reboot; i had implemented it) that is the only hope we have when we are left with new clue.This option is not 100% success in getting a crash dump;but most of the cases it helps

anything on dmesg or messages(something reported few hours back...)
Amit Agarwal_1
Trusted Contributor

Re: System blackout [no response]

We discovered it to be application issue, where we were using "pgrep" in multi-threaded application. Since pgrep uses strtok (which is a non-reentrant code, uses static buffer), the threads were gettign into some kind of race conditions. We got rid of that part of code, and system is running fine for last two days.

We need to investiate more on why CPU ended up spending cycles in kernel mode, or was it all in user-mode itself.

Thanks for sharing your knowledge. We will definitely investigate more on sysrq and its possible usage in our environment.