1834316 Members
2273 Online
110066 Solutions
New Discussion

Problem with shell hung

 
SOLVED
Go to solution
R.O.
Esteemed Contributor

Problem with shell hung

I have the following problem:
One application sometimes fails and the users close their session and open a new one. The application of the closed session then begin to consume the cpu and I have to kill it. What would be the better metod to detect this and kill it automatically?

Best Regards,

"When you look into an abyss, the abyss also looks into you"
14 REPLIES 14
Pete Randall
Outstanding Contributor

Re: Problem with shell hung

R.O.,

How do you detect it manually? I would guess that you use the ps command, piped to grep to search for some distinguishing characteristic. Just set the same thing up in a script that can be run by cron at regular intervals of your choosing. Be sure to use full path names and set any environment variables you require as cron's environment is minimal.


Pete

Pete
TSaliba
Trusted Contributor

Re: Problem with shell hung

hi

if you know the name of the process that consume the cpu, the you can use the attached script

Note: add an entry in the cron

TS
jj
TSaliba
Trusted Contributor

Re: Problem with shell hung

sorry
i forget the script
jj
Michael Schulte zur Sur
Honored Contributor

Re: Problem with shell hung

Hi,

in the follwing script I assume, that the process, you want to stop has ppid=1 and is no more associated to a terminal. Try it without | ksh and see, if it fits your needs.
Substitute for process the programme you want to stop.

greetings,

Michael
Michael Schulte zur Sur
Honored Contributor

Re: Problem with shell hung

Hi,

in the follwing script I assume, that the process, you want to stop has ppid=1 and is no more associated to a terminal. Try it without | ksh and see, if it fits your needs.
Substitute for process the programme you want to stop.

greetings,

Michael
R.O.
Esteemed Contributor

Re: Problem with shell hung

Hi,

The problem is that the only way to identify the process is doing a "top" and seeing the %CPU. It´s PPID is not "1". I cannot kill all the processes and I should be able to isolate the process named which is consuming > 80%CPU and kill it.

Thanks and regar
"When you look into an abyss, the abyss also looks into you"
Michael Schulte zur Sur
Honored Contributor

Re: Problem with shell hung

Hi,

yes, it is a dangerous thing to kill processes automatically. What is the process group leader in such a case, a shell?

greetings,

Michael
Bill Hassell
Honored Contributor

Re: Problem with shell hung

The process that is hung up is likely network-based, probably Xwindows. Most Xwindows programs are poorly written to handle disconnects. Unlike telnet or ftp, the Xwindow connection is 'soft', that is, it simply puts images on a remote screen and gets keyboard and mouse events back. The application is trying to talk to the display but has no intelligence to see that there is no longer any connection.

Now if the application getting hung is hpterm, dtterm or xterm, I would stop using Xwindows completely for those users and run a local terminal program such as Reflection for HP or QCterm or similar. Also, make sure your /etc/profile and/or .profile have not disabled traps with something like: trap "" 1 2 3 15 since this tells the program (and child processes) to ignore things like a hang-up or termination.


Bill Hassell, sysadmin
R.O.
Esteemed Contributor

Re: Problem with shell hung

Hi,

- The parent leader process is a shell.
- We have put "trap "" 1 2 3" in .profile of the users to avoid the users go out to operating system.

Regards,
"When you look into an abyss, the abyss also looks into you"
Michael Schulte zur Sur
Honored Contributor
Solution

Re: Problem with shell hung

Hi,

why do you trap signal 1? Isnt that the one for hangup?

greetings,

Michael Schulte


Bill Hassell
Honored Contributor

Re: Problem with shell hung

Oops. trap "" 1 2 3 means: do nothing whenever a process is sent a signal to terminate for HUP (hangup which is the same as a connection termination), INT (interrupt) or QUIT (terminate with a core dump). So the application crashes or otherwise terminates abnormally and the session is sent a SIGHUP signal but the users' environment has been told to ignore such signals and thus the session starts to consume CPU time because it has no valid connection. All very normal.

Since your users can't be trusted to handle a shell prompt (ie, not break out of an application with CTRL-C, etc,) you'll have to write a program that acts as their shell, a menu program with just a few choices. And turn traps back on again.


Bill Hassell, sysadmin
R.O.
Esteemed Contributor

Re: Problem with shell hung

Hi,

I tried only to 'trap "" 2 3', and it seems to work. Users can not go out to s.o. and the sessions no longers become hung. I was trying to read about the meaning of the different trap signals but I did not find so much. Can you tell me a good source to do this?

Thxs and regards,

R.O.
"When you look into an abyss, the abyss also looks into you"
Michael Schulte zur Sur
Honored Contributor

Re: Problem with shell hung

Hi,

you will find a list of the signals in:
/usr/include/sys/signal.h
with a short description.

greetings,

Michael
Chris Wilshaw
Honored Contributor

Re: Problem with shell hung

A list of signals can be obtained using

kill -l

(that's a lower case L, not a 1), more details can be found on the manual signal

man 5 signal