- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- Re: Script Help! Trap and Kill those run-away pro...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-27-2005 12:28 AM
10-27-2005 12:28 AM
My mission is to Trap all the "run-away" application processes and "kill -9" them.
In fact, we have observed that during some time interval periods
CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU COMMAND
0 ? 17750 ias122 154 20 227M 183M sleep 54:58 83.47 83.32 f60webm
Processes (f60webm) which have SIZE and RES in terms of M's should be terminated! They are run-away processes resulting from a client crash... a bug in the software..
Any idea how to best do that?
[sorry i have little scripting knowledge]
thanking you all in advance for a reply.
kind regards
yogeeraj
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-27-2005 01:47 AM
10-27-2005 01:47 AM
SolutionFor starting, may be you should test is the following command returns the top output correctly:
top -d 12 -n 12 -f top_out.txt
PS: I have also little scripting experience so don't expect faster resolution. Meanwhile, I will be learning too.
Best Regards,
Eric
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-27-2005 01:50 AM
10-27-2005 01:50 AM
Re: Script Help! Trap and Kill those run-away processes!!
but I'm sure you will know.
e.g. just filtering those procs whose vsize is bigger than 10m and whose command string matches web then you possibly could process them like that (but be careful with the kill (especially SIGKILL)
UNIX95= ps -e -o vsz= -o pid= -o ppid= -o comm=|awk '50 < $1/2^10 && $4~/web/'|read vsz pid ppid comm; do
# do further filtering or signalling here
done
HTH
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-27-2005 02:17 AM
10-27-2005 02:17 AM
Re: Script Help! Trap and Kill those run-away processes!!
Ralph's use of the UNIX95 (XPG4) variant of the 'ps' command is most appropriate and His skeletal script should provide what you need to get started.
PLEASE do not do 'kill -9' without first attempting to do a simple 'kill' first.
A 'kill -9' cannot be caught and thus a process has no (programatic) chance to cleanup temporary files and/or shared memory segments.
Instead, do something like:
kill mypid > /dev/null 2>&1
sleep 3
kill -9 mypid > /dev/null 2>&1
If the first 'kill' works "mypid" will no longer be valid and the second 'kill' will be a "no-op". If the first 'kill' fails, then the second will terminate the process as desired (unless it is waiting on an I/O to complete or in some other kernel state).
You may wish to issue a simple 'kill' and then escalate to 'kill -1' (SIGHUP) and then if that fails, a 'kill -9' (SIGKILL). I have found this useful too.
Regards!
...JRF...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-27-2005 02:24 AM
10-27-2005 02:24 AM
Re: Script Help! Trap and Kill those run-away processes!!
Bill Hassell, sysadmin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-27-2005 02:30 AM
10-27-2005 02:30 AM
Re: Script Help! Trap and Kill those run-away processes!!
Indeed Raplh's command is a better way to go (I knew I was going to learn more than help...)! You can also try to show more columns from ps like state and flags for example:
UNIX95= ps -e -o vsz= -o pid= -o ppid= -o time= -o state= -o flags= -o comm|grep 'f60'|awk '20 < $1/1024'
Ralph,
Any reason for 2^10 instead of 1024 besides the binary one?
Best Regards,
Eric
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-27-2005 02:34 AM
10-27-2005 02:34 AM
Re: Script Help! Trap and Kill those run-away processes!!
The following snippet should get you started on the 'guts' of your script to determine if your processes are out of control
#top -d 1 -h -u -f /tmp/top.tmp
#awk '$13~/BAD_PROCESS_NAME/ {print $12}' /tmp/top.tmp
this will print out the percentage of cpu time used by a process named BAD_PROCESS_NAME.
Cheers
Hanwant
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-27-2005 03:32 AM
10-27-2005 03:32 AM
Re: Script Help! Trap and Kill those run-away processes!!
no, it was a silly choice since exponantiation is processing-wise most expensive I guess (I think some sort of Taylor's row approximation is used).
Actually, exponantiation to base 2 is probably quickest achieved by bitwise right shifting (for negative powers), but I didn't know how this is done in awk (in Perl there's the >> operator).
So devision by 1024 is much better.
Apart I would strongly agree with Bill's statement not to scan all processes by -e,
but instead restrict to the user's procs who is running the notorious *web procs (i.e. -u user|uid).
And be extra cautious with sending SIGKILLs (i.e. -9) as it could result in some orphans.
Use the suggested three-step kill.
You can check if the process survived after having it sent a signal by resending it -0
e.g.
sleep 10
if kill -0 $pid 2>/dev/null; then
# probably next kill level
fi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2005 02:22 AM
10-28-2005 02:22 AM
Re: Script Help! Trap and Kill those run-away processes!!
Thanks for the explanation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-30-2005 05:00 PM
10-30-2005 05:00 PM
Re: Script Help! Trap and Kill those run-away processes!!
thank you for your precious replies.
Especial thanks to Ralph for his insight and great explanations.
can you please clarify. Does your script include the part that will be checking that the SIZE and RES are in magnitudes of Ms (e.g. 227M or 183M)?
also, grateful if you can explain the "3-step kill process"
thanking you in advance
kind regards
yogeeraj
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-30-2005 07:00 PM
10-30-2005 07:00 PM
Re: Script Help! Trap and Kill those run-away processes!!
What you see in glance in RSS column, is not what you get in UNIX95. RSS includes shared ememory size used by program. UNIX95 does not include that. If RSS column has 3.5GB, you would never see that in UNIX95.
You should better use the alarmdef to notice the runaway process. An example of it is in /opt/perf/examples/adviser
Also, the kill -9 is not good idea.
I start as follows.
kill -1
kill -2
kill -3
kill -11 and last kill -9
kill -11 is equally effective and does cleanup work which -9 does not.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-30-2005 09:57 PM
10-30-2005 09:57 PM
Re: Script Help! Trap and Kill those run-away processes!!
thank you again for your reply.
I usual use TOP to identify the processes to be "killed"...
Am attaching a snapshot whereby the following process can be identified to have run-away and that should be killed: 14203 , 13399 and 1102
more guidances would be most appreciated!
kind regards
yogeeraj
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-30-2005 10:26 PM
10-30-2005 10:26 PM
Re: Script Help! Trap and Kill those run-away processes!!
as suggested by RAC.
I would to the contrary rather expect it to mess up the scene with a coredump, as memeory reference vialations usually go, than cleanup.
But I know the cleanup was meant differently here, and if it works better in that respect,
why not?
I know that the vsz that ps lists don't contain shared memory or memory mapped pages.
According to ps manpage it comprises of text, data, and stack segments in KB unlike in memory pages as the sz option would.
Thus devision by 1024 should show the sum in MB and reduce the necessary arithmetics instead of the extra multiplication by page size.
With three step kill I was referring to Bill's three level kill.
i.e. signal in this sequence with intermediate checks if the process still is alive:
1. SIGTERM (-15)
2. SIGHUP (-1)
3. SIGKILL (-9)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-30-2005 10:40 PM
10-30-2005 10:40 PM
Re: Script Help! Trap and Kill those run-away processes!!
Once you decide that then it becomes easy.
glance is a very good tool at this, you can define a adviser which when certain limits (as decided by you) are crossed can email/page you about a culprit process.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-30-2005 11:26 PM
10-30-2005 11:26 PM
Re: Script Help! Trap and Kill those run-away processes!!
i agree that Glance may be a better tool.
what i really want is an algorithm/script which can be safely used to kill those run-away processes.
the main reason being that we operate 24x7 and we don't have resources available to see in front on the screen to monitor any such alerts!
am attaching a snapshot of a graph plotted from data extracted from measureware! You will see what happened that Sunday!!
please guide me further!
kind regards
yogeeraj
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-31-2005 01:49 AM
10-31-2005 01:49 AM
Re: Script Help! Trap and Kill those run-away processes!!
I didn't know you were using MWA already.
Then it's probably best you use this excellent tool.
I think you could write an appropriate event handler script that would safely kill those runaway procs of yours.
But it requires a little reading and experimenting until things will be triggered by MWA correctly.
Have a look at /opt/perf/paperdocs/mwa/C.
There you will find the MWA User's guide as PDF and PS as well as PDF and plain ASCII files describing the monitorable system metrics by MWA.
(e.g. PROC_MEM_RES)
Get acquainted with these docs.
Then you can edit /var/opt/perf/alarmdef
(make a backup copy first)
In that file you will find a well commented set of pre defined general purpose alarms.
There's also in the comments an example of how to trigger an email notification as a very rudimentary "event handler".
Intstead of this you could trigger your kill script.
To have a quick reference of the so called adviser syntax you could also start glance and tab to its help screen -> adviser information.
MWA uses the same syntax for its alarmdef file.
Before you restart MWA with your new definitions you should check it for syntax errors with the "utility -xa" command.
Maybe it's a good idea during the trial phase before your handler does the actual killing to have it first send an echoed kill line of the PIDs it would kill to you as an email so that you could check the right procs would be killed.
Alternatively you could have your processes' memory consumption monitored by freely available tools such as Mon or Nagios.
Especially Nagios is a great system monitor.
For instance it offers a ready made check_procs plug-in that can be passed "-m RSS" or "-m VSZ" as arguments, and that will trigger alerts whenever the Warning (-w
In the Nagios doc there is also an example how to write an event handler script that restarts a failed webserver.
Best of all Nagios maintains a very active mailing list with many users and developers participating in and willing to help.
But I'm afraid the first setup usually involves some work, no matter what solution you prefer.