1829914 Members
1660 Online
109993 Solutions
New Discussion

Hung Process

 
SOLVED
Go to solution

Hung Process

Encountering a problem due to a hung process, associated with a daily file cleanup job, which simply removes (rm) files from our file content directory. This job has run nightly, past 1+ year with no problems; however, last couple of days, the job kicks of a delete process which seems to hang on the rm command that ultimately causes sluggish performance for the remainder of the application that runs on the same box ??? which is fighting for resources locked by the hung process.

Our tech staff has been unsuccessful with tracing the root cause of the problem - only upon restarting the box and performing the magical kill -9 command with the root access, does the problem seem to go away - until the next evenings clean-up process kick-off.

I am unable to kill the process myself.
Unable to truss the problem.
Can???t seem to get the UNIX admin staff to do anything other than cold startup and kill command - only fixes the symptom and not the problem.

Any advice?
Any sort of fix disk command or known hardware issue that I can recommend to our UNIX gurus look into?

6 REPLIES 6
Stanimir
Trusted Contributor

Re: Hung Process

Please show as the full row on your crontab.
You may need to caught the error-output of
this row. Try to do "rm" without cron, by
command row. Did you see any message?
Kevin O'Donovan
Regular Advisor

Re: Hung Process

Hi Brian,

sounds like something has changed on the box...

* Are all the filesystems that the rm script deletes stuff from local? Or are any NFS? Check whether there are any stale nfs file handles (bdf command). Maybe its just tripping up over one of those.

* Is it just a simple rm in the script or does it do anything fancier? In case its not a problem with the rm command in the script, maybe with one of the other commands?

* Can you run the script manually, or even better (depending on the script) run each line from the script manually? That should show you exactly where in the script its having problems and you might even get an error message if you're lucky.

* if you can't run the script manually, does it log its output into a file? Or can you get it to send its output to a logfile?

hope that helps,
Kevin.
A. Clay Stephenson
Acclaimed Contributor
Solution

Re: Hung Process

My best at this point is a corrupt filesystem; have your UNIX guys run a full fsck (not simply a log replay) on the filesystem in question. You might have a circularly linked directory structure which could cause exactly this kind of problem.

I am making the assumption that this is a local filesystem rather than NFS/automount.
If it ain't broke, I can fix that.
Bernhard Mueller
Honored Contributor

Re: Hung Process

Hi,

I would take a close look at your "file content directory". If you have not changed anything then there is probably something in it which causes problems for rm to continue, this could be something like a subdirectory consisting of non-printing characters only or a file name containing semicolon, circular soft links or whatever weird can happen.

Ideally you would create a new "file content directory" and copy over ONLY what you need, carefully examining each file and directory.

As a preparation I would cd into it and do
# find . > /tmp/check_names_list
# find -type l | xargs ll > /tmp/check_link_list

Go through that list to rule out things I mentioned above.

Regards,
Bernhard
Scot Bean
Honored Contributor

Re: Hung Process

Check /var/adm/syslog/syslog.log for any local hard drive errors.

Run online diagnostics on local hard drives.
Krishna Prasad
Trusted Contributor

Re: Hung Process

You can also try to do a

sh -x "script name" from the command line.

This should show you where it is hanging.

Positive Results requires Positive Thinking