Operating System - HP-UX
1833017 Members
2205 Online
110048 Solutions
New Discussion

Re: C call wait () seems to wait forever

 
JUP
Regular Advisor

C call wait () seems to wait forever

I am running HP-UX 11.00 E.

My C program starts a number of different unix scripts depending on the state of the system.

Before I kick off a unix script from my C program, I carry out a:

pid = wait(&status);

This is so that I make sure all the previous script is completed before I kick off a new one. (ie. no zombies).

I problem is that most times (but not everytime) the program just sits at the wait(&status) call forever. Even though it may have just executed a very short unix script - maybe a few lines. I have tried:

pid = waitpid((pid_t)-1, &status, WNOHANG);

but that doesn't seem to work all the time either.

The scripts are run in POSIX shell (/bin/sh) and are all short - and I know they complete.

Does anyone have any ideas what may be causing my program to hang at the wait() call ?

Thanks in advance
PA
4 REPLIES 4
Steven Gillard_2
Honored Contributor

Re: C call wait () seems to wait forever

The wait() system call will sleep if you have any child processes at all. If there are no child processes is should return straight away with an ECHILD error.

While the process is hung can you see any child processes in "ps -ef" output?

How are you executing the shell scripts - are you using fork() / exec*() or another routine like system()?

Exactly what happens when you change the wait() call to waitpid() with the WNOHANG option?

Can you get a tusc trace of the problem or post some sample code?

Regards,
Steve
hein coulier
Frequent Advisor

Re: C call wait () seems to wait forever

PA,

you must make sure that you're not ignoring the SIGCLD signal, see manpage :

WARNINGS
The behavior of wait(), waitpid(), and wait3() is affected if the SIGCLD signal is set to SIG_IGN.
JUP
Regular Advisor

Re: C call wait () seems to wait forever

Steve, Thanks for your response. I am using a fork and exec (not system call).
The ps -ef doesn't show the script running. It shows the C program running though as it should do as the C program is a daemon running in the background.
The script is only a few lines of basic unix commands (ie. echos. df's etc) with an exit(0) as the last line.

I have found a common theme causing my problem:
The problem mainly (nearly everytime) occurs when the C program starts up after a re-boot of the server (from a startup script). If i kill the task, and then restart it maunally it seems to work ok.

When I used waitpid(... WNOHANG) the program does not halt at the wait() and the return value from waitpid is 0 which seems all ok - however I'm not confident that WNOHANG is the correct option as it does not actually wait for the process to complete before proceeding.

Haven't used tusc as yet - but plan to download it and use it.

Hein, thanks for your advice too. I have put the extra line sigset(SIGCLD, FnCall) just before the wait(&stat) call. However this did not help. Was that what you meant by your suggestion of not ignoring SIGCLD ?

Any other suggestions would be much appreciated.
PA
Steven Gillard_2
Honored Contributor

Re: C call wait () seems to wait forever

PA,

Ignoring SIGCLD will prevent zombie processes, but shouldn't alter the behaviour or wait(). If you have child processes, wait() will go to sleep waiting for them to terminate, otherwise it will return an ECHILD error.

Have a read of the signal(5) man page - there is a WARNINGS section that goes into detail about this. The main warning is regarding the use of signal handlers - if you install a signal handler for SIGCLD, you must ensure that in your handler you call wait() BEFORE re-installing the handler otherwise you will get a hang.

Interesting that this problem only happens on system start. Make sure that your start script is written correctly (ie it accepts the "start_msg" and "start" args) and that your daemon process correctly detaches itself with setprgp() and ignores/catches SIGHUP. Also be careful if you are running the program in a pipeline - you might find you have child processes that you didn't expect.

A tusc trace would definitely help!

Regards,
Steve