cancel
Showing results for 
Search instead for 
Did you mean: 

SIG_IGN does not work

sukhanya
Occasional Advisor

SIG_IGN does not work

We are using HP-UX ia64 11i v3.
compiler:
aCC:
HP aC++/C for Integrity Servers B3910B A.06.10 [Mar 22 2006]

We spawn multiple child processes from parent process.
and use the code below to avoid zombies

struct sigaction sa;
sa.sa_handler = SIG_IGN;
sa.sa_flags = SA_NOCLDWAIT;
if (sigaction(SIGCHLD, &sa, NULL) == -1) {
perror("sigaction");
exit(1);
}

Our application is POSIX compliant.

The above code cleaned up child process entries from process table when terminated.
This was working all along.
Now we had an occurance where it stopped working leaving behind many defunct (zombies).

Is there a way to work around to this?
The parent process cannot wait until child returns since many many clients depend on the parent server process and this will hold additional requests to the parent process.
The defunct processes where owned by the same user as parent process.

Please suggest.




12 REPLIES
sukhanya
Occasional Advisor

Re: SIG_IGN does not work

Parent process forks to start the child.

There is a script which terminates the parent and child are killed using kill -9.
We use Orbix and parent and child are Orbix services. Orbix daemon starts a new parent process and child.
The defunct process creation started after the above script execution and continued until parent was killed forcefully.

All the defunct were owned by parent.
Killing parent killed all the defunct processes
Dennis Handly
Acclaimed Contributor

Re: SIG_IGN does not work

>struct sigaction sa;

Is this a global? If a local, are you clearing the sa_mask field with sigemptyset(3)?

>Killing parent killed all the defunct processes

Yes, killing the zombie master allows init(1m) to reap the zombies.

Re: SIG_IGN does not work

..and i've noticed that resetting siginterrupt() flag for SIGCHLD does not prevent some system calls from returning EINTR on child exit.
sukhanya
Occasional Advisor

Re: SIG_IGN does not work

struct sigaction sa is global.

We modified the code above (in first thread) as below

struct sigaction sa;
sa.sa_flags = SA_NOCLDWAIT;
if (sigaction(SIGCHLD, &sa, NULL) == -1) {
perror("sigaction");
exit(1);
}

We removed SIG_IGN, since suspected that SIG_IGN is not supported in POSIX standard.

It works if we do not make any system calls.
If any system call is made as
system (command), it executes the system command and on completion returns

Program terminated with signal 11, Segmentation fault.

SEGV_MAPERR - Address not mapped to object

As mentioned in original thread, we don't want the parent process to wait on child and do not want to use waitpid or wait().

We want parent to start the child processes and from then on child is on its own. parent should not care about the exit/termination of the child. and child should exit on it own and if not init will reap it. How to make this work?




Dennis Handly
Acclaimed Contributor

Re: SIG_IGN does not work

>it executes the system command and on completion returns
>Program terminated with signal 11

Right, system(3) fiddles with SIGCHLD and wait.

What does stack trace show for your signal 11?

>We want parent to start the child processes and from then on child is on its own. How to make this work?

You probably need to have each child daemonize themselves.
sukhanya
Occasional Advisor

Re: SIG_IGN does not work

Here's the stack trace.

$ gdb ServiceFactory_Service core

HP gdb 5.4.0 for HP Itanium (32 or 64 bit) and target HP-UX 11.2x.

Copyright 1986 - 2001 Free Software Foundation, Inc.

Hewlett-Packard Wildebeest 5.4.0 (based on GDB) is covered by the

GNU General Public License. Type "show copying" to see the conditions to

change it and/or distribute copies. Type "show warranty" for warranty/support.

..

Core was generated by `ServiceFactory'.



warning: ServiceFactory is 14 characters in length. Due to a limitation

in the HP-UX kernel, core files contain only the first 14 characters

of an executable's name. Check if ServiceFactory is a truncated name.

If it is so, core-file, packcore and other commands dealing with

core files will exhibit incorrect behavior. To avoid this, issue

exec-file and symbol-file commands with the full name of the executable

that produced the core; then issue the core-file, packcore or other

core file command of interest.



Program terminated with signal 11, Segmentation fault.

SEGV_MAPERR - Address not mapped to object



warning: Load module /u01/app/oracle10g/product/10.2.0/lib/libclntsh.so.10.1 has been stripped.

Debugging information is not available.


warning: Load module /u01/app/oracle10g/product/10.2.0/lib/libnnz10.so has been stripped.

Debugging information is not available.



#0 0xffffffff80000000 in ()

(gdb) where

#0 0xffffffff80000000 in ()

warning: Attempting to unwind past bad PC 0xffffffff80000000

#1 0xe000000120002620 in ()

#2 0xc0000000003267f0:0 in sigprocmask+0x30 () from /usr/lib/hpux64/libc.so.1

#3 0xc000000000319d20:0 in _system_sys+0x330 () from /usr/lib/hpux64/libc.so.1

#4 0xc000000000339ea0:0 in system+0xa0 () from /usr/lib/hpux64/libc.so.1

#5 0x4000000000259710:0 in main () at src/main.cpp:145

(2) How to daemonize the child?
What would be the effect of this?
Dennis Handly
Acclaimed Contributor

Re: SIG_IGN does not work

#1 0xe000000120002620 in
#2 0xc0000000003267f0:0 in sigprocmask+0x30 libc.so.1
#3 0xc000000000319d20:0 in _system_sys+0x330 libc.so.1
#4 0xc000000000339ea0:0 in system+0xa0 libc.so.1

It looks like the signal 11 has been blocked and only occurs when sigprocmask unblocks it in the kernel, frame #1.
You would need to subvert all calls to sigprocmask to never block signal 11 if you want to see where it is occurring.
Do you have other threads?

>(2) How to daemonize the child?

Just google daemonize.

>What would be the effect of this?

Its parent will already be init(1m). It will be in a different process group.

Of course you must wait for that intermediate child process, or use SIG_IGN.
kobylka
Valued Contributor

Re: SIG_IGN does not work

Hi!

> (2) How to daemonize the child?

There is a nice, short and simple example:

http://h21007.www2.hp.com/portal/site/dspp/menuitem.863c3e4cbcdc3f3515b49c108973a801?ciid=ea08852bcbe02110852bcbe02110275d6e10RCRD


Kind regards,

Kobylka
sukhanya
Occasional Advisor

Re: SIG_IGN does not work

1. We can not daemonize our child process, since have option in parent process to kill child process in case of timeout.

2. how to subvert the calls to sigprocmask?

3. Yes we have one thread in parent process.
Dennis Handly
Acclaimed Contributor

Re: SIG_IGN does not work

>We can not daemonize our child process, since have option in parent process to kill child process in case of timeout.

Just hunt down the demon and kill it?
Of course if you can do this, you can also hunt down your zombies and kill them too. :-)

Back to your original problem. You could call waitpid with WNOHANG to do a poll of your child processes every hour or so.

>2. how to subvert the calls to sigprocmask?

Set a breakpoint there and analyze the mask and then change it to not block signal 11.

Or write function called sigprocmask, analyze and change the parms and then call _sigprocmask to do the work.

>3. Yes we have one thread in parent process.

You have created one additional thread?
sukhanya
Occasional Advisor

Re: SIG_IGN does not work

1.Yes we have one more thread in our parent process.

2.I read on article, in that they mentioned that sigprocmask() will have undesired results in multithreaded application. So i think we can't use sigprocmask() in our application.

Is there any other way to resolve this problem?

Dennis Handly
Acclaimed Contributor

Re: SIG_IGN does not work

>1. Yes we have one more thread in our parent process.

So you are multithreaded.

>2. in that they mentioned that sigprocmask will have undesired results in multithreaded application. So I think we can't use sigprocmask in our application.

The purpose of fiddling with sigprocmask(2) is to prevent the blocking of signal 11 so you can debug.

After 4 months and no points, it's probably time to contact the Response Center.