Operating System - HP-UX
1755712 Members
4576 Online
108837 Solutions
New Discussion юеВ

Re: fork error !! HP-UX 11.31

 
guojianlee
New Member

fork error !! HP-UX 11.31

I have a Multi-process socket-server program running on HP-UX B.11.11 U 9000/800.
it work find.
when i deploy it on new host" HP-UX B.11.31 U ia64" ,after four hours ,
it occur fork fail error (errno 11,No more processes).
"ps -ef |wc -l" no more than 300 process,
I checked some parameters also .they saw ok.

maxuprc 3780
nproc 4200

following is the result of (sar -v).it saw ok .
$sar -v -f /var/adm/sa/sa28

HP-UX B.11.31 U ia64 08/28/09
00:00:00 text-sz ov proc-sz ov inod-sz ov file-sz ov
21:50:00 N/A N/A 323/4200 0 1135/35648 0 4077/2147483647 0
21:55:01 N/A N/A 312/4200 0 1115/35648 0 4079/2147483647 0
22:00:01 N/A N/A 311/4200 0 1112/35648 0 4096/2147483647 0
22:05:01 N/A N/A 316/4200 0 1122/35648 0 4106/2147483647 0
22:10:00 N/A N/A 310/4200 0 1110/35648 0 4097/2147483647 0
22:15:01 N/A N/A 311/4200 0 1113/35648 0 4096/2147483647 0
22:20:01 N/A N/A 313/4200 0 1118/35648 0 4104/2147483647 0
22:25:00 N/A N/A 312/4200 0 1114/35648 0 4096/2147483647 0
22:30:00 N/A N/A 311/4200 0 1111/35648 0 4085/2147483647 0
22:35:00 N/A N/A 311/4200 0 1112/35648 0 4087/2147483647 0

I tried to reboot server.it also occured fork fail error after some hours.

At last, i check my program.

i found action of signaal SIGCHLD was set to SIG_IGN.
signal(SIGCHLD, SIG_IGN);

chang it to this .it did't occure fork fail error any more!!.

signal(SIGCHLD, wait_child);

void wait_child(int sig)
{
wait(NULL);
signal(SIGCHLD, wait_child);
}

thses is my doubt.
if fork fail was because of parent process's no-wait.why result of 'ps -ef ' and 'sar -v' saw well?
I know zombie processes can be showed by ps .is there hidden processes in the systems?

Thanks in advance
4 REPLIES 4
Hein van den Heuvel
Honored Contributor

Re: fork error !! HP-UX 11.31

Perhaps a runnaway condition in the program makes it fork like crazy under certain circumstances, and in a second it eats up all the process slots. With the old signal, the subsequent error takes down the parent, and all the children removing the evidence of the glitch. The new code waits and the bastard child processes go away on their own behind your back with nothing to do?

When faced with this problem I would probably first make a little program to fork 'n' children which all go to sleep to test how many process the system allows you to create.

I might try look at the process-id to see if there was a jump/hole

Instrument the application to log an timestamp fork + pid, or use truss for that.

Good luck!
Hein.
Hein van den Heuvel
Honored Contributor

Re: fork error !! HP-UX 11.31

Did you get messages in syslog/dmesg?

How many processes is this application designed to fork?

Could it just be exceeding maxuprc ?

What is the error on the fork?

"If fork() fails with an error value of EAGAIN, it could be an indication that maxuprc was reached by that particular user."

http://docs.hp.com/en/B3921-90010/maxuprc.5.html


See also this related, recent, topic:

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1368384

Also check parameter nkthread.
It is typically automatically adjusted with nproc, but maybe someone was too smart?

Hein.
Dennis Handly
Acclaimed Contributor

Re: fork error !! HP-UX 11.31

>"ps -ef | wc -l" no more than 300 process,
I checked some parameters also. they saw ok.
>maxuprc 3780 nproc 4200

It looks like it. My ps will return "", zombie processes. So your 300 looks less then 3780.

Perhaps Hein is correct in that you have nkthread < nproc?

But kctune(1m) implies it won't let you make that mistake.
guojianlee
New Member

Re: fork error !! HP-UX 11.31

Maybe I am not lunky.
I found /var/adm/syslog not be updated some months.syslogd core down at some months ago.So i can't found any information from syslog.

I check nkthread parameters with kctune ,it saw no problem:

nkthread 7184 (((nproc*7)/4)+16) Immed