1831323 Members
3392 Online
110023 Solutions
New Discussion

Fork function failed.

 
Kirk Solano
Occasional Advisor

Fork function failed.

My hp ux machine 11 os gives me this message every now and then - often enough that it needs to be dealt with. Its occurrence is during a Monday (after the weekend or after a long vacation).
"The fork function failed. Too many processes already exist."

I've called HP support and they've asked me to change a few configurable kernel parameters. As of now, nproc is 1024; maxuser = 124; and maxuprc = 200. This was changed a week ago. But today, I had the same problem again. When this happens, my only choice is to do a transfer of control to reboot everything, once that's done everything seems to be fine. Any input will be appreciated. Thanks.
Kirk
"You're never too young. It's never too early."
17 REPLIES 17
Ken Hubnik_2
Honored Contributor

Re: Fork function failed.

Have you tried increasing nprocs in the kernel??
Kirk Solano
Occasional Advisor

Re: Fork function failed.

Yes, the value was doubled from last time.
"You're never too young. It's never too early."
Ken Hubnik_2
Honored Contributor

Re: Fork function failed.

May need to also bump up your nfile parameter. If you have glance you should be able to look at these usages.
Chris Wilshaw
Honored Contributor

Re: Fork function failed.

If you look at the values in sar -v 1 10, you should see something similar to

16:57:54 text-sz ov proc-sz ov inod-sz ov file-sz ov
16:57:55 N/A N/A 556/1024 0 2800/2800 0 1532/5110 0
16:57:56 N/A N/A 556/1024 0 2800/2800 0 1532/5110 0
16:57:57 N/A N/A 556/1024 0 2800/2800 0 1532/5110 0
16:57:58 N/A N/A 556/1024 0 2800/2800 0 1532/5110 0
16:57:59 N/A N/A 556/1024 0 2800/2800 0 1532/5110 0
16:58:00 N/A N/A 556/1024 0 2800/2800 0 1532/5110 0
16:58:01 N/A N/A 556/1024 0 2800/2800 0 1532/5110 0
16:58:02 N/A N/A 556/1024 0 2800/2800 0 1532/5110 0
16:58:03 N/A N/A 556/1024 0 2800/2800 0 1532/5110 0
16:58:04 N/A N/A 556/1024 0 2800/2800 0 1532/5110 0

The proc-sz column shows the system-wide number of processes that are running. Is this close to your limit of 1024?

Kirk Solano
Occasional Advisor

Re: Fork function failed.

This is what mine looks like.

12:08:50 text-sz ov proc-sz ov inod-sz ov file-sz ov
12:08:51 N/A N/A 143/1064 0 1356/1356 0 424/2328 0
12:08:52 N/A N/A 143/1064 0 1356/1356 0 424/2328 0
12:08:53 N/A N/A 143/1064 0 1354/1356 0 424/2328 0
12:08:54 N/A N/A 143/1064 0 1354/1356 0 424/2328 0
12:08:55 N/A N/A 143/1064 0 1354/1356 0 424/2328 0
12:08:56 N/A N/A 143/1064 0 1354/1356 0 424/2328 0
12:08:57 N/A N/A 143/1064 0 1354/1356 0 424/2328 0
12:08:58 N/A N/A 143/1064 0 1354/1356 0 424/2328 0
12:08:59 N/A N/A 143/1064 0 1354/1356 0 424/2328 0
12:09:00 N/A N/A 143/1064 0 1354/1356 0 424/2328 0
"You're never too young. It's never too early."
erics_1
Honored Contributor

Re: Fork function failed.

Kirk,

The important part to remember from your output is whether it is from a time when the system was producing the errors or at a time when the system was less busy. Based on your error, I'd say that nproc needs to be increased yet.

Hope this helps!
Eric
Byron Myers
Trusted Contributor

Re: Fork function failed.

Kirk, also take a peek at the maxuprc kernel parm - max processes per user. I believe the default is 50. Also, you will probably get many more responses if you assign points more often to those that respond to your questions.
If you can focus your eyes far and straight enough ahead of yourself, you can see the back of your head.
Kirk Solano
Occasional Advisor

Re: Fork function failed.

Are you saying to decrease my maxuprc from 200 to 50? Also, just want to let you know that there are no processes that run during the weekend, and nobody usually works during those days.
"You're never too young. It's never too early."
Byron Myers
Trusted Contributor

Re: Fork function failed.

Kirk, NO - sorry, I did not notice that you already stated that maxuprc=200.
If you can focus your eyes far and straight enough ahead of yourself, you can see the back of your head.
Kirk Solano
Occasional Advisor

Re: Fork function failed.

Also, just fyi...nfile = 2232
"You're never too young. It's never too early."
Brian Watkins
Frequent Advisor

Re: Fork function failed.

Give this a try when the error message is occurring on the problem system:

1. Start glance
2. Look at the System Tables Report (shortcut to this is the "t" key)

Are any of these paramters at 80% utilization or higher? If so, you should consider making a backup of your current kernel and increasing the affected values by at least 10 or 20% and rebooting.

After the reboot, repeat the steps above to see what the utilization levels are. Continue to monitor them, especially during peak processing hours when the errors have been happening.

Hope this helps!
Kirk Solano
Occasional Advisor

Re: Fork function failed.

Mmm...I'm not familiar with glance, but do I start it by typing "glance" at the command line. If that's the way, then I don't have it. Anyway, I would like to increase my nproc. What is the maximum nproc that I can use? I don't want to go over it and cause my system to crash. Thanks again.
"You're never too young. It's never too early."
Byron Myers
Trusted Contributor

Re: Fork function failed.

Kirk, Also watch for zombie processes. run "ps -ef" and look for any processes that have "" under the "COMMAND" column. Sometimes a program starts cranking out these defunct processes until the UNIX process table fills up. If you see a lot of these defunct processes, say in the hundreds, then look for their parent process in the "ps -ef" output. Kill the parent and the defunc processes go away, free-ing up the slots in the process table. The process table size is defined by nproc. If you see this behavior, that parent program has a bug in it that needs to be addressed.
If you can focus your eyes far and straight enough ahead of yourself, you can see the back of your head.
Kirk Solano
Occasional Advisor

Re: Fork function failed.

I do seem to have alot of those defunc proces...more than 100. What command do I use to kill them? Also, does rebooting the system clear up the process table? Because I rebooted it this morning and the output I'm looking has this defunc processes. Does this mean I have a bug somewhere?
"You're never too young. It's never too early."
Frank Slootweg
Honored Contributor

Re: Fork function failed.

*Where* (i.e. on your terminal? in a file?) do you get that message ("The fork function failed. Too many processes already exist." and from which command(s)?

If you get it on your terminal, then also look in your /var/adm/syslog/syslog.log file for a similar/related message of the *same time*.

If syslog.log says "... proc: table is full", then the cause is a too low nproc. It is says something else (please post message) then the cause is probably a too low maxuprc.
Trond Haugen
Honored Contributor

Re: Fork function failed.

Seems your problem is the defunct proceses. You get a "zombi" when a parent process don't "wait" for it's child. Usually a resoule of bad programming. The defunct process is nothing more than a entry in the process list. But it will hold that entry and as you have experienced it will run full. The only way to clear it is to reboot.
Finding the program that leavs the zombies will be hard "detective" work. Maybe doing some 'ps -ef' and sending to a file will help you backtrack the PID and PPID when you find the zombies.
Good luck and do assign points.

Regards,
Tron
Regards,
Trond Haugen
LinkedIn
Raynald Boucher
Super Advisor

Re: Fork function failed.

First Review your kernel parameters as suggested above.

Then try to identify what causes the "fork bomb". It could be a script that restarts itself, a bad loop or even a typo.

We experienced this a little while ago: one of our developpers cd'd to a directory containing copybooks and typed "" instead of "cat ".
Unfortunately, all the files in the directory contained many asterisks (comments) and had execute permissions. So the shell expanded the first "*" into execute "all files in this directory" and so on.

We removed execute permissions on the files in text directories and haven't had the problem since.

Hope this helped in your diagnostic efforts.