Operating System - HP-UX
1833847 Members
2258 Online
110063 Solutions
New Discussion

Re: System hang with this message:"The fork function failed. Too many processes already exist."

 
Hoang Chi Cong_1
Honored Contributor

System hang with this message:"The fork function failed. Too many processes already exist."

Dear all,
One of my server (rp5470) had strange problem:
I can not using any network service (telnet, rlogin, ftp,......etc.....) when connect to the server. (Please see on the attachment of syslog.log file)
I found a thread that had similar trouble:
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=667961

I have followed and just kill all of nfs process and I can connect again the server (very lucky, I didn't have to reboot server).
But after that I found that the server is working with low performance!
Can anyone explain for me what is in the "proc table"? what is TOC?
I just know that proc table in fact is a mount of memory. But I don't know how size is it? And can I create it or modify it?
In this case, I think I have to reboot the server to free up the "proc table"...
Does anyone know about "Duping" process during rebooting the server?
I check in the GSP log and see:
***/***

plete (160 of 601 MB) (device 64:0x2) *** Dumping: 27% complete (168 of 601 MB) (device 64:0x2) *** Dumping: 29% complete (176 of 601 MB) (device 64:0x2) *** Dumping: 30% com
plete (184 of 601 MB) (device 64:0x2) *** Dumping: 31% complete (192 of 601 MB) (device 64:0x2) *** Dumping: 33% complete (200 of 601 MB) (device 64:0x2) *** Dumping: 34% complete (208 of 601 MB) (device 64:0x2) *** Dumping
: 35% complete (216 of 601 MB) (device 64:0x2) *** Dumping: 37% complete (224 of 601 MB) (device 64:0x2) *** Dumping: 38% complete (232 of 601 MB) (device 64:0x2) *** Dumping: 39% complete (240 of 601 MB) (device 64:0x2) *
** Dumping: 41% complete (248 of 601 MB) (device 64:0x2) *** Dumping: 42% complete (256 of 601 MB) (device 64:0x2) *** Dumping: 43% complete (264 of 601 MB) (device 64:0x2) *** Dumping: 45% complete (272 of 601 MB) (device 64:0x2
) *** Dumping: 46% complete (280 of 601 MB) (device 64:0x2) *** Dumping: 47% complete (288 of 601 MB) (device 64:0x2) *** Dumping: 49% complete (296 of 601 MB) (device 64:0x2) *** Dumping: 50% complete (304 of 601 MB) (dev
ice 64:0x2) *** Dumping: 51% complete (312 of 601 MB) (device 64:0x2) *** Dumping: 53% complete (320 of 601 MB) (device 64:0x2) *** Dumping: 54% complete (328 of 601 MB) (device 64:0x2) *** Dumping: 55% complete (336 of 60
1 MB) (device 64:0x2) *** Dumping: 57% complete (344 of 601 MB) (device 64:0x2) *** Dumping: 58% complete (352 of 601 MB) (device 64:0x2) *** Dumping: 59% complete (360 of 601 MB) (device 64:0x2) *** Dumping: 61% complete
(368 of 601 MB) (device 64:0x2) *** Dumping: 62% complete (376 of 601 MB) (device 64:0x2) *** Dumping: 63% complete (384 of 601 MB) (device 64:0x2) *** Dumping: 65% complete (392 of 601 MB) (device 64:0x2) *** Dumping: 66%
complete (400 of 601 MB) (device 64:0x2) *** Dumping: 67% complete (408 of 601 MB) (device 64:0x2) *** Dumping: 69% complete (416 of 601 MB) (device 64:0x2) *** Dumping: 70% complete (424 of 601 MB) (device 64:0x2) *** Dum
ping: 71% complete (432 of 601 MB) (device 64:0x2) *** Dumping: 73% complete (440 of 601 MB) (device 64:0x2) *** Dumping: 74% complete (448 of 601 MB) (device 64:0x2) *** Dumping: 75% complete (456 of 601 MB) (device 64:0x2)
*** Dumping: 77% complete (464 of 601 MB) (device 64:0x2) *** Dumping: 78% complete (472 of 601 MB) (device 64:0x2) *** Dumping: 79% complete (480 of 601 MB) (device 64:0x2) *** Dumping: 81% complete (488 of 601 MB) (device 64
:0x2) *** Dumping: 82% complete (496 of 601 MB) (device 64:0x2) *** Dumping: 83% complete (504 of 601 MB) (device 64:0x2) *** Dumping: 85% complete (512 of 601 MB) (device 64:0x2) *** Dumping: 86% complete (520 of 601 MB)
(device 64:0x2) *** Dumping: 87% complete (528 of 601 MB) (device 64:0x2) *** Dumping: 89% complete (536 of 601 MB) (device 64:0x2) *** Dumping: 90% complete (544 of 601 MB) (device 64:0x2) *** Dumping: 91% complete (552 o
f 601 MB) (device 64:0x2) *** Dumping: 93% complete (560 of 601 MB) (device 64:0x2) *** Dumping: 94% complete (568 of 601 MB) (device 64:0x2) *** Dumping: 95% complete (576 of 601 MB) (device 64:0x2) *** Dumping: 97% compl
ete (584 of 601 MB) (device 64:0x2) *** Dumping: 98% complete (592 of 601 MB) (device 64:0x2) *** Dumping: 99% complete (600 of 601 MB) (device 64:0x2) *** Dumping: 100% complete (601 of 601 MB) (device 64:0x2) *** Dumping:
100% complete (601 of 601 MB).
*** System rebooting.

****/****
Any idea is much appreciated!
Regard,
Hoang Chi Cong

Looking for a special chance.......
11 REPLIES 11
Mel Burslan
Honored Contributor

Re: System hang with this message:"The fork function failed. Too many processes already exist."

proc table is a chunk of memory where the PIDs of the processes running on the system is kept. This table's size is controlled by the kernel parameter nproc. In default system installations, nproc is tied to an artificial fromula like this:

(20+8*MAXUSERS)

considering the maxusers kernel parameter is set to 32 or 64, the value is for nproc is a very low number indeed.

First off, you can forget about the formula value and fix the number to a certain value. Formula is just a guideline not a golden rule to live by.

run command:

kmtune | grep nproc

to see what the value currently is. Then either manually or if you do not feel comfy with maual kernel compiles, using sam, modify this parameter to a higher value (usually two to three times is a good starting point, depending on the current value). Recompile the kernel and reboot your system. Keep in mind that, if you increase the value to obscenely high numbers, this time your memory performance will suffer. So, be reasonable.

After the reboot, check the proc table utilization using glance or if you do not have glance license, using command

sar -v 5 10

and note the ratio under column header proc-sz. If the ratio is coming cose to 1, then it may be time to bumpp it up a little more. It is an iterative way how to figure out the right value as the right value depends on what your system does, day in and day out.

hope this helps
________________________________
UNIX because I majored in cryptology...
Hoang Chi Cong_1
Honored Contributor

Re: System hang with this message:"The fork function failed. Too many processes already exist."

Thanks you for the answer but I have checked this kernel parameter and I think it is enough:
nproc 4116 - (20+8*MAXUSERS)

I have an notice:
In this server, no application is running on it! It is a standby server in the Cluster! And the connection to this server is regular! It is strange, isn't it?

Any idea?
Looking for a special chance.......
Mel Burslan
Honored Contributor

Re: System hang with this message:"The fork function failed. Too many processes already exist."

In which case, it is actually very strange. It means that there are some runaway processes on your system not releasing the proc table space as they die or they just plain refuse to die.

If I were you, next time the system hangs, go to the console and hit ctrl-B and at the GSP prompt, force a TOC by typing

TC

at the gsp prompt. Then send the crash dumpp to HP for further analysis. They should be able to pinpoint the problem pretty quickly.
________________________________
UNIX because I majored in cryptology...
CAS_2
Valued Contributor

Re: System hang with this message:"The fork function failed. Too many processes already exist."

run

ps -ef | grep defunct

in order to check the amount of zombie processes that might fill the proc table.
Colin Topliss
Esteemed Contributor

Re: System hang with this message:"The fork function failed. Too many processes already exist."

Hi,

A TOC is a Transfer Of Control - basically the system was forced to restart.

If you didn't initiate it, then your cluster software initiated it on your behalf (you say its clustered - is this ServiceGuard)?

What I guess has happened is that you may have had a network glitch. That would possibly have caused the cluster to go split-brained, at which point SG would issue a TOC to try and resolve the issue.

Check your syslog for any unusual messages.

Look for any crash files in /var/adm/crash. Look for a file called INDEX - it may give you a clue as to what happened (especially if it was a TOC that caused your system to reboot). If not, ask HP to send you crashinfo - its a really useful tool that will give you information on why a system crashed by analysing the crash dump for you.

Regards

Col.
Jarle Bjorgeengen
Trusted Contributor

Re: System hang with this message:"The fork function failed. Too many processes already exist."

This is indeed a serviceguard cluster, otherwise you wouldn't have the hacl entries in syslog. (This are the HACluster daemons of the nodes in the cluster talking to each other )

I bet Serviceguard TOC'ed the node, and that it is not necesseraly related to proc table full messages.

Look at the syslog of the the other node(s) in the cluster to better understand what happened.

RAC_1
Honored Contributor

Re: System hang with this message:"The fork function failed. Too many processes already exist."

sar -v 2 5

check nproc, is it almost used and has overruns, then you need to bump up that parameter.
There is no substitute to HARDWORK
Alessandro Pilati
Esteemed Contributor

Re: System hang with this message:"The fork function failed. Too many processes already exist."

Hoang,
to increase nproc, run
kmtune -s nproc=new_bigger_value

and then you should regenerate kernel:
cd /stand
mkdir bak
cp system bak
mk_kernel -o /stand/vmunix
kmupdate
cd /
shutdown -r y now


Regards,
Alex
if you don't try, you'll never know if you are able to
Hoang Chi Cong_1
Honored Contributor

Re: System hang with this message:"The fork function failed. Too many processes already exist."

First off all, I would like to thanks you for your replies.

To Jarle Bjoergeengen: I agree with you in one point: It doesn't related to proc table.
Because as my previous post, the nproc parameter was set to 4116. It is enough!
But I don't think this error cause by MC/SG.
In the other node of cluster, there wasn't any log information related!

To RAC:
Here is out put of sar -v 2 5 command:
HP-UX ipcasba B.11.11 U 9000/800 09/09/05

08:56:05 text-sz ov proc-sz ov inod-sz ov file-sz ov
08:56:07 N/A N/A 131/4116 0 0/5076 0 1204/62358 0
08:56:09 N/A N/A 131/4116 0 0/5076 0 1205/62358 0
08:56:11 N/A N/A 130/4116 0 0/5076 0 1200/62358 0
08:56:13 N/A N/A 130/4116 0 0/5076 0 1200/62358 0
08:56:15 N/A N/A 130/4116 0 0/5076 0 1200/62358 0

Have you any idea? The proc size seem no trouble, doesn't it?

To Alessandro Pilati:

In fact, I don't want to increase this parameter. I remember that with 1024 is good for this parameter but in my system, this value more bigger!


Regard,
Hoang Chi Cong
Looking for a special chance.......
RAC_1
Honored Contributor

Re: System hang with this message:"The fork function failed. Too many processes already exist."

Though sar -v does not show max. usage for nproc. You can check few more things.

What is the setting for maxuprc?? This is per user process limit.

Also, check the what was the highest utilization of nproc with glance.

glance -t

It will give what was max. % value that was hit.
There is no substitute to HARDWORK
Bill Singletary
New Member

Re: System hang with this message:"The fork function failed. Too many processes already exist."

Hoang, et. al.,

Any word on what happened with this? We started seeing the same problem here too. Assuming the nproc issue is a red herring, then what could be the cause of all these zombie processes?

Thanks,
Bill