1829863 Members
2298 Online
109993 Solutions
New Discussion

pthread_create()

 
Krishnan_14
Occasional Advisor

pthread_create()

Hi,

We are intermittently getting a "pthread_create() failed" error from a standard vendor supplied application in one of our environments alone. We are not able to reproduce this error though. The same application runs fine in other environments without any issue. All the environments are configured alike.

We have the following kernel parms
max_thread_proc = 1024
nkthread = 20000
nproc = 1461

We have monitored the system and have not seen any significant increase in pthreads. The process thread list count monitored via gpm does not go above 70 for this application. The memory utilization is not high as well.

What could be the probable cause for failure.

Detailed error message is given below.

(cthreadpool.cxx:424): cThreadPool_::Add(): pthread_create failed. (from OS) Resource temporarily unavailable (0x0000000B)
(cthreadpool.cxx:461): CreateThreadFailed E:0x20200000 (not open)
(svcmain.cxx:487): ServiceMain(): pThreads->Add() Failed

Thanks
Krishnan
16 REPLIES 16
A. Clay Stephenson
Acclaimed Contributor

Re: pthread_create()

Whenever I code any threaded application, I always add a loop to retry after errno is set to EAGAIN on pthread_create(). Since you can't alter your application, I suggest that you carefully compare all the tunables between "good" and "bad" systems. For example, surprisingly, values like maxssiz can impact the number of threads that can be spawned on some flavors of UNIX. Are you watching your number of processes? Are you confident that you have enough swap? Have you compared the patch levels and shared libraries?
If it ain't broke, I can fix that.
Sandman!
Honored Contributor

Re: pthread_create()

Info on server type and OS version might help a bit. Post uname -a output.

thanks!
Krishnan_14
Occasional Advisor

Re: pthread_create()

Thanks for the replies.

The HPUX version is given below.
HP-UX hp_prod B.11.11 U 9000/800 651359373 unlimited-user license

Our unix admins have verified the kernel parms on both systems and claim they are similar.

The kernel parms that differ between the 2 systems are given below.

Parm Bad Good
bufpages 125000 0
dbc_max_pct 0 4
dbc_min_pct 0 4
ksi_alloc_max 11688 22352
maxdsiz_64bit 8808038400 4294967296
maxfiles 2048 4096
maxfiles_lim 2048 4096
maxuprc 1168 2235
maxusers 400 800
msgmni 1461 2794
nclist 6500 12900
ncsize 18856 29520
nfile 23963 757760
nflocks 1461 2794
ninode 13736 24400
nproc 1461 2794
nstrtel 400 800
nsysmap 2922 5588
nsysmap64 2922 5588
semmap 2924 5590
semmni 2922 5588
semmnu 1457 2790
swchunk 2048 65536
unlockable_mem 4000 8000

The bad system is our production server.

Please let me know if changing any of the kernel parms above could help prevent this problem.

Thanks
Krishnan
Banibrata Dutta
Frequent Advisor

Re: pthread_create()

Is this a MP system or single proccy system ?

BTW, from the pthread_create() manpage...

[EAGAIN]
The necessary resources to create another thread are not available, or the number of threads in the calling process already equals PTHREAD_THREADS_MAX.

And this happens to be the only error which can happen intermittently, all others are related to invalid args (though that too can happen if you keep changing args).

When this problem does occur, does it occur for multiple times in a bunch ?

BTW, the manpage also has a warning saying that if the threads are joinable but have not been joined, then whether or not they contribute towards PTHREAD_THREAD_MAX is undefined, but it may mean that they do, and "gpm" may not show them (not sure)!

- bd

Krishnan_14
Occasional Advisor

Re: pthread_create()

Yep. It has 8 processors.

We have closely monitored the system and don't believe the PTHREAD_THREADS_MAX is being reached. This is currently set to 1024.

Since the log does not provide any more details how can we determine what other resources could be maxed out?

I also noticed that warning about joinable threads and was not sure if that contributes towards PTHREAD_THREADS_MAX. If so is there any way to tell if this value is being exceeded.
Sandman!
Honored Contributor

Re: pthread_create()

What is the value of the max_thread_proc parameter setting in the kernel? Run the below command on both systems and post its output

# kmtune -q max_thread_proc

thanks!
Krishnan_14
Occasional Advisor

Re: pthread_create()

max_thread_proc 1024
nkthread 20000
nproc 1461
maxuprc 1168

Currently we have the above values on the production server which is the bad system. The good system also has the same values and we don't get this problem there.

Sandman!
Honored Contributor

Re: pthread_create()

Try increasing the value of max_thread_proc beyond 1024 and see if the problem disappears. Also what are the values of the following parameters on both systems:

maxdsiz
maxtsiz
maxssiz
A. Clay Stephenson
Acclaimed Contributor

Re: pthread_create()

There are simply too many unknowns here nor have you looked at differences in patches or shared libraries. I also see nothing about swapspace. I will say that you have really done one thing that is state of the art dumb. Essentially all of your resources on your production ("bad") box are smaller than their counterparts on the test, development ("good") box. The opposite condition should be true because you want the failures to occur in the test and/or development environment. I would make the test box like the production box and make sure that equivalent patches are applied and then try to reproduce the problem. You then increase one of your tunables at a time until the problem goes away -- in test.

Again, EAGAIN is not a completely unexpected event during pthread_create and the software should cope with what is normally a transient situation --- that is the real fix.
If it ain't broke, I can fix that.
Krishnan_14
Occasional Advisor

Re: pthread_create()

Thanks for all the responses. I have passed the suggestions to the our unix admins to change the kernel parms this weekend. If we still have this issue I will look for help in this forum again.
Emil Velez
Honored Contributor

Re: pthread_create()

Do you have measureware on the system ? If so I would put down the thresholds so you capture process information every minute. There are metrics that tell you the number of total threads for the OS and threads per process. Then when the process dies you can go back into the measureware data and see how many threads the process had minutes before.

If you dont have glance and measureware on the system you can load it once and use it for 60 days to "evaluate" the product.

Good luck..
Banibrata Dutta
Frequent Advisor

Re: pthread_create()

Hi Krishnan,

Could you plz answer one of the questions in my previous post, i.e.--

when the pthread_create() error does happen, then do you see a bunch of these (&/ other errors), or this is the single error for a sometime. also, what is the frequency of this error ?

thanks,
banibrata
Krishnan_14
Occasional Advisor

Re: pthread_create()

Hi banibrata

The pthread_create() error is noticed in the applications log. There are no associated problems seen either at the OS or with any other application. As I had indicated this is quite intermittent and we have not been able to reproduce this. The frequency is not consistent as well. Couple of times per week.

Our unix admins have been monitoring the systems closely with gpm, glance etc. and are not able to find any resource issues that might trigger this.
Sandman!
Honored Contributor

Re: pthread_create()

Krishnan,

IMHO...the intermittency and frequency of the problem might correlate to load on the server. Your production server maybe getting impacted heavily during those two or three times/week spawning processes that create threads but don't destroy them.

As for your other system where this application runs fine, is that a development box? If that's the case then try ramping up the load on your development box so that it matches production and see if you come across this error again. I'm saying so because many a times the development systems are not stress-tested to production's capacity.

BTW, did you try increasing the max_thread_proc parameter? And what about the kernel values for maxtsiz/maxdsiz/maxssiz on both the systems.

cheers!
Banibrata Dutta
Frequent Advisor

Re: pthread_create()

Well Krishnan, I think you are on the right track but few things you need to note & do.

1) max_thread_proc is a per-process limit, not a cumulative system limit.

2) nkthread OTOH is a cumulative system limit. so if you have other processes which momentarily create too many thread, of too many processes get spawned with few threads, the nkthread may be hit, even though for your application max_thread_proc is not hit.

so you need to watch out for no. of threads in your application, total number of threads in the system, while monitoring.

One thing you can do though is to reduce the max_thread_proc to say about 80 (since you said your application, at max runs about 70 odd threads), and then see if the frequency of error increases. This may help in faster and controllable reproduction of the problem.

If you are able to reproduce your problem more often, debugging shall become easier. I'd recommend this as the first step, not increasing any of the limits.

my 2 cents,
bd
Alexey Roytman
Frequent Advisor

Re: pthread_create()

Creating thread allocates additional memory (either by sbrk() or by mmap()) for its stack, thus when this call fail, pthread_create() fails too.

Tru to install "tusc" to trace system calls, and check the mmap()/sbrk() calls for failures.