Operating System - HP-UX
1847893 Members
1777 Online
104021 Solutions
New Discussion

Re: Process running at 100% CPU Utilization

 
Sourav Sain
Occasional Advisor

Process running at 100% CPU Utilization

Hi All
I have a daemon process which listens at a socket and forks child processes which a request is received. I am facing a problem When large number of requests are being sent and say 30-40 children are forked, It is being observed that after sometime a child (and sometimes 2 children) start using 100% CPU [observation of top command].
More information:
1) the child process runs execl and loads another process.

2) The situation does not get reproduced on attaching tusc as in this case tusc utilizes most of the CPU usage and the concerned process does not reach 100% CPU utilization.

3) The problem arises at random and is not seen always

4) The problem is being observed after applying the 2 bundles
GOLDAPPS11i B.11.11.0606.446 Applications Patches for HP-UX 11i v1, June 2006

GOLDBASE11i B.11.11.0606.446 Base Patches for HP-UX 11i v1, June 2006
[But I don’t know is this has something to do with the problem]

5) Core file generated by sending signal SIGTRAP and then doing "what" on the core gives output

HP-UX libm shared PA1.0 C math library 970220 (133940) UX 10.20
PATCH-PHCO_32718 for 10.20; for 10.30, 11.x compatibility libc.1_ID@@/main/r10dav/libc_dav/libc_dav_cpe//1
/ux/core/libs/libc/shared_pa1/libc.1_ID
Feb 4 2005 10:03:03
OpenGL 1.1 Revision 1.45 on HP-UX 11.00 $Date: 07-Jul-05.17:25:04 $ $Revision: 20050707.17729 $ libogltls.2
SMART_BIND
92453-07 dld dld dld.sl B.11.37 030909

Please guied as to what can be the cause.

Thanks in advance.
7 REPLIES 7
Dennis Handly
Acclaimed Contributor

Re: Process running at 100% CPU Utilization

>2) The situation does not get reproduced on attaching tusc as in this case tusc utilizes most of the CPU usage and the concerned process does not reach 100% CPU utilization.

This duplicates it. You need to see what tusc is showing the child is doing.

>5) Core file generated by sending signal SIGTRAP and then doing "what" on the core gives output

Using what(1) on the core is near useless. You need to get gdb and attach to the process and do a "bt" to get a stack trace. If you want to isolate where the loop is, you may want to use "finish". If it returns to the caller, do it again, until it doesn't. Then it is that function that has the loop.

Or you can use gdb on that core file you generated but you won't be able to use "finish".
Sourav Sain
Occasional Advisor

Re: Process running at 100% CPU Utilization

Thanks Dennis,
I will do as asked and get back with the results if things still seem confusing.

bye :)
Sourav
Sourav Sain
Occasional Advisor

Re: Process running at 100% CPU Utilization

Hi Dennis and all there
I have further investigated into this problem and what actually what is happening.

I found that the probme is when allocating mmory using "new" for String. The process goes to "gcc-3.4.2/libstdc++-v3/libsupc++/new_op.cc: line 48" and then calls malloc from /usr/lib/libc.1. At his time if my process gets a signal [Signal Alarm ] and on a signal handler I try to allocate a string for giving a proper message then this action again calls "gcc-3.4.2/libstdc++-v3/libsupc++/new_op.cc: line 48" and then malloc() in /usr/lib/libc.1 and the process starts taking 100% CPU and keeps running.

Please guide urgently as to how to solve this.

I have some of the stack trace attached

Thanks in advance
Sourav
Dennis Handly
Acclaimed Contributor

Re: Process running at 100% CPU Utilization

>I found that the problem is when allocating mmory using "new" for String....on a signal handler I try to allocate a string

This is illegal. You can do next to nothing in a signal handler.

> ... then malloc() in /usr/lib/libc.1 and the process starts taking 100% CPU and keeps running.

>Please guide urgently as to how to solve this.

As I said, this programming style is not valid. You could workaround it by using an emergency buffer and only use char*.

Since the signal is only a trivial SIGALRM, you could block signals while in heap routines (this will kill performance).

See mallopt(3) and M_BLOCK.

I also noticed that you have libc.1 in your stack trace. Are you developing on 10.20 and running on 11.11? If not, you should be linking against -lc or libc.2.

Sourav Sain
Occasional Advisor

Re: Process running at 100% CPU Utilization

Hey Dennis
Thanks. This helped a lot in understanding what all is happening and that I need to work on my signal handling ways.

As for the stack showing libc.1, you are right. The application was build in 10.20 and running on 11.11

I have one question and that is regarding M_BLOCK. Is there a way to get its value through enviromnent? I read about M_ARENA_OPTS and _M_SBA_OPTS but they dont seem to provide a way to block signals during malloc. So how do we block signal handling during malloc? Is it by calling mallopt during our application initialization?

Thanks again for your time and help!
Sourav Sain
Occasional Advisor

Re: Process running at 100% CPU Utilization

Hi Dennis and all

I found out that the issue is not due to signal handled while code is trying to allocate memory but even when we have removed all memory allocation in our signals. Its just during string allocation done by libstdc++ through malloc in libc.1

I have attached the stake trace for this time. Can you show some direction as to what is the cause, as now we do bare minimum in signal handler.

Thanks again
Dennis Handly
Acclaimed Contributor

Re: Process running at 100% CPU Utilization

>I found out that the issue is not due to signal handled while code is trying to allocate memory but even when we have removed all memory allocation in our signals.

How have you proven that? The only way to do this is to not call anything. Or set a breakpoint in malloc/free etc and make sure you don't hit anything. (Or visual machine code inspection. ;-)

>Its just during string allocation done by libstdc++ through malloc in libc.1

Even if you are not in a signal handler?

>I have attached the stack trace for this time. Can you show some direction as to what is the cause, as now we do bare minimum in signal handler.

If this is not a signal handler issue, then you have corrupted the heap. Typically these cause signal 11 but I guess loops are possible.

Do you link your application with -z to abort on NULL pointer deferences? That may find bugs easier, unless you have sloppy code that may trigger it.

You will have to use a heap corruption detection tool. I'm not sure if gdb will work with libc.1.