Operating System - HP-UX
1753774 Members
7019 Online
108799 Solutions
New Discussion

Re: Why some POSIX threads do not have their stacktrace starting from __pthread_bound_body?

 
blackwater
Regular Advisor

Why some POSIX threads do not have their stacktrace starting from __pthread_bound_body?

I have been analyzing a core file of a multithreaded C++ program.
So as a first step I did in gdb 'thread apply all bt' (gdb 6.1, HP-UX 11.31).

Most of the threads have stack trace that starts from __pthread_bound_body().
This is an example:

#52 0xc000000021f98800:0 in selfcare::operations_execute (ctx=@0x60000002f364a140, from=<not available>, to={_M_current = 0x6000000751306090}, snmp_on=false) at has_source/request_handler.cpp:1002
#53 0xc000000021fa1d80:0 in selfcare::request_handler::process_request (this=0x60000002f364a000, cmd=0x60000007b06ec200, parameters=@0x6000000727499fe8) at has_source/request_handler.cpp:1609
#54 0xc000000021fa8920:0 in selfcare::request_handler::execute (this=0x60000002f364a000) at has_source/request_handler.cpp:2042
#55 0xc000000018b0c460:0 in threads::thread_proc (thr_ptr=0x60000002f364a000) at has_common_source/source/cpp/threads.cpp:225
#56 0xc000000000117b20:0 in __pthread_bound_body () at /ux/core/libs/threadslibs/src/common/pthreads/pthread.c:4875

 

However there are a few threads that have stacktrace that starts from other functions.
These are examples:
1) frame #25 is the last thread in this thread

#23 0xc000000021f98800:0 in selfcare::operations_execute (ctx=@0x60000002f3ac8140, from=<not available>, to={_M_current = 0x6000000710d13490}, snmp_on=false) at has_source/request_handler.cpp:1002
#24 0xc000000021fa1d80:0 in selfcare::request_handler::process_request (this=0x60000002f3ac8000, cmd=0x600000072a2fea00, parameters=@0x60000002eb5b48c8) at has_source/request_handler.cpp:1609
#25 0xc000000021fa8920:0 in selfcare::request_handler::execute (this=0x60000002f3ac8000) at has_source/request_handler.cpp:2042

 

2)frame #29 is the last thread in this thread

#24 0xc000000021f94030:0 in selfcare::operation_execute (ctx=@0x60000002f5d6a140, o=@0x6000000870219d80) at has_source/request_handler.cpp:781
#25 0xc000000021f98800:0 in selfcare::operations_execute (ctx=@0x60000002f5d6a140, from=<not available>, to={_M_current = 0x6000000870219e10}, snmp_on=false) at has_source/request_handler.cpp:1002
#26 0xc000000021f99680:0 in selfcare::execute_operations (root_operation_name=<not available>, ec=@0x60000002f5d6a140, snmp_on=false) at has_source/request_handler.cpp:1085
#27 0xc000000021dd03e0:0 in selfcare::operation_context_wrapper::exec_operation (this=0x9fffffff39cc35a8, name=@0x9fffffff39cc2a48, args=<not available>, ptr_os=<not available>, ptr_psf=0x60000002f5d6eb30) at has_source/operation_context_wrapper.cpp:330
#28 0xc000000021dca0a0:0 in selfcare::operation_context_wrapper::exec_operation (this=0x9fffffff39cc35a8, name=<not available>, args=@0x9fffffff39cc2df8, ptr_os=0x9fffffff39cc3350, ptr_psf=0x9fffffff39cc2cb0) at has_source/operation_context_wrapper.cpp:366
#29 0xc000000023873190:0 in serviceguide::operations::call_credit_subs_brt_get::operator() (this=<not available>, ctx=@0x9fffffff39cc35a8) at sc_source/get_brt_cc_volumes.cpp:290

 

All POSIX threads in the program are created in the same way.
I don't understand why some threads do not have their stacktrace starting from __pthread_bound_body.
Does it mean that when there is no __pthread_bound_body() then this is kind of corruption in stack or data?
If it is indeed corruption how can this particular corruption be found or detected at runtime?
Or it just gdb can't show stacktraces properly?

 

 

 

 

2 REPLIES 2
blackwater
Regular Advisor

Re: Why some POSIX threads do not have their stacktrace starting from __pthread_bound_body?

For example #1 where the frame #25 is the last thread in this thread:

 

(gdb) f 25
#25 0xc000000021fa8920:0 in selfcare::request_handler::execute (
    this=0x60000002f3ac8000) at has_source/request_handler.cpp:2042
2042    in has_source/request_handler.cpp
(gdb) info frame
Stack level 25, frame at 0x9fffffff5731a390:
 ip = 0xc000000021fa8920:0 in selfcare::request_handler::execute()
    (has_source/request_handler.cpp:2042); saved ip 0x0
 caller of frame at 0x9fffffff57315560
 source language c++.
 Size of frame is 93, Size of locals is 88, Size of rotating is 0.
 NAT collections saved at 0x9fffffff56f1c1f8.
 Arglist at 0x9fffffff56f1c0e8, args: this=0x60000002f3ac8000
 Locals at 0x9fffffff56f1c0e8, Previous frame's sp is 0x9fffffff5731a390


So for frame #25

1) Stack level 25, frame at 0x9fffffff5731a390

2) Previous frame's sp is 0x9fffffff5731a390

 

Iit means that "this frame" == "Previous frame" and there is no more frames above. Am I right? But where is a frame for __pthread_bound_body()?

 

Dennis Handly
Acclaimed Contributor

Re: Why some POSIX threads do not have their stacktrace starting from __pthread_bound_body?

>If it is indeed corruption how can this particular corruption be found or detected at runtime?

 

Well if you have corruption, it could be one thread stack destroying another by jumping over the guard pages.

With pthread_attr_setguardsize you can make them bigger.

Have you checked the sizes of your stacks?

The following will give you the sizes of all of your memory segments:

# elfdump -o -S core-file

Type     Offset           Vaddr            FSize            Memsz

CoreStck 0000000001228bd0 9fffffffbf7ff000 0000000000001000 0000000000001000 RSE stack
CoreStck 0000000001229bd0 9ffffffffffd0000 0000000000030000 0000000000030000 user stack

Thread stacks would be MMF regions, unfortunately shlib data is the same.  Look for the right sizes.

Or use gdb's "info shared" and exclude those ranges.

 


>Or it just gdb can't show stacktraces properly?

 

Can you repeatedly duplicate the problem?

Can you call U_STACK_TRACE in some of those unusual threads?

gdb should be calling libunwind and that should get it right.  U_STACK_TRACE would prove it one way or the other.

 

>saved ip 0x0

 

This is why it stopped.

 

You need to disassemble the bunch of instructions at the start of this function.  Figure out where it put the PC.  And if in a register, where in the RSE stack it put it.

What does "info reg" show for that frame?

 

>1) Stack level 25, frame at 0x9fffffff5731a390

>2) Previous frame's sp is 0x9fffffff5731a390

 >It means that "this frame" == "Previous frame"

 

Unfortunately gdb has some weird idea how frames should be described and I need to draw a picture each time.  The compiler only deals with SP and BSP.  And another dedicated register if alloc frames.

 

RSE stack:

0x9fffffff56f1c0e8  arg list/locals == BSP

0x9fffffff56f1c1f8  NAT reg

  V V

guard page

 

  ^ ^

User stack:

0x9fffffff57315560  caller of frame  This doesn't make sense, should be after!

0x9fffffff5731a390  frame at

0x9fffffff5731a390  prev frame's SP

 

gdb thinks the frame starts on the high address.