Re: Why some POSIX threads do not have their stacktrace starting from __pthread_bound_body?

blackwater · ‎08-01-2013

I have been analyzing a core file of a multithreaded C++ program.
So as a first step I did in gdb 'thread apply all bt' (gdb 6.1, HP-UX 11.31).

Most of the threads have stack trace that starts from __pthread_bound_body().
This is an example:

#52 0xc000000021f98800:0 in selfcare::operations_execute (ctx=@0x60000002f364a140, from=<not available>, to={_M_current = 0x6000000751306090}, snmp_on=false) at has_source/request_handler.cpp:1002
#53 0xc000000021fa1d80:0 in selfcare::request_handler::process_request (this=0x60000002f364a000, cmd=0x60000007b06ec200, parameters=@0x6000000727499fe8) at has_source/request_handler.cpp:1609
#54 0xc000000021fa8920:0 in selfcare::request_handler::execute (this=0x60000002f364a000) at has_source/request_handler.cpp:2042
#55 0xc000000018b0c460:0 in threads::thread_proc (thr_ptr=0x60000002f364a000) at has_common_source/source/cpp/threads.cpp:225
#56 0xc000000000117b20:0 in __pthread_bound_body () at /ux/core/libs/threadslibs/src/common/pthreads/pthread.c:4875

However there are a few threads that have stacktrace that starts from other functions.
These are examples:
1) frame #25 is the last thread in this thread

#23 0xc000000021f98800:0 in selfcare::operations_execute (ctx=@0x60000002f3ac8140, from=<not available>, to={_M_current = 0x6000000710d13490}, snmp_on=false) at has_source/request_handler.cpp:1002
#24 0xc000000021fa1d80:0 in selfcare::request_handler::process_request (this=0x60000002f3ac8000, cmd=0x600000072a2fea00, parameters=@0x60000002eb5b48c8) at has_source/request_handler.cpp:1609
#25 0xc000000021fa8920:0 in selfcare::request_handler::execute (this=0x60000002f3ac8000) at has_source/request_handler.cpp:2042

2)frame #29 is the last thread in this thread

#24 0xc000000021f94030:0 in selfcare::operation_execute (ctx=@0x60000002f5d6a140, o=@0x6000000870219d80) at has_source/request_handler.cpp:781
#25 0xc000000021f98800:0 in selfcare::operations_execute (ctx=@0x60000002f5d6a140, from=<not available>, to={_M_current = 0x6000000870219e10}, snmp_on=false) at has_source/request_handler.cpp:1002
#26 0xc000000021f99680:0 in selfcare::execute_operations (root_operation_name=<not available>, ec=@0x60000002f5d6a140, snmp_on=false) at has_source/request_handler.cpp:1085
#27 0xc000000021dd03e0:0 in selfcare::operation_context_wrapper::exec_operation (this=0x9fffffff39cc35a8, name=@0x9fffffff39cc2a48, args=<not available>, ptr_os=<not available>, ptr_psf=0x60000002f5d6eb30) at has_source/operation_context_wrapper.cpp:330
#28 0xc000000021dca0a0:0 in selfcare::operation_context_wrapper::exec_operation (this=0x9fffffff39cc35a8, name=<not available>, args=@0x9fffffff39cc2df8, ptr_os=0x9fffffff39cc3350, ptr_psf=0x9fffffff39cc2cb0) at has_source/operation_context_wrapper.cpp:366
#29 0xc000000023873190:0 in serviceguide::operations::call_credit_subs_brt_get::operator() (this=<not available>, ctx=@0x9fffffff39cc35a8) at sc_source/get_brt_cc_volumes.cpp:290

All POSIX threads in the program are created in the same way.
I don't understand why some threads do not have their stacktrace starting from __pthread_bound_body.
Does it mean that when there is no __pthread_bound_body() then this is kind of corruption in stack or data?
If it is indeed corruption how can this particular corruption be found or detected at runtime?
Or it just gdb can't show stacktraces properly?

blackwater · ‎08-01-2013

For example #1 where the frame #25 is the last thread in this thread:

(gdb) f 25
#25 0xc000000021fa8920:0 in selfcare::request_handler::execute (
    this=0x60000002f3ac8000) at has_source/request_handler.cpp:2042
2042    in has_source/request_handler.cpp
(gdb) info frame
Stack level 25, frame at 0x9fffffff5731a390:
ip = 0xc000000021fa8920:0 in selfcare::request_handler::execute()
    (has_source/request_handler.cpp:2042); saved ip 0x0
caller of frame at 0x9fffffff57315560
source language c++.
Size of frame is 93, Size of locals is 88, Size of rotating is 0.
NAT collections saved at 0x9fffffff56f1c1f8.
Arglist at 0x9fffffff56f1c0e8, args: this=0x60000002f3ac8000
Locals at 0x9fffffff56f1c0e8, Previous frame's sp is 0x9fffffff5731a390

So for frame #25

1) Stack level 25, frame at 0x9fffffff5731a390

2) Previous frame's sp is 0x9fffffff5731a390

Iit means that "this frame" == "Previous frame" and there is no more frames above. Am I right? But where is a frame for __pthread_bound_body()?

Dennis Handly · ‎08-01-2013

>If it is indeed corruption how can this particular corruption be found or detected at runtime?

Well if you have corruption, it could be one thread stack destroying another by jumping over the guard pages.

With pthread_attr_setguardsize you can make them bigger.

Have you checked the sizes of your stacks?

The following will give you the sizes of all of your memory segments:

# elfdump -o -S core-file

Type Offset Vaddr FSize Memsz

CoreStck 0000000001228bd0 9fffffffbf7ff000 0000000000001000 0000000000001000 RSE stack
CoreStck 0000000001229bd0 9ffffffffffd0000 0000000000030000 0000000000030000 user stack

Thread stacks would be MMF regions, unfortunately shlib data is the same. Look for the right sizes.

Or use gdb's "info shared" and exclude those ranges.

>Or it just gdb can't show stacktraces properly?

Can you repeatedly duplicate the problem?

Can you call U_STACK_TRACE in some of those unusual threads?

gdb should be calling libunwind and that should get it right. U_STACK_TRACE would prove it one way or the other.

>saved ip 0x0

This is why it stopped.

You need to disassemble the bunch of instructions at the start of this function. Figure out where it put the PC. And if in a register, where in the RSE stack it put it.

What does "info reg" show for that frame?

>1) Stack level 25, frame at 0x9fffffff5731a390

>2) Previous frame's sp is 0x9fffffff5731a390

>It means that "this frame" == "Previous frame"

Unfortunately gdb has some weird idea how frames should be described and I need to draw a picture each time. The compiler only deals with SP and BSP. And another dedicated register if alloc frames.

RSE stack:

0x9fffffff56f1c0e8 arg list/locals == BSP

0x9fffffff56f1c1f8 NAT reg

V V

guard page

^ ^

User stack:

0x9fffffff57315560 caller of frame This doesn't make sense, should be after!

0x9fffffff5731a390 frame at

0x9fffffff5731a390 prev frame's SP

gdb thinks the frame starts on the high address.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Why some POSIX threads do not have their stacktrace starting from __pthread_bound_body?

Why some POSIX threads do not have their stacktrace starting from __pthread_bound_body?

Re: Why some POSIX threads do not have their stacktrace starting from __pthread_bound_body?

Re: Why some POSIX threads do not have their stacktrace starting from __pthread_bound_body?