Operating System - HP-UX
1753440 Members
4890 Online
108794 Solutions
New Discussion юеВ

coredump in $$dyncall_external_20+0()

 
SOLVED
Go to solution
H.H.
Occasional Advisor

coredump in $$dyncall_external_20+0()

I have a C++ program which is dynamically linked with shared libraries. It has been working fine for a months, but few days ago it failed and core dump with the following stack trace was generated:

Core was generated by `serv'.
Program terminated with signal 10, Bus error.
#0 0xc49c08c4 in $$dyncall_external_20+0 () from /home/hh/lib/libservMain.16
(gdb) where
#0 0xc49c08c4 in $$dyncall_external_20+0 () from /home/hh/lib/libservMain.16
#1 0xc4a2b5ac in sm_tmr_process () at smgn_timers.C:387
#2 0xc4acb3d8 in sm_nm_process_timeouts (seconds_ptr=0x77b67190, micro_seconds_ptr=0x77b67194) at smnm_timeout.C:107
#3 0xc4a0ef44 in SM_ProcMgmt::msgQueueThread (arg=0x405d7570) at SM_ProcMgmt.C:1639
#4 0xc005b1c8 in __pthread_body+0x44 () from /lib/libpthread.1
#5 0xc006549c in __pthread_start+0x14 () from /lib/libpthread.1

The calls #1-#3 are from libservMAIN.16 library. In the stack at position #1 (smgn_timers.C:387) the line below is called:
if (timeoutList.empty())

Object timeoutList is located in a global scope of libservMain.16 library, but implementation of its class is located in another - libservUtil.16 library.

All libraries are linked dynamically, we don't use dlopen() of shl_*() functions. What can be wrong with application or with libraries? What are the possible causes of fail of $$dyncall_external_20+0 ()?

It happened on HP-UX 11.11 PA-RISC machine
9 REPLIES 9
Dennis Handly
Acclaimed Contributor

Re: coredump in $$dyncall_external_20+0()

>but few days ago it failed

What changed?

What is the value of $r22 in $$dyncall_external_20? That's your plabel.

>if (timeoutList.empty())

This means your virtual table hasn't been set up? Or the object timeoutList hasn't been constructed yet. Where are you creating threads? After main is called?

>Object timeoutList is located in a global scope of libservMain.16 library, but implementation of its class is located in another libservUtil.16 library.

If you are creating threads during static construction, timeoutList may not be constructed yet?

Or perhaps you overwrite that memory? A hardware watch point would catch it.
H.H.
Occasional Advisor

Re: coredump in $$dyncall_external_20+0()

>>but few days ago it failed

>What changed?

it seems that nothing has been changed. Also this application is running fine on other customers' installations

> What is the value of $r22 in $$dyncall_external_20? That's your plabel.

how to check it in gdb?

>>if (timeoutList.empty())

>This means your virtual table hasn't been set up? Or the object timeoutList hasn't been constructed yet. Where are you creating threads? After main is called?

yes, the threads are created after calling main(). So, the timeoutList object should be already created.

But there is another interesting thing in the core file. I see that another (1st) thread interrupted initializing some shared librariy (__shlInit) and called termination of that shared library (__shlTerm). Here is an information about the 1st thread:

(gdb) info threads
* 5 system thread 166153 0xc49c08c4 in $$dyncall_external_20+0 () from /home/hh/lib/libservMain.16
4 system thread 166154 0xc0210940 in __sigwait_sys+0x10 () from /lib/libc.2
3 system thread 166147 0xc0210940 in __sigwait_sys+0x10 () from /lib/libc.2
2 system thread 166100 0xc0210940 in __sigwait_sys+0x10 () from /lib/libc.2
1 system thread 165829 0xc4598ca8 in SM_nmDebug::blockExit (this=0x77bc97a0) at smnm_debug.C:1521
(gdb) thread 1
[Switching to thread 1 (system thread 165829)]
#0 0xc4598ca8 in SM_nmDebug::blockExit (this=0x77bc97a0) at smnm_debug.C:1521
1521 smnm_debug.C: No such file or directory.
in smnm_debug.C
(gdb) where
#0 0xc4598ca8 in SM_nmDebug::blockExit (this=0x77bc97a0) at smnm_debug.C:1521
#1 0xc458f5d0 in SM_nmDebug::~SM_nmDebug (this=0x77bc97a0, #free=2) at smnm_debug.C:490
#2 0xc35f6160 in __shlTerm+0x29c () from /lib/libCsup_v2.2
#3 0xc35f6238 in __shlInit+0x44 () from /lib/libCsup_v2.2
#4 0xc49c0918 in _shlInit+0x20 () from /home/hh/lib/libservMain.16
#5 0xc35f5ba8 in __shlinit+0xac () from /lib/libCsup_v2.2
#6 0xc35f5eac in __callInitFuncFromHandle+0xf0 () from /lib/libCsup_v2.2
#7 0xc35f7fac in _niam_body+0xc0 () from /lib/libCsup_v2.2
#8 0xc35f8080 in _niam+0x1c () from /lib/libCsup_v2.2

Could you please comment this stack trace? Does it contain anything suspect?

>>Object timeoutList is located in a global scope of libservMain.16 library, but implementation of its class is located in another libservUtil.16 library.

>If you are creating threads during static construction, timeoutList may not be constructed yet?

Threads should be created after starting main().

>Or perhaps you overwrite that memory? A hardware watch point would catch it.

Unfortunately it happened on product environment and it will be impossible to debug the application there.
Dennis Handly
Acclaimed Contributor

Re: coredump in $$dyncall_external_20+0()

>> What is the value of $r22 in $$dyncall_external_20?

>how to check it in gdb?

p /x $r22

>the timeoutList object should be already created.
>called termination of that shared library (__shlTerm).
>#8 0xc35f8080 in _niam+0x1c /lib/libCsup_v2.2

Except this is during termination.

>Could you please comment this stack trace? Does it contain anything suspect?

Everything seems normal, you are exiting. But nobody told thread #5 that it should stop/wait or be killed by the main thread.

You will need to see what #5 is doing and seem if it is reasonable when the main thread is exiting.
H.H.
Occasional Advisor

Re: coredump in $$dyncall_external_20+0()

> p /x $r22

I've checked the value you requested:

(gdb) p /x $r22
$1 = 0x0
Dennis Handly
Acclaimed Contributor

Re: coredump in $$dyncall_external_20+0()

>(gdb) p /x $r22 -> $1 = 0x0

This shows you have an uninitialized plabel and if it was initialized, you need to synchronize your process shutdown.
H.H.
Occasional Advisor

Re: coredump in $$dyncall_external_20+0()

> This shows you have an uninitialized plabel and if it was initialized, you need to synchronize your process shutdown.

could you please give me a tip what plabel is? I'm not so common with low-level development.
Dennis Handly
Acclaimed Contributor
Solution

Re: coredump in $$dyncall_external_20+0()

>could you please give me a tip what plabel is?

A pointer to a function. It could also be from the virtual table. Or the virtual table pointer in a already destructed object, is bad.

You are terminating and you have another thread that is still trying to use something that has already been destroyed.

I.e. you need to look at stack traces from each thread and see if it is reasonable for that state to be valid.
H.H.
Occasional Advisor

Re: coredump in $$dyncall_external_20+0()

thank you for the explanation. AS far as I understand, at the line smgn_timers.C:387 I call a code from the library which is already unloaded from the process by another (main) thread. I'll check my threads interaction scheme.
Dennis Handly
Acclaimed Contributor

Re: coredump in $$dyncall_external_20+0()

>is already unloaded from the process

That or an object that has be destructed.