Languages and Scripting
cancel
Showing results for 
Search instead for 
Did you mean: 

Illegal instruction in pthread_mutex_lock

 
Highlighted
Occasional Advisor

Illegal instruction in pthread_mutex_lock


We are trying to port a huge DCE/Encina/Sybase based application from HP-UX 10.20 (dce treading) to 11.11 (kernel threading, 32 bits). One of our applications core dumps every time when doing a certain rpc call, and we don't understand why. Hope you can give us some pointers or solutions. Below is some key information.
We use aCC for the final linking stage, some of the static libraries linked in are created by cc.
Thanks in advance,
Rob


Gdb backtrace:
=============
Core was generated by `Our_app'.
Program terminated with signal 4, Illegal instruction.

warning: The shared libraries were not privately mapped; setting a
breakpoint in a shared library will not work until you rerun the program.

#0 0xc003de64 in pthread_mutex_lock+0x48 () from /usr/lib/libpthread.1
(gdb) bt
#0 0xc003de64 in pthread_mutex_lock+0x48 () from /usr/lib/libpthread.1
#1 0xc021c438 in __thread_mutex_lock+0x70 () from /usr/lib/libc.2
#2 0xc0196e60 in _sigfillset+0xa0 () from /usr/lib/libc.2
#3 0xc0195254 in _sscanf+0x67c () from /usr/lib/libc.2
#4 0xc019a3ac in malloc+0x18c () from /usr/lib/libc.2
#5 0xf64ed4 in DHMR118V_GetCl1ConQualNINOS+0x2b4 ()
#6 0x755d54 in tx_DHMR118V_GetCl1ConQualNINOS+0x434 ()
#7 0x61b504 in tx_DHMR118V_GetCl1ConQualNINOS_msr+0x310 ()
#8 0x26d038 in op541_ssr+0x4a0 ()
#9 0xc28fcfe4 in rpc__cn_call_executor+0x464 () from /usr/lib/libdcekt.1
#10 0xc28d2c44 in cthread_call_executor+0x16c () from /usr/lib/libdcekt.1
#11 0xc003b168 in __pthread_body+0x44 () from /usr/lib/libpthread.1
#12 0xc00449ec in __pthread_start+0x14 () from /usr/lib/libpthread.1


Link line:
=========
aCC -o Our_app -Aa -w -Dhpux +DAportable +inst_implicit_include +inst_all +inst_used +inst_v
-D_HPUX_SOURCE -D_REENTRANT -D_POSIX_D10_THREADS +DD32 -DRWDEBUG=1 -DGEN_TEST_LEVEL=4
-I.
-I[bunch of other include dirs]
-L.
-L../lib
-L[bunch of other library paths]
-Wl,-Bimmediate -Wl,-Bnonfatal
-Wl,+allowdups
sybesql.o ts_DhmUpcall.o cdo_dhm.o \
/cm_rel12/product/TIS/TISBUILD/obj/ts/dis_DhmServer.o
../idl/_dhm_svr_sstub.o
../idl/_dhm_svr_cstub.o
../idl/dhm_svr_client.o
../idl/dhm_svr_manager.o
libgen_dhm.a \
liblcdo.a
-l[list of application specific archives]
-lrwtool_mt_posix -lrwmny_mt_posix -lqabtnca
-lm -l++ -ldcekt -lxadtm -lct_r -lcs_r -lcomn_r
-lkms -lpthread -lwrapper -lintl_r -ltcl_r
-lEncServer -lEncClient -lEncina



Compiler versions
=================
aCC: HP ANSI C++ B3910B A.03.52
cc: product-id C-ANSI-C B.11.11.08



Ldd output
==========
/usr/lib/libdld.2 => /usr/lib/libdld.2
/usr/lib/libc.2 => /usr/lib/libc.2
/usr/lib/libdld.2 => /usr/lib/libdld.2
/usr/lib/libc.2 => /usr/lib/libc.2
/usr/lib/libcl.2 => /usr/lib/libcl.2
/usr/lib/libisamstub.1 => /usr/lib/libisamstub.1
/usr/lib/libdld.2 => /usr/lib/libdld.2
/usr/lib/libCsup.2 => /usr/lib/libCsup.2
/usr/lib/libstream.2 => /usr/lib/libstream.2
/usr/lib/libstd.2 => /usr/lib/libstd.2
/opt/encina/lib/libEncina.sl => /opt/encina/lib/libEncina.sl
/opt/encina/lib/libEncClient.sl => /opt/encina/lib/libEncClient.sl
/opt/encina/lib/libEncServer.sl => /opt/encina/lib/libEncServer.sl
/opt/sybase/sybase1251/OCS-12_5/lib/libtcl_r.sl => /opt/sybase/sybase1251/OCS-12_5/lib/libtcl_r.sl
/opt/sybase/sybase1251/OCS-12_5/lib/libintl_r.sl => /opt/sybase/sybase1251/OCS-12_5/lib/libintl_r.sl
/usr/lib/libpthread.1 => /usr/lib/libpthread.1
/opt/sybase/sybase1251/OCS-12_5/lib/libcomn_r.sl => /opt/sybase/sybase1251/OCS-12_5/lib/libcomn_r.sl
/opt/sybase/sybase1251/OCS-12_5/lib/libcs_r.sl => /opt/sybase/sybase1251/OCS-12_5/lib/libcs_r.sl
/opt/sybase/sybase1251/OCS-12_5/lib/libct_r.sl => /opt/sybase/sybase1251/OCS-12_5/lib/libct_r.sl
/opt/sybase/sybase1251/OCS-12_5/lib/libxadtm.sl => /opt/sybase/sybase1251/OCS-12_5/lib/libxadtm.sl
/usr/lib/libdcekt.1 => /usr/lib/libdcekt.1
/vob/dce/external/hp700_ux1111/usr/lib/libpthread.1 => /usr/lib/libpthread.1
/usr/lib/libm.2 => /usr/lib/libm.2


Patch levels
============
June 2003 patch bundle is installed.
DCE patch PHSS_28387 is NOT yet installed.
[can send full swlist output if required]

7 REPLIES 7
Highlighted
Honored Contributor

Re: Illegal instruction in pthread_mutex_lock

That stack backtrace sure looks innocent. It is difficult to imagine how you could get an illegal instruction in a read-only address inside pthread_mutex_lock. Perhaps gdb is showing you a different thread than the one that caused the SIGILL. You could use "info threads" and "thread N" and "bt" for each of the N threads to see if one of the other threads has a more suspicious backtrace.
Highlighted
Esteemed Contributor

Re: Illegal instruction in pthread_mutex_lock

how about checking what actual instruction is there at the faulting address ?
 
--
ranga
(i work for hpe)

Accept or Kudo

Highlighted
Occasional Advisor

Re: Illegal instruction in pthread_mutex_lock

Some more debug info:

When I attach gdb to the process while still runing, it will crash with SIGBUS.

A bit info on the last stack frame:

(gdb) info frame
Stack level 0, frame at 0x7ad2df80:
pcoqh = 0xc003de64 in pthread_mutex_lock; saved pcoqh 0xc021c438
called by frame at 0x7ad2df40
Arglist at 0x7ad2df80, args:
Locals at 0x7ad2df80, Previous frame's sp is 0x7ad2df80
Saved registers:
rp at 0x7ad2df6c, r3 at 0x7ad2df98, r4 at 0x7ad2df9c, r5 at 0x7ad2dfa0, r6 at 0x7ad2dfa4, r7 at 0x7ad2dfa8, r8 at 0x7ad2dfac, r9 at 0x7ad2dfb0, r10 at 0x7ad2dfb4, r11 at 0x7ad2dfb8, r12 at 0x7ad2dfbc, r13 at 0x7ad2dfc0, r14 at 0x7ad2dfc4, r15 at 0x7ad2dfc8, r16 at 0x7ad2dfcc, fr8R at 0x7ad2df88, fr9 at 0x7ad2df90, fr12 at 0x7ad2df80
(gdb) x/x 0x7ad2df80
0x7ad2df80: 0x00000000
(gdb) x/x 0x7ad2dfff
0x7ad2dfff: Error accessing memory address 0x7ad2dfff: Bad address.
(gdb) x/x $sp
0x7ad2e040: Error accessing memory address 0x7ad2e040: Bad address.
(gdb) x/x 0x7ad2df40
0x7ad2df40: 0x00000000
(gdb) x/x 0x7ad2df80
0x7ad2df80: 0x00000000

So the last stack pointer is not correct anymore.

That tends me to say I ran out of stack size (maxssiz) or the pthread stack size (pthread_setattr or something similar).
Are there any checks that confirms my thoughts before I ask the sytem admins to tweak the kernel?
Highlighted
Honored Contributor

Re: Illegal instruction in pthread_mutex_lock

A SIGBUS makes perfect sense for a stack overflow. You will get that trying to access the guard page after the end of a thread stack. The default thread stack size is only 64K. Some of the functions in the fatal stack trace may have a large amount of local variables that use up that space.
You can check that by adding a call to pthread_attr_setstacksize to set the thread attributes passed into pthread_create.
You can take a shortcut by calling the non-portable pthread_default_stacksize_np function in main before you start any new threads. That way you don't have to find and change every place that calls pthread_create.

I wouldn't bother the system admins with changing the maxssiz parameter for this. That maxssiz limit only applies to the first thread. Its default is much larger then the default stack limit for other threads. Exceeding it would produce an error message about stack growth failure rather than a SIGBUS.
Highlighted
Occasional Advisor

Re: Illegal instruction in pthread_mutex_lock

The problem was in the area suggested by your posts: We were running out of stack space.

I originally tried to solve this by doing a call to pthread_attr_setstacksize before each pthread_create() call, but that didn't solve it.

It turned out that that there were not only explicit calls to pthread_create(), but also calls to rpc_server_listen(), which is a DCE api. Apperently, rpc_server_listen makes calls to pthread_create().
I added pthread_default_stacksize_np() right in the beginning of my program and then it goes through.

Thanks for everybodies help in these.
Highlighted
Occasional Advisor

Re: Illegal instruction in pthread_mutex_lock

The problem was in the area suggested by your posts: We were running out of stack space.

I originally tried to solve this by doing a call to pthread_attr_setstacksize before each pthread_create() call, but that didn't solve it.

It turned out that that there were not only explicit calls to pthread_create(), but also calls to rpc_server_listen(), which is a DCE api. Apperently, rpc_server_listen makes calls to pthread_create().
I added pthread_default_stacksize_np() right in the beginning of my program and then it goes through.

Thanks for your help.
Highlighted
Occasional Advisor

Re: Illegal instruction in pthread_mutex_lock

See my last message