Operating System - HP-UX
1753645 Members
6339 Online
108798 Solutions
New Discussion юеВ

Re: JVM crashing when calling pthread_num_processors_np

 
SOLVED
Go to solution
Jose M. del Rio
Frequent Advisor

JVM crashing when calling pthread_num_processors_np

Hi, JVM (1.8) is getting signal 11 on startup when calling pthread_num_processors_np.

No core file is written. I just have stack trace:

----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
siehDumpStackTrace( call kgdsdst() 9FFFFFFFFFFEAD50 ?
)+192 000000001 ?
9FFFFFFFBF718B58 ?
siehWriteExceptionR call siehDumpStackTrace( C000000000000713 ?
eport()+336 ) 9FFFFFFFBF772C80 ?
600000000002B650 ?
40000000006C6FD0 ?
9FFFFFFFFFFEAD78 ?
siehjmpterm()+1216 call siehWriteExceptionR 60000000000815E0 ?
eport() 000000000 ? 00000510B ?
600000000002B650 ?
C000000000000A19 ?
<kernel> call siehjmpterm() 00000000B ?
600000000002B650 ?
C000000000000187 ?
E000000120757480 ?
ENTER_PTHREAD_LIBRA signal <kernel> 9FFFFFFFFFFEB000 ?
RY_FUNC()+353 20000000B ? 000000000 ?
__pthread_num_proce call ENTER_PTHREAD_LIBRA 9FFFFFFFBF5FAFB8 ?
ssors_np()+160 RY_FUNC() C000000000000207 ?
C0000000002A4EA0 ?
_ZN2os22active_proc call __pthread_num_proce
essor_countEv()+64 ssors_np()
_ZN2os23is_server_c call _ZN2os22active_proc
lass_machineEv()+19 essor_countEv()
2
_ZN9Arguments23sele call _ZN2os23is_server_c 9FFFFFFFBF1A95E0 ?
ct_gc_ergonomically lass_machineEv() C000000000000207 ?
Ev()+32 C00000002FB612C0 ?
_ZN9Arguments9selec call _ZN9Arguments23sele
t_gcEv()+160 ct_gc_ergonomically
Ev()
_ZN9Arguments20set_ call _ZN9Arguments9selec 000000288 ?
ergonomics_flagsEv( t_gcEv() C00000000000058D ?
)+64
_ZN9Arguments10appl call _ZN9Arguments20set_ 9FFFFFFFBF1A95E0 ?
y_ergoEv()+64 ergonomics_flagsEv( C00000000000060E ?
)
_ZN7Threads9create_ call _ZN9Arguments10appl 9FFFFFFFBF1A95E0 ?
vmEP14JavaVMInitArg y_ergoEv() C000000000000B9B ?
sPb()+496 C00000003189C3F0 ?
00001C837 ?
JNI_CreateJavaVM()+ call _ZN7Threads9create_ 9FFFFFFFBF0DDE80 ?
192 vmEP14JavaVMInitArg 9FFFFFFFFFFF0110 ?
sPb() 9FFFFFFFBF1A95E0 ?
C000000000000A99 ?

...

And tusc showing the same issue:

10:46:44 [/weblogic12/][20747]{334471} mpctl(MPC_GETNUMSPUS, 0, 0) .................................................................... [entry]
10:46:44 [/weblogic12/][20747]{334471} <0.000009> mpctl(MPC_GETNUMSPUS, 0, 0) ......................................................... = 4
10:46:44 [/weblogic12/][20747]{334471} Received signal 11, SIGSEGV, in user mode, [caught], partial siginfo
10:46:44 [/weblogic12/][20747]{334471} Siginfo: si_code: SEGV_ACCERR, faulting address: 0x30, si_errno: 0
10:46:44 [/weblogic12/][20747]{334471} PC: 00000001000000a0.0 break.m 0x16000

I've found this thread:

http://community.hpe.com/t5/Languages-and-Scripting/JVM-Crashes-on-HPPA-64-Bit-and-HPIA-64-Bit/td-p/4411689

but I'm not sure what the solution was / it helps in my case.

Any idea would be greatly appreciated.

Regards.

11 REPLIES 11
ranganath ramachandra
Esteemed Contributor

Re: JVM crashing when calling pthread_num_processors_np

Is this an application you compiled/built ? I see a similar problem reported here which I could reproduce very easily:
http://community.hpe.com/t5/Languages-and-Scripting/received-SIGSEGV-while-executing-an-exe/td-p/4785614
In this case, using gcc, the problem goes away when the program is built linked with libpthread. If you're compiling and linking with aCC, the recommended way is to use the "-mt" compiler flag.

If this is not an application you built yourself, you might have to contact the vendor about it.

In the post you linked, Dennis seems to suggest that the thread stack size be increased.

 
--
ranga
[i work for hpe]

Accept or Kudo

Jose M. del Rio
Frequent Advisor

Re: JVM crashing when calling pthread_num_processors_np

Hi ranganath,
very interesting.
I've tried the testcase (with or without linking with libpthread) and the SIGSEGV issue is reproduced.
So we know the implication is true this way:
not linked with libpthread => pthread_num_processors_np SIGSEGV
As far as I know, using footprints or ldd, I can see whether the testcase binary was linked or not with libpthread, When linked:

footprints crea_JVM
Shared library: /lib/hpux64/libpthread.so.1 (system library - not scanned) 

ldd -v crea_JVM | grep pthread
find library=libpthread.so.1; required by crea_JVM
libpthread.so.1 => /lib/hpux64/libpthread.so.1

The program failing in my case is frmweb (Oracle Forms 12c). We know it is raising:
pthread_num_processors_np SIGSEGV
but we can't be sure if the reverse implication
pthread_num_processors_np SIGSEGV => not linked with libpthread
applies in this  case.

If my assumption about footprints / ldd is true, I can see frmweb IS linked with libpthread:

footprints frmweb
Shared library: /usr/lib/hpux64/libpthread.so.1 (system library - not scanned)

ldd -v frmweb | grep pthread
find library=libpthread.so.1; required by frmweb
libpthread.so.1 => /usr/lib/hpux64/libpthread.so.1

, so should we look for another cause? (e.g., how could I determine if stack overflow is happening?).

Thanks a lot.

P.S.:
I have opened a Service Request with Oracle, but the support engineer looks quite lost at the moment.

ranganath ramachandra
Esteemed Contributor

Re: JVM crashing when calling pthread_num_processors_np

curioser, etc

Here's another report of a crash in ENTER_PTHREAD_LIBRARY_FUNC:
http://community.hpe.com/t5/Languages-and-Scripting/Problem-with-lpthread/td-p/3273067

Whether a stack overflow is happening, you can try to figure out from examining the disassembly in ENTER_PTHREAD_LIBRARY_FUNC up to the point where SIGSEGV is received. Doing that with the testcase previously mentioned seems to show that there was a problem in trying to access thread local storage (TLS) - this seems to match with the last post on the thread I have linked above in this post. Also, building that previous testcase with "-lc" before "-lpthread" reproduces the crash. So, there is some vague indication that the problem may be caused by bad link order or memory corruption.

For Oracle or HPE to be able to triage/debug the problem, you can help by providing a packcore tarball created with gdb. Launch the application from within gdb, use the "dump" command to generate a core dump once this SIGSEGV occurs (make sure that it is this instance of SIGSEGV). With this done, get gdb to load the core file with the "core-file" command and then use the "packcore" command to create the packcore tarball.

bash-2.05b# gdb testJNI 
...
(gdb) r
Starting program: /tmp/ranga/itrc1/testJNI 

Program received signal SIGSEGV, Segmentation fault
  si_code: 2 - SEGV_ACCERR - Invalid Permissions for object.
0x9fffffffebe39f60:1 in ENTER_PTHREAD_LIBRARY_FUNC+0x61 ()
   from /usr/lib/hpux64/libpthread.so.1
(gdb) dump
Dumping core to the core file core.28689
(gdb) core-file core.28689 
A program is being debugged already.  Kill it? (y or n) y

Core was generated by `testJNI'.

#0  0x9fffffffebe39f60:1 in ENTER_PTHREAD_LIBRARY_FUNC+0x61 ()
   from /usr/lib/hpux64/libpthread.so.1
(gdb) packcore 
The core file has been added to '/tmp/ranga/itrc1/packcore.tar'
Do you want to remove the original core file?(y or n) y
The core file has been removed.
(gdb) quit

 

 
--
ranga
[i work for hpe]

Accept or Kudo

ranganath ramachandra
Esteemed Contributor
Solution

Re: JVM crashing when calling pthread_num_processors_np

I was able to reproduce a somewhat similar crash with a much simpler case, not involving java/JVM.

#include <pthread.h>
int main ()
{
  return pthread_num_processors_np();
}
[ crust 1 ] $ cat lib.c
void func() {}
[ crust 1 ] $ cc lib.c -b -o liblib.so -lpthread
[ crust 1 ] $ cc pnp.c -L. -llib -o pnp
[ crust 1 ] $ ./pnp
Segmentation fault (core dumped)
[ crust 1 ] $ pstack pnp core
core:   pnp
--------------------------------  lwpid : 932124   -------------------------------
 0: 60000000c0155961 : ENTER_PTHREAD_LIBRARY_FUNC() + 0x161 (/usr/lib/hpux32/libpthread.so.1)
 1: 60000000c01a0d20 : pthread_num_processors_np() + 0xa0 (/usr/lib/hpux32/libpthread.so.1)
 2: 00000000040008a0 : main() + 0x40 (pnp)
 3: 60000000c008b920 : main_opd_entry() + 0x50 (/usr/lib/hpux32/dld.so)
[ crust 1 ] $ ldd -v pnp

pnp:

  find library=liblib.so; required by pnp
        liblib.so =>    ./liblib.so

  find library=libc.so.1; required by pnp
        libc.so.1 =>    /usr/lib/hpux32/libc.so.1

  find library=libpthread.so.1; required by ./liblib.so
        libpthread.so.1 =>      /usr/lib/hpux32/libpthread.so.1

  find library=libdl.so.1; required by /usr/lib/hpux32/libc.so.1
        libdl.so.1 =>   /usr/lib/hpux32/libdl.so.1
[ crust 1 ] $ cc pnp.c -L. -llib -o pnp -lpthread
[ crust 1 ] $ ./pnp ; echo $?
8
[ crust 1 ] $ ldd -v pnp
pnp:
  find library=liblib.so; required by pnp
        liblib.so =>    ./liblib.so

  find library=libpthread.so.1; required by pnp
        libpthread.so.1 =>      /usr/lib/hpux32/libpthread.so.1

  find library=libc.so.1; required by pnp
        libc.so.1 =>    /usr/lib/hpux32/libc.so.1

  find library=libdl.so.1; required by /usr/lib/hpux32/libc.so.1
        libdl.so.1 =>   /usr/lib/hpux32/libdl.so.1

At least this problem was because of library load order.

 
--
ranga
[i work for hpe]

Accept or Kudo

Jose M. del Rio
Frequent Advisor

Re: JVM crashing when calling pthread_num_processors_np

Aha! This is getting more and more interesting!

The failiing program was indeed linked -lc before -lpthread:

cc +DD64 -o frmweb -L/weblogic12/Oracle_Home/lib/ -L/usr/lib/hpux64 -L/weblogic12/Oracle_Home/jdk//jre/lib/IA64W/ -L/weblogic12/Oracle_Home/jdk//jre/lib/IA64W/server/ -L/weblogic12/Oracle_Home/jdk//jre/lib/IA64W/native_threads/ \
/weblogic12/Oracle_Home/lib/s0nnmain.o \
/weblogic12/Oracle_Home/forms/lib/ssliftabw.o \
/weblogic12/Oracle_Home/forms/lib/ifzxtb.o \
/weblogic12/Oracle_Home/forms/lib/sixn.o \
/weblogic12/Oracle_Home/forms/lib/sixp.o \
\
-liplsn \
/weblogic12/Oracle_Home/forms/lib/istiif.o \
-u iiflmp -u iifget -u iiffcb -u iiflog -u iiflov -u iifgdl -u iifatr -u iifeue -u iifwcnew -u iifwru \
-libfrmw -liffw -lifcw -lijcw -liifw -lipc -lipfw -lipc -limffrmw -limc -liwfw -liwcw -litw -licw -lirm -lsosdw -lihm -libfrmw -lixw -liffw -lifcw -lijcw -lipfw -lipc -liicw -liiiw -liwfw -liwcw -liqw -litw -licw -lipc -lsosdw -limffrmw -limc -liplsn \
-lnn -lzrc -lvgsw -ldeb -lca -lmmoi -lmmcm -luicm -lrem -luiimg -ltio -luc -lutt -luicm -lrod -lror -lros -lrod -lror -lros -lrod -lutc -lutj -lutl -lutsl -lpls11 -lplp11 -lplc11 -lpls11 -lplp11 -lslax11 -lsql11 -dynamic /weblogic12/Oracle_Home/lib/nautab.o /weblogic12/Oracle_Home/lib/naeet.o /weblogic12/Oracle_Home/lib/naect.o /weblogic12/Oracle_Home/lib/naedhs.o -lm -lc -lclntsh `cat /weblogic12/Oracle_Home/lib/ldflags` -lnsgr11 -lnzjs11 -ln11 -lnl11 -lnro11 `cat /weblogic12/Oracle_Home/lib/ldflags` -lnsgr11 -lnzjs11 -ln11 -lnl11 -lnnz11 -lzt11 -lztkg11 -lclient11 -lnnetd11 -lvsn11 -lcommon11 -lgeneric11 -lmm -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 `cat /weblogic12/Oracle_Home/lib/ldflags` -lnsgr11 -lnzjs11 -ln11 -lnl11 -lnro11 `cat /weblogic12/Oracle_Home/lib/ldflags` -lnsgr11 -lnzjs11 -ln11 -lnl11 -lclient11 -lnnetd11 -lvsn11 -lcommon11 -lgeneric11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 -lclient11 -lnnetd11 -lvsn11 -lcommon11 -lgeneric11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 `cat /weblogic12/Oracle_Home/lib/sysliblist` -lm `cat /weblogic12/Oracle_Home/lib/sysliblist` -ldl -lpthread -lm -lpthread -lc -lrt -lpthread -lc -lsnls11 -lpthread -lc

 

Also:

ppoas1:/root#ldd -v /weblogic12/Oracle_Home/bin/frmweb

/weblogic12/Oracle_Home/bin/frmweb:

...
find library=libc.so.1; required by /weblogic12/Oracle_Home/bin/frmweb
libc.so.1 => /usr/lib/hpux64/libc.so.1

...
find library=libpthread.so.1; required by /weblogic12/Oracle_Home/lib//libclntsh.so.11.1
libpthread.so.1 => /usr/lib/hpux64/libpthread.so.1

So, as per the link you mentioned and your tests, it should ┬┐systematically? fail when calling pthread_num_processors_np?

 

ranganath ramachandra
Esteemed Contributor

Re: JVM crashing when calling pthread_num_processors_np


@Jose M. del Rio wrote:

Aha! This is getting more and more interesting!

The failiing program was indeed linked -lc before -lpthread:

cc +DD64 -o frmweb -L/weblogic12/Oracle_Home/lib/ -L/usr/lib/hpux64 -L/weblogic12/Oracle_Home/jdk//jre/lib/IA64W/ -L/weblogic12/Oracle_Home/jdk//jre/lib/IA64W/server/ -L/weblogic12/Oracle_Home/jdk//jre/lib/IA64W/native_threads/ \
/weblogic12/Oracle_Home/lib/s0nnmain.o \
/weblogic12/Oracle_Home/forms/lib/ssliftabw.o \
/weblogic12/Oracle_Home/forms/lib/ifzxtb.o \
/weblogic12/Oracle_Home/forms/lib/sixn.o \
/weblogic12/Oracle_Home/forms/lib/sixp.o \
\
-liplsn \
/weblogic12/Oracle_Home/forms/lib/istiif.o \
-u iiflmp -u iifget -u iiffcb -u iiflog -u iiflov -u iifgdl -u iifatr -u iifeue -u iifwcnew -u iifwru \
-libfrmw -liffw -lifcw -lijcw -liifw -lipc -lipfw -lipc -limffrmw -limc -liwfw -liwcw -litw -licw -lirm -lsosdw -lihm -libfrmw -lixw -liffw -lifcw -lijcw -lipfw -lipc -liicw -liiiw -liwfw -liwcw -liqw -litw -licw -lipc -lsosdw -limffrmw -limc -liplsn \
-lnn -lzrc -lvgsw -ldeb -lca -lmmoi -lmmcm -luicm -lrem -luiimg -ltio -luc -lutt -luicm -lrod -lror -lros -lrod -lror -lros -lrod -lutc -lutj -lutl -lutsl -lpls11 -lplp11 -lplc11 -lpls11 -lplp11 -lslax11 -lsql11 -dynamic /weblogic12/Oracle_Home/lib/nautab.o /weblogic12/Oracle_Home/lib/naeet.o /weblogic12/Oracle_Home/lib/naect.o /weblogic12/Oracle_Home/lib/naedhs.o -lm -lc -lclntsh `cat /weblogic12/Oracle_Home/lib/ldflags` -lnsgr11 -lnzjs11 -ln11 -lnl11 -lnro11 `cat /weblogic12/Oracle_Home/lib/ldflags` -lnsgr11 -lnzjs11 -ln11 -lnl11 -lnnz11 -lzt11 -lztkg11 -lclient11 -lnnetd11 -lvsn11 -lcommon11 -lgeneric11 -lmm -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 `cat /weblogic12/Oracle_Home/lib/ldflags` -lnsgr11 -lnzjs11 -ln11 -lnl11 -lnro11 `cat /weblogic12/Oracle_Home/lib/ldflags` -lnsgr11 -lnzjs11 -ln11 -lnl11 -lclient11 -lnnetd11 -lvsn11 -lcommon11 -lgeneric11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 -lclient11 -lnnetd11 -lvsn11 -lcommon11 -lgeneric11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lcore11 -lsnls11 -lnls11 -lxml11 -lcore11 -lunls11 -lsnls11 -lnls11 -lcore11 -lnls11 `cat /weblogic12/Oracle_Home/lib/sysliblist` -lm `cat /weblogic12/Oracle_Home/lib/sysliblist` -ldl -lpthread -lm -lpthread -lc -lrt -lpthread -lc -lsnls11 -lpthread -lc

It should have been linked with the "-mt" option and without "-lpthread"; libc should never be linked explicitly.

Also:

ppoas1:/root#ldd -v /weblogic12/Oracle_Home/bin/frmweb

/weblogic12/Oracle_Home/bin/frmweb:

...
find library=libc.so.1; required by /weblogic12/Oracle_Home/bin/frmweb
libc.so.1 => /usr/lib/hpux64/libc.so.1

...
find library=libpthread.so.1; required by /weblogic12/Oracle_Home/lib//libclntsh.so.11.1
libpthread.so.1 => /usr/lib/hpux64/libpthread.so.1

So, as per the link you mentioned and your tests, it should ┬┐systematically? fail when calling pthread_num_processors_np?

Yes, I expect so.


 

 
--
ranga
[i work for hpe]

Accept or Kudo

Jose M. del Rio
Frequent Advisor

Re: JVM crashing when calling pthread_num_processors_np

Thanks a lot for your help.

I'll let Oracle Support know and, if this proves to be the solution, I'll write it here.

Jose M. del Rio
Frequent Advisor

Re: JVM crashing when calling pthread_num_processors_np

Bingo!
I relinked frmweb executable in the right order and it works!
I've marked your answer as solution.
Thanks a lot for your help!
ranganath ramachandra
Esteemed Contributor

Re: JVM crashing when calling pthread_num_processors_np


I relinked frmweb executable in the right order and it works!

Using "-mt" is recommended and supported, I suggest you do that instead.

 

 
--
ranga
[i work for hpe]

Accept or Kudo