Operating System - Linux
1755721 Members
3176 Online
108837 Solutions
New Discussion юеВ

Core Dump with multi-threaded application...

 
SOLVED
Go to solution
Sanjay Sutar
Frequent Advisor

Re: Core Dump with multi-threaded application...

>>(I hope you were given all of the user's >>shlibs? See gdb's packcore command.)

I am analyzing the core file on my test machine where it works fine. So its for sure has all the shlib files.

>>You could look at stack traces for the other threads to see if they are fiddling with NIS_PWDIR_

forgive my ignorance , but how do I do that?

>>to see if there is a pattern for that $ nm -pxv HpuxAgent

nothing like pattern of zero


>>(gdb) x /128x 0x400014b0-64*4

Attached is the dump of raw memory...

And this application works fine on most of our test machine but gives core on the customer machine. Is there something specific to that machine I should look for?

Beacause when I run same application both NIS_PWDIR_ and SHADOW are assigned default values as;

string Context::SHADOW = "shadow";
string Context::NIS_PWDIR_ = "/etc/";

And this is not conditional code, and should always initialize both variables. So how come they got zeroed on the customer machine.

Is there anything else I should look at?

As I mentioned earlier , when we had similar problem earlier , we had to rebuilt ICU lib files as they were built with older gcc.

But this time all the component of application are built with same gcc 3.4.2

Any pointers/info is much appreciated.
Dennis Handly
Acclaimed Contributor

Re: Core Dump with multi-threaded application...

>>(I hope you were given all of the user's shlibs? See gdb's packcore command.)

>I am analyzing the core file on my test machine where it works fine. So its for sure has all the shlib files.

That's not what I mean. You must have an EXACT copy of every shlib your customer has. Including libc, etc. It is easier for your customer to just give you what they have with packcore, rather than play guessing games. That may be why your stack traces aren't complete.

>>You could look at stack traces for the other threads to see if they are fiddling with NIS_PWDIR_

>forgive my ignorance, but how do I do that?

To dump all threads in gdb:
(gdb) info thread
(gdb) thread apply all bt

Then look at the functions in each trace to see where they all are. See if there is a pattern the next time it happens. See if any function is accessing any variables near in memory to NIS_PWDIR_.

>>to see if there is a pattern for that $ nm -pxv HpuxAgent

This gives the mangled names of variables in memory order.

>nothing like pattern of zero
>Attached is the dump of raw memory...

There are a bunch of variables that are 0. If they are strings, they are also bad:
0x40001404 0x00000000 0x00000000 0x00000000
0x40001410 <_ZN7Context8DEV_PTS_E>: 0x00000000 0x0 0x0 0x0
0x40001420 <_ZN7Context5PROC_E>: 0x00000000 0x0 0x0 0x0
0x40001430 <_ZN7Context9USR_SBIN_E>: 0x00000000 0x0 0x0 0x0
0x40001440 <_ZN7Context8VAR_RUN_E>: 0x00000000 0x0 0x0 0x0
0x40001450 <_ZN7Context9CRON_DENYE>: 0x00000000 0x0 0x0 0x0
0x40001460 <_ZN7Context8CHPWDAGEE>: 0x00000000 0x0 0x0 0x0
0x40001470 <_ZN7Context11SYSLOG_CONFE>: 0x00000000 0x0 0x0 0x0
0x40001480 <_ZN7Context4GREPE>: 0x00000000 0x0 0x0 0x0
0x40001490 <_ZN7Context7USERMODE>: 0x00000000 0x0 0x0 0x0
0x400014a0 <_ZN7Context22NISMAP_NETGROUP_BYUSERE>: 0x00000000 0x0 0x0 0x0

Then your variable and 2 more words of 0:
0x400014b0 <_ZN7Context10NIS_PWDIR_E>: 0x00000000 0x0 0x0 0xffff0112

(If there are other variables in this area, they aren't listed if static or they don't start on the first address of each line. You need to match up with the nm(1) output.)

>Is there something specific to that machine I should look for?

Nothing is obvious, you'll set it when you find it. ;-) (How many CPUs?)

>So how come they got zeroed on the customer machine.

A bad pointer? Something wasn't locking a critical resource like a pointer?

>As I mentioned earlier, when we had similar problem earlier, we had to rebuilt ICU lib files as they were built with older gcc.

A mismatch in layouts is a possible cause.
What does "info shared" show?

>Any pointers/info is much appreciated.

There isn't much I can suggest except to try to build debugging tools/tracing into your product. Or see if you can get the customer to repeat it. It seems you were able to do that from your original message??

As I said, linking with -z may catch the error sooner.

You are asking questions that are getting into real time and money.
Sanjay Sutar
Frequent Advisor

Re: Core Dump with multi-threaded application...

Thanks Dennis for your continued assistance.

I had some luck to reproduce the issue on one of my HPUX box.

I started debugging and was constantly watching the static global variables and they were "zero" from the beginning, I mean from start of the program, in the main function.

Later I did same thing on the box where the application works fine and all the static global variables have proper non-zero values.

And all these static variables are initilized with hard coded values at global declaration area (above all function definition)...


Dennis Handly
Acclaimed Contributor

Re: Core Dump with multi-threaded application...

>I started debugging and was constantly watching the static global variables and they were "zero" from the beginning, I mean from start of the program, in the main function.

These variables should be zero at the very start of executation since they are runtime initialized. But sometime before main, they should be initialized to their proper value.

For PA32, aC++ used to initialized them on a call to _main inside of main. I think g++ does it sooner.
Sanjay Sutar
Frequent Advisor

Re: Core Dump with multi-threaded application...

Dennis,
I have written a small sample program that just simulate my situation and it also gives the core dump in same way.

Attached is the sample program.

Here are the commands used to build and link;

g++ -g -pthread -c test.cpp

g++ -o TestStaticString test.o -pthread -L /var/home/HPUXAGENT/lib -L/usr/local/lib -lpthread -lcrypt -lxnet -lnsl -lsec -lstdc++ -Xlinker +b -Xlinker ../lib:. -Xlinker +s -z


Here is the ldd output of the sample exe on build machine.

$ ldd TestStaticString
/usr/lib/libc.2 => /usr/lib/libc.2
/usr/lib/libdld.2 => /usr/lib/libdld.2
/usr/lib/libc.2 => /usr/lib/libc.2
/var/home/HPUXAGENT/lib/libgcc_s.sl => /var/home/HPUXAGENT/lib/libgcc_s.sl
/usr/lib/libc.2 => /usr/lib/libc.2
/usr/lib/libm.2 => /usr/lib/libm.2
/usr/local/lib/libstdc++.sl.6 => /usr/local/lib/libstdc++.sl.6
/usr/lib/libc.2 => /usr/lib/libc.2
/scratch/njs/pkgbuild/3.3.1/hpux-11/gcc-3.4.2-b/gcc/libgcc_s.sl => /usr/local/lib/libgcc_s.sl
/usr/lib/libc.2 => /usr/lib/libc.2
/usr/lib/libm.2 => /usr/lib/libm.2
/usr/lib/libsec.2 => /usr/lib/libsec.2
/usr/lib/libm.2 => /usr/lib/libm.2
/usr/lib/libnsl.1 => /usr/lib/libnsl.1
/usr/lib/libxti.2 => /usr/lib/libxti.2
/usr/lib/libxnet.2 => /usr/lib/libxnet.2
/usr/lib/libxti.2 => /usr/lib/libxti.2
/usr/lib/libpthread.1 => /usr/lib/libpthread.1


And here is the ldd output of same sample exe on the test machine where it gives core dump.

bash-2.04# ldd TestStaticString
=>
/usr/lib/libc.2 => ../lib/libc.2
/usr/lib/libdld.2 => ../lib/libdld.2
/usr/lib/libc.2 => ../lib/libc.2
/var/home/HPUXAGENT/lib/libgcc_s.sl => /home/sanjay/agent/lib/libgcc_s.sl
/usr/lib/libc.2 => ../lib/libc.2
/usr/lib/libm.2 => ../lib/libm.2
/usr/local/lib/libstdc++.sl.6 => /home/sanjay/agent/lib/libstdc++.sl.6
/usr/lib/libc.2 => ../lib/libc.2
/scratch/njs/pkgbuild/3.3.1/hpux-11/gcc-3.4.2-b/gcc/libgcc_s.sl => /home/sanjay/agent/lib/libgcc_s.sl
/usr/lib/libm.2 => ../lib/libm.2
/usr/lib/libsec.2 => ../lib/libsec.2
/usr/lib/libm.2 => ../lib/libm.2
/usr/lib/libnsl.1 => ../lib/libnsl.1
/usr/lib/libxti.2 => ../lib/libxti.2
/usr/lib/libxnet.2 => ../lib/libxnet.2
/usr/lib/libxti.2 => ../lib/libxti.2
/usr/lib/libpthread.1 => ../lib/libpthread.1

bash-2.04# ./TestStaticString
Main started
initialized string variable
Calling getfullname
Segmentation fault (core dumped)
bash-2.04#


ANd the dump analysis;

bash-2.04# gdb TestStaticString core
HP gdb 5.2 for PA-RISC 1.1 or 2.0 (narrow), HP-UX 11.00
and target hppa1.1-hp-hpux11.00.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 5.2 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..

warning: core file may not match specified executable file.
Core was generated by `TestStaticStr'.
Program terminated with signal 11, Segmentation fault.
SEGV_UNKNOWN - Unknown Error
#0 _ZNSsC1ERKSs ()
at /scratch/njs/pkgbuild/3.3.1/hpux-11/gcc-3.4.2-b/hppa2.0w-hp-hpux11.11/libstdc++-v3/include/bits/basic_string.h:182
182 /scratch/njs/pkgbuild/3.3.1/hpux-11/gcc-3.4.2-b/hppa2.0w-hp-hpux11.11/libstdc++-v3/include/bits/basic_string.h: No such file or directory.
in /scratch/njs/pkgbuild/3.3.1/hpux-11/gcc-3.4.2-b/hppa2.0w-hp-hpux11.11/libstdc++-v3/include/bits/basic_string.h
(gdb) where
#0 _ZNSsC1ERKSs ()
at /scratch/njs/pkgbuild/3.3.1/hpux-11/gcc-3.4.2-b/hppa2.0w-hp-hpux11.11/libstdc++-v3/include/bits/basic_string.h:182
#1 0x3b84 in _ZStplIcSt11char_traitsIcESaIcEESbIT_T0_T1_ERKS6_S8_ ()
at /usr/local/include/c++/3.4.2/bits/basic_string.h:1984
(gdb) x/4x *(void**)($sp-0x12c)
0x40001078 <_ZN7static14nameE>: 0x00000000 0x00000000 0xffff0106 0x00000100

Dennis Handly
Acclaimed Contributor

Re: Core Dump with multi-threaded application...

>I have written a small sample program that just simulate my situation and it also gives the core dump in same way.

Then setup your watch point.

>And here is the ldd output of same sample exe on the test machine where it gives core dump.

Why don't you point to where it is different? libstdc++.sl.6 & libgcc_s.sl?

(gdb) x/4x *(void**)($sp-0x12c)
0x40001078 <_ZN7static14nameE>: 0x00000000 0x00000000 0xffff0106 0x00000100

So, just put a watchpoint here.
You can also do that nm -pxv to see what variables are around yours.

If you are correct that these string variable are never runtime initialized, you have found a g++ problem. In my version 3.3.2, I have no problems.
Sanjay Sutar
Frequent Advisor

Re: Core Dump with multi-threaded application...

Here is the output ... when I print those variables; just after running the program...


bash-2.04# gdb TestStaticString
HP gdb 5.2 for PA-RISC 1.1 or 2.0 (narrow), HP-UX 11.00
and target hppa1.1-hp-hpux11.00.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 5.2 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
(gdb) watch static1::name
warning: can't do that without a running program; try "break main", "run" first
(gdb) b main
Breakpoint 1 at 0x36f4: file test.cpp, line 29 from /test/TestStaticString.
(gdb) r
Starting program: /test/TestStaticString

Breakpoint 1, main () at test.cpp:29
warning: Source file is more recent than executable TestStaticString.

29 static1* s = new static1();
(gdb) watch static1::name
Hardware watchpoint 2: static1::name
(gdb) p static1::name
$1 = {static npos = 4294967295,
_M_dataplus = {> = {> = {}, }, _M_p = 0x0}}
(gdb) p static1::surname
$2 = {static npos = 4294967295,
_M_dataplus = {> = {> = {}, }, _M_p = 0x0}}


Sanjay Sutar
Frequent Advisor

Re: Core Dump with multi-threaded application...

And on the machine ,where is works ,

(gdb) b main
Breakpoint 1 at 0x36f4: file test.cpp, line 29 from /test/TestStaticString.
(gdb) r
Starting program: /test/TestStaticString
warning: Load module /usr/local/lib/libstdc++.sl.6 has been stripped


Breakpoint 1, main () at test.cpp:29
29 static1* s = new static1();
(gdb) p static1::name
$1 = {static npos = ,
_M_dataplus = {> = {> = {}, },
_M_p = 0x40004aec "sanjay"}}
(gdb) p static1::surname
$2 = {static npos = ,
_M_dataplus = {> = {> = {}, },
_M_p = 0x40004b14 "sutar"}}
(gdb)


So its for sure that the static variables are not initilized.

>>If you are correct that these string >>variable are never runtime initialized, >>you have found a g++ problem

Then I wonder ,how this works on some other machine


>>Why don't you point to where it is different? libstdc++.sl.6 & libgcc_s.sl?

I tried hard to link with my version of both files , but every time the linker links with libstdc++ and libgcc form /usr/local/lib :(



Sanjay Sutar
Frequent Advisor

Re: Core Dump with multi-threaded application...

Thanks for your help Dennis ,
I am going to post this on langauage forum...

Dennis Handly
Acclaimed Contributor

Re: Core Dump with multi-threaded application...

>>Why don't you point to where it is different? libstdc++.sl.6 & libgcc_s.sl?

>I tried hard to link with my version of both files , but every time the linker links with libstdc++ and libgcc from /usr/local/lib :(

(By point, I meant add a comment saying "look here <<<".)

You can use chatr(1) to see what the paths are. If you embed a path, it can point to somewhere different. And you can also use SHLIB_PATH to change it.

>I am going to post this on langauage forum...

I talked with our expert here and he didn't recall any static init errors lately. I sent him a pointer to this thread.