Operating System - HP-UX
1829302 Members
2585 Online
109989 Solutions
New Discussion

application dumping core during system reboot

 
SOLVED
Go to solution
Srimalik
Valued Contributor

application dumping core during system reboot

Hi,

We have an application on 11.23 which dumps core when it is started during system reboot by rc scripts.

The core file shows.

Program terminated with signal 10, Bus error.

warning: The shared libraries were not privately mapped; setting a
breakpoint in a shared library will not work until you rerun the program.

(no debugging symbols found)...(no debugging symbols found)...#0 0xc002f65c in stat64+0xbfffd8f4 () from /usr/lib/dld.sl
(gdb) bt
#0 0xc002f65c in stat64+0xbfffd8f4 () from /usr/lib/dld.sl
warning: Attempting to unwind past bad PC 0xc002f65c
#1 0xc002f658 in stat64+0xbfffd8f0 () from /usr/lib/dld.sl
#2 0xc002f658 in stat64+0xbfffd8f0 () from /usr/lib/dld.sl
(gdb)


interesting thing is that starting the application after system reboot succeeds.

Any clues?

Sri
abandon all hope, ye who enter here..
42 REPLIES 42
Dennis Handly
Acclaimed Contributor

Re: application dumping core during system reboot

This stack trace isn't helpful because of the bogus offsets. What gdb version are you using?
F Verschuren
Esteemed Contributor

Re: application dumping core during system reboot

Because you can startup after the reboot manualy the problem can be caused by 2 problems:
1 you started it up to ealy in the boot proces
2 the script was built thet it can only be started manaly, if so you properly alsow can not start it by cron...
reason normaly is that the script does not start whit someling like #!/sbin/sh
Srimalik
Valued Contributor

Re: application dumping core during system reboot


####gdb version is 3.1

ccxrthp1# /tmp/gdb/gdb -v
HP gdb 3.1 for PA-RISC 1.1 or 2.0 (narrow), HP-UX 11.00.
Copyright 1986 - 2001 Free Software Foundation, Inc.
####
gdb is showing a proper stack if I start the application manually after system reboot by the same command and force it to dump a core by " kill -ABRT 12345" command.

####

The same application is started by rc scripts without problems on another machine.
This is the first time we are seeing this issue.

Also, our application changeds its CWD after starting and if it dumps, the core should be present in that directory. It has always been the case if the application dumps a core due to some other reasons.

In this case its the core file is created on root dir. So, it make me to think that it dumps core before it actually starts executinng.

Thanks
Sri
abandon all hope, ye who enter here..
Dennis Handly
Acclaimed Contributor

Re: application dumping core during system reboot

>gdb version is 3.1

The latest is 5.7. You need to download the latest. http://www.hp.com/go/wdb

>gdb is showing a proper stack if I start the application manually

But you don't want to debug that one. :-)
What shlibs does it use?

>it make me to think that it dumps core before it actually starts executing.

That's possible. If there was a dld problem, you would expect a message on stderr. And /usr/lib is mounted.
Srimalik
Valued Contributor

Re: application dumping core during system reboot

With the new gdb, I am getting a proper stack trace.

################
Program terminated with signal 10, Bus error.

(no debugging symbols found)...#0 0xc002f65c in get_origin+0x40 () from /usr/lib/dld.sl
(gdb) bt
#0 0xc002f65c in get_origin+0x40 () from /usr/lib/dld.sl
#1 0xc001ae4c in map_shlib+0x11ac () from /usr/lib/dld.sl
#2 0xc0018a78 in form_load_graph+0x1a4 () from /usr/lib/dld.sl
#3 0xc001958c in form_load_graph+0xcb8 () from /usr/lib/dld.sl
#4 0xc0028020 in finish_dld_main+0x1024 () from /usr/lib/dld.sl
#5 0xc002b9d4 in _dld_main+0x1c8 () from /usr/lib/dld.sl
#6 0xba8c in __map_dld+0x4e4 ()
#7 0xb0cc in $START$+0xd4 ()
#8 0xc002f658 in get_origin+0x3c () from /usr/lib/dld.sl
(gdb) infor threads
Undefined command: "infor". Try "help".
(gdb) info threads
* 1 system thread 2263 0xc002f65c in get_origin+0x40 () from /usr/lib/dld.sl
(gdb)
############

But still its failing before it enters in our code.

########

I added a trace to find whether /usr/lib is mounted before the command is run...but everything seems to be OK.

Any ideas what may be happening ?

Regards
Sri
abandon all hope, ye who enter here..
Dennis Handly
Acclaimed Contributor

Re: application dumping core during system reboot

>With the new gdb, I am getting a proper stack trace.

Right, much better.

#0 0xc002f65c in get_origin+0x40 /usr/lib/dld.sl

Do you use $ORIGIN in your shlib paths?

>I added a trace to find whether /usr/lib is mounted before the command is run...but everything seems to be OK.

Well /usr/lib/dld.sl is mounted. What does ldd or chatr show on your executable?

>Any ideas what may be happening?

What version of dld.sl do you have? Maybe you need a patch?
JAGag07378 $ORIGIN in filename cause dlgetfileinfo to dump core
Srimalik
Valued Contributor

Re: application dumping core during system reboot

>Do you use $ORIGIN in your shlib paths?

We do not use ORIGIN in SHLIB_PATH, but we are using it in embedded path.

>Well /usr/lib/dld.sl is mounted. What does >ldd or chatr show on your executable?
chatr shows: ( please let me know if you want full output)

SHLIB_PATH enabled second
embedded path enabled first
and every path is prefixed with $ORIGIN

ldd resolves all the dependencies without problems.

>What version of dld.sl do you have? Maybe >you need a patch?
>JAGag07378 $ORIGIN in filename cause >dlgetfileinfo to dump core

1# what /usr/lib/dld.sl
/usr/lib/dld.sl:
SMART_BIND
92453-07 dld dld dld.sl B.11.62 070917


This JAG is fixed in patch PHSS_37201, I have already installed this patch but it was of no help. :(


Thanks
Sri
abandon all hope, ye who enter here..
Srimalik
Valued Contributor

Re: application dumping core during system reboot

the app is starting without problems during reboot on another machine, and dld.sl on that machine seems to be older than that at the machine on which we are facing problems.

ccxrthp2# ls -l /usr/lib/dld.sl
-r-xr-xr-x 1 bin bin 274432 Sep 14 2006 /usr/lib/dld.sl
ccxrthp2# what /usr/lib/dld.sl
/usr/lib/dld.sl:
SMART_BIND
92453-07 dld dld dld.sl B.11.57 060914
ccxrthp2#
abandon all hope, ye who enter here..
Dennis Handly
Acclaimed Contributor

Re: application dumping core during system reboot

>but we are using it in embedded path.

Yes, that's what I meant.

>and every path is prefixed with $ORIGIN

What are those paths and what does ldd show for them? Is this on a file system that isn't mounted until later?
Is there anyway you can run ldd just before you start your application up??

>This CR is fixed in patch PHSS_37201, I have already installed this patch but it was of no help. :(

I hate to think it broke it. :-(
Srimalik
Valued Contributor

Re: application dumping core during system reboot


>What are those paths
see chatr.txt in attached gz file

> and what does ldd show for them?

ldd_after_start.txt in attcahed gz

>Is this on a file system that isn't mounted >until later?
>Is there anyway you can run ldd just before >you start your application up??

see info_during_reboot.txt in attached gz

Sri
abandon all hope, ye who enter here..
Srimalik
Valued Contributor

Re: application dumping core during system reboot

Hi, Dennis

Was the data useful ?

Thanks
Sri
abandon all hope, ye who enter here..
Dennis Handly
Acclaimed Contributor

Re: application dumping core during system reboot

>Was the data useful?

Nothing obvious.
If you still have your core file can you do:
(gdb) x /s *(void**)($sp-0x40)
(gdb) x /s *(void**)($sp-0x3c)
(gdb) p /x $ret0

This should print out the input string to get_origin. And the result of dld_realpath and the result of dld_strrchr.

In your chatr(1) output I see:
dynamic /usr/lib/libstd.2
dynamic /usr/lib/libstream.2
dynamic /usr/lib/libCsup.2

But some of the ldd output has the wrong order:
/usr/lib/libCsup.2 => /usr/lib/libCsup.2
/usr/lib/libstream.2 => /usr/lib/libstream.2
/usr/lib/libstd.2 => /usr/lib/libstd.2

You may have to use "ldd -v" or chatr(1) on every shlib until you find this bad order.
(Or perhaps ldd doesn't have any ordering in its output??)

I do see some strange relative paths:
/usr/lib/libdld.2 => ../lib/libdld.2

But this is in the "after" list too.
Comparing the RHS of each "=>" I get matches for before and after.

Your files that use $ORIGIN all appear to be in /opt/VRTSob, which appears to be:
/opt on /dev/vg00/lvol5 ioerror=mwdisable,largefiles,delaylog,dev=40000005

I did find out that one of your shlibs is illegally being built with -AA. You can't mix -AA and -AP. You have one with libCsup_v2.2 and the rest with libCsup.2.
rajdev
Valued Contributor

Re: application dumping core during system reboot

Hi,

Not sure what the issue is but since you say that its working after system is booted i have a couple of questions :
---> how are you starting it ie from command prompt
---> does this require any terminal ( what is the stdin/stdout/stderr )
---> or is this a daemon program
---> have you tried running with nohup

Regards,
RD
Srimalik
Valued Contributor

Re: application dumping core during system reboot

Thanks, Dennis

Output of gdb commands given by you:
(gdb) x /s *(void**)($sp-0x40)
0x77ff0000: "vxsvc"
(gdb) x /s *(void**)($sp-0x3c)
0x77fce36c: ""
(gdb) p /x $ret0
$1 = 0x0
(gdb)

I can not make out much from the output. :(

I would be able to work on all other points on Monday only. :(


Rajdev,
This exe does not use teminal for stdot/in. We don't have to use nohup.
The command used is exactly same as the command used after reboot and as I have mentioned earlier, it works without problem on most of the machines.

Both of you have a great weekend. :)
abandon all hope, ye who enter here..
Dennis Handly
Acclaimed Contributor

Re: application dumping core during system reboot

>Output of gdb commands given by you:
(gdb) x /s *(void**)($sp-0x40)
0x77ff0000: "vxsvc"
(gdb) x /s *(void**)($sp-0x3c)
0x77fce36c: ""

Basically this is saying that argv[0]? is not an absolute path so that dld's realpath has to sweat. And something is going wrong there.
And unfortunately dld doesn't have any checking for this case.

Years ago I found problem using dirname/basename on argv[0] and I had a case of a non-absolute path and the function aborted. Later I was never able to duplicate it because ksh always provided the absolute path every time I tried to duplicate it. Perhaps that's happening to you with the before and after the boot? Something to do with shells and exec(2)?

So the workaround may be simple, provide an absolute path to vxsvc. (I assume that's what you were running?)

In the meantime, you should contact the Response Center with your problem so they can put more checking into dld. Unfortunately if realpath doesn't work or getcwd(3) fails, dld may give you an error that it can't compute $ORIGIN and you'll just abort with a nicer message.

(You're not in a directory that gets removed out from under it?)

(It would have been helpful if your attachment was .tar.gz or .tgz instead of just .gz. It took me awhile to realize it was a tarfile.)
Srimalik
Valued Contributor

Re: application dumping core during system reboot


>So the workaround may be simple, provide an >absolute path to vxsvc. (I assume that's >what you were running?)
Yes, I am running vxsvc.
Where do I need to give the absolute path? we are already using the absolute path in the rc script which starts vxsvc.



>(You're not in a directory that gets removed >out from under it?)

I think no, as the core is in / I think the working dir for a rc script is / by default.

(It would have been helpful if your attachment was .tar.gz or .tgz instead of just .gz. It took me awhile to realize it was a tarfile.)

Will keep that in mind for future.

Sri
abandon all hope, ye who enter here..
Dennis Handly
Acclaimed Contributor

Re: application dumping core during system reboot

>Where do I need to give the absolute path? we are already using the absolute path in the rc script which starts vxsvc.

Well, that's where I would expect it.
If not there, perhaps it forks and execs itself?
Srimalik
Valued Contributor

Re: application dumping core during system reboot

Hi Dennis,


> In the meantime, you should contact the
Response Center with your problem so they
can put more checking into dld.
Unfortunately if realpath doesn't work or
getcwd(3) fails, dld may give you an error
that it can't compute $ORIGIN and you'll
just abort with a nicer message.

Can't we fix this i.e cann't we prevent the abort instead of giving a nicer message? either in the application or the OS.

Thanks
Sri
abandon all hope, ye who enter here..
Dennis Handly
Acclaimed Contributor

Re: application dumping core during system reboot

>can't we prevent the abort instead of giving a nicer message? either in the application or the OS.

You can't fix something that hasn't been reported. Nor can you fix something if you can't duplicate. Currently we can only check for the symptoms of the lack of error detection in get_origin.

It might be interesting to see if exec(2) passed the absolute path to vxsvc. (Do you still have that corefile?) Or get the output of tusc to see what's different when it fails vs when it works.
Srimalik
Valued Contributor

Re: application dumping core during system reboot

Yes, I have the core file(fortunately), Please find the packcore attached.
The setup on the machine has changed, I will try to recreate the setup and try tusc.

Thanks
Sri
abandon all hope, ye who enter here..
Dennis Handly
Acclaimed Contributor

Re: application dumping core during system reboot

>I have the core file(fortunately), Please find the packcore attached.

Thanks.

>I will try to recreate the setup and try tusc.

Basically there is something wrong with the kernel or the shell. There is no way dld can figure out the path of the executable:
argv[ 0]: vxsvc
argv[ 1]: -r
argv[ 2]: /etc/vx/isis/Registry
argv[ 3]: -e
argc: 4
envp[ 0]: _=/opt/VRTSob/bin/vxsvc
...
envp[ 5]: PATH=/usr/bin:/bin:/sbin

While "_" is set, this wouldn't be true or trusted in all shells. I have no idea why argv[0] doesn't have the absolute path, especially if it isn't in PATH.

Using tusc on the case where it works may tell you something, argv[0] is full path?
Dennis Handly
Acclaimed Contributor

Re: application dumping core during system reboot

Some other variables:
envp[14]: INIT_STATE=3
envp[22]: TERM=unknown
envp[23]: PWD=/
envp[24]: TZ=PST8PDT
envp[28]: SHLIB_PATH=/opt/VRTSob/lib/compat:/opt/VRTSob/lib:/opt/VRTSob/
sig/bin:/opt/VRTSob/lib
Srimalik
Valued Contributor

Re: application dumping core during system reboot

Hi, Dennis

While I was waiting for a 11.23 machine to send you the logs,
I saw similar problem on 11.31 machine.

Few things to note.

previously vxsvc was dumping during a reboot. But now it dumps every time(even when we try to start it after the machine is up)

We are calling execvp with first arg as "/opt/VRTSob/bin/vxsvc" in our code(vxsvc). and the exec seems to be failing.

If we do not call this exec function it runs without a dump.

if we include /opt/VRTSob/bin to $PATH vxsvc runs without problems. ( tusc logs attached)

The core is same as the core we saw on 11.23.

gdb) x /s *(void**)($sp-0x40) shows "vxsvc"

(gdb) x /s *(void**)($sp-0x3c) shows ""

(gdb) p /x $ret0 shows 0x0

detailed output
================
# gdb -c /opt/VRTSob/bin/core.vxsvc /opt/VRTSob/bin/vxsvc
Detected PA executable.
Invoking /usr/ccs/bin/gdbpa.
HP gdb 5.7 for PA-RISC 1.1 or 2.0 (narrow), HP-UX 11.00
and target hppa1.1-hp-hpux11.00.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 5.7 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
Core was generated by `vxsvc'.
Program terminated with signal 11, Segmentation fault.
SEGV_UNKNOWN - Unknown Error
#0 0xc4d6365c in get_origin+0x40 () from /usr/lib/dld.sl

warning: core file might be corrupted.
(gdb) bt
#0 0xc4d6365c in get_origin+0x40 () from /usr/lib/dld.sl
#1 0xc4d4ee4c in map_shlib+0x11ac () from /usr/lib/dld.sl
#2 0xc4d4ca78 in form_load_graph+0x1a4 () from /usr/lib/dld.sl
#3 0xc4d4d58c in form_load_graph+0xcb8 () from /usr/lib/dld.sl
#4 0xc4d5c020 in finish_dld_main+0x1024 () from /usr/lib/dld.sl
#5 0xc4d5f9d4 in _dld_main+0x1c8 () from /usr/lib/dld.sl
#6 0xb73c in __map_dld+0x404 ()
#7 0xaf34 in $START$+0x114 ()
#8 0xc4d63658 in get_origin+0x3c () from /usr/lib/dld.sl
(gdb) x /s *(void**)($sp-0x40)
0x7b03b064: "vxsvc"
(gdb) x /s *(void**)($sp-0x3c)
0x7b008748: ""
(gdb) p /x $ret0
$1 = 0x0
(gdb)
#####################################

The first argument to execvp is full absolute path. but the args to get origin is only vxsvc. Is this the problem?

Is it a problem in exec or our code?
As the man page of execvp says that the first argument to this function should be file name and the path prefix is found out from PATH env variable.
But we are passing full path of the exe.

please confirm if this is a problem with the dld code, I will also ask my colleagues the channel to raise this with HP. Before that I want to make sure that that our code is clean. :)
####################
I am attaching the tusc logs for three cases

1. /opt/VRTSob/bin is not in PATH on an IA machine( core dump) filename:
tusc_logs/tusc_vxsvc_bad_NOPATH_IA.txt


2. /opt/VRTSob/bin is in PATH on an IA machine ( no core) filename: tusc_logs/tusc_vxsvc_good_PATH_IA.txt

3. same binary on a PA machine where the problem is not seen.
filename: tusc_logs/tusc_vxsvc_good_PA.txt

I think the problem is independent of IA/PA as we saw it on PA on 11.23.

I have access to both the machines now and am not going to release them before this issue is fixed.
Let me know if you want additional info, I will provide that in our morning.

I am still searching for a 11.23 machine to try with.

Thanks
Sri

abandon all hope, ye who enter here..
Dennis Handly
Acclaimed Contributor

Re: application dumping core during system reboot

>But now it dumps every time

That's helpful to duplicate the problem.

>We are calling execvp with first arg as "/opt/VRTSob/bin/vxsvc" in our code (vxsvc). and the exec seems to be failing.

You aren't invoking it from the shell?
If you are calling execvp, you must provide the absolute path, both in "file" and in argv[0], if you expect $ORIGIN to work.

>if we include /opt/VRTSob/bin to $PATH vxsvc runs without problems.

Yes, that would make it work.

>The first argument to execvp is full absolute path. but the args to get origin is only vxsvc. Is this the problem?

Yes, you must pass that full path twice.

>Is it a problem in exec or our code?

If you are calling execvp(2), it is your problem. (Or lack of documentation or that new error message I suggested. :-)

>As the man page of execvp says that the first argument to this function ...

You forgot to read more:
argv is ... By convention, argv must have at least one member, and must point to a string that is identical to path or path's last component.

Here the documentation is missing the fine print that says: But if you don't use the exact same string and that path isn't in PATH, then dld will abort if you use $ORIGIN. ;-)

>please confirm if this is a problem with the dld code, I will also ask my colleagues the channel to raise this with HP. Before that I want to make sure that that our code is clean. :)

The only problem with dld is a nicer error message. You need to fix your code.

But please contact HP so that message can be added.

>I am attaching the tusc logs for three cases

You need to add -a to get the exec args and -p to print the PID.

>1. /opt/VRTSob/bin is not in PATH on an IA machine: tusc_vxsvc_bad_NOPATH_IA.txt

It shows a good exec and a bad.

>2. /opt/VRTSob/bin is in PATH on an IA machine: tusc_vxsvc_good_PATH_IA.txt

It shows the stats for each PATH.

>3. same binary on a PA machine: tusc_vxsvc_good_PA.txt

It seems the same as 1. But the path is ./vxsvc and it knows how to look that up.