Re: core dump from application on 11.00 system

Technical Support_2 · ‎04-16-2003

Hi
I have in-house application which is not working

I got the folloing message when I run it
{hawk}% run_me
Executing default
sgmentation fault (core dumped)

I found some gdb stuff in site and try to run it. Here is what I got

{hawk}% gdb run_me.hp1100
HP gdb 3.0 for PA-RISC 1.1 or 2.0 (narrow), HP-UX 11.00.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard Wildebeest 3.0 (based on GDB) is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
(gdb) run
Starting program: /rel/bin/run_me.hp1100

Program received signal SIGSEGV, Segmentation fault.
0x7f6fdaec in mallinfo () from /usr/lib/libc.2
(gdb) where
#0 0x7f6fdaec in mallinfo () from /usr/lib/libc.2
#1 0x7f6fb330 in __thread_callback_np () from /usr/lib/libc.2
#2 0x7f700e7c in malloc () from /usr/lib/libc.2
#3 0x7d7d8c58 in _XlcResolveLocaleName () from /usr/lib/X11R6/libX11.3
#4 0x7d7d36a8 in _XhpEucTWLoader () from /usr/lib/X11R6/libX11.3
#5 0x7d7a7900 in _XOpenLC () from /usr/lib/X11R6/libX11.3
#6 0x7d7a7b30 in _XrmInitParseInfo () from /usr/lib/X11R6/libX11.3
#7 0x7d78c74c in NewDatabase () from /usr/lib/X11R6/libX11.3
#8 0x7d78e958 in XrmCombineFileDatabase () from /usr/lib/X11R6/libX11.3
#9 0x7d6ccb44 in CombineUserDefaults () from /usr/lib/X11R6/libXt.3
#10 0x7d6cd944 in GetLanguage () from /usr/lib/X11R6/libXt.3
#11 0x7d6cdc5c in _XtDisplayInitialize () from /usr/lib/X11R6/libXt.3
#12 0x7d6c0e1c in XtOpenDisplay () from /usr/lib/X11R6/libXt.3
warning: /vega/xp/src/vision/apps/pace/pace.o: Unable to open file to read debug information. Use the "objectretry" command to try again.
#13 0x660944 in main ()
(gdb) quit

I am trying to run this on V-class. We have number of V-Class and it works in all except one. I check libc patch and they are same. I check ld, libpthread patch as well they are idantical.

Interesting thing though. I have .Xdefault file in my home directory. If I remove that file from home directory then my application works. If I have .Xdefault file in my home directory even if it is empty then allication fail to start. Any help ??

Jeff Schussele · ‎04-16-2003

Hi,

SIGSEGV or segment violations are *usually* an indicator of something trying to cross a memory boundary that it can't or shouldn't.
These can be due to missized maxXsiz kernel parameters, running out a swap & a reservation can't be made or even a 32-bit app trying to allocate mem beyond the normal 32-bit boundaries.

Rgds,
Jeff

PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!

Jeff Schussele · ‎04-16-2003

OH...and another case could be where a 32-bit app is requesting a contigous mem space & the space is so fragmented that the request cannot be filled.

Jeff

PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!

Paddy_1 · ‎04-16-2003

I am attaching a small program I use to read the core.See it is helpful or else I can send you the advanced version which dumps more info on the core.

The sufficiency of my merit is to know that my merit is NOT sufficient

Adam J Markiewicz · ‎04-16-2003

Hi

Two thoughts.

1. The coredumps inside mallinfo I interprete in one way: Someone corrupted memory. For example free() added the block to the list of memory for further use, then a program modified the memory, corrupting that list, then another try to malloc() started to walk accross that corrupted list. Or the program wrote data after/behind correctly allocated block, corrupting the list.

2. If the .Xdefault is some clue, try using 'tusc'. The last logs before crash mayby can tell something more.

Good luck
Adam

I do everything perfectly, except from my mistakes

Technical Support_2 · ‎04-16-2003

Hi Jeff, application is 64bit, os is 64bit. I didn't find any maxX kernel parameter.

paddy I will try to use your programm. But I am not programmer. I hope the output will make sense to me.

Technical Support_2 · ‎04-16-2003

Hi Adam,
How to read this. It give me something more then I was getting before.

Last output when it is not working.

stat("/opt/graphics/OpenGL/lib//libXhp11.3", 0x7f7f4910) ...... ERR#2 ENOENT
mmap(NULL, 8192, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_SHLIB, 4, 0x2000) ERR#12 EN
OMEM
mmap(NULL, 90112, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_SHLIB, 4, 0xa000) ERR#12 E
NOMEM
access("/home/sachin/.Xauthority", R_OK) ...................... ERR#2 ENOENT
read(4, 0x7f7f21cc, 8) ........................................ ERR#246 EWOULDBL
OCK
read(4, 0x7f7f2284, 32) ....................................... ERR#246 EWOULDBL
OCK
read(4, 0x7f7f22ac, 32) ....................................... ERR#246 EWOULDBL
OCK
read(4, 0x7f7f2480, 32) ....................................... ERR#246 EWOULDBL
OCK
Received signal 11, SIGSEGV, in user mode, [SIG_DFL], partial siginfo
Siginfo: si_code: I_NONEXIST, faulting address: 0x6e380000, si_errno: 0
PC: 0xc0184aef, instruction: 0x488a0000
exit(11) [implicit] ........................................... WIFSIGNALED(SIGS
EGV)|WCOREDUMP

If it is working.

stat("/usr/lib/Motif2.1/libXhp11.3", 0x7f7f4910) .............. ERR#2 ENOENT
stat("/opt/graphics/OpenGL/lib//libXhp11.3", 0x7f7f4910) ...... ERR#2 ENOENT
mmap(NULL, 8192, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_SHLIB, 4, 0x2000) ERR#12 ENOMEM
mmap(NULL, 90112, PROT_READ|PROT_EXEC, MAP_SHARED|MAP_SHLIB, 4, 0xa000) ERR#12 ENOMEM
access("/home/sachin/.Xauthority", R_OK) ...................... ERR#2 ENOENT
read(4, 0x7f7f21cc, 8) ........................................ ERR#246 EWOULDBLOCK
read(4, 0x7f7f2284, 32) ....................................... ERR#246 EWOULDBLOCK
read(4, 0x7f7f22ac, 32) ....................................... ERR#246 EWOULDBLOCK
read(4, 0x7f7f2480, 32) ....................................... ERR#246 EWOULDBLOCK
open("/home/sachin/.Xdefaults", O_RDONLY, 0) .................. ERR#2 ENOENT
open("/usr/lib/nls/C/iso88591/locale.inf", O_RDONLY, 0) ....... ERR#2 ENOENT
ioctl(5, TCGETA, 0x7f7f3420) .................................. ERR#25 ENOTTY
ioctl(5, TCGETA, 0x7f7f3be0) .................................. ERR#25 ENOTTY
read(4, 0x7f7f2778, 32) ....................................... ERR#246 EWOULDBLOCK
read(4, 0x7f7f2798, 32) ....................................... ERR#246 EWOULDBLOCK
open("/home/sachin/.Xdefaults-hawking", O_RDONLY, 0) .......... ERR#2 ENOENT
open("/home/sachin/.Xdefaults", O_RDONLY, 0) .................. ERR#2 ENOENT
access("/TANGO/3/rel/rdo/C.iso88591/VisionX", R_OK) ........... ERR#2 ENOENT
access("/TANGO/3/rel/rdo/C/VisionX", R_OK) .................... ERR#2 ENOENT

And so many more lines....

using -z option. Am I missing anything from OpenGL.

Jeff Schussele · ‎04-16-2003

Hi Tech,

By maxXsiz I meant the maxdsiz, maxssiz & maxtsiz kernel parameters. But in your case since you're all 64-bit they'd be maxdsiz_64bit, maxssiz_64bit & maxtsiz_64bit
Verify just what these are set to & insure the program is not attempting to exceed these values.

Rgds,
Jeff

PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!

Technical Support_2 · ‎04-16-2003

Hi Jeff,
My name is sachin, I was login as tech and open a thread.

maxdsiz_64bit 7516192768
maxssiz_64bit 1073741824
maxtsiz_64bit 1073741824
maxtsiz 67108864

Sachin

Jeff Schussele · ‎04-16-2003

Hi Sachin,

Values of those kernel params look fine. I doubt you're exceeding those.

As to understanding the earlier output - this may help

ENOENT => err #2 => No such file or directory
ENOMEM => err #12 => Not enough core. (This is probably the cause of the segment violation)
EWOULDBLOCK => err #246 => Operation would block

Note these standard errors can be found in /usr/include/sys/errno.h
You need to strings it & grep for what you're looking for.

HTH,
Jeff

PERSEVERANCE -- Remember, whatever does not kill you only makes you stronger!

Adam J Markiewicz · ‎04-16-2003

Hi Sachin

You managed to deliver very good traces (same situation in both cases).

tusc writes an entry at the moment, when function returns. And the thing that wonders me most is that in the not working example I cannot see expected
open("/home/sachin/.Xdefaults-hawking", O_RDONLY, 0)
and we know that this is the difference in configuration.

This leads me to the conclusion that this function actually didn't return, so this would mean that the problem is somewhere during system call. This would also mean that the coredump happens even before the file is even open.

I have to admit also that the address 0x6e380000 is not very often seen, although to be honest, can be valid.

But this has absolutelly nothing in common with trace from gdb, so thats mage things harder.

Another point is that it looks like your application is multithreaded.
If you could do some more tests: with gdb at the core dumping try:
info threads
I just wonder if the trace you show is from the thread that actually causes trouble.

You could also play a little more with tusc: options to investigate:
-u (show thread id)
-E (show the entry point of the functions, not only the exit)

You can check the options with just simple 'tusc' (no arguments).

Good luck
Adam

I do everything perfectly, except from my mistakes

Steven E. Protter · ‎04-16-2003

I love tusc.

Great tool.

Before I use it, I analyze the core files according to the attached instructions.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Technical Support_2 · ‎04-16-2003

Thanks Jeff, I will check that .h file in moment,

Thanks Adam, I will try that option.
Yes application is multithreaded.

Sachin

Adam J Markiewicz · ‎04-17-2003

Hi again

Few comments to Jeffs advice to save your time:

ENOENT - trying to open files that do not exist. I wouldn't worry about it too much, as it to happens in both cases (working and not working), so the process seems to be able to live without them.

ENOMEM - actually not to worry about. Not so long ago we checked this quite precisely. (Welcome SEP! Thanks again for all those points, that actually gave me this hat, although my suggestions appeared to be pretty useless afterall...) This is after unsuccessull mmap() call, and is UNDOCUMENTED reaction for mmap()'ing the same file several times. Observed quite often in real world.
If you are interested check this link:
http://forums.itrc.hp.com/cm/QuestionAnswer/1,,0x0a49b82b2d63d71190080090279cd0f9,00.html

EWOULDBLOCK - I also do not expect anything dangerous from this. Just file descriptor (socket mayby?) was configured not to wait for data to arive, but return immediately with this comment, when the incomming buffer is empty.

for complementary:
ENOTTY - related file descriptor is connected to something different than a terminal. Seen it plenty times in traces.

And the last thing: The tusc options I gave you can be mixed (and the best would be if you specified both).

Good luck
Adam

I do everything perfectly, except from my mistakes

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: core dump from application on 11.00 system

core dump from application on 11.00 system