1835065 Members
2118 Online
110073 Solutions
New Discussion

How to read coredumps?

 
Philip Chan_1
Respected Contributor

How to read coredumps?

Is there any good source of documentation on how to read coredumps in HPUX? I've done this before for Tandem computers so perhaps some sort of quick start info would be very helpful to me.

Thanks in advance.

Regards,
Philip
15 REPLIES 15
Alex Glennie
Honored Contributor

Re: How to read coredumps?

If you are talking core dumps wrt a crashed system as opposed to S/W cores .....

Before you can get a core, the customer must follow the instructions in the
"System Administration Tasks" manual (B2355-90051). In chapter 6, "Managing
Swap Space and Dump Areas", start with the section on "Setting Up DumpAreas".

After you have a system core dump, use q4, adb, or scancore to analyze a
system core dump on the S800. Use q4 or adb on the S700.

> Is the `analyze' command available? If not, when?
The modern system-core-dump analysis program is q4, and everyone should use that.

The analyze program is shipped in the USRCONTRB fileset but is unsupported.

This means that people who used analyze before can still use it, but HP does
not explain it by supplying a man page, the program offers no new
functionality, and some old functionality was removed (parts that changed
drastically from 9.X to 10.0 were not updated -- for instance, analyze no
longer knows about the I/O system now that it is converged).

OS-Core.CMDS-MIN: /usr/bin/adb
OS-Core.USRCONTRB: /usr/contrib/bin/analyze
OS-Core.USRCONTRB: /usr/contrib/bin/q4

> Can I check the message buffer in core with `adb'? How?
Yes. Run adb (as in `adb /stand/vmunix {corefile}'), then type the adbcommand:
msgbuf+8/s
Alex Glennie
Honored Contributor

Re: How to read coredumps?

Stefan Farrelly
Honored Contributor

Re: How to read coredumps?


Are you talking about a crashdump (HP-UX crash) or a binary coredumping ? The previous reply is about crashdmps. If you have a program (binary) which is coredumping then you need to use xdb to read the core file. It really helps if you have the source for the binary also, then xdb can step through the source and core file together and show you where the error is.
Or alternatively you can use the GNU debugger, gdb (downloadable from http://hpux.cs.utah.edu/hppd/hpux/Gnu/gdb-4.18)

Im from Palmerston North, New Zealand, but somehow ended up in London...
Cheryl Griffin
Honored Contributor

Re: How to read coredumps?

The ITRC Knowledge Base contains many articles on how to pre-process the information for HP using Q4

For instance:
OZBEKBRC00000611 How do I pre-process my crash dump so HP can troubleshoot it?

Search keywords: Q4 and crash
"Downtime is a Crime."
Tom Danzig
Honored Contributor

Re: How to read coredumps?

Running the file command on the core will give you a quick explaination of what caused the SW to core dump.

# file core

core: core file from 'exp' - received SIGSEGV

In this case, program "exp" made an invalid memory reference. See "man 5 signal" for a list of signals and their explaination.
Philip Chan_1
Respected Contributor

Re: How to read coredumps?

Hi,

Thanks for everyone's reply first. To make my question more specific I'm actually referring to program binary coredumps instead of HPUX OS crash dumps. And my binary coredump was indeed coming from a Perl script !!! Before I invest my time on either xdb or gdb, can anyone tell me if they are capable of debugging Perl dumps?
Stefan Farrelly
Honored Contributor

Re: How to read coredumps?


I dont think xdb or gdb will analyze perl scripts from a dump.

You need a debugger called The Perl Debugger
http://www2.linuxjournal.com/lj-issues/issue49/2484.html

or Devel::Trace, see the following webpage for more;
http://perl.about.com/compute/perl/msub67.htm?iam=mt&terms=%22perl+debugger%22
Im from Palmerston North, New Zealand, but somehow ended up in London...
Philip Chan_1
Respected Contributor

Re: How to read coredumps?

Thanks to Tom's advice and now I can confirm one of the problem that caused our coredumps was invalid memory reference (SIGSEGV), but I still have to find out which statement was causing that and that is the reason why I need to load up the coredump for seeing the program stack and instruction pointer at point of failure.

Stefan, as far as I'm aware the perl debugger can only help to step through a program from beginning to end, but can do nothing with coredump images.
Stefan Farrelly
Honored Contributor

Re: How to read coredumps?


Philip,

your coredump, does it come from a compiled Perl program ? if so, what was the compiler used ? doing a strings on the binary may help to answer this, or a type command on the binary file or a chatr on the binary file.
Or is the binary you are running a compiled C program which just happens to call a Perl script ?
Im from Palmerston North, New Zealand, but somehow ended up in London...
Philip Chan_1
Respected Contributor

Re: How to read coredumps?

Hi Stefan,

Correct me if I'm wrong, isn't Perl only a interpretor language? I mean no compiled version available unless we turn the Perl source into C code then compile it. (I love to hear that I'm wrong :-) )

For your info the Perl program in trouble is being invoked from a cronjob. I've traced through the preceeding commands and indeed it was the Perl program that received a SIGSEGV signal hence forced the coredump.
Stefan Farrelly
Honored Contributor

Re: How to read coredumps?


Yes, I beleive you can buy compilers for Perl now, ive seen one advertised for HP-UX in my System Administration magazine. But in your case your not compiling it so it should be pretty easy to trace the error causing the coredump by simply stepping thru the script until it dumps ? once you see what was being executed before the dump this should give you clues as to why. Let me do some more checking on reading the coredump.
Im from Palmerston North, New Zealand, but somehow ended up in London...
Philip Chan_1
Respected Contributor

Re: How to read coredumps?

Hi Stefan,

The Perl bug cannot be repeated easily. It only occurs perhaps once for every 30 runs. And frankly speaking even the people who wrote the program don't have a clue on what's wrong with it, so even if I spent half a day on stepping through the code that will not guarantee I'll hit the problematic statements.

Since the coredump was caused by SIGSEGV, so I just installed a signal handler to the program for capturing the above signal which would print out the line no. from the source that triggered the problem. Hope this help but the bug seem fairly intellienge because the program has not aborted since I installed the hanlder (Carp::confess) and that has been 5 hours already (60 runs of the cronjob) !!!
Stefan Farrelly
Honored Contributor

Re: How to read coredumps?


If the problem ocurrs only once every so many runs then its more than likely not a problem with the program, its caused by an external problem to it. SIGSEGV is caused by one of a few things;
Insufficient memory or swap space, or stack size exceeded maxssiz, or maxdsiz too small.
So while this program is running you should monitor these parameters, do a swapinfo -mt before and after it runs, monitor the process as it grows while running using ps, monitor system memory usage, or use glance/gpm to keep an eye on it. I suspect one of these parameters is periodically being exceeded resulting in your occasional coredump.
Im from Palmerston North, New Zealand, but somehow ended up in London...
Tor Even Furvann
New Member

Re: How to read coredumps?

Hi,
we are having a related problem on our system.
We get a coredump(recieved SIGSEGV) when a fcomp compiled PRO*C program executes.
The coredump only occures on this PRO*C program(we run several others).
When we restart the program it always executes normally.
It therefore seems probable as you state in this issue that it's something external and not the program that causes the coredump.
To check the swapinfo before and after exec. seems like a good idea. We will do so on the next run.

My question is concerning the kernel parameters.(eks. maxdsize and maxssize).
We have a K370 with 1024MB RAM.
What values should these parameters have on this machine?

Regards
/Lars