Operating System - HP-UX
1830898 Members
3034 Online
110017 Solutions
New Discussion

How can I analyse a core file in /var/adm/crash

 
SOLVED
Go to solution
Steve Yeats
Occasional Advisor

How can I analyse a core file in /var/adm/crash

How can I tell what caused my machine to crash, by looking at the core file created by savecore in /var/adm/crash. Machine is a very old 735/99 running HPUX 10.20
As happy as a constipated bear in a small copse
12 REPLIES 12
Paula J Frazer-Campbell
Honored Contributor

Re: How can I analyse a core file in /var/adm/crash

Hi Steve

strings
what

Will give information on the file.
Also search the forum for previous questions on core files.

HTH

Paula
If you can spell SysAdmin then you is one - anon
Andy Monks
Honored Contributor

Re: How can I analyse a core file in /var/adm/crash

The best way, is to log a support call with HP and get them to look at it (assuming you have a s/w contract).

As a very quick thing you can do :-

cd /var/adm/crash/core.*
q4 .
trace event 0

and then search the on-line info for the stack trace and/or post it back here. It might be something really obvious.
Andy Monks
Honored Contributor

Re: How can I analyse a core file in /var/adm/crash

Paula,

This is a system crash not a command core dump. Doing a strings and a what is pointless.
Steve Yeats
Occasional Advisor

Re: How can I analyse a core file in /var/adm/crash

Thanks for the responses,
Andy,
When I do cd /var/adm/crash/core.2; q4 .
I get a message saying "the kernel doesn't look like it has been prepared for debugging...." q4 advises me to run pxdb, which is not installed on my system. Where can I find this tool? Do I have to purchase it?
-Thx
As happy as a constipated bear in a small copse
Andy Monks
Honored Contributor

Re: How can I analyse a core file in /var/adm/crash

Steve,

There is a version of pxdb in /usr/contrib/bin called q4pxdb.

I'd recommend taking a copy of the vmunix in the core directory and then running 'q4pxdb vmunix'
Steve Yeats
Occasional Advisor

Re: How can I analyse a core file in /var/adm/crash

Andy, thanks once again for your help.

Copied vmunix, ran q4pxdb, ran q4 ./vmunix, q4> trace event 0, "q4: (error) Can't read symbol crash_event_ptr+0x0 from core file".

Would I be correct in saying that this pointer might not be found because there was not enough space for savecore to save a full coredump. (When I view the /var/adm/crash/core.2/INDEX, I see "error savecore: Insufficient space to save full core dump only 72876032 of 301989888 bytes saved"

I am interested in learning more about analysing these system crashes, is there any online documentation/books about it that you could recommend?

-Steve
As happy as a constipated bear in a small copse
Andy Monks
Honored Contributor

Re: How can I analyse a core file in /var/adm/crash

Steve,

That's pretty short thats for sure.

try the following in the core directory :-

echo "msgbuf+8/s" | adb -m vmunix . > /tmp/msgbuf

Then look for the 'panic' message. Under that (or above if the message buffer wrapped), will be a block of hex numbers. What you then need to do is :-

adb vmunix
?a

and repeat for each set of numbers. That will get the stack trace out at least which is a start.

The problem with learning to read a dump is that you really need the hp-ux source code and lots of experience. Also, lots of online documents about known problems is good too. from the stack trace you can search the external database of problems to see if you can spot anything.

Btw, if in the message buffer you find its a 'data page fault', ingore the top 4 stack entries when trying to match anything up. all 10.20 machines have the same top 4 entries.
Steve Yeats
Occasional Advisor

Re: How can I analyse a core file in /var/adm/crash

Andy

Well it was a Data page fault.

I've attached the output from the "echo msg..." command, and here's the "adb.." output (inc. first four stack entries..)

# adb vmunix
0x0023f984?a
panic+3C: panic+3C:
0x00218708?a
report_trap_or_int_and_panic+0xE8: report_trap_or_int_and_panic+0xE8:
0x000dae04?a
trap+1054: trap+1054:
0x00223b40?a
$RDB_trap_patch+20: $RDB_trap_patch+20:
0x00048ae8?a
strlen+0xC: strlen+0xC:
0x001cc580?a
hpauto_readdir2+2C0: hpauto_readdir2+2C0:
0x00138d1c?a
getdents+164: getdents+164:
0x000d87c4?a
syscall+1A4: syscall+1A4:
0x00049108?a
$syscallrtn:
$syscallrtn: $syscallrtn:
#

What does it all mean?!? I'm no programmer, perhaps I should learn a bit!! What's worrying me is that this machine has been running for ages without problems and has crashed three times in the last week. I know nothing has changed unix wise, so assume a user process is causing this problem.

Is there a way I can find out processes/owners at the time of the crash? I guess that's what I'm really after.
As happy as a constipated bear in a small copse
melvyn burnard
Honored Contributor
Solution

Re: How can I analyse a core file in /var/adm/crash

You appear to have a panic in the autofs routines, which is a known problem.

I would suggest you look at loading the latest patch bundle, or at least the NFS patches PHNE_22117 and PHNE_21375 plus dependencies.
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Andy Monks
Honored Contributor

Re: How can I analyse a core file in /var/adm/crash

Steve, as Melvyn said, that patch should fix it.

Btw, Melv that's a beer you owe me on the 18th march for stealing my points :-)
Steve Yeats
Occasional Advisor

Re: How can I analyse a core file in /var/adm/crash

Melvyn/Andy,

Thanks for all your help. I will have to schedule downtime for this machine to apply the latest patch bundle. (Reboot is required).

Thanks,
Steve
As happy as a constipated bear in a small copse
Paula J Frazer-Campbell
Honored Contributor

Re: How can I analyse a core file in /var/adm/crash

Hi Andy

I should have RTFQ correctly - Ooops

Paula ;-)

If you can spell SysAdmin then you is one - anon