OZBEKBRC00000611 - How do I use q4 to pre-process a dump that HP needs to
read
Document Information Table
How do I use q4 to pre-process a dump that HP needs to read
PROBLEM
Prior to Opening a Call with HP, I'd like to know
the best way to pre-process
my crash dump so HP can troubleshoot it.
RESOLUTION
USING Q4
TO ANALYZE SYSTEM DUMP FILES
(For 10.10-11.20)
==============================================================
When HP-UX crashes, it saves a snapshot of RAM in
disk-based swap space or
dedicated dump space, reboots the system, and copies
the resulting "dump"
into /var/adm/crash.
A utility called q4, normally loaded on the
system, is available to
make text files for fast analysis. A patched version of q4 must be loaded
to interpret dumps resulting from a
"hanging" operating system.
To preprocess the dump, follow these steps and email
the resulting files
to the HP Response Center for analysis. Steps vary depending on the
version of the O/S and the version of q4.
Please note, all email generated from this procedure
should be sent to the
dump team email address hpcu@atl.hp.com using
the CALL ID as
the SUBJECT. DO NOT send this information to the
engineer's personal
address.
After emailing the data, please log a callback
against the call to let
the engineer know that you have emailed your data.
==============================================================
STEP 1
===== WHERE IS THE DUMP? ===========================
==============================================================
1.1 Verify a current dump exists in the dump
directory:
# ll
/var/adm/crash/c*
A recent core.N(10.X)
or crash.N(11.X) directory should be
listed.
(NOTE:N
is the next available dump index, which increments
with each
successive dump.)
The INDEX
file in /var/adm/crash/c* and /etc/shutdownlog
contains
the "panic" statement.
1.2 # touch
/etc/shutdownlog (if it does
not exist)
1.3 If a current
dump is not in /var/adm/crash, do
# grep
_DIR /etc/rc.config.d/save*
The value
pointed to by SAVECORE_DIR=(10.X) or
SAVECRASH_DIR=(11.X)
is where the system places dump files.
1.4 If the
system dump is not in the expected location try to
re-save the
dump with:
10.X : # savecore -vr
11.00: # savecrash -vr
A return
message "invalid dump header" means the dump is
non-existent.
NOTE: If the current dump directory gets full with
a dump save,
update the
directory variable with a directory with more space, and make
the new
directory to capture future dumps.
==============================================================
STEP 2
===== IS A VERSION OF Q4 LOADED? ===================
==============================================================
2.1 Determine if
and which version of q4 is loaded:
# swlist
-l fileset | grep -i Q4
The
following are unpatched versions supplied with the OS:
OS-Core.Q4 B.10.20 HP-UX Crash Dump Debugger for PA-RISC
systems
or
OS-Core.Q4 B.11.00 HP-UX Crash Dump Debugger for PA-RISC
systems
or
OS-Core.Q4 B.11.11 HP-UX Crash Dump Debugger for PA-RISC
systems
2.2 If one of
the following patched versions are listed, proceed to
STEP 3:
10.20 11.00 11.11
10.20 [PHCO_20261/PACHRDME/English]
11.00 [PHCO_20262/PACHRDME/English]
11.11 [PHCO_25723/PACHRDME/English]
2.3 If the
system does not have q4, or the dump was the result of a
hang,
load the patched version.
Loading the patched version
will not
cause a system reboot. Installation
instructions
accompany the patch.
Download
the appropriate version from this site:
For the 10.10 or 10.20 version:
ftp://us-ffs.external.hp.com/hp-ux_patches/s700_800/10.X/PHCO_20261
For the 11.0 version:
ftp://us-ffs.external.hp.com/hp-ux_patches/s700_800/11.X/PHCO_20262
For the 11.11 version:
ftp://us-ffs.external.hp.com/hp-ux_patches/s700_800/11.X/PHCO_25723
NOTE:
the patch number may be superceded over time
2.4 If web
access is unavailable and no version of q4 is on the
system and
the install CD is available, proceed to load the standard
version of
q4:
Mount the
INSTALL media and verify a matching version of Q4 is
available:
# swlist
-l fileset -s / | grep Q4
OS-Core.Q4 B.10.10 HP-UX Crash Dump Debugger for PA-RISC
systems
^^^^^ -matches the O/S
Use
swinstall to install it:
# swinstall
-vs / OS-Core.Q4
==============================================================
STEP 3
===== CD TO THE DUMPS
DIRECTORY ====================
==============================================================
NOTE: csh (c-shell) will cause errors with q4. Use
sh-posix.
#
cd (IMPORTANT!)
eg: cd /var/adm/crash/core.0 OR /var/adm/crash/crash.0
==============================================================
STEP 4
===== IF USING UNPATCHED Q4 ========================
==============================================================
4.1 Perform this
command:
# /usr/contrib/bin/gunzip
vmunix.gz
(uncompresses
the kernel file)
For 10.20
through 11.11, type this command and then skip to 4.2:
# /usr/contrib/bin/q4prep
-p
For 11.20
and beyond, type this command and then
skip to 4.2:
# /usr/contrib/Q4/bin/q4prep
-p
If at
10.10, type the following commands:
#
uncompress /usr/contrib/lib/Q4Lib.tar.Z
(ignore
the error if this was done previously)
# tar
-xf /usr/contrib/lib/Q4Lib.tar
(output
goes into the current directory)
# cp
q4lib/sample.q4rc.pl ~/.q4rc.pl
\ \ \
Note the
use of a tilde and letter "l" (not digit 1)
#
/usr/contrib/bin/q4pxdb vmunix
This
may complain if vmunix is already preprocessed.
4.2 If the next
command causes "/var: file
system full",
move the
core. directory to a file system with adequate
space
(approximately 2x the sum of the core.x.y.gz files) and
continue at
this point.
Type:
# q4
-p .
(note
the "dot" at the end of the command)
Then:
q4> trace
event 0 > trace.out
q4> include
analyze.pl
\
NOTE
letter "l" (not digit 1)
q4> run
Analyze AU >> ana.out
NOTE:
ctrl-c will interrupt q4
q4> exit
Skip to STEP
6
==============================================================
STEP 5
===== IF USING THE PATCHED
VERSION OF Q4 ===========
==============================================================
5.1 Type:
# .
/usr/contrib/Q4/bin/set_env
Note
the 'dot' at the beginning of the command.
5.2 If the next
steps cause "/var: file system
full", move
the core.
or crash. directory to a file system with
adequate
space (approximately 2x the sum of the core.x.y.gz files) and
continue at
this point.
Type:
#
/usr/contrib/Q4/bin/q4pxdb vmunix (Disregard "unnecessary" message)
#
/usr/contrib/Q4/bin/q4 -p .
(note
the "dot" at the end of the command)
5.3 At the q4>
prompt, type:
q4> run
Analyze AU > ana.out
q4> run
WhatHappened -HANG > what.out
NOTE: ctrl-c can interrupt these two commands,
which may take
several minutes to process.
5.4 Type:
q4> exit
==============================================================
STEP 6
====== REVIEW AND SEND DATA ========================
==============================================================
6.1 Determine if a hardware problem induced the
crash. If the ana.out or
trace.out
contains references to an HPMC occuring, the cause of the
crash was
very likely a hardware fault.
Type:
# grep
HPMC ana.out trace.out
Check for:
"crash
event was an HPMC"
or
"Crash
Event 0 (HPMC, struct crash_event_table_struct..."
If either
of this lines appear, open a hardware repair request with the
hardware
support organization for this system.
Also, send
the /var/tombstones/ts* file (if that directory exists)
matching
the "dumptime" listed in the INDEX file. It may well have the
hardware
fault codes that can aid in isolating the hardware cause.
If an
HPMC did not occur, proceed to 6.2.
6.2 Check ana.out to see if MC/ServiceGuard (if it is
installed)
triggered
the
reboot. Look for this message:
"MC/ServiceGuard: Unable to maintain contact with cmcld daemon.
Performing
TOC to ensure data integrity."
If so,
type:
#
cmgetconf |grep E_T /etc/cmcluster/*
(Check the
cluster for a NODE_TIMEOUT of 2000000)
If
NODE_TIMEOUT is set to 2 seconds, the crash is probably due to this
extremely
low setting.
To correct
the problem:
Increase
the value to 5-8 seconds in the cluster configuration file and
perform a
"cmapplyconf" with the cluster down. Also, read this
article UXSGLVKBAN00000010
in the <http://ITRC.HP.COM/>
technical database for
more details
on dealing with ServiceGuard-induced crashes.
If
NODE_TIMEOUT was set to 2 seconds and the value was corrected, stop
here.
6.3 Generate a
patch list:
#
/usr/sbin/swlist -l product | grep PH > patches.out
6.4 Send the
following files to hpcu@atl.hp.com
<mailto:hpcu@atl.hp.com> using the
SOFTWARE
CASE ID as the subject:
ana.out
patches.out
trace.out
what.out (if created)
/etc/shutdownlog
/var/tombstones/ts* (if HPMC was detected)
NOTES:
- The hpcu E-Mail box has a 3MB maximum mail size!
- Keep this document and use it on future dumps to
determine whether to
open a
hardware or software case.
*** END ***