Top of Form 1

Bottom of Form 1

 

OZBEKBRC00000611 - How do I use q4 to pre-process a dump that HP needs to read
Document Information Table
How do I use q4 to pre-process a dump that HP needs to read

PROBLEM

Prior to Opening a Call with HP, I'd like to know the best way to pre-process

my crash dump so HP can troubleshoot it.

RESOLUTION

 

           USING Q4 TO ANALYZE SYSTEM DUMP FILES

                    (For 10.10-11.20)

==============================================================

When HP-UX crashes, it saves a snapshot of RAM in disk-based swap space or

dedicated dump space, reboots the system, and copies the resulting "dump"

into /var/adm/crash.

 

A utility called q4, normally loaded on the system, is available to

make text files for fast analysis.  A patched version of q4 must be loaded

to interpret dumps resulting from a "hanging" operating system.

 

To preprocess the dump, follow these steps and email the resulting files

to the HP Response Center for analysis.  Steps vary depending on the

version of the O/S and the version of q4.

 

Please note, all email generated from this procedure should be sent to the

dump team email address hpcu@atl.hp.com using the CALL ID as

the SUBJECT. DO NOT send this information to the engineer's personal

address.

 

After emailing the data, please log a callback against the call to let

the engineer know that you have emailed your data.  
 

 

==============================================================

STEP 1  =====  WHERE IS THE DUMP?  ===========================

==============================================================

1.1 Verify a current dump exists in the dump directory:

 

   # ll /var/adm/crash/c* 

 

  A recent core.N(10.X) or crash.N(11.X) directory should be

  listed.

  (NOTE:N is the next available dump index, which increments

  with each successive dump.)

 

  The INDEX file in /var/adm/crash/c* and /etc/shutdownlog

  contains the "panic" statement.

 

1.2  # touch /etc/shutdownlog       (if it does not exist)

 

1.3  If a current dump is not in /var/adm/crash, do

 

   # grep _DIR /etc/rc.config.d/save*

 

  The value pointed to by SAVECORE_DIR=(10.X) or

  SAVECRASH_DIR=(11.X) is where the system places dump files.

 

1.4  If the system dump is not in the expected location try to

  re-save the dump with:

 

  10.X :   # savecore -vr

  11.00:   # savecrash -vr

 

  A return message "invalid dump header" means the dump is

  non-existent.

 

  NOTE:  If the current dump directory gets full with a dump save,

  update the directory variable with a directory with more space, and make

  the new directory to capture future dumps.

 

 

==============================================================

STEP 2  =====  IS A VERSION OF Q4 LOADED?  ===================

==============================================================

2.1  Determine if and which version of q4 is loaded:

 

   # swlist -l fileset | grep -i Q4

 

  The following are unpatched versions supplied with the OS:

  OS-Core.Q4    B.10.20   HP-UX Crash Dump Debugger for PA-RISC systems

      or

  OS-Core.Q4    B.11.00   HP-UX Crash Dump Debugger for PA-RISC systems

      or

  OS-Core.Q4    B.11.11   HP-UX Crash Dump Debugger for PA-RISC systems

 

 

2.2  If one of the following patched versions are listed, proceed to

STEP 3:

          10.20               11.00                11.11

        10.20  [PHCO_20261/PACHRDME/English]         

        11.00  [PHCO_20262/PACHRDME/English] 

        11.11  [PHCO_25723/PACHRDME/English] 

 

2.3  If the system does not have q4, or the dump was the result of a

     hang, load the patched version.  Loading the patched version

     will not cause a system reboot.  Installation instructions

     accompany the patch.

 

  Download the appropriate version from this site:

For the 10.10 or 10.20  version:

  ftp://us-ffs.external.hp.com/hp-ux_patches/s700_800/10.X/PHCO_20261

For the 11.0 version:

  ftp://us-ffs.external.hp.com/hp-ux_patches/s700_800/11.X/PHCO_20262

For the 11.11 version:

  ftp://us-ffs.external.hp.com/hp-ux_patches/s700_800/11.X/PHCO_25723

 

  NOTE: the patch number may be superceded over time

 

2.4  If web access is unavailable and no version of q4 is on the

  system and the install CD is available, proceed to load the standard

  version of q4:

 

  Mount the INSTALL media and verify a matching version of Q4 is

  available:

 

   # swlist -l fileset -s / | grep Q4

 

  OS-Core.Q4    B.10.10     HP-UX Crash Dump Debugger for PA-RISC systems

                  ^^^^^ -matches the O/S

 

  Use swinstall to install it:

 

   # swinstall -vs / OS-Core.Q4

 

 

 

==============================================================

STEP 3  =====  CD TO THE DUMPS DIRECTORY  ====================

==============================================================

NOTE: csh (c-shell) will cause errors with q4. Use sh-posix.

 

   # cd               (IMPORTANT!)

   eg:  cd /var/adm/crash/core.0 OR /var/adm/crash/crash.0

 

==============================================================

STEP 4  =====  IF USING UNPATCHED Q4  ========================

==============================================================

4.1  Perform this command:

 

   # /usr/contrib/bin/gunzip vmunix.gz

    (uncompresses the kernel file)

 

  For 10.20 through 11.11, type this command and then skip to 4.2:

 

   # /usr/contrib/bin/q4prep -p

 

  For 11.20 and  beyond, type this command and then skip to 4.2:

 

   # /usr/contrib/Q4/bin/q4prep -p

 

  If at 10.10, type the following commands:

 

   # uncompress /usr/contrib/lib/Q4Lib.tar.Z

    (ignore the error if this was done previously)

 

   # tar -xf /usr/contrib/lib/Q4Lib.tar

    (output goes into the current directory)

 

   # cp q4lib/sample.q4rc.pl ~/.q4rc.pl

                            \ \        \

  Note the use of a tilde and letter "l" (not digit 1)

 

   # /usr/contrib/bin/q4pxdb vmunix

    This may complain if vmunix is already preprocessed.

 

4.2  If the next command causes "/var:  file system full",

  move the core. directory to a file system with adequate

  space (approximately 2x the sum of the core.x.y.gz files) and

  continue at this point.

 

  Type:

   # q4 -p  .

       (note the "dot" at the end of the command)

  Then:

   q4> trace event 0 > trace.out

 

   q4> include analyze.pl

                         \

     NOTE letter "l" (not digit 1)

 

   q4> run Analyze AU >> ana.out

     NOTE: ctrl-c will interrupt q4

 

   q4>  exit

 

  Skip to STEP 6

 

 

==============================================================

STEP 5  =====  IF USING THE PATCHED VERSION OF Q4  ===========

==============================================================

5.1  Type:

   # . /usr/contrib/Q4/bin/set_env

   Note the 'dot' at the beginning of the command.

 

5.2  If the next steps cause "/var:  file system full", move

  the core. or crash. directory to a file system with

  adequate space (approximately 2x the sum of the core.x.y.gz files) and

  continue at this point.

 

  Type:

   # /usr/contrib/Q4/bin/q4pxdb vmunix  (Disregard "unnecessary" message)

 

   # /usr/contrib/Q4/bin/q4 -p .

         (note the "dot" at the end of the command)

 

5.3  At the q4> prompt, type:

 

   q4> run Analyze AU > ana.out

 

   q4> run WhatHappened -HANG > what.out

      NOTE:  ctrl-c can interrupt these two commands, which may take

         several minutes to process.

 

5.4  Type:

 

   q4> exit

 

 

==============================================================

STEP 6  ======  REVIEW AND SEND DATA  ========================

==============================================================

6.1 Determine if a hardware problem induced the crash.  If the ana.out or

  trace.out contains references to an HPMC occuring, the cause of the

  crash was very likely a hardware fault.

 

  Type:

   # grep HPMC ana.out trace.out

 

  Check for:

    "crash event was an HPMC"

           or

    "Crash Event 0 (HPMC, struct crash_event_table_struct..."

 

  If either of this lines appear, open a hardware repair request with the

  hardware support organization for this system.

 

  Also, send the /var/tombstones/ts* file (if that directory exists)

  matching the "dumptime" listed in the INDEX file.  It may well have the

  hardware fault codes that can aid in isolating the hardware cause.

 

  If an HPMC did not occur, proceed to 6.2.

 

6.2 Check ana.out to see if MC/ServiceGuard (if it is installed)

triggered

     the reboot.  Look for this message:

 

  "MC/ServiceGuard: Unable to maintain contact with cmcld daemon.

  Performing TOC to ensure data integrity."

 

  If so, type:

   # cmgetconf |grep E_T /etc/cmcluster/*

  (Check the cluster for a NODE_TIMEOUT of 2000000)

 

  If NODE_TIMEOUT is set to 2 seconds, the crash is probably due to this

  extremely low setting.

 

  To correct the problem:

  Increase the value to 5-8 seconds in the cluster configuration file and

  perform a "cmapplyconf" with the cluster down.  Also, read this

  article UXSGLVKBAN00000010 in the <http://ITRC.HP.COM/> technical database for

  more details on dealing with ServiceGuard-induced crashes.

 

  If NODE_TIMEOUT was set to 2 seconds and the value was corrected, stop

  here.

 

6.3  Generate a patch list:

 

   # /usr/sbin/swlist -l product | grep PH > patches.out

 

6.4  Send the following files to hpcu@atl.hp.com <mailto:hpcu@atl.hp.com> using the

     SOFTWARE CASE ID as the subject:

 

  ana.out

  patches.out

  trace.out

  what.out   (if created)

  /etc/shutdownlog

  /var/tombstones/ts* (if HPMC was detected)

 

NOTES:

- The hpcu E-Mail box has a 3MB maximum mail size!

 

- Keep this document and use it on future dumps to determine whether to

  open a hardware or software case.

  

                                *** END ***