Top of Form 1 Bottom of Form 1 OZBEKBRC00000611 - How do I use q4 to pre-process a dump that HP needs to read Document Information Table How do I use q4 to pre-process a dump that HP needs to read PROBLEM Prior to Opening a Call with HP, I'd like to know the best way to pre-process my crash dump so HP can troubleshoot it. RESOLUTION USING Q4 TO ANALYZE SYSTEM DUMP FILES (For 10.10-11.20) ============================================================== When HP-UX crashes, it saves a snapshot of RAM in disk-based swap space or dedicated dump space, reboots the system, and copies the resulting "dump" into /var/adm/crash. A utility called q4, normally loaded on the system, is available to make text files for fast analysis. A patched version of q4 must be loaded to interpret dumps resulting from a "hanging" operating system. To preprocess the dump, follow these steps and email the resulting files to the HP Response Center for analysis. Steps vary depending on the version of the O/S and the version of q4. Please note, all email generated from this procedure should be sent to the dump team email address hpcu@atl.hp.com using the CALL ID as the SUBJECT. DO NOT send this information to the engineer's personal address. After emailing the data, please log a callback against the call to let the engineer know that you have emailed your data. ============================================================== STEP 1 ===== WHERE IS THE DUMP? =========================== ============================================================== 1.1 Verify a current dump exists in the dump directory: # ll /var/adm/crash/c* A recent core.N(10.X) or crash.N(11.X) directory should be listed. (NOTE:N is the next available dump index, which increments with each successive dump.) The INDEX file in /var/adm/crash/c* and /etc/shutdownlog contains the "panic" statement. 1.2 # touch /etc/shutdownlog (if it does not exist) 1.3 If a current dump is not in /var/adm/crash, do # grep _DIR /etc/rc.config.d/save* The value pointed to by SAVECORE_DIR=(10.X) or SAVECRASH_DIR=(11.X) is where the system places dump files. 1.4 If the system dump is not in the expected location try to re-save the dump with: 10.X : # savecore -vr 11.00: # savecrash -vr A return message "invalid dump header" means the dump is non-existent. NOTE: If the current dump directory gets full with a dump save, update the directory variable with a directory with more space, and make the new directory to capture future dumps. ============================================================== STEP 2 ===== IS A VERSION OF Q4 LOADED? =================== ============================================================== 2.1 Determine if and which version of q4 is loaded: # swlist -l fileset | grep -i Q4 The following are unpatched versions supplied with the OS: OS-Core.Q4 B.10.20 HP-UX Crash Dump Debugger for PA-RISC systems or OS-Core.Q4 B.11.00 HP-UX Crash Dump Debugger for PA-RISC systems or OS-Core.Q4 B.11.11 HP-UX Crash Dump Debugger for PA-RISC systems 2.2 If one of the following patched versions are listed, proceed to STEP 3: 10.20 11.00 11.11 10.20 [PHCO_20261/PACHRDME/English] 11.00 [PHCO_20262/PACHRDME/English] 11.11 [PHCO_25723/PACHRDME/English] 2.3 If the system does not have q4, or the dump was the result of a hang, load the patched version. Loading the patched version will not cause a system reboot. Installation instructions accompany the patch. Download the appropriate version from this site: For the 10.10 or 10.20 version: ftp://us-ffs.external.hp.com/hp-ux_patches/s700_800/10.X/PHCO_20261 For the 11.0 version: ftp://us-ffs.external.hp.com/hp-ux_patches/s700_800/11.X/PHCO_20262 For the 11.11 version: ftp://us-ffs.external.hp.com/hp-ux_patches/s700_800/11.X/PHCO_25723 NOTE: the patch number may be superceded over time 2.4 If web access is unavailable and no version of q4 is on the system and the install CD is available, proceed to load the standard version of q4: Mount the INSTALL media and verify a matching version of Q4 is available: # swlist -l fileset -s / | grep Q4 OS-Core.Q4 B.10.10 HP-UX Crash Dump Debugger for PA-RISC systems ^^^^^ -matches the O/S Use swinstall to install it: # swinstall -vs / OS-Core.Q4 ============================================================== STEP 3 ===== CD TO THE DUMPS DIRECTORY ==================== ============================================================== NOTE: csh (c-shell) will cause errors with q4. Use sh-posix. # cd (IMPORTANT!) eg: cd /var/adm/crash/core.0 OR /var/adm/crash/crash.0 ============================================================== STEP 4 ===== IF USING UNPATCHED Q4 ======================== ============================================================== 4.1 Perform this command: # /usr/contrib/bin/gunzip vmunix.gz (uncompresses the kernel file) For 10.20 through 11.11, type this command and then skip to 4.2: # /usr/contrib/bin/q4prep -p For 11.20 and beyond, type this command and then skip to 4.2: # /usr/contrib/Q4/bin/q4prep -p If at 10.10, type the following commands: # uncompress /usr/contrib/lib/Q4Lib.tar.Z (ignore the error if this was done previously) # tar -xf /usr/contrib/lib/Q4Lib.tar (output goes into the current directory) # cp q4lib/sample.q4rc.pl ~/.q4rc.pl \ \ \ Note the use of a tilde and letter "l" (not digit 1) # /usr/contrib/bin/q4pxdb vmunix This may complain if vmunix is already preprocessed. 4.2 If the next command causes "/var: file system full", move the core. directory to a file system with adequate space (approximately 2x the sum of the core.x.y.gz files) and continue at this point. Type: # q4 -p . (note the "dot" at the end of the command) Then: q4> trace event 0 > trace.out q4> include analyze.pl \ NOTE letter "l" (not digit 1) q4> run Analyze AU >> ana.out NOTE: ctrl-c will interrupt q4 q4> exit Skip to STEP 6 ============================================================== STEP 5 ===== IF USING THE PATCHED VERSION OF Q4 =========== ============================================================== 5.1 Type: # . /usr/contrib/Q4/bin/set_env Note the 'dot' at the beginning of the command. 5.2 If the next steps cause "/var: file system full", move the core. or crash. directory to a file system with adequate space (approximately 2x the sum of the core.x.y.gz files) and continue at this point. Type: # /usr/contrib/Q4/bin/q4pxdb vmunix (Disregard "unnecessary" message) # /usr/contrib/Q4/bin/q4 -p . (note the "dot" at the end of the command) 5.3 At the q4> prompt, type: q4> run Analyze AU > ana.out q4> run WhatHappened -HANG > what.out NOTE: ctrl-c can interrupt these two commands, which may take several minutes to process. 5.4 Type: q4> exit ============================================================== STEP 6 ====== REVIEW AND SEND DATA ======================== ============================================================== 6.1 Determine if a hardware problem induced the crash. If the ana.out or trace.out contains references to an HPMC occuring, the cause of the crash was very likely a hardware fault. Type: # grep HPMC ana.out trace.out Check for: "crash event was an HPMC" or "Crash Event 0 (HPMC, struct crash_event_table_struct..." If either of this lines appear, open a hardware repair request with the hardware support organization for this system. Also, send the /var/tombstones/ts* file (if that directory exists) matching the "dumptime" listed in the INDEX file. It may well have the hardware fault codes that can aid in isolating the hardware cause. If an HPMC did not occur, proceed to 6.2. 6.2 Check ana.out to see if MC/ServiceGuard (if it is installed) triggered the reboot. Look for this message: "MC/ServiceGuard: Unable to maintain contact with cmcld daemon. Performing TOC to ensure data integrity." If so, type: # cmgetconf |grep E_T /etc/cmcluster/* (Check the cluster for a NODE_TIMEOUT of 2000000) If NODE_TIMEOUT is set to 2 seconds, the crash is probably due to this extremely low setting. To correct the problem: Increase the value to 5-8 seconds in the cluster configuration file and perform a "cmapplyconf" with the cluster down. Also, read this article UXSGLVKBAN00000010 in the technical database for more details on dealing with ServiceGuard-induced crashes. If NODE_TIMEOUT was set to 2 seconds and the value was corrected, stop here. 6.3 Generate a patch list: # /usr/sbin/swlist -l product | grep PH > patches.out 6.4 Send the following files to hpcu@atl.hp.com using the SOFTWARE CASE ID as the subject: ana.out patches.out trace.out what.out (if created) /etc/shutdownlog /var/tombstones/ts* (if HPMC was detected) NOTES: - The hpcu E-Mail box has a 3MB maximum mail size! - Keep this document and use it on future dumps to determine whether to open a hardware or software case. *** END ***