- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - HP-UX
- >
- What To Check When Sys Crashes
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-08-2007 10:43 AM
01-08-2007 10:43 AM
I have a little D370 running Hp v11.
For some reason the beast decided to crash yesterday. Only thing I could do was ctrl B from the terminal and RS.
My question is, when something like this happens what are things you wonderful people check?
I checked syslogs, rc.logs, messages, crash logs. Unfortunately I have not been able to find an answer to why the system crashed and I have management asking why.
Thanking anyone for any info...
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-08-2007 10:52 AM
01-08-2007 10:52 AM
SolutionIf yes, then you can decode that with q4.
cd /var/adm/crash/crash.0
# /usr/contrib/Q4/bin/q4 -p .
(note the "dot" at the end of the command)
At the q4> prompt, type:
q4> run Analyze AU > ana.out
q4> run WhatHappened -HANG > what.out
NOTE: ctrl-c can interrupt these two commands, which may take several minutes to process.
To exit q4:
q4> exit
Rgds...Geoff
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-08-2007 10:55 AM
01-08-2007 10:55 AM
Re: What To Check When Sys Crashes
Other than the logs you mentioned already, you could try looking at /var/opt/resmon/log/event.log. It's hard to say if you'll find anything.
-denver
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-08-2007 10:57 AM
01-08-2007 10:57 AM
Re: What To Check When Sys Crashes
/var/tombstones
/etc/shutdownlog
/var/adm/syslog/syslog.log
The q4 will provide the majority of the info but sometimes you can get crash info from some of these other files. If you have no support this could provide other options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-08-2007 11:29 AM
01-08-2007 11:29 AM
Re: What To Check When Sys Crashes
It sounds like this did not happen so the problem was more likely a hang where the system appeared to be dead and unresponsive. Hangs cannot be diagnosed from logs because the OS stopped running or ended up in an endless loop. The only way to diagnose this condition is to use CTRL-B and then TC rather than RS. That will create a memory dump which can be analyzed as to the reason for the hang.
Understanding the memory dump and diagnosing a fix is almost impossible without a lot of OS internals training, so you'll need to hand the dump over to HP for analysis. The alternative is to bring the system up to date on patches.
Bill Hassell, sysadmin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-09-2007 08:32 AM
01-09-2007 08:32 AM
Re: What To Check When Sys Crashes
thanks heaps for you responses. I will keep the TC in mind the next time one of my babies misbehaves...
Thumbs up to all that replied..
Alex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-09-2007 08:56 AM
01-09-2007 08:56 AM
Re: What To Check When Sys Crashes
You also need to check :
1. /var/tombstones/ts99 file for valid timestamps.
2. also /etc/shutdownlog ( For detail error like panic: .....
3. You can run HP Collector script ( collector.sh ) if you have , with crashinfo command , and to send the output file to HP for debuging the error, as what exactly caused the crash,
, There is three reason for crash , [ PANIC:crash caused by OS , TOC : by TOC , and HPMC: hw related. ] and Q4 analysis also can be useful and can be run to check the cause of the problem, but its quite lengthy , better its prefered method to send the dump to HP.
Hth,
Raj.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-09-2007 09:00 AM
01-09-2007 09:00 AM
Re: What To Check When Sys Crashes
could you please explain a little more on the collector script. IE: where can I get that as we dont have it? Also, what command is that crashinfo? I had a look at the man pages but did not find anything on crashinfo?
Or, are you saying to send the crash information over to Hp to get it investigated?
Alex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-09-2007 09:29 AM
01-09-2007 09:29 AM
Re: What To Check When Sys Crashes
" are you saying to send the crash information over to Hp to get it investigated? "
Well, Not really, as the dump files big in size, in GB, so HP usually sends the script and to be run by you and to send the small output to them, thats it.
Well,
I think i do not have the script right now with me, this is the script given by hp when I logged the call after one system crashed, after leaving the dumps on /var/adm/crash/crash.0/...
And they will send you the file named 1. COLLECTION.sh (A big shell script: ,and 2. crashinfo.zip (zipped binary file : size 876KB ), attached here with.
Procedure: You need to copy the crashinfo binary file into /var/adm/crash , and run the collection.sh script, and it will generate a report .tz format [contain all system info and other details including ts99 , and ts99.tracefile.txt (important one) ], And once you send this report , they will tell you the cause for the crash. Many time it happens due to hardware failuer as well. If I remember correctly ,for few of my crash occasion ,HP replaced one cpu once and once one scsi card,
Cheers,
Raj.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-09-2007 09:32 AM
01-09-2007 09:32 AM
Re: What To Check When Sys Crashes
Alex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-09-2007 09:45 AM
01-09-2007 09:45 AM
Re: What To Check When Sys Crashes
Again, If you run the crashinfo command it will generate the result file and that will have the cause , check below: (You need same size of space on /var as the size of the dump file, or it will give error.)
# cd /var/adm/crash.0
#./crashinfo -c 2> /dev/null 1> crash0_crashinfo.txt 2>&1
And the result file is here: crash.0_crashingo.txt
Just pasting few lines from the crash.0_crashinfo.txt file to make it easy:
=======================================
crashinfo (3.9)
libp4 (8.47): Opening ./vmunix ./INDEX
Loading symbols from ./vmunix
Kernel TEXT pages not requested in crashconf
Will use an artificial mapping from a.out TEXT pages
crashinfo (3.9) output
=====================
= Table Of Contents =
=====================
* General Information
* Crash Events
* Message Buffer
* Memory Globals
* Buffer Cache Globals
* Swap Information
* Global Error Counters / kmem_writes
* Network Interfaces
* IOVA Usage Check
* Crash Event / Processor Information
* Processor Clock Info
* Syswait Array
* Load Averages
* Thread Information
* Kernel Patches
=======================
= General Information =
=======================
Dump time Thu Jul 28 13:13:31 2005 UTC4
System has been up 184 days, 23 hours, 50 minutes.
System Name : HP-UX
Node Name : server20
Model : 9000/800/SD32000
HP-UX version : B.11.11 (64-bit Kernel)
Number of CPU's : 8
Disabled CPU's : 0
CPU type : PCXW+ (875 Mhz)
CPU Architecture : PA-RISC 2.0
Load average : 0.34 0.27 0.26
================
= Crash Events =
================
Note: Crash event 0 was a PANIC !
Panic string :
Note: In the case of a PANIC, normally crash event 0 is the crash
event you should concentrate on. There may well be other secondary
panics (for example spinlock panics) that have happened as a
consequence of the original panic.
Stack Trace for Crash event 0
=============================
============== EVENT ============================
= Event #0 is PANIC on CPU #6
= p crash_event_t 0xabf000
= p rpb_t 0xab8100
= Using pc from pim.wide.rp_rp_hi = 0x233724
============== EVENT ============================
SR5=0x01cdb400
SP RP Return Name
0x400003ffffff1338 0x00233724 panic+0x6c
0x400003ffffff1298 0x00038400 fdc_target_miss_PCXU
0x400003ffffff1248 0x000378b4 fdcache_conditionally+0x90
0x400003ffffff11e8 0x001079c8 checkaccess+0x6d0
0x400003ffffff1098 0x00107be0 hdl_pfault+0x158
0x400003ffffff0f48 0x001808d8 pfault+0x120
0x400003ffffff0e68 0x001678cc trap+0x68c
0x400003ffffff0c78 0x0016a444 thandler+0xd20
+------------- TRAP ----------------------------
| Trap type 7 in USER mode at 0xd973c00.0x800003ffbfe9b8f3 (???)
| p struct save_state 0x1cdb400.0x400003ffffff07a8
+------------- TRAP ----------------------------
Stack Trace for Crash event 0 with all args
===========================================
============== EVENT ============================
= Event #0 is PANIC on CPU #6
= p crash_event_t 0xabf000
= p rpb_t 0xab8100
= Using pc from pim.wide.rp_rp_hi = 0x233724
============== EVENT ============================
SR5=0x01cdb400
SP RP Return Name
0x400003ffffff1338 0x00233724 panic+0x6c
arg0: 0x0000000000b2e1c0
0x400003ffffff1298 0x00038400 fdc_target_miss_PCXU
0x400003ffffff1248 0x000378b4 fdcache_conditionally+0x90
0x400003ffffff11e8 0x001079c8 checkaccess+0x6d0
arg0: 0x0000000087638200
arg1: 0x0000000000000000
arg2: 0x800003ffbfe9b000
arg3: 0x000000000022e5a7
arg4: 0x400003ffffff0fb8
0x400003ffffff1098 0x00107be0 hdl_pfault+0x158
arg0: 0x0000000087638200
arg1: 0x0000000000000000
arg2: 0x000000000d973c00
arg3: 0x800003ffbfe9b000
arg4: 0x400003ffffff0ec0
0x400003ffffff0f48 0x001808d8 pfault+0x120
arg0: 0x0000000000000000
arg1: 0x0000000000000000
arg2: 0x000000000d973c00
arg3: 0x800003ffbfe9b8f3
0x400003ffffff0e68 0x001678cc trap+0x68c
.... --------n/a-------
arg1: 0x400003ffffff07a8
0x400003ffffff0c78 0x0016a444 thandler+0xd20
+------------- TRAP ----------------------------
| Trap type 7 in USER mode at 0xd973c00.0x800003ffbfe9b8f3 (???)
| p struct save_state 0x1cdb400.0x400003ffffff07a8
+------------- TRAP ----------------------------
Stack Traces for all other Crash events
=======================================
============== EVENT ============================
= Event #1 is TOC on CPU #7
= p crash_event_t 0xabf030
= p rpb_t 0x1fcacb0
= Using pc from pim.wide.rp_pcoq_head_hi = 0x288ca0
============== EVENT ============================
SR5=0x0cb75c00
SP RP Return Name
0x400003ffffff0e68 0x00288ca0 check_panic_loop+0x20
0x400003ffffff0e68 0x00167da4 trap+0xb64
0x400003ffffff0c78 0x0016a444 thandler+0xd20
+------------- TRAP ----------------------------
| Trap type 31 in USER mode at 0x7e1ec00.0xc004e20b (???)
| p struct save_state 0xcb75c00.0x400003ffffff07a8
+------------- TRAP ----------------------------
============== EVENT ============================
= Event #2 is TOC on CPU #4
= p crash_event_t 0xabf060
= p rpb_t 0x1fc9810
= Using pc from pim.wide.rp_pcoq_head_hi = 0x288ca0
============== EVENT ============================
SR5=0x0c2f8c00
SP RP Return Name
0x400003ffffff0e68 0x00288ca0 check_panic_loop+0x20
0x400003ffffff0e68 0x00167da4 trap+0xb64
0x400003ffffff0c78 0x0016a444 thandler+0xd20
+------------- TRAP ----------------------------
| Trap type 31 in USER mode at 0x29a400.0xe3941543 (???)
| p struct save_state 0xc2f8c00.0x400003ffffff07a8
+------------- TRAP ----------------------------
============== EVENT ============================
= Event #3 is TOC on CPU #5
= p crash_event_t 0xabf090
= p rpb_t 0x1fc9ef0
= Using pc from pim.wide.rp_pcoq_head_hi = 0x288ca0
============== EVENT ============================
SR5=0x0e307400
SP RP Return Name
0x400003ffffff0e68 0x00288ca0 check_panic_loop+0x20
0x400003ffffff0e68 0x00167da4 trap+0xb64
0x400003ffffff0c78 0x0016a444 thandler+0xd20
+------------- TRAP ----------------------------
| Trap type 31 in USER mode at 0xf623c00.0xeea6fa13 (???)
| p struct save_state 0xe307400.0x400003ffffff07a8
+------------- TRAP ----------------------------
============== EVENT ============================
= Event #4 is TOC on CPU #0
= p crash_event_t 0xabf0c0
= p rpb_t 0xab8e50
= Using pc from pim.wide.rp_pcoq_head_hi = 0x288cb0
============== EVENT ============================
SR5=0x04d97800
SP RP Return Name
0x400003ffffff1588 0x00288cb0 check_panic_loop+0x30
0x400003ffffff1588 0x00167da4 trap+0xb64
0x400003ffffff1398 0x0016a444 thandler+0xd20
+------------- TRAP ----------------------------
| Trap type 31 in KERNEL mode at 0x14ea84 (spluser+0x14)
| p struct save_state 0x4d97800.0x400003ffffff0ec8
+------------- TRAP ----------------------------
SR5=0x04d97800
SP RP Return Name
0x400003ffffff0ec8 0x0014ea84 spluser+0x14
0x400003ffffff0e58 0x0014e25c syscall+0x48c
0x400003ffffff0c78 0x00033f64 syscallinit+0x55c
============== EVENT ============================
= Event #5 is TOC on CPU #1
= p crash_event_t 0xabf0f0
= p rpb_t 0x1fc8370
= Using pc from pim.wide.rp_pcoq_head_hi = 0x288cb0
============== EVENT ============================
SR5=0x033b8000
SP RP Return Name
0x400003ffffff1588 0x00288cb0 check_panic_loop+0x30
0x400003ffffff1588 0x00167da4 trap+0xb64
0x400003ffffff1398 0x0016a444 thandler+0xd20
+------------- TRAP ----------------------------
| Trap type 31 in KERNEL mode at 0x14ea84 (spluser+0x14)
| p struct save_state 0x33b8000.0x400003ffffff0ec8
+------------- TRAP ----------------------------
SR5=0x033b8000
SP RP Return Name
0x400003ffffff0ec8 0x0014ea84 spluser+0x14
0x400003ffffff0e58 0x0014e25c syscall+0x48c
0x400003ffffff0c78 0x00033f64 syscallinit+0x55c
============== EVENT ============================
= Event #6 is TOC on CPU #2
= p crash_event_t 0xabf120
= p rpb_t 0x1fc8a50
= Using pc from pim.wide.rp_pcoq_head_hi = 0x288ca8
============== EVENT ============================
SR5=0x01b60c00
SP RP Return Name
0x400003ffffff0e68 0x00288ca8 check_panic_loop+0x28
0x400003ffffff0e68 0x00167da4 trap+0xb64
0x400003ffffff0c78 0x0016a444 thandler+0xd20
+------------- TRAP ----------------------------
| Trap type 31 in USER mode at 0xab69800.0x800003ffbfe99493 (???)
| p struct save_state 0x1b60c00.0x400003ffffff07a8
+------------- TRAP ----------------------------
......
=========================================
Cheers,
Raj.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-18-2008 02:32 PM
07-18-2008 02:32 PM
Re: What To Check When Sys Crashes
thread yesterday (hoping one of the
original posters would be notified :)
Today, the ITRC software sent me a note saying
that a reply was posted ... so I restart
MSIE (and FireFox) to display the thread.
Not only do I not see the reply, I don't
see my post either!
(I waited about 4 more hours and viewed the
thread again ... still don't see my post or
the reply!)
So...if the replier doesn't mind, I can be
reached at sieler@allegro.com
thanks!
Stan