- Community Home
- >
- Servers and Operating Systems
- >
- Legacy
- >
- HPE 9000 and HPE e3000 Servers
- >
- Re: HP-UX 10.20 server crashed ... but why?
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-23-2008 11:48 AM
тАО04-23-2008 11:48 AM
Weird problem here. Saturday night, one of our production servers (K460 running 10.20 - yeah, I know..) crashed. I was unable to log in remotely, even via the console. Just a blank screen.
Came in to the office after a 90 minute drive, and the server display showed:
INIT CBF7
TRAPS CPU0123
All I could find in the book on that error was "Entering PDC IO".
The system was hung, so I manually turned the key to 'standby', then back to 'service'. The system booted normally the first time, and has been running flawlessly since. (over 3.5 days now)
Thus far, I've been unable to determine the root cause. Syslog shows no entries for the preceding 17 hours. Dmesg shows nothing of interest, and nothing was written to /var/adm/crash. There is a ts99 file (attached), but unless I'm missing something, I don't see a reason for the crash, or failure to automatically reboot.
Any suggestions or ideas would be appreciated. Thanks.
-Rich
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-23-2008 04:02 PM
тАО04-23-2008 04:02 PM
Re: HP-UX 10.20 server crashed ... but why?
You almost always need a crash dump to see what is happening. For 10.20, make sure you have the crash file configured in /etc/rc.config.d/savecrash:
SAVECRASH=1
SAVECRASH_DIR=/var/adm/crash
Note that unless you overwrite the dump area (often shared with swap), the crash dump may still be viable. Just run the command:
savecrash -z
and if it can find a clean crash dump, it will save it. Then you can run q4 to get a better idea of what happened.
For the K-class machines, this is an invaluable manual:
http://ftp.parisc-linux.org/docs/platforms/A2375-90004.pdf
In the back are all the chassis codes.
From all your ts99 HPMC chassis codes:
0x20b1 HPMC data cache parity fault in tag
0x5008 Processor Memory bus broad fault
0x5108 "
0x5208 "
0x5308 "
0x5408 "
0x5508 "
0x7d09 Single bit memory fault (HPMC)
0x7f14 1 = memory carrier number, 4 = SIMM pair slot number
0xcbf0 High Priority Machine Check occurred
0xcbfb Branching to the OS HPMC handler
So it looks like a fatal memory failure, carrier 1, slot 4 (both 4a and 4b)
Note that some memory problems can be logged, but if this occurred in part of the kernel space, a panic is the only choice. syslog can never tell you anything about a crash because the OS stops running. Only the panic code can do anything (like writing out the crash dump).
Since it is running now, use this command to look at the memory:
echo "selclass qualifier memory;info;wait;infolog" | cstm
Bill Hassell, sysadmin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-24-2008 01:16 AM
тАО04-24-2008 01:16 AM
Solutionindeed this K460 had one ore more single bit errors on Memory Carrier 1, SIMM Pair 4A/4B, but:
The K460 has ECC protected memory and single bit errors are corrected in realtime and do not cause a system crash.
The cause of the crash was a data cache error of Processor 1 (counted 0,1,2,3).
The CPU caches of these old HP9000 servers are (in contrast to new systems) NOT ECC protected. => single bit cache errors cannot be corrected and always lead to a HPMC and system crash.
Something like this can happen on a CPU cache accidentaly. I would do nothing unless the same cache error happens frequently.
You can see the cache error by
1.) a valid & actual timestamp in the ts99 file
2.) a chassis code beginning with 0x2...
In your case:
----------------- Processor 1 HPMC Information ------------------
Timestamp = Sun Apr 20 00:09:39 GMT 2008 (20:08:04:20:00:09:39)
HPMC Chassis Codes = 0xcbf0 0x20b1 <=== (the rest of the chassis codes can be ignored)
best regards
Stefan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-25-2008 02:45 AM
тАО04-25-2008 02:45 AM
Re: HP-UX 10.20 server crashed ... but why?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО04-28-2008 05:41 AM
тАО04-28-2008 05:41 AM
Re: HP-UX 10.20 server crashed ... but why?
Sorry for the delay in replying - I was out of town for a few days.
Thanks for the decoding help and suggestions. We're still up and running (8 1/2 days), thankfully.
Since it only happened once, and is still running, I plan on leaving it alone until our next scheduled maintenance. Then I'll bring it down, swap out cpu1, and go from there. (we have plenty of spares) Of course, if she panics again before that, I'll swap out the cpu before the reboot.
Thanks again for the help, and the pointer to the manual online! Points to be assigned shortly.
-Rich