- Community Home
- >
- Servers and Operating Systems
- >
- Legacy
- >
- HPE 9000 and HPE e3000 Servers
- >
- Rp2470 Keeps rebooting
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-24-2008 03:32 AM
тАО06-24-2008 03:32 AM
We got an Rp2470 server thats keeps on rebooting, There's no messages on syslog.log, Rc.log, Demegs that incidate what causes the reboot but i have got the GSP output that may help you guys, To me it seems that it's the Processor that's at fault but can you guru confirm with me if this is the case. The outputs are:
Log Entry # 1 :
SYSTEM NAME: dvdb01-web
DATE: 06/24/2008 TIME: 10:01:17
ALERT LEVEL: 2 = Non-Urgent operator attention required
SOURCE: 0 = unknown, no source stated
SOURCE DETAIL: 0 = unknown, no source stated SOURCE ID: FF
PROBLEM DETAIL: 0 = no problem detail
CALLER ACTIVITY: 6 = machine check STATUS: 2
CALLER SUBACTIVITY: 51 = implementation dependent
REPORTING ENTITY TYPE: 0 = system firmware REPORTING ENTITY ID: 00
0x0000002000FF6512 00000000 00000000 type 0 = Data Field Unused
0x5800082000FF6512 00006C05 180A0111 type 11 = Timestamp 06/24/2008 10:01:17
Type CR for next entry, - CR for previous entry, Q CR to quit.
Log Entry # 2 :
SYSTEM NAME: dvdb01-web
DATE: 06/24/2008 TIME: 08:14:50
ALERT LEVEL: 13 = System hang detected via timer popping
SOURCE: 1 = processor
SOURCE DETAIL: 1 = processor general SOURCE ID: 0
PROBLEM DETAIL: 4 = timeout
CALLER ACTIVITY: F = display_activity() update STATUS: 0
CALLER SUBACTIVITY: 00 = implementation dependent
REPORTING ENTITY TYPE: E = HP-UX REPORTING ENTITY ID: 00
0x78E000D41100F000 00000003 0000000A type 15 = Activity Level/Timeout
0x58E008D41100F000 00006C05 18080E32 type 11 = Timestamp 06/24/2008 08:14:50
Log Entry # 7 :
SYSTEM NAME: dvdb01-web
DATE: 06/23/2008 TIME: 11:21:51
ALERT LEVEL: 12 = Software failure
SOURCE: 1 = processor
SOURCE DETAIL: 1 = processor general SOURCE ID: 0
PROBLEM DETAIL: 0 = no problem detail
CALLER ACTIVITY: B = system panic STATUS: 0
CALLER SUBACTIVITY: 00 = implementation dependent
REPORTING ENTITY TYPE: E = HP-UX REPORTING ENTITY ID: 01
0xA0E010C01100B000 00000000 000005E9 type 20 = major change in system state
0x58E018C01100B000 00006C05 170B1533 type 11 = Timestamp 06/23/2008 11:21:51
Type CR for next entry, - CR for previous entry, Q CR to qui
Thanks
William
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-24-2008 04:07 AM
тАО06-24-2008 04:07 AM
Re: Rp2470 Keeps rebooting
Log Entry #7 means "Software failure", a dump should have been written to /var/adm/crash. If not then I would strongly suggest to examine the console log (on the GSP use the "cl" command) and watch out for the panic sting which is helpful to determin the cause.
Log Entry #2 is a "hang" which is logged if the OS does not send a hartbeat to the GSP for some time. It is usually an indication that the system hangs or crashed.
Log Entry #1 means that a TOC (transfer of control) has happened. Either someone pressed the TOC button or "tc" was issued on the MP (some software like Service Guard also issues a TOC in a hang sitiuation).
=> => all this looks more like a software system panic or hang (which of course can be caused by a hardware problem like a bad root disk etc.)
Check /var/adm/crash, /etc/shutdownlog and the console log with the GSP "cl" command to find out what happened here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-24-2008 05:58 AM
тАО06-24-2008 05:58 AM
Re: Rp2470 Keeps rebooting
William
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-24-2008 06:02 AM
тАО06-24-2008 06:02 AM
Re: Rp2470 Keeps rebooting
Thanks
William
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-24-2008 06:19 AM
тАО06-24-2008 06:19 AM
Re: Rp2470 Keeps rebooting
Its also worth checking the GSP version installed as there was issues with timer popping and systems rebooting with the earlier versions.
The updates seemed to fix this, and there is reference to this in the .txt file for the patches
post your version and whether it is a A,B or C revision GSP
login to GSP and type he it will list it at the top of the page
Andy
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-24-2008 06:39 AM
тАО06-24-2008 06:39 AM
Re: Rp2470 Keeps rebooting
Hardware Revision A0 Firmware Revision C.02.14
William
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-24-2008 11:00 PM
тАО06-24-2008 11:00 PM
Solution"no valid timestamp" in the stm information on a CPU means that no HPMC (high priority machine check) has happened on that CPU since it was installed. A HPMC could be a direct hint on a hardware problem, but not nessecarily on that CPU. The logs would have to be analyzed by HP support.
The /etc/shutdownlog shows that you had several system panics ("Software Failure"):
A "panic" means, the hardware (firmware) did not see any error, but to Operating System (HPUX) found something unexpected and uncorrectable. The panic is normally followed by a memory dump ("man savecrash") and finaly a reset.
10:00 Tue May 27 2008. Reboot after panic: Data page fault
15:05 Tue May 27 2008. Reboot after panic: Break instruction trap
16:08 Fri Jun 20 2008. Reboot after panic: Illegal instruction trap
12:34 Mon Jun 23 2008. Reboot after panic: Break instruction trap
The panic string is not very specific and someone would have to analyze the memory dumps under /var/adm/crash to find the cause of the System Panic.
If you are really able to reproduce this problem with the STM excerciser, then it is possible that a miscalculating CPU is responsible for the panics (this is only one of many possible root causes).
To verify this you could disable one of the CPUs (reboot the server and disable it in the BCH configuration menue, CPU will be disabled after a 2nd reset then). If then the excerciser runs fine, you know that this CPU was causing the problem. But this is only possible if
a) you have two CPUs
b) the problem is fully reproducable with STM
I would strongly suggest to open a case with HP support for a dump analysis.
best regards
Stefan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-25-2008 06:48 AM
тАО06-25-2008 06:48 AM
Re: Rp2470 Keeps rebooting
OK.. I have a tombstone from the server mention and it seems that theres no time stamp for the CPU, What i got told is that if there's no time stamp on the CPU it would indicate that the CPU is at fault, Can any1 comfirm this with me
Thanks
William
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-25-2008 07:06 AM
тАО06-25-2008 07:06 AM
Re: Rp2470 Keeps rebooting
"no valid timestamp" is the normal status you see on a CPU that never had a HPMC. This is a normal status.
I think the one who told you that this is strange ment that it is a hint on a defect CPU if after a HPMC ONLY ONE CPU has "no valid timestamp" but all others have actual timestamps.
This is the only case were the "no valid timestamp" could be a hint that the CPU is faulty (f.e. because it was halted before the HPMC happened causing a timeout or CPU bus error in a transaction).
But even in this case, the hint has to be verified (f.e. have this CPU be replaced after the HPMC which would cause the same output, was it really a timeout or CPU bus error etc.).
In you example files, both CPUs have no valid timestamp and this is a proove that no HPMC has happened.
=> the tombstone file can be ignored. Concentrate on the memory dump.
Unfortunately, the missing HPMC does not mean that the CPUs are OK. Some errors (register errors f.e.) are not detected by firmware (except in selftests or CPU diagnostic tools) and may cause system panics like you have, but this only happens in rare cases. Without a dump analysis we cannot say what really happened.
best regards
Stefan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-07-2008 05:42 AM
тАО07-07-2008 05:42 AM