- Community Home
- >
- Servers and Operating Systems
- >
- Legacy
- >
- HPE 9000 and HPE e3000 Servers
- >
- superdome sd16a cpu problem
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-14-2009 12:04 AM
тАО06-14-2009 12:04 AM
I got a superdome sd16a ,which reports errors about CPU ,do i need to replace the CPU ,and how can determine the location ( Cab 0 Cell 0 CPU 4 )
thank you.
CURRENT MONITOR DATA:
Event Time..........: Sun Apr 26 09:45:23 2009
Severity............: CRITICAL
Monitor.............: fpl_em
Event #.............: 1698
System..............: zhkf1
Summary:
Machine check type could not be determined.
Description of Error:
The Reporting Entity CPU experienced a trap that has caused an asynchronous
branch to the machine check handler, but CPU logs do not indicate that an HPMC,
LPMC or TOC has occurred. The data field will contain the CPU Check Summary.
This Check Summary is described in the return value description for
CpuProcessMachineCheck in PA-8800 CPU Library Application
Probable Cause / Recommended Action:
Contact HP Support. Save event list and Processor HPMC PIM for analysis by lab.
-
Additional Event Data:
System IP Address...: 133.224.202.13
Event Id............: 0x49f3bcb300000000
Monitor Version.....: A.01.00
Event Class.........: System
Client Configuration File...........:
/var/stm/config/tools/monitor/default_fpl_em.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 1
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/SD16A
EMS Version.....................: A.04.20
STM Version.....................: A.45.00
System Serial Number............: SGH443838R
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/fpl_em.htm#1698
v-v-v-v-v-v-v-v-v-v-v-v-v D E T A I L S v-v-v-v-v-v-v-v-v-v-v-v-v
IPMI event hex: 0xf7800c9604e00000 000000000000000000
Time Stamp: Sun Apr 26 01:11:05 2009
Event keyword: ERR_CHECK_FALL_THROUGH
Alert level name: Fatal
Reporting vers: 1
Data field type: Status return from function call
Decoded data field:
Reporting entity ID: 4 ( Cab 0 Cell 0 CPU 4 )
Reporting entity Full Name: System Firmware
IPMI Event ID : 3222 (0xc96)
#[/var/opt/resmon/log]parstatus
Warning: No action specified. Default behaviour is display all.
[Complex]
Complex Name : Complex 2
Complex Capacity
Compute Cabinet (4 cell capable) : 1
Active GSP Location : cabinet 0
Model : 9000/800/SD16A
Serial Number : SGH443838R
Current Product Number : A6113A
Original Product Number : A6113A
Complex Profile Revision : 1.0
The total number of Partitions Present : 1
[Cabinet]
Cabinet I/O Bulk Power Backplane
Blowers Fans Supplies Power Boards
OK/ OK/ OK/ OK/
Cab Failed/ Failed/ Failed/ Failed/
Num Cabinet Type N Status N Status N Status N Status GSP
=== ============ ========= ========= ========== ============ ======
0 SD16A 4/ 0/ N+ 5/ 0/ N+ 4/ 0/ N+ 2/ 0/ N active
Notes: N+ = There are one or more spare items (fans/power supplies).
N = The number of items meets but does not exceed the need.
N- = There are insufficient items to meet the need.
? = The adequacy of the cooling system/power supplies is unknown.
[Cell]
CPU Memory Use
OK/ (GB) Core On
Hardware Actual Deconf/ OK/ Cell Next Par
Location Usage Max Deconf Connected To Capable Boot Num
========== ============ ======= ========= =================== ======= ==== ===
cab0,cell0 active core 8/0/8 32.0/ 0.0 cab0,bay1,chassis3 yes yes 0
cab0,cell1 active base 8/0/8 32.0/ 0.0 - no yes 0
cab0,cell2 active base 8/0/8 32.0/ 0.0 - no yes 0
cab0,cell3 active base 8/0/8 32.0/ 0.0 cab0,bay0,chassis3 yes yes 0
[Chassis]
Core Connected Par
Hardware Location Usage IO To Num
=================== ============ ==== ========== ===
cab0,bay0,chassis0 absent - - -
cab0,bay0,chassis1 absent - - -
cab0,bay0,chassis2 absent - - -
cab0,bay0,chassis3 active yes cab0,cell3 0
cab0,bay1,chassis0 absent - - -
cab0,bay1,chassis1 absent - - -
cab0,bay1,chassis2 absent - - -
cab0,bay1,chassis3 active yes cab0,cell0 0
[Partition]
Par # of # of I/O
Num Status Cells Chassis Core cell Partition Name (first 30 chars)
=== ============ ===== ======== ========== ===============================
0 active 4 2 cab0,cell0 Partition 0
zhkf1#[/var/opt/resmon/log]parstatus -C
[Cell]
CPU Memory Use
OK/ (GB) Core On
Hardware Actual Deconf/ OK/ Cell Next Par
Location Usage Max Deconf Connected To Capable Boot Num
========== ============ ======= ========= =================== ======= ==== ===
cab0,cell0 active core 8/0/8 32.0/ 0.0 cab0,bay1,chassis3 yes yes 0
cab0,cell1 active base 8/0/8 32.0/ 0.0 - no yes 0
cab0,cell2 active base 8/0/8 32.0/ 0.0 - no yes 0
cab0,cell3 active base 8/0/8 32.0/ 0.0 cab0,bay0,chassis3 yes yes 0
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-15-2009 07:08 AM
тАО06-15-2009 07:08 AM
Solutiontake out the cell board and put on a mate.
on the right of the cell boards, you will have all memory dimm slots.
The cpus are on the right side. The placing would be as follow.
+---------------------------------+
|cpu-0 cpu-1 memory-slots .......|
|cpu-2 cpu-3 memory-slots........|
+---------------------------------+
From what I see, this cell board contain 4 X dual core proc. So, i'm guessing, 4th proc is in cpu-3 slot.
Rgds
PS: Really appreciate if you cld assign points.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-17-2009 09:42 AM
тАО06-17-2009 09:42 AM
Re: superdome sd16a cpu problem
If you read the event desciption...
Event 1698
Severity: CRITICAL
Event Summary: Machine check type could not be determined.
Event Class: System
Problem Description:
The Reporting Entity CPU experienced a trap that has caused an asynchronous branch to the machine check handler, but CPU logs do not indicate that an HPMC, LPMC or TOC has occurred. The data field will contain the CPU Check Summary. This Check Summary is described in the return value description for CpuProcessMachineCheck in PA-8800 CPU Library Application Cause / Action:
cause:Contact HP Support. Save event list and Processor HPMC PIM for analysis by lab. action:-
Automated Recovery: None
Event Generation Threshold: 1 occurrence
...you'll note that there is no HW failure.
Also, having been through a few failed Superdome cell boards, I wouldn't do it myself unless you know what you're doing. Right now you have one vPar affected, incorrect replacement of the cell board, i.e., separating it into two unbolted units for an easy fit, will result in bringing down the whole NPar.
You in a producution environment?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО06-18-2009 12:41 AM
тАО06-18-2009 12:41 AM
Re: superdome sd16a cpu problem
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО07-23-2009 05:35 PM
тАО07-23-2009 05:35 PM
Re: superdome sd16a cpu problem
thanks all.