- Community Home
- >
- Servers and Operating Systems
- >
- Legacy
- >
- HPE 9000 and HPE e3000 Servers
- >
- superdome EMS report I/O link error
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-23-2008 09:38 PM
тАО03-23-2008 09:38 PM
FRU Physical Location: 0x00ffff01ffffff93
FRU Source = 9 (cell)
Source Detail = 3 (coherency controller)
Cabinet Location = 0
Cell Location = 1
RIN_ERR_PRI_MODE..........: 0x0000000000000008
REO input single wire error
CECC_DATA_MSB_0...........: 0x0000000000000383
CECC_DATA_LSB_0...........: 0xc064d030322e0c4e
CECC_DATA_MSB_1...........: 0x0000000000000383
CECC_DATA_LSB_1...........: 0xc064d030322e0c4e
>---------- End Event Monitoring Service Event Notification ----------<
>------------ Event Monitoring Service Event Notification ------------<
Notification Time: Sun Dec 16 11:42:00 2007
dds2 sent Event Monitor notification information:
/system/events/core_hw/core_hw is >= 1.
Its current value is SERIOUS(4).
Event data from monitor:
Event Time..........: Sun Dec 16 11:41:59 2007
Severity............: SERIOUS
Monitor.............: dm_core_hw
Event #.............: 85
System..............: dds2
Summary:
I/O link interface to cell controller recovered errors
Description of Error:
The cell controller (CC) chip has detected and corrected multiple errors
in data transferred to it from the I/O bus adapter (REO) chip to which it
is connected.
Probable Cause / Recommended Action:
The inbound I/O link cable is unreliable.
Contact your HP support representative to check the inbound I/O link
cable.
There may be a problem with the CC chip or cell board.
Contact your HP support representative to check the cell board.
There may be a problem with the I/O backplane.
Contact your HP support representative to check the I/O backplane.
Additional Event Data:
System IP Address...: 10.93.4.12
Event Id............: 0x47649e8800000000
Monitor Version.....: B.01.00
Event Class.........: System
Client Configuration File...........:
/var/stm/config/tools/monitor/default_dm_core_hw.clcfg
Client Configuration File Version...: A.01.00
Qualification criteria met.
Number of events..: 3
Associated OS error log entry id(s):
None
Additional System Data:
System Model Number.............: 9000/800/SD32000
OS Version......................: B.11.11
STM Version.....................: A.29.00
EMS Version.....................: A.03.20
Latest information on this event:
http://docs.hp.com/hpux/content/hardware/ems/dm_core_hw.htm#85
There is a error in the HPMC trace file:
10215: ------- Analyzing CC1, RIN_ERR_PRI_MODE CSR:
10216: RIN_ERR_PRI_MODE_CSR = 0x0000000000000008
10217: RIN_ERR_ENABLE_MASK CSR = 0x000000001fffffff
10218: RIN_FE_UPGRADE_CONFIG CSR = 0x0000000000000fc0
10219: RIN_DR_UPGRADE_CONFIG CSR = 0x0000000000000000
10220: Problem: (3)CC1, RIN Corr Err: Link one-bit failure in same position for
10221: 1 or more cycles. Corrected by HW.
10222: Possible Cause 1: RIO link cable connected to cell 1 has a poor
10223: connection or is defective. [12270208]
10224: Possible Fix 1: Reseat or replace RIO link cable.
10225: Possible Cause 2: RIO chip is defective.
10226: Possible Fix 2: Replace HIOB connected to cell 1.
10227: Possible Cause 3: CC chip on cell 1 or cell board is defective.
10228: Possible Fix 3: Replace Cell board 1.
10229:
10230:
10231: ------- Analyzing CC1, RIN_ERR_SEC_MODE CSR:
10232: RIN_ERR_SEC_MODE_CSR = 0x0000000000000008
10233: Problem: (3)CC1, RIN Corr Err: Link one-bit failure in same position for
10234: 1 or more cycles. Corrected by HW.
10235: Possible Cause 1: RIO link cable connected to cell 1 has a poor
10236: connection or is defective. [12270208]
10237: Possible Fix 1: Reseat or replace RIO link cable.
10238: Possible Cause 2: RIO chip is defective.
10239: Possible Fix 2: Replace HIOB connected to cell 1.
10240: Possible Cause 3: CC chip on cell 1 or cell board is defective.
10241: Possible Fix 3: Replace Cell board 1.
10242:
10243:
10244: Note: CC1, RIN GSM Hdr Log is NOT valid.
10245: Note: CC1, RIN Uncor. Hdr Log is NOT valid.
10246: Note: CC1, RIN Uncor. ECC Data Log is NOT valid.
10247: Note: CC1, RIN Uncor. ECC Cyc Log is NOT valid.
10248: Note: CC1, RIN FE Hdr Log is NOT valid.
10249: Note: CC1, RIN No Pres. Log is NOT valid.
10250:
10251: Note: CC1, RIN Cor. ECC Data MSB Log0 is valid.
10252: Note: CC1, RIN Single ECC Wire Log is valid.
10253:
10254: ------- Analyzing CC1 RIN_SGL_ECC_WIRE_LOG:
10255: RIN_SGL_ECC_WIRE_LOG CSR = 0x0000000004002000
10256: CC1 RIN block corrected single wire error in RIO link wire number 13.
10257: CC1 RIN block corrected single bit error in RIO link data row 2
------- Analyzing cell 1 RIO logs:
10260:
10261: Warning: RIO 0 Link PRIMARY_ERROR_LOG CSR connected to cell 1 not
10262: stored. - Analysis skipped.
10263:
10264: Note: CC 1, RIO 0, Rope Unit 0 RU_PRI_ERR_LOG CSR not stored. -
10265: Analysis skipped.
10266: Note: CC 1, RIO 0, Rope Unit 1 RU_PRI_ERR_LOG CSR not stored. -
10267: Analysis skipped.
what do you think? Thanks
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-24-2008 11:32 PM
тАО03-24-2008 11:32 PM
SolutionCall HP and have them send a CE to site.
I would recommend that the CE runs Scan-on-the-fly (SOTF) from the SuperDome Management Station (SMS) - this may give more clues as to where the problem actually is, ie, Cell Board,REO,Backplane, IO Backplane.
If SOTF finds errors - it may be necessary to arrange a complete outage on the machine depending what the problem might be.
(Don't try to reseat the REO cable while the SuperDome is powered on as you may damage the backplane or bend pins on the backplane.)
Note, it is very unusual for REO cables to fail.
Regards,
Phil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-25-2008 08:32 AM
тАО03-25-2008 08:32 AM
Re: superdome EMS report I/O link error
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-28-2008 06:35 AM
тАО03-28-2008 06:35 AM
Re: superdome EMS report I/O link error
Urgent call, Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-28-2008 06:48 AM
тАО03-28-2008 06:48 AM
Re: superdome EMS report I/O link error
It is a diagnostic test (called JUST) that you run from the SMS. It also depends on what sort of SMS you have as to how you run the tests (unix SMS or PC based SMS).
It is quite detailed and should be run by HP CE's etc.
Also, from the output of the diagnostics then you need to decode what the problem may be.
If you make a mistake in the sequence of events for setting up the JUST tests (the correct daemons etc) - then you can cause all nPARs/vPARs to crash.
I would strongly suggest you get HP onsite to do this.
Regards,
Phil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-28-2008 07:04 AM
тАО03-28-2008 07:04 AM
Re: superdome EMS report I/O link error
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-28-2008 07:17 AM
тАО03-28-2008 07:17 AM
Re: superdome EMS report I/O link error
If you know JUST, and you CAN shut it down, then do the offline version - much better.
A500 SMS - so must be a Legacy 'Dome i guess?
logon to sms with hduser account
(password is HP proprietry - so if you know JUST then I guess you know the password)
ONLY do this if partitions are DOWN !!
run
# just -s
once at JUST prompt on SMS
...select the tests you wish to run.
Don't forget to power off the whole machine (+IOX chassis) at the breakers for 1minute after you've run the test...then back on again.
Cheers,
Phil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-28-2008 07:24 AM
тАО03-28-2008 07:24 AM
Re: superdome EMS report I/O link error
.......not to mention that you need to decode all of that stuff if it picks up the errors 8-(
I still recommend you get HP to do it
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-28-2008 07:37 AM
тАО03-28-2008 07:37 AM
Re: superdome EMS report I/O link error
reo_link_ac_test -dt
right?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
тАО03-28-2008 07:51 AM
тАО03-28-2008 07:51 AM
Re: superdome EMS report I/O link error
Is it Aclts?
>> This is not the password for the hduser account
reo_link_ac_test -dt
>> I'm not familiar with this....
What does that do??
....where are you running this command from??