System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

Error rx2660 reset Alert Level 7

SOLVED
Go to solution
transardo
Occasional Advisor

Error rx2660 reset Alert Level 7

Our server it is a Integrity rx2660  (HP-UX 11.31 ia64), and it's unexpectedly restarted.
Only the front panel System health Led turnned to red.
I entered into system menu and see the SL log. I see many entrys like this ones:

Log Entry 593805: 24 May 2016 15:03:29
Alert Level 7: Fatal
Keyword: MC_INITIATED
Machine Check initiated
Logged by: System Firmware 0
Data: Major Change in system state - HPMC or MCA
0xF40009800E03E90 000000000000000B

Log Entry 593801: 24 May 2016 15:03:28
AlertLevel 7: Fatal
Keyword: MACHINE_CHECK_INITIATED
Machine Check initiated
Logged by: Redundant w/ an E0 code;
Sensor: Critical Interrupt
Data2: Oem Code2 0x00
0xC157446D40023E80 003FA17000130300

After restart I see the /var/adm/syslog/syslog.log and  OLDsyslog.log, but does not have any entry at about 40 minutes before restart, and the last entry on OLDsyslog.log.

Someone have any idea what could be.

11 REPLIES

Re: Error rx2660 reset Alert Level 7

Hi, 

MCA it seems HW problem, but you need more informations. DO you have saved crash dump? Have you check server for unconfigured memory or procesor count? There is some log on console during server crash. Do you have ServiceGuard on OS? Do you have some records in evweb? Like evweb eventviewer -L. If configured. 

transardo
Occasional Advisor

Re: Error rx2660 reset Alert Level 7

Hi!

The server is a rx2660, with 2 Itanium 9120N, 12Gb Ram, 4 disk 146Gb( configured on 2 mirror raid by hardware)

We don't have ServiceGuard, it is only one single machine. I did not save the crash dump or dump log ( I don't know how to do this).

I think that the server don't have problem of unconfigured memory or processor, it are with initial configuration, made by HP Support on instalation of the server.(But I dont know how to check this).

Today I don't see any information on SL log, when the system crashed the SL Forward had 4000 entries but now only have 536, and all are level 1.

How can I do the dump of SL log to a file?

 

Re: Error rx2660 reset Alert Level 7

Hi,

to save the crashdump you can try "savecrash -r /BACKUPDIRECTORY" cmd to some directory big enough to save the crash dump. You can check current crashdump configuration by "crashconf" cmd.. 

To make some checks about HW.

cprop -list  # list possible components

cprop -summary -c "Memory" # It can be other components that has any status in the list above.

cprop -summary -c "Processors"

evweb eventviewer -L # list events, very useful, many events can be logged here.

sautil /dev/cissX# if some errors in HW RAID configuration. Device name can be get using ioscan -fnk. man sautil etc.

sfmconfig -a -L  # get info about main componets from SFM

About SL log, maybe it is already cleared. But anyway, you can get SL to file for example using ssh terminal client putty.

Just simply connect to MP using putty, in putty enable logging printable output to file.

Then in MP choose SL, then E for system events, then D for dump entire log. close connection and check file on your workstation.

Nice tool for get a lot of logs as possible is HP support tool script getsysinfo.sh. But mainly it is for purpose of send it tou your support partner.

transardo
Occasional Advisor

Re: Error rx2660 reset Alert Level 7

Hi!

Thank you for help!
I made the crashdump (at about 1.2Gb), but aren't usefull to me.

The cprop is not available.
sautil and ioscan -fnk found just only one device, what's give me worry, in installation was configured 2.

The most usefull information comes from evweb, but do not appears good, apparently, it's a hardware  failure ( logs in attachments)..

We do not have a HP support contract.

If someone have a idea...

Robert_Jewell
Honored Contributor

Re: Error rx2660 reset Alert Level 7

Check the directory /var/adm/tombstones for recent files.  The filename will be in a format similar to:

mca201606011200.001

The numbers refer to Year, Date, Hour and Minute.   Look for any recent ones.  Package them up in a zip or gzip file and post them.

 

-Bob

 

----------------
Was this helpful? Like this post by giving me a thumbs up below!
transardo
Occasional Advisor

Re: Error rx2660 reset Alert Level 7

Hi!

I found in /var/tombstone/mca20160524150122.000 (attachment)

In last days I found on forums the command sasmgr get_info -D /dev/sasd0 -q raid, Result below, Our system have only the dev sasd1

sasmgr get_info -D /dev/sasd1 -q raid
Wed Jun 8 13:52:25 2016
---------- LOGICAL DRIVE 5 ----------
Raid Level : RAID 1
Volume sas address : 0xb61a09c049aad5
Device Special File : /dev/rdsk/c3t3d0
Raid State : OPTIMAL
Raid Status Flag : ENABLED
Raid Size : 139136
Rebuild Rate : 20.00 %
Rebuild Progress : 100.00 %
Participating Physical Drive(s) :
SAS Address Enc Bay Size(MB) Type State
0x5000c5000c0dc36d 1 6 140014 SECONDARY ONLINE
0x5000c5000c0df6d9 1 5 140014 PRIMARY ONLINE
---------- LOGICAL DRIVE 7 ----------
Raid Level : RAID 1
Volume sas address : 0xa07bce2191857fc
Device Special File : /dev/rdsk/c3t2d0
Raid State : OPTIMAL
Raid Status Flag : ENABLED
Raid Size : 139898
Rebuild Rate : 0.00 %
Rebuild Progress : 100.00 %
Participating Physical Drive(s) :
SAS Address Enc Bay Size(MB) Type State
0x5000c5000c05e815 1 7 140014 PRIMARY ONLINE
0x5000c5000c05dd19 1 8 140014 SECONDARY ONLINE

And on SMH/Storage/Disks

 

Status	  Legacy Hardware Path	Agile Hardware Path	Product Id	Vendor 	Id	Serial Number	Capacity (GB)	Disk FW Revision
Degraded  0/1/1/0.0.0.2.0	64000/0xfa00/0x1	IR Volume	HP		N/A		136.0			HP01

ClassName : 		HP_DiskDrive NameSpace : root/cimv2	
Caption			SCSI Disk
Description		This is a Disk with following details: Vendor ID: HP SCSI Disk Product ID: IR Volume@0/1/1/0.0.0.2.0
ElementName		Hard Disk
Name			Hard Disk
StatusDescriptions	The disk is in degraded state
EnabledState		Not Applicable
RequestedState		No Change
EnabledDefault		Enabled
SystemCreationClassName	CIM_ComputerSystem
SystemName ardo1 CreationClassName HP_DiskDrive DeviceID 0/1/1/0.0.0.2.0 MaxMediaSize 143255552 KBytes DefaultBlockSize 512 bytes Status Legacy Hardware Path Agile Hardware Path Product Id Vendor Id Serial Number Capacity (GB) Disk FW Revision Degraded 0/1/1/0.0.0.3.0 64000/0xfa00/0xc IR Volume HP N/A 135.0 HP01 ClassName : HP_DiskDrive NameSpace : root/cimv2 Caption SCSI Disk Description This is a Disk with following details: Vendor ID: HP SCSI Disk Product ID: IR Volume@0/1/1/0.0.0.3.0 ElementName Hard Disk Name Hard Disk StatusDescriptions The disk is in degraded state EnabledState Not Applicable RequestedState No Change EnabledDefault Enabled SystemCreationClassName CIM_ComputerSystem SystemName ardo1 CreationClassName HP_DiskDrive DeviceID 0/1/1/0.0.0.3.0 MaxMediaSize 142475264 KBytes DefaultBlockSize 512 bytes

 

 

Robert_Jewell
Honored Contributor

Re: Error rx2660 reset Alert Level 7

The crash looks like it was due to an error on a PCI device attached to PCIe slot 2.  Do you have an I/O card in this slot that could have failed?

Either way, if the crash has not occurred since the way in May, I would let things run as they are.  Perhaps it was a temporary glitch?

 

Your info above shows the two volumes as being mirrored.  I cannot say why SMH is stating your volumes are in a degraded state while sasmgr shows that they are fine.  In this I would trust sasmgr more than SMH.

 

-Bob

----------------
Was this helpful? Like this post by giving me a thumbs up below!
transardo
Occasional Advisor

Re: Error rx2660 reset Alert Level 7

Hi !

On the PCIe slot 2 is installed a Smart Array P800/512.

Since the failed server, it is working fine, this server have 6.5 y.o. it's a good machine but's are agging. Maybe is time to substitute this server.

 

 

transardo
Occasional Advisor

Re: Error rx2660 reset Alert Level 7

Hi!
Today I see that's the Smart Array P800 Heartbeat LED is on steady amber.
I checked the Log Event / System log, and the entrys 21 and 14 caught my attention:

 #  Location|Alert| Encoded Field    |  Data Field    |   Keyword / Timestamp
-------------------------------------------------------------------------------
23    HPUX 0   2  0x54801C2F00E00210 0000000000001001 HP-UX_BOOT_COMPLETE 	13 Jun 2016 16:08:16
22    SFW  0   2  0x40801CBB00E001F0 0000000000000000 BOOT_SWITCH_INSECURE_MODE 13 Jun 2016 16:06:15
21    SFW  0  *3  0x6480007B00E001D0 FFFFFFFFFF07FF83 IO_CHECK_LBA_MISSING_ERR	13 Jun 2016 16:05:58
20    SFW  0   2  0x5480006300E001B0 0000000000000000 BOOT_START                13 Jun 2016 16:05:51
19    SFW      2  0xC1575ED9DF0201A0 FFFF000A001D0300 CPU_START_BOOT            13 Jun 2016 16:05:51
18    BMC      2  0x20575ED9D5020190 FFFF027000120300 SOFT_RESET                13 Jun 2016 16:05:41
17    HPUX 0   2  0x54801C3000E00170 00000000001A100C HP-UX_OS_NORMAL_SHUTDOWN  13 Jun 2016 16:05:33
16    HPUX 0   2  0x54801C2F00E00150 0000000000001001 HP-UX_BOOT_COMPLETE       13 Jun 2016 15:53:59
15    SFW  0   2  0x40801CBB00E00130 0000000000000000 BOOT_SWITCH_INSECURE_MODE 13 Jun 2016 15:52:04
14    SFW  0  *3  0x6480007B00E00110 FFFFFFFFFF07FF83 IO_CHECK_LBA_MISSING_ERR  13 Jun 2016 15:51:48
13    SFW  0   2  0x5480006300E000F0 0000000000000000 BOOT_START                13 Jun 2016 15:51:41
12    SFW      2  0xC1575ED68D0200E0 FFFF000A001D0300 CPU_START_BOOT	        13 Jun 2016 15:51:41
11    BMC      2  0x20575ED67E0200D0 FFFF027000120300 SOFT_RESET		13 Jun 2016 15:51:26
10    BMC      2  0x20575ED6780200C0 0401A37004120300 CHASSIS_CONTROL_REQUEST	13 Jun 2016 15:51:20
9     BMC      2  0x20575ED6770200B0 FFFF027000120300 SOFT_RESET		13 Jun 2016 15:51:19
8     BMC      2  0x20575ED6770200A0 FFFF010943090300 POWER_UNIT_ENABLED        13 Jun 2016 15:51:19
7     BMC      2  0x20575ED677020090 FFFF027000120300 SOFT_RESET                13 Jun 2016 15:51:19
6     BMC      2  0x20575ED677020080 FFFF006F04140300 POWER_BUTTON_PRESSED      13 Jun 2016 15:51:19
5     BMC      2  0x20575ED62B020070 FFFF000943090300 POWER_UNIT_DISABLED       13 Jun 2016 15:50:03
4     BMC      2  0x20575ED629020060 FA00A370FA120300 CHASSIS_CONTROL_REQUEST   13 Jun 2016 15:50:01
3     BMC      2  0x20575ED629020050 FFFF056FFA220300 ACPI_SOFT_OFF             13 Jun 2016 15:50:01
2     HPUX 0   2  0x54801C3000E00030 00000000001A100C HP-UX_OS_NORMAL_SHUTDOWN  13 Jun 2016 15:49:54
1     BMC      2  0x2057503897020020 FFFF0103FCC00300 TIME_SET                  02 Jun 2016 13:45:59
0     BMC      2  0x2057503605020010 FFFF0103FCC00300 TIME_SET		        02 Jun 2016 13:35:01
Log Entry 21: 13 Jun 2016 16:05:58
Alert Level 3: Warning
Keyword: IO_CHECK_LBA_MISSING_ERR
Expected I/O host bridge is missing
Logged by: System Firmware 0
Data: Location - I/O Device (Local Bus Adapter): LBA 7
0x6480007B00E001D0 FFFFFFFFFF07FF83

Log Entry 14: 13 Jun 2016 15:51:48
Alert Level 3: Warning
Keyword: IO_CHECK_LBA_MISSING_ERR
Expected I/O host bridge is missing
Logged by: System Firmware 0
Data: Location - I/O Device (Local Bus Adapter): LBA 7
0x6480007B00E00110 FFFFFFFFFF07FF83

IThe server was restarted today.

This could be the cause?

Robert_Jewell
Honored Contributor
Solution

Re: Error rx2660 reset Alert Level 7

So it looks like when the HPMC/MCA occurred it was due to the loss of that IO bridge adapter.  The card is still functioning as there are multiple bridge adapters (ropes) to that slot.

It appears to be a problem with the slot, which means a bad system board for this model.  It could still possibly be a card issue though, so if you have a spare P800 its a lot easier to try and replace it than it would be a new system board.

 

-Bob

----------------
Was this helpful? Like this post by giving me a thumbs up below!
transardo
Occasional Advisor

Re: Error rx2660 reset Alert Level 7

If I understood, the most correct would replace the P800.
Unfortunately we have no other for testing.

I will try maintain this one running until is possible.

Thanks for help.