HPE 9000 and HPE e3000 Servers
1751860 Members
5325 Online
108782 Solutions
New Discussion юеВ

Re: L2000 crashes and system error code

 
SOLVED
Go to solution
Wayne Yu_2
New Member

L2000 crashes and system error code

We have a L2000 box crashes every night. Could someone let us know what is wrong by looking following error messages:

DATE: 03/24/2006 TIME: 17:10:33
ALERT LEVEL: 2 = Non-Urgent operator attention required

SOURCE: 1 = processor
SOURCE DETAIL: 1 = processor general SOURCE ID: 0
PROBLEM DETAIL: 0 = no problem detail

CALLER ACTIVITY: 6 = machine check STATUS: 2
CALLER SUBACTIVITY: 32 = implementation dependent
REPORTING ENTITY TYPE: 0 = system firmware REPORTING ENTITY ID: 00

0x1800002011006322 CB818000 00000000 type 3 = Actual Data
0x5800082011006322 00006A02 18110A21 type 11 = Timestamp 03/24/2006 17:10:33
Type CR for next entry, - CR for previous entry, Q CR to quit.
cr


DATE: 03/24/2006 TIME: 17:10:36
ALERT LEVEL: 2 = Non-Urgent operator attention required

SOURCE: 7 = memory
SOURCE DETAIL: 4 = SIMM or DIMM SOURCE ID: FF
PROBLEM DETAIL: 3 = double bit error

CALLER ACTIVITY: 6 = machine check STATUS: 3
CALLER SUBACTIVITY: B8 = implementation dependent
REPORTING ENTITY TYPE: 0 = system firmware REPORTING ENTITY ID: 02

0x2000202374FF6B83 0000FF00 006BFF74 type 4 = Physical Location
0x5800282374FF6B83 00006A02 18110A24 type 11 = Timestamp 03/24/2006 17:10:36
Type CR for next entry, - CR for previous entry, Q CR to quit.
cr

DATE: 03/24/2006 TIME: 17:10:36
ALERT LEVEL: 2 = Non-Urgent operator attention required

SOURCE: 0 = unknown, no source stated
SOURCE DETAIL: 0 = unknown, no source stated SOURCE ID: FF
PROBLEM DETAIL: 0 = no problem detail

CALLER ACTIVITY: 6 = machine check STATUS: 2
CALLER SUBACTIVITY: 46 = implementation dependent
REPORTING ENTITY TYPE: 0 = system firmware REPORTING ENTITY ID: 02

0x0000202000FF6462 00000000 00000000 type 0 = Data Field Unused
0x5800282000FF6462 00006A02 18110A24 type 11 = Timestamp 03/24/2006 17:10:36
Type CR for next entry, - CR for previous entry, Q CR to quit.
cr
6 REPLIES 6
Patrick Wallek
Honored Contributor

Re: L2000 crashes and system error code

The key is the 2nds message. It appears that you may have an issue with one of the memory DIMMs on this machine.

It's time to place a hardware call to HP and have them take a look at your logs so they can pinpoint the DIMM/SIMM and fix the problem.
Wayne Yu_2
New Member

Re: L2000 crashes and system error code

I thought someone knows what the message meaning could identify the slot the memory is in by reading the error. The system is located in a small island on south pacific ocean. HP does not send anyone there. If anyone knows which pair is the problem pair, that would be great help!
Sameer_Nirmal
Honored Contributor
Solution

Re: L2000 crashes and system error code

Hi,

As per the event log entry, the memory module showing "double bit error" is at slot no 6B. The system shows machine check has occured which constitute a HPMC. Looking at the sequence of the 3 entries, the HPMC is seems to be occured by the memory module.

However you might want to see all other entries logged to know if something else is also being reported. You can collect those logs from MP alongwith /var/tombstones/ts99 & ts98 files and send it to HP for analysis.
YoungHwan, Ko
Valued Contributor

Re: L2000 crashes and system error code

It seemed to be memory fault.
Check your ts99 file(/var/tombstone/ts99)
Double bit error must replace memory.

Regards..
Mirko Schmidt
Advisor

Re: L2000 crashes and system error code

Hello Wayne Yu,

if your system crashed ervery night with this error messages at GSP-Log than this indicates an memoryissue. The system crashed every night at the same time? High workload at this time (backup)?
You can check the memory with cstm.
# cstm
# scl type mem
# info
# il
-> you can see
- the configuration
- errorlist (single bit / double bit)
- PDT entries (note: if PDT is used up to 100% the system didnt restart at next crash)

-> for deeper check you must call HP to get the password and see the whole logtool

If one or more adresses from memory are with high errorcount or more adresses from one DIMM this should be rplaced.

HP also can check the files at /var/tombstones/.

If there are double bit errors the DIMM must be replaced, if single bit errors at one adress or DIMM it seems like defective adress and system (ECC) cant dealocate this adress in cause of use from process. Powercycle helps in this case.

Regards
Mirko
Lester Dias
Advisor

Re: L2000 crashes and system error code

The location of DIMM slots are on a label stuck to the lid. Once you replace the DIMM at 6B or perhaps the pair at 6A & 6B, clear out the page deallocation table(PDT) at the boot console handler(BCH) menues. How to get to BCH? - 10 second interrupt after reset or power on.