Operating System - Tru64 Unix
1748126 Members
3432 Online
108758 Solutions
New Discussion юеВ

Re: Error : panic (cpu 0): Processor Machine Check

 
Sachin_34
Occasional Advisor

Error : panic (cpu 0): Processor Machine Check


Hello,

We have a OSF1 V5.1 TRU64 server. It went down and halted onto P00>>> prompt. We booted the system and it is up now.

Tried to read the binary error log file and noticed 2 events prior to system gone down.

Please suggest if it is any hardware error or so.

Thanx in adv.

Here are the extract..

****************** ENTRY 9. ***************

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 302. PANIC
SEQUENCE NUMBER 26212.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Sat Feb 14 01:33:20 2009
OCCURRED ON SYSTEM test123
SYSTEM ID x000B0022
SYSTYPE x00000000
PROCESSOR COUNT 2.
PROCESSOR WHO LOGGED x00000000
MESSAGE panic (cpu 0): Processor Machine Check

********** ENTRY 10. ********************

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 100. CPU EXCEPTION
SEQUENCE NUMBER 26211.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Sat Feb 14 01:33:19 2009
OCCURRED ON SYSTEM test123
SYSTEM ID x000B0022
SYSTYPE x00000000
PROCESSOR COUNT 2.
PROCESSOR WHO LOGGED x00000000

----- UNIT INFORMATION -----

UNIT CLASS CPU

9 REPLIES 9
Martin Moore
HPE Pro

Re: Error : panic (cpu 0): Processor Machine Check

A machine check is almost invariably a hardware error. (I've seen a couple of instances of software-induced machine checks, but they were *extremely* rare and a long time ago.) You need to examine the binary error log entries for the machine checks in detail to identify the specific problem. If you have a hardware support contract on the system, the support vendor should be able to perform that analysis.

Martin
I work for HPE
A quick resolution to technical issues for your HPE products is just a click away HPE Support Center
See Self Help Post for more details

Accept or Kudo

Sachin_34
Occasional Advisor

Re: Error : panic (cpu 0): Processor Machine Check

Thanks for the reply. Yes we too suspect this could be a hardware related issue but dont know how to pin point it.

Tried to analyse Binary Error log but seems difficult to go though .

Unfortunately we dont have any contract.

Please suggest.
Pieter 't Hart
Honored Contributor

Re: Error : panic (cpu 0): Processor Machine Check

if crash-dumps are enabled on the system, there may be a dump in /var/adm/crash.
at boottime this is analyzed to produce a textfile in the same dir with hints of what produced the crash.

maybe this helps

Pieter
Sachin_34
Occasional Advisor

Re: Error : panic (cpu 0): Processor Machine Check


Thanks Peiter for the reply. Yes it did generate a crash dump file.
I have tried to analyse the extract w.r.t. the time when the System Panic. i.e. on Feb 14 at 01:38.

Please find the relevant crash dump file and suggest if there is any hardware issue.

John Manger
Valued Contributor

Re: Error : panic (cpu 0): Processor Machine Check

That snippet from the crash_data doesn't really tell us much.

A more interesting output would be the console log at the time of the crash - that would have the exception frame and associated output. An Alpha H/W person could then suggest what might be faulty. In fact the console output may even suggest the nature of the fault in human readable text.

I see the system had been up for 6 months - If it doesnt suffer another Machine Check in the near future, I guess you could always 'ignore' it.... Or was the previous reboot last year due to a similar fault ?
Nobody can serve both God and Money
Sachin_34
Occasional Advisor

Re: Error : panic (cpu 0): Processor Machine Check


I could manage to take a note of few of the console logs at the time of crash. Like the system was showing the messages like,

ata1 at pci0 slot 205(slot 5, function 2)
ata1: CYPRESS 82C693
scsi2 at ata1 slot 0 rad 0
usb0 at pci0 slot 305 (slot 5, function 3)
aha_chim0 at pci0 slot 6
Adaptec AIC-7895 Adapter : H/W Rev 4, Driver Rev 2.274 CHIM V364A5
scsi2 at aha_chim0 slot 0 rad 0
aha_chim2 at pci0 slot 106
Adaptec AIC-7895 Adapter : H/W Rev 4, Driver Rev 2.274 CHIM V364A5
scsi4 at aha_chim2 slot 0 rad 0
ee2 at pci0 slot 7
ee2: COMPAQ Intel 82559 (10/100 MBPS)
ee2: Driver Rev =
isp1 at pci0 slot 8
ee2: Autonegotiated
isp1 QLogic
isp1 Firmware
isp1 Fast RAM

halted CPU0

halt code = 5
HALT instruction executed
PC= fffffc000067d230
P00>>>



Any help from this?
Sachin_34
Occasional Advisor

Re: Error : panic (cpu 0): Processor Machine Check

I apologize. Did not answer the question,
Yes last year similar boot happened but for some other reason. Like Battery charge was the issue.
Neubeck
New Member

Re: Error : panic (cpu 0): Processor Machine Check

What kind of CPUs are used.

EV5 - download DECevent and use it for translation of the binary.errlog.

http://h18023.www1.hp.com/support/svctools/decevent/index.html

EV6 or higher download SEA and use it for translation. SEA is a part of the WEBES toolkit

http://h18023.www1.hp.com/support/svctools/webes/index.html
Uwe_9
Advisor

Re: Error : panic (cpu 0): Processor Machine Check

Hello Sachin,

with a little browsing through your data I finally extracted
a little info that should -in the end- help us to help you :)

From the snippet of the crash-data that you provided I read this:
--------------
Hostname : ---snip---
:
COMPAQ AlphaServer DS20E 666 MHz avail: 2
:
2 kn600_machcheck(0x100000000, 0x0, 0x4, 0xfffffc0000006120, 0xfffffc0000006080)
3 default_mach_error(0x4, 0xfffffc0000006120, 0xfffffc0000006080, 0x0, 0xfffffc000067c060)
:
--------------

That answers the question of what tool to use next

The above shown snippet from the crash-data shows the beginning of the stacktrace.
It would not be really helpfull. Here we have to look at the Machinecheck-Frame
which is in detail (normally/mostly) to find in the binary.errlog .

You need to use WEBES (here "wsea" formerly known as "ca") to analyse /var/adm/binary.errlog

Regards,
--Uwe.