Simpler Navigation for Servers and Operating Systems
Completed: a much simpler Servers and Operating Systems section of the Community. We combined many of the older boards, so you won't have to click through so many levels to get at the information you need. Check the consolidated boards here as many sub-forums are now single boards.
Operating System - Tru64 Unix
cancel
Showing results for 
Search instead for 
Did you mean: 

Error : panic (cpu 0): Processor Machine Check

Sachin_34
Occasional Advisor

Error : panic (cpu 0): Processor Machine Check


Hello,

We have a OSF1 V5.1 TRU64 server. It went down and halted onto P00>>> prompt. We booted the system and it is up now.

Tried to read the binary error log file and noticed 2 events prior to system gone down.

Please suggest if it is any hardware error or so.

Thanx in adv.

Here are the extract..

****************** ENTRY 9. ***************

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 302. PANIC
SEQUENCE NUMBER 26212.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Sat Feb 14 01:33:20 2009
OCCURRED ON SYSTEM test123
SYSTEM ID x000B0022
SYSTYPE x00000000
PROCESSOR COUNT 2.
PROCESSOR WHO LOGGED x00000000
MESSAGE panic (cpu 0): Processor Machine Check

********** ENTRY 10. ********************

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 100. CPU EXCEPTION
SEQUENCE NUMBER 26211.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Sat Feb 14 01:33:19 2009
OCCURRED ON SYSTEM test123
SYSTEM ID x000B0022
SYSTYPE x00000000
PROCESSOR COUNT 2.
PROCESSOR WHO LOGGED x00000000

----- UNIT INFORMATION -----

UNIT CLASS CPU

9 REPLIES
Martin Moore
HPE Pro

Re: Error : panic (cpu 0): Processor Machine Check

A machine check is almost invariably a hardware error. (I've seen a couple of instances of software-induced machine checks, but they were *extremely* rare and a long time ago.) You need to examine the binary error log entries for the machine checks in detail to identify the specific problem. If you have a hardware support contract on the system, the support vendor should be able to perform that analysis.

Martin
I work for HP
A quick resolution to technical issues for your HP Enterprise products is just a click away HP Support Center Knowledge-base
See Self Help Post for more details

Sachin_34
Occasional Advisor

Re: Error : panic (cpu 0): Processor Machine Check

Thanks for the reply. Yes we too suspect this could be a hardware related issue but dont know how to pin point it.

Tried to analyse Binary Error log but seems difficult to go though .

Unfortunately we dont have any contract.

Please suggest.
Pieter 't Hart
Honored Contributor

Re: Error : panic (cpu 0): Processor Machine Check

if crash-dumps are enabled on the system, there may be a dump in /var/adm/crash.
at boottime this is analyzed to produce a textfile in the same dir with hints of what produced the crash.

maybe this helps

Pieter
Sachin_34
Occasional Advisor

Re: Error : panic (cpu 0): Processor Machine Check


Thanks Peiter for the reply. Yes it did generate a crash dump file.
I have tried to analyse the extract w.r.t. the time when the System Panic. i.e. on Feb 14 at 01:38.

Please find the relevant crash dump file and suggest if there is any hardware issue.

John Manger
Valued Contributor

Re: Error : panic (cpu 0): Processor Machine Check

That snippet from the crash_data doesn't really tell us much.

A more interesting output would be the console log at the time of the crash - that would have the exception frame and associated output. An Alpha H/W person could then suggest what might be faulty. In fact the console output may even suggest the nature of the fault in human readable text.

I see the system had been up for 6 months - If it doesnt suffer another Machine Check in the near future, I guess you could always 'ignore' it.... Or was the previous reboot last year due to a similar fault ?
Nobody can serve both God and Money
Sachin_34
Occasional Advisor

Re: Error : panic (cpu 0): Processor Machine Check


I could manage to take a note of few of the console logs at the time of crash. Like the system was showing the messages like,

ata1 at pci0 slot 205(slot 5, function 2)
ata1: CYPRESS 82C693
scsi2 at ata1 slot 0 rad 0
usb0 at pci0 slot 305 (slot 5, function 3)
aha_chim0 at pci0 slot 6
Adaptec AIC-7895 Adapter : H/W Rev 4, Driver Rev 2.274 CHIM V364A5
scsi2 at aha_chim0 slot 0 rad 0
aha_chim2 at pci0 slot 106
Adaptec AIC-7895 Adapter : H/W Rev 4, Driver Rev 2.274 CHIM V364A5
scsi4 at aha_chim2 slot 0 rad 0
ee2 at pci0 slot 7
ee2: COMPAQ Intel 82559 (10/100 MBPS)
ee2: Driver Rev =
isp1 at pci0 slot 8
ee2: Autonegotiated
isp1 QLogic
isp1 Firmware
isp1 Fast RAM

halted CPU0

halt code = 5
HALT instruction executed
PC= fffffc000067d230
P00>>>



Any help from this?
Sachin_34
Occasional Advisor

Re: Error : panic (cpu 0): Processor Machine Check

I apologize. Did not answer the question,
Yes last year similar boot happened but for some other reason. Like Battery charge was the issue.
Neubeck
Occasional Visitor

Re: Error : panic (cpu 0): Processor Machine Check

What kind of CPUs are used.

EV5 - download DECevent and use it for translation of the binary.errlog.

http://h18023.www1.hp.com/support/svctools/decevent/index.html

EV6 or higher download SEA and use it for translation. SEA is a part of the WEBES toolkit

http://h18023.www1.hp.com/support/svctools/webes/index.html
Uwe_9
Advisor

Re: Error : panic (cpu 0): Processor Machine Check

Hello Sachin,

with a little browsing through your data I finally extracted
a little info that should -in the end- help us to help you :)

From the snippet of the crash-data that you provided I read this:
--------------
Hostname : ---snip---
:
COMPAQ AlphaServer DS20E 666 MHz avail: 2
:
2 kn600_machcheck(0x100000000, 0x0, 0x4, 0xfffffc0000006120, 0xfffffc0000006080)
3 default_mach_error(0x4, 0xfffffc0000006120, 0xfffffc0000006080, 0x0, 0xfffffc000067c060)
:
--------------

That answers the question of what tool to use next

The above shown snippet from the crash-data shows the beginning of the stacktrace.
It would not be really helpfull. Here we have to look at the Machinecheck-Frame
which is in detail (normally/mostly) to find in the binary.errlog .

You need to use WEBES (here "wsea" formerly known as "ca") to analyse /var/adm/binary.errlog

Regards,
--Uwe.