System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

Tru64 version 4.0F - DS20E 833MHZ system panic

Alice Daniel
Frequent Advisor

Tru64 version 4.0F - DS20E 833MHZ system panic

I have encountered the following problem. Firmware version 6.4-11 Can anyone provide direction as to what may be the issue (cpu, memory...)

FROM MY UERF FILE:

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 100. CPU EXCEPTION
SEQUENCE NUMBER 807.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Tue Sep 1 10:32:11 2009
OCCURRED ON SYSTEM das4
SYSTEM ID x000D0022
SYSTYPE x00000000
PROCESSOR COUNT 2.
PROCESSOR WHO LOGGED x00000000

----- UNIT INFORMATION -----

UNIT CLASS CPU

********************************* ENTRY 4908. *********************************

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 302. PANIC
SEQUENCE NUMBER 808.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Tue Sep 1 10:32:12 2009
OCCURRED ON SYSTEM mcdas4
SYSTEM ID x000D0022
SYSTYPE x00000000
PROCESSOR COUNT 2.
PROCESSOR WHO LOGGED x00000000
MESSAGE panic (cpu 0): System Uncorrectable
_Machine Check


FROM THE MESSAGES FILE:


Sep 1 10:36:20 mcdas4 vmunix: Machine Check SYSTEM Fatal Abort
Sep 1 10:36:20 mcdas4 vmunix: Machine check code = 0x100000202
Sep 1 10:36:20 mcdas4 vmunix: Ibox Status = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: Dcache Status = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: Cbox Address = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: Fill Syndrome 1 = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: Fill Syndrome 0 = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: Cbox Status = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: EV6 captured status of Bcache mode = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: EV6 Exception Address = fffffc00002d8f30
Sep 1 10:36:20 mcdas4 vmunix: EV6 Interrupt Enablement and Current Processor mode = 0000003ee0000000
Sep 1 10:36:20 mcdas4 vmunix: EV6 Interrupt Summary Register = 0000000200000000
Sep 1 10:36:20 mcdas4 vmunix: EV6 TBmiss or Fault status = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: EV6 PAL Base Address = 0000000000018000
Sep 1 10:36:20 mcdas4 vmunix: EV6 Ibox control = fffffffc1e304396
Sep 1 10:36:20 mcdas4 vmunix: EV6 Ibox Process_context = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: O/S Summary flag = 0000000000000006
Sep 1 10:36:20 mcdas4 vmunix: Cchip Base Address (phys) = 00000801a0000000
Sep 1 10:36:20 mcdas4 vmunix: Cchip Device Raw Interrupt Request = 2000000000000000
Sep 1 10:36:20 mcdas4 vmunix: DRIR Register Decode:
Sep 1 10:36:20 mcdas4 vmunix: Bit 61: Error from Pchip 1
Sep 1 10:36:20 mcdas4 vmunix: PCI Device Interrupt Mask = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: Cchip Miscellaneous Register = 0000000100000000
Sep 1 10:36:20 mcdas4 vmunix: Misc Register Decode:
Sep 1 10:36:20 mcdas4 vmunix: Bit 32: CChip Rev (Bit<32>)
Sep 1 10:36:20 mcdas4 vmunix: Cchip Revision: 01
Sep 1 10:36:20 mcdas4 vmunix: ID of CPU performing read: 00
Sep 1 10:36:20 mcdas4 vmunix: Pchip 0 Base Address (phys) = 0000080180000000
Sep 1 10:36:20 mcdas4 vmunix: Pchip 0 Error Register = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: Pchip Error Register Decode:
Sep 1 10:36:20 mcdas4 vmunix: PCI Xaction Start Address = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: PCI Command: Interrupt Acknowledge
Sep 1 10:36:20 mcdas4 vmunix: Pchip 1 Base Address (phys) = 0000080380000000
Sep 1 10:36:20 mcdas4 vmunix: Pchip 1 Error Register = f100410088100801
Sep 1 10:36:20 mcdas4 vmunix: Pchip Error Register Decode:
Sep 1 10:36:20 mcdas4 vmunix: Bit 0: Lost Error
Sep 1 10:36:20 mcdas4 vmunix: Bit 11: Correctable ECC Error
Sep 1 10:36:20 mcdas4 vmunix: System Address = 0000000041008810
Sep 1 10:36:20 mcdas4 vmunix: Command: DMA Read
Sep 1 10:36:20 mcdas4 vmunix: ECC Syndrome: f1
Sep 1 10:36:20 mcdas4 vmunix: panic (cpu 0): System Uncorrectable Machine Check
Sep 1 10:36:20 mcdas4 vmunix: syncing disks... device string for dump = SCSI 1 7 0 0 0 0 0.
Sep 1 10:36:20 mcdas4 vmunix: DUMP.prom: dev SCSI 1 7 0 0 0 0 0, block 20541131
Sep 1 10:36:20 mcdas4 vmunix: device string for dump = SCSI 1 7 0 0 0 0 0.
Sep 1 10:36:20 mcdas4 vmunix: DUMP.prom: dev SCSI 1 7 0 0 0 0 0, block 20541131

2 REPLIES
Vladimir Fabecic
Honored Contributor

Re: Tru64 version 4.0F - DS20E 833MHZ system panic

"Sep 1 10:36:20 mcdas4 vmunix: ECC Syndrome: f1
Sep 1 10:36:20 mcdas4 vmunix: panic (cpu 0): System Uncorrectable Machine Check"

First I would do some memory tests

>>> memexer 3
In vino veritas, in VMS cluster
Hein van den Heuvel
Honored Contributor

Re: Tru64 version 4.0F - DS20E 833MHZ system panic

The line above the ECC error reads

>> Sep 1 10:36:20 mcdas4 vmunix: Command: DMA Read

That suggests a device involved to me.
Does the 'mc' in the name reflect to Memory Channel per chance?

If you can find exercisers for yor PIC devices.. run them.
And consider re-seating and dusting of the PCI devices.

What changed?

Was anything special happening at the time?

Was there something possibly physically wrong with the system at that time... someone re-arranging cables, higher or lower temperatures than normal (but not in alarm range)?
Any special load at that crash time?

fwiw,
Hein.