Simpler Navigation for Servers and Operating Systems
Completed: a much simpler Servers and Operating Systems section of the Community. We combined many of the older boards, so you won't have to click through so many levels to get at the information you need. Check the consolidated boards here as many sub-forums are now single boards.
Operating System - Tru64 Unix
cancel
Showing results for 
Search instead for 
Did you mean: 

Tru64 version 4.0F - DS20E 833MHZ system panic

Alice Daniel
Frequent Advisor

Tru64 version 4.0F - DS20E 833MHZ system panic

I have encountered the following problem. Firmware version 6.4-11 Can anyone provide direction as to what may be the issue (cpu, memory...)

FROM MY UERF FILE:

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 100. CPU EXCEPTION
SEQUENCE NUMBER 807.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Tue Sep 1 10:32:11 2009
OCCURRED ON SYSTEM das4
SYSTEM ID x000D0022
SYSTYPE x00000000
PROCESSOR COUNT 2.
PROCESSOR WHO LOGGED x00000000

----- UNIT INFORMATION -----

UNIT CLASS CPU

********************************* ENTRY 4908. *********************************

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 302. PANIC
SEQUENCE NUMBER 808.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Tue Sep 1 10:32:12 2009
OCCURRED ON SYSTEM mcdas4
SYSTEM ID x000D0022
SYSTYPE x00000000
PROCESSOR COUNT 2.
PROCESSOR WHO LOGGED x00000000
MESSAGE panic (cpu 0): System Uncorrectable
_Machine Check


FROM THE MESSAGES FILE:


Sep 1 10:36:20 mcdas4 vmunix: Machine Check SYSTEM Fatal Abort
Sep 1 10:36:20 mcdas4 vmunix: Machine check code = 0x100000202
Sep 1 10:36:20 mcdas4 vmunix: Ibox Status = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: Dcache Status = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: Cbox Address = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: Fill Syndrome 1 = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: Fill Syndrome 0 = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: Cbox Status = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: EV6 captured status of Bcache mode = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: EV6 Exception Address = fffffc00002d8f30
Sep 1 10:36:20 mcdas4 vmunix: EV6 Interrupt Enablement and Current Processor mode = 0000003ee0000000
Sep 1 10:36:20 mcdas4 vmunix: EV6 Interrupt Summary Register = 0000000200000000
Sep 1 10:36:20 mcdas4 vmunix: EV6 TBmiss or Fault status = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: EV6 PAL Base Address = 0000000000018000
Sep 1 10:36:20 mcdas4 vmunix: EV6 Ibox control = fffffffc1e304396
Sep 1 10:36:20 mcdas4 vmunix: EV6 Ibox Process_context = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: O/S Summary flag = 0000000000000006
Sep 1 10:36:20 mcdas4 vmunix: Cchip Base Address (phys) = 00000801a0000000
Sep 1 10:36:20 mcdas4 vmunix: Cchip Device Raw Interrupt Request = 2000000000000000
Sep 1 10:36:20 mcdas4 vmunix: DRIR Register Decode:
Sep 1 10:36:20 mcdas4 vmunix: Bit 61: Error from Pchip 1
Sep 1 10:36:20 mcdas4 vmunix: PCI Device Interrupt Mask = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: Cchip Miscellaneous Register = 0000000100000000
Sep 1 10:36:20 mcdas4 vmunix: Misc Register Decode:
Sep 1 10:36:20 mcdas4 vmunix: Bit 32: CChip Rev (Bit<32>)
Sep 1 10:36:20 mcdas4 vmunix: Cchip Revision: 01
Sep 1 10:36:20 mcdas4 vmunix: ID of CPU performing read: 00
Sep 1 10:36:20 mcdas4 vmunix: Pchip 0 Base Address (phys) = 0000080180000000
Sep 1 10:36:20 mcdas4 vmunix: Pchip 0 Error Register = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: Pchip Error Register Decode:
Sep 1 10:36:20 mcdas4 vmunix: PCI Xaction Start Address = 0000000000000000
Sep 1 10:36:20 mcdas4 vmunix: PCI Command: Interrupt Acknowledge
Sep 1 10:36:20 mcdas4 vmunix: Pchip 1 Base Address (phys) = 0000080380000000
Sep 1 10:36:20 mcdas4 vmunix: Pchip 1 Error Register = f100410088100801
Sep 1 10:36:20 mcdas4 vmunix: Pchip Error Register Decode:
Sep 1 10:36:20 mcdas4 vmunix: Bit 0: Lost Error
Sep 1 10:36:20 mcdas4 vmunix: Bit 11: Correctable ECC Error
Sep 1 10:36:20 mcdas4 vmunix: System Address = 0000000041008810
Sep 1 10:36:20 mcdas4 vmunix: Command: DMA Read
Sep 1 10:36:20 mcdas4 vmunix: ECC Syndrome: f1
Sep 1 10:36:20 mcdas4 vmunix: panic (cpu 0): System Uncorrectable Machine Check
Sep 1 10:36:20 mcdas4 vmunix: syncing disks... device string for dump = SCSI 1 7 0 0 0 0 0.
Sep 1 10:36:20 mcdas4 vmunix: DUMP.prom: dev SCSI 1 7 0 0 0 0 0, block 20541131
Sep 1 10:36:20 mcdas4 vmunix: device string for dump = SCSI 1 7 0 0 0 0 0.
Sep 1 10:36:20 mcdas4 vmunix: DUMP.prom: dev SCSI 1 7 0 0 0 0 0, block 20541131

2 REPLIES
Vladimir Fabecic
Honored Contributor

Re: Tru64 version 4.0F - DS20E 833MHZ system panic

"Sep 1 10:36:20 mcdas4 vmunix: ECC Syndrome: f1
Sep 1 10:36:20 mcdas4 vmunix: panic (cpu 0): System Uncorrectable Machine Check"

First I would do some memory tests

>>> memexer 3
In vino veritas, in VMS cluster
Hein van den Heuvel
Honored Contributor

Re: Tru64 version 4.0F - DS20E 833MHZ system panic

The line above the ECC error reads

>> Sep 1 10:36:20 mcdas4 vmunix: Command: DMA Read

That suggests a device involved to me.
Does the 'mc' in the name reflect to Memory Channel per chance?

If you can find exercisers for yor PIC devices.. run them.
And consider re-seating and dusting of the PCI devices.

What changed?

Was anything special happening at the time?

Was there something possibly physically wrong with the system at that time... someone re-arranging cables, higher or lower temperatures than normal (but not in alarm range)?
Any special load at that crash time?

fwiw,
Hein.