Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Help.. Server crashed with Fatal BUG CHECK, version = V5.5-2 UNXINTEXC

 
Victor Mendham
Regular Advisor

Help.. Server crashed with Fatal BUG CHECK, version = V5.5-2 UNXINTEXC

Anyone have any hints to troubleshoot this server crash.

It looks like a tape init was running and then the system crashed..

%MOUNT-I-MOUNTED, Billy mounted on _$255$MIA0: (TAPE0)
Initiallizing NEW Tape MUA0
%DCL-I-ALLOC, _$2$MUA650: allocated
%MOUNT-I-MOUNTED, Bobbyd mounted on _$2$MUA650: (HSJ01)

Interrupt through SCB vector 60 @ XMI node E1880000 - CANNOT CONTINUE

Valid mask: FFFFFFFF
000000FF

XDEV = 00058087
XBE = 90001040
XFADR0 = 61880008
XGPR = 00000036
NSCSR0 = 00000020
XCR0 = 00000000
XFAER0 = 1000000F
XBEER0 = 00000001
WFADR0 = 1D7F8140
WFADR1 = 1D7F8120
NCSR = 00000800
ICSR = 00000001
VMAR = 000007E0
VTAG = 00000600
VDATA = 9F165022
ECR = 0000008A
PAMODE = 00000000
MMEADR = 83B4CA74
MMEPTE = 1D894994
MMESTS = 1C00C004
TBADR = 00000000
TBSTS = 800001D0
PCADR = 1CB33C28
PCSTS = FFFFF830
PCCTL = FFFFFC13
CCTL = 00000037
BCETSTS = 00000140
BCETIDX = 01400020
BCETAG = 8040F600
BCEDSTS = 00000400
BCEDIDX = 00000020
BCEDECC = 00000000
CEFADR = E1E00108
CEFSTS = 00019210
NESTS = 00000000
NEOADR = 1CB33C6C
NEOCMD = 00000F15
NEDATHI = 00018000
NEDATLO = 00018000
NEICMD = 0000018C

**** Fatal BUG CHECK, version = V5.5-2 UNXINTEXC, Unexpected interrupt or exception

Crash CPU: 01 Primary CPU: 01

Active/available CPU masks: 0000001E/0000001E

Current process = NULL

Register dump

R0 = 041F0000

A search on google for the unx error brings me to...

http://vms.cc.wmich.edu/disk$openvms0731/000000/731final/6023/6023pro_018.html

At the bottom of the page, it shows...

Facility: BUGCHECK, System Bugcheck
Explanation: The OpenVMS software detected an irrecoverable, inconsistent condition. After all physical memory is written to a system dump file, the system automatically reboots if the BUGREBOOT system parameter is set to 1.
User Action: Use the BACKUP command with the /IGNORE=NOBACKUP qualifier to make a backup save set of the system dump file and the error log file active at the time of the error. Contact a Compaq support representative and describe the conditions leading to the error.

UNXINTEXC, unexpected interrupt or exception

The Error log shows...it has been 5 days since the last exact crash... Any thoughts, could this be memory, tape drive or HSJ controller issue?

******************************* ENTRY 22015. *******************************
ERROR SEQUENCE 56172. LOGGED ON: SID 13000202
DATE/TIME 19-MAR-2004 10:36:57.88 SYS_TYPE 02100101
SYSTEM UPTIME: 5 DAYS 13:00:20
SCS NODE:
BILLYBOB
VAX/VMS V5.5-2

DISMOUNT VOLUME KA66 CPU FW REV# 2. CONSOLE FW REV# 1.0
XMI NODE # 3.

UNIT _HSJ01$MUA650:, VOLUME LABEL "BOBBYD"

857184. QIO OPERATIONS THIS UNIT, 0. ERRORS THIS UNIT
2. QIO OPERATIONS THIS VOLUME, 0. ERRORS THIS VOLUME
******************************* ENTRY 22016. *******************************
ERROR SEQUENCE 56179. LOGGED ON: SID 13000202
DATE/TIME 19-MAR-2004 11:37:58.91 SYS_TYPE 02100101
SYSTEM UPTIME: 5 DAYS 14:01:21
SCS NODE:
BILLYBOB
VAX/VMS V5.5-2

TIME STAMP KA66 CPU FW REV# 2. CONSOLE FW REV# 1.0
XMI NODE # 2.
******************************* ENTRY 22017. *******************************
ERROR SEQUENCE 56180. LOGGED ON: SID 13000202
DATE/TIME 19-MAR-2004 11:41:41.79 SYS_TYPE 02100101
SYSTEM UPTIME: 5 DAYS 14:05:04
SCS NODE:
BILLYBOB
VAX/VMS V5.5-2

MACHINE CHECK KA66 CPU FW REV# 2. CONSOLE FW REV# 1.0
XMI NODE # 1.

SW FLAGS 04000000
00040000
00000000
00014005
machine check code 6 bit<26>
cfe rde bit<50>
xmi present bit<96>
adapter present bit<98>
all enabled bit<110>
noxma error bit<112>
LOGGING OFF 00000000
00000000
00000000
00000000
ACTIVE CPUS 0000001E
HW REVISION 02000000
37304520
SYS SERIAL NUM 34384741
33323034
3833
SYS SERIAL NUM = ##########3
SERIAL NUMBER 30334147
38343332
3536
SERIAL NUMBER = GA30234865
RESRC DISABLE 0000

PHYS ADDRESS E1880000
XDEV 00058087
KA66A
DEVICE REV = 5.
XBE 90042040
TRANSACTION TIMEOUT
NO READ RESPSONSE
XMI BAD
ERROR DETECTED
XFADR 61E00108
FAILING ADDR = 8001E00108(X)
FAILING LENGTH = 1.
XFAER 1000000F
TRANSACTION BYTE MASK = 000F(X)
READ CMD
XGPR 00000036
NSCSR0 00000020
boot processor
NVAX rev = 00
XCR0 00000000
XBEER0 00000000
WFADR0 1D7F7860
WFADR1 1D7F8100
NCSR 00000800
set cntrol-p enable
ICSR 00000001
enable VIC
VMAR 000007E0
longword select = 00
sub_block select = 00
row index = 1F
Bank Select = 01
Tag = 00
VTAG 81B33E40
VDATA 9F165022
ECR 000000CA
fbox enable
fbox st4 bypass enable
timeout clock
iccs ext
pmf pmux = 00
pmf emux = 00
PAMODE 00000000
30 bit physical address mode
MMEADR 00197200
MMEPTE 0000018C
MMESTS 18008000
lock 0
tnv fault
TBADR 00000000
TBSTS 800001D0
s5 cmd corresp to tb perr = 1D
source of ref causing tb perr = 04
PCADR 1D601F68
PCSTS FFFFF800
no operation
PCCTL FFFFFC13
d-stream enabled
i-stream enabled
parity checking enabled
tb hit rate, p0/p1 sp i-stream reads
CCTL 00000037
bcache enabled
tag speed = 01
data speed = 01
size = 03
bcache coherency access
bcache hit
BCETSTS 00000140
tag store cmd being processed = 0A
BCETIDX 01400020
BCETAG 8040F600
valid
owned
ecc = 1E
tag = 0402
BCEDSTS 00000400
data rams cmd at time of err = 04
BCEDIDX 00000020
BCEDECC 00000000
ecclo = 00
ecchi = 00
CEFADR E1E00108
CEFSTS 0001920A
REGISTER LOCKED
READ DATA ERROR, FILL FAILED
data returned to mbox
do not fill
count = 03
NESTS 00000000
NEOADR 1D601FAC
NEOCMD 00000F15
ndal command = 05
commander id = 01
byte enable = 0F
length of ndal transaction = 00
NEDATHI 00018000
byte enable = 0180
length = 00
NEDATLO 00018000
NEICMD 0000018C
ndal command = 0C
commander id = 00
parity = 03

STACK FRAME

BYTE COUNT 00000018
MCHK TYPE 80060001
AST LEVEL = 00(x)
06 sync hardware error occurred
INT SYS REG 00000001
SAVEPC REG 81AF82E5
VA REG 8000848E
Q REG 17800000
OPCODE F0E10080
vax restart bit
OPCODE = E1(x)
PC REG 8320E874
ERROR PSL 04150009
C-BIT
N-BIT
INTERRUPT PRIORITY LEVEL = 21.
PREVIOUS MODE = KERNEL
CURRENT MODE = KERNEL
INTERRUPT STACK
FIRST PART DONE CLEAR

ERROR COUNTERS

code_6 01
cfe_rde 01

XMI NODE DATA
PHYS ADDR E1880000
XDEV VALID 6FFE
XBE VALID 6FFE
XFADR VALID 0002
XFAER VALID 0002
NODE PRESENT 7FFE

XMI NODE #1.
XDEV 00058087
KA66A
DEVICE REV = 5.
XBE 90042040
XFADR 61E00108
FAILING ADDR = 8001E00108(X)
FAILING LENGTH = 1.
XFAER 1000000F
TRANSACTION BYTE MASK = 000F(X)
READ CMD

XMI NODE #2.
XDEV 00058087
KA66A
DEVICE REV = 5.
XBE 10000080

XMI NODE #3.
XDEV 00058087
KA66A
DEVICE REV = 5.
XBE 100000C0

XMI NODE #4.
XDEV 00058087
KA66A
DEVICE REV = 5.
XBE 10000100

XMI NODE #5.
XDEV 00844001
MS65A
DEVICE REV = 132.
XBE 00000000

XMI NODE #6.
XDEV 00844001
MS65A
DEVICE REV = 132.
XBE 00000000

XMI NODE #7.
XDEV 00844001
MS65A
DEVICE REV = 132.
XBE 00000000

XMI NODE #8.
XDEV 00844001
MS65A
DEVICE REV = 132.
XBE 00000000

XMI NODE #9.
XDEV 00844001
MS65A
DEVICE REV = 132.
XBE 00000000

XMI NODE #10
XDEV 00844001
MS65A
DEVICE REV = 132.
XBE 00000000

XMI NODE #11
XDEV D26E0810
KFMSA
HW REV = D2
FW REV = V3.14
XBE 000002C0

XMI NODE #12
XDEV 00000823
IO DEVICE = 35.
DEVICE REV = 0.


ADAPTER registers not readable


XMI NODE #13
XDEV 47530C05
CIXCD
HW REV = E3
FW REV = V2.7
XBE 00000340

XMI NODE #14
XDEV 06060C03
DEMNA
HW REV = F
EEPROM FW REV = 6.
XBE 00000384

LOG ADAPTER DATA

XDEV 00000823
IO DEVICE = 35.
DEVICE REV = 0.
XBER0 04000000
XFADR0 00000823
XFAER0 00000000
******************************* ENTRY 22018. *******************************
ERROR SEQUENCE 56181. LOGGED ON: SID 13000202
DATE/TIME 19-MAR-2004 11:41:42.35 SYS_TYPE 02100101
SYSTEM UPTIME: 5 DAYS 14:05:04
SCS NODE: BILLYB ...