Operating System - OpenVMS
1839232 Members
3711 Online
110137 Solutions
New Discussion

Re: VMS crash dump DecNet I64 8.3-1H1

 
Toine_1
Regular Advisor

VMS crash dump DecNet I64 8.3-1H1

Hi,

I use decnet version V8.3-1H1 ECO01.
My system crashed.
Anyone else had the same problem?


Crashdump Summary Information:
------------------------------
Crash Time: 17-NOV-2009 12:58:13.50
Bugcheck Type: INVEXCEPTN, Exception while above ASTDEL
Node: NVJ (Cluster)
CPU Type: HP rx6600 (1.59GHz/12.0MB)
VMS Version: V8.3-1H1
Current Process: TAX_BVR900
Current Image: DSA108:[VMS$COMMON.JAVA$150.BIN]JAVA$JAVA.EXE
Failing PC: FFFFFFFF.8194D221 NET$TRANSPORT_NSP+37A21
Failing PS: 00000000.00000808
Module: NET$TRANSPORT_NSP (Link Date/Time: 25-NOV-2008 14:31:38.89
)
Offset: 00037A21

Boot Time: 31-OCT-2009 21:24:36.00
System Uptime: 16 15:33:37.50
Crash/Primary CPU: 12./0.
System/CPU Type: 4020
Saved Processes: 165
Pagesize: 8 KByte (8192 bytes)
Physical Memory: 32768 MByte (134742016 PFNs, discontiguous memory)
Dumpfile Pagelets: 4916146 blocks
Dump Flags: olddump,writecomp,errlogcomp
Dump Type: compressed,selective,shared_mem
EXE$GL_FLAGS: poolpging,init,bugdump,tbchk
Paging Files: 1 Pagefile and 1 Swapfile installed

Stack Pointers:
KSP = 00000000.7FF43060 ESP = 00000000.7FF68000 SSP = 00000000.7FFAC000
USP = 00000000.0246FA00

General Registers:
R0 = 00000000.00000000 GP = FFFFFFFF.E1457A00 R2 = 0009804C.0270033F
R3 = 00000000.7FF43320 R4 = 00000000.7FF430E0 R5 = 00000000.7FF43934
R6 = 00000000.7FF43990 R7 = 00000000.00000000 R8 = 00000000.0000043C
R9 = 00000000.000005D4 R10 = 00000000.00000001 R11 = 00000000.7FF43AF8
SP = 00000000.00000000 TP = 00000000.02471200 R14 = 00000000.00000008
R15 = FFFFFFFF.B0073D90 R16 = 00000000.00000020 R17 = 00000000.000001CC
R18 = 00000000.7FF43990 R19 = 00000000.00000004 R20 = 00000000.00000001
R21 = 00000000.00000004 R22 = 00000000.7FF43A38 R23 = 00000000.00040004
R24 = 00000000.7FF43A38 AI = 00000000.00000004 RA = 00000000.00000000
PV = 00000000.000005D4 R28 = 00000000.00000000 FP = 00100000.00000996
R30 = 00000000.7FF43970 R31 = FFFFFFFF.E11E6570
Crashdump Summary Information:
------------------------------
Exception Frame:
Exception taken at IP FFFFFFFF.8194D220, slot 01 from Kernel mode
Trap Type 00000086 (NaT Consumption Fault)
IVT Offset 00005600 (NaT Consumption Fault)

Control Registers:
CR0 Default Control Register (DCR) 00000000.00007F00
CR16 Processor Status Register (IPSR) 00001210.08026030
CR17 Interrupt Status Register (ISR) 00000202.00000010
CR19 Instruction Pointer (IIP) FFFFFFFF.8194D220
CR20 Faulting Address (IFA) FFFFF804.09E633FA
CR21 TLB Insertion Register (ITIR) 00000000.00000334
CR22 Instruction Previous Address (IIPA) FFFFFFFF.8194D220
CR23 Function State (IFS) 80000000.00000C18
CR24 Instruction immediate (IIM) 00000000.00000060
CR25 VHPT Hash Address (IHA) FFFFFFFF.7FDFE640

Application Registers:
AR16 Register Stack Config Reg (RSC) 00000000.00000003
AR17 Backing Store Pointer (BSP) 00000000.7FF2E8E8
AR18 Backing Store for Mem Store (BSPSTORE) 00000000.7FF2E738
AR19 RSE NaT Collection Register (RNAT) 00000000.00000020
AR32 Compare/Exchange Comp Value Reg (CCV) 00000000.B1835400
AR36 User NaT Collection Register (UNAT) 00000000.00000000
AR64 Previous Function State (PFS) 00100000.00000C18
AR65 Loop Count Register (LC) 00000000.00000000
AR66 Epilog Count Register (EC) 00000000.00000001

Processor Status Register (IPSR):
AC = 0 MFL= 1 MFH= 1 IC = 1 I = 1 DT = 1
DFL= 0 DFH= 0 RT = 1 CPL= 0 IT = 1 MC = 0 RI = 1
Interrupt Status Register (ISR):
Code 00000010 X = 0 W = 1 R = 0 NA = 0 SP = 0
RS = 0 IR = 0 NI = 0 SO = 0 EI = 1 ED = 0

Branch Registers:
B0 FFFFFFFF.8194BEF0
B1 00000000.00000000
B2 00000000.00000000
B3 00000000.00000000
B4 00000000.00000000
B5 00000000.010C0C00
B6 FFFFFFFF.80137350
Crashdump Summary Information:
------------------------------
B7 FFFFFFFF.8021E6B0

Floating Point Registers: FPSR 0009804C.0270033F
F6 00000000.0001003E.00000000.00000010
F7 00000000.0001003E.00000000.00000008
F8 00000000.0001003E.00000000.0F7D9000
F9 00000000.00010005.D2000000.00000000
F10 00000000.00010011.BC8B94B9.4B94B94C
F11 00000000.0000FFF8.9C09C09C.09C09C0A

Miscellaneous Registers:
Interrupt Priority Level (IPL) 00000008
Stack Align 000002D0
NaT Mask 0032
PPrev Mode 00
Previous Stack 00
Interrupt Depth 01
Preds 40000000.0000A985
Nats 00000000.80000000
Context 40000000.0001AA03

General Registers:
R0 00000000.00000000 GP FFFFFFFF.E1828800 R2 FFFFFFFF.E1039828
R3 00000000.00000000 R4 00000000.00000000 R5 00000000.00000014
R6 FFFFFFFF.B47BB2C0 R7 00000000.00000000 R8 FFFFFFFF.B451C5F8
R9 00000000.00000014 R10 00000000.00000000 R11 00000000.00000000
SP 00000000.7FF43C60 TP 00000000.02471200 R14 00000000.00000000
R15 FFFFFFFF.E1017B78 R16 FFFFFFFF.B451C630 R17 00000000.000002B9
R18 00000000.000001B8 R19 FFFFFFFF.B47BB583 R20 00000000.00000000
R21 FFFFFFFF.E100A200 R22 00000000.0000001A R23 FFFFFFFF.B47BB582
R24 00000000.0000001A R25 00000000.00000002 R26 00000000.00000000
R27 FFFFFFFF.E16280CE R28 00000000.00000000 R29 00000000.7FF43D10
R30 FFFFF804.18AC8000 R31 FFFFFFFF.E11E6570


Signal Array: 64-bit Signal Array:
Arg Count = 00000004 Arg Count = 00000004
Condition = 000005D4 Condition = 00000000.000005D4
Argument #2 = 00000004 Argument #2 = 00000000.00000004
Argument #3 = 8194D221 Argument #3 = FFFFFFFF.8194D221
Argument #4 = 00000800 Argument #4 = 00000000.00000800

Mechanism Array:
Crashdump Summary Information:
------------------------------
Arguments = 00000109 Establisher FP = 00000000.7FF43C60
Flags = 00000001 Exception FP = 00000000.7FF43990
Depth = FFFFFFFD Signal Array = 00000000.7FF43934
Handler Data = 00000000.0001001B Signal64 Array = 00000000.7FF43948
R0 = FFFFFFFF.B451C5F8 R1 = 00000000.000005D4
F2 = 00000000.00000000 F3 = 00000000.00000000 F4 = 00000000.00000000
F5 = 00000000.00000000 F12 = 80000000.00000000 F13 = FFFFFFD8.00000000
F14 = 00000000.00000004 F15 = 80000000.00000000 F16 = 00000000.00000000
F17 = 00000000.00000000 F18 = 00000000.00000000 F19 = 00000000.00000000
F20 = 00000000.00000000 F21 = 00000000.00000000 F22 = 00000000.00000000
F23 = 00000000.00000000 F24 = 00000000.00000000 F25 = 00000000.00000000
F26 = 00000000.00000000 F27 = 00000000.00000000 F28 = 00000000.00000000
F29 = 00000000.00000000 F30 = 00000000.00000000 F31 = 00000000.00000000

System Registers:
Page Table Base Register (PTBR) 00000000.0029DBBA
Processor Base Register (PRBR) FFFFFFFF.B09B2180
Privileged Context Block Base (PCBB) FFFFFFFF.E255C080
System Control Block Base (SCBB) FFFFFFFF.B0F2D0CE
Software Interrupt Summary Register (SISR) 00000000.00000100
Address Space Number (ASN) 00000000.0078B4DB
AST Summary / AST Enable (ASTSR_ASTEN) 00000000.0000000F
Floating-Point Enable (FEN) 00000000.00000001
Interrupt Priority Level (IPL) 00000000.00000008
Machine Check Error Summary (MCES) 00000000.00000000
Virtual Page Table Base Register (VPTB) 00000000.00000000
Crashdump Summary Information:
------------------------------
Failing Instruction:
NET$TRANSPORT_NSP+37A21: st2 [r41] = r9

Instruction Stream (last 20 instructions):
NET$TRANSPORT_NSP+379D0: nop.m 000000
NET$TRANSPORT_NSP+379D1: nop.f 000000
NET$TRANSPORT_NSP+379D2: br.call.sptk.many b0 = 00328F0 ;;
NET$TRANSPORT_NSP+379E0: add r44 = 0020, r44
NET$TRANSPORT_NSP+379E1: mov r3 = 000029
NET$TRANSPORT_NSP+379E2: sxt4 r5 = r40
NET$TRANSPORT_NSP+379F0: mov r1 = r51 ;;
NET$TRANSPORT_NSP+379F1: ld2 r8 = [r44], 1E0
NET$TRANSPORT_NSP+379F2: nop.i 000000 ;;
NET$TRANSPORT_NSP+37A00: nop.m 000000
NET$TRANSPORT_NSP+37A01: sxt4 r28 = r8
NET$TRANSPORT_NSP+37A02: br.call.sptk.many b0 = 00328C0 ;;
NET$TRANSPORT_NSP+37A10: mov r1 = r51
NET$TRANSPORT_NSP+37A11: nop.f 000000
NET$TRANSPORT_NSP+37A12: nop.i 000000
NET$TRANSPORT_NSP+37A20: add r41 = 00A6, r44 ;;
NET$TRANSPORT_NSP+37A21: st2 [r41] = r9
NET$TRANSPORT_NSP+37A22: nop.i 000000 ;;
NET$TRANSPORT_NSP+37A30: nop.m 000000
NET$TRANSPORT_NSP+37A31: extr.u r4 = r46, 00, 01 ;;
NET$TRANSPORT_NSP+37A32: cmp4.eq p0, p6 = 01, r4 ;;
NET$TRANSPORT_NSP+37A40: (p6) cmp.eq.unc p2, p0 = r0, r0
NET$TRANSPORT_NSP+37A41: nop.f 000000
NET$TRANSPORT_NSP+37A42: (p2) br.cond.dpnt.few 0000040 ;;
NET$TRANSPORT_NSP+37A50: add r47 = 00B8, r42
Crashdump Summary Information:
------------------------------
Failing Instruction:
NET$TRANSPORT_NSP+37A21: st2 [r41] = r9

Instruction Stream (last 20 instructions):
NET$TRANSPORT_NSP+379D0: nop.m 000000
NET$TRANSPORT_NSP+379D1: nop.f 000000
NET$TRANSPORT_NSP+379D2: br.call.sptk.many b0 = 00328F0 ;;
NET$TRANSPORT_NSP+379E0: add r44 = 0020, r44
NET$TRANSPORT_NSP+379E1: mov r3 = 000029
NET$TRANSPORT_NSP+379E2: sxt4 r5 = r40
NET$TRANSPORT_NSP+379F0: mov r1 = r51 ;;
NET$TRANSPORT_NSP+379F1: ld2 r8 = [r44], 1E0
NET$TRANSPORT_NSP+379F2: nop.i 000000 ;;
NET$TRANSPORT_NSP+37A00: nop.m 000000
NET$TRANSPORT_NSP+37A01: sxt4 r28 = r8
NET$TRANSPORT_NSP+37A02: br.call.sptk.many b0 = 00328C0 ;;
NET$TRANSPORT_NSP+37A10: mov r1 = r51
NET$TRANSPORT_NSP+37A11: nop.f 000000
NET$TRANSPORT_NSP+37A12: nop.i 000000
NET$TRANSPORT_NSP+37A20: add r41 = 00A6, r44 ;;
NET$TRANSPORT_NSP+37A21: st2 [r41] = r9
NET$TRANSPORT_NSP+37A22: nop.i 000000 ;;
NET$TRANSPORT_NSP+37A30: nop.m 000000
NET$TRANSPORT_NSP+37A31: extr.u r4 = r46, 00, 01 ;;
NET$TRANSPORT_NSP+37A32: cmp4.eq p0, p6 = 01, r4 ;;
NET$TRANSPORT_NSP+37A40: (p6) cmp.eq.unc p2, p0 = r0, r0
NET$TRANSPORT_NSP+37A41: nop.f 000000
NET$TRANSPORT_NSP+37A42: (p2) br.cond.dpnt.few 0000040 ;;
NET$TRANSPORT_NSP+37A50: add r47 = 00B8, r42


/Toine
6 REPLIES 6
Volker Halle
Honored Contributor

Re: VMS crash dump DecNet I64 8.3-1H1

Toine,

when posting crashdump footprints in the future, please consider to just put the first 11 lines of the CLUE file into the text of your topic and add the full CLUE$COLLECT:CLUE$node_ddmmyy_hhmm.LIS file as a .TXT attachment.

A crash with this footprint has apparently never been publicly reported before, at least I have nothing in my crash footprint database.

Consider to log a call with HP. This most likely is a problem in DECnet-Plus.

Volker.
Hoff
Honored Contributor

Re: VMS crash dump DecNet I64 8.3-1H1

Patch the box to current (both OpenVMS and DECnet ECO kits), and then (if the problem persists) escalate to HP support. HP may (will?) ask for the CLUE CRASH data as an attachment to the report.

As for the purpose and use of the CLUE CRASH data, Volker has posted a good presentation (in German) on crashes and crashdump analysis and related tools here:

http://www.decus.de/slides/sy2004/20_04/1l02.pdf
Toine_1
Regular Advisor

Re: VMS crash dump DecNet I64 8.3-1H1

Hello,

The latest patches are installed on 8.3-1H1 only these ones are not done yet.
VMS831H1I_ICAP-V0300 06-NOV-2009
VMS831H1I_XFC-V0200 06-NOV-2009
VMS831H1I_FIBRE_SCSI-V0500 29-OCT-2009

I also logged a call at HP.

I have also added ed clue$node information.
Is this crash caused by DecNet or something else?

Regards,

/Toine
Hoff
Honored Contributor

Re: VMS crash dump DecNet I64 8.3-1H1

Quoting HP documentation for this error: "If no user-written I64 assembly language is involved, contact HP Customer Support."

Specifically, something at address 8194D221 encountered a NATFAULT (NAT: Not A Thing) fault; it tried accessing a bogus value.

DECnet was clearly involved here, but whether its involvement is as the culprit or as a bystander requires digging around in the crash and in the source code.

The trigger could be within DECnet, or it could well be something else going on within the context, or potentially there's a bug within a compiler somewhere. There are other potential triggers.
Toine_1
Regular Advisor

Re: VMS crash dump DecNet I64 8.3-1H1

Hoff,

You identified the correct problem.

The problem we encountered has already been elevated to Decnet Engineering.

The system crashed as of a NaT Fault (Not a Thing)
Typically this refers to a programming problem, when the code is trying to read from an uninitialized location, and it is mostly to be seen with multithreaded applications.

Fix-images for NET$TRANSPORT_NSP have been developed to resolve this crash for OpenVMS Integrity 8.3 and 8.3-1H1.

For 8.3-1H1 the official fix will be included in the release of DECnet-Plus coming after V8.3-1H1 ECO 1

/Toine
Volker Halle
Honored Contributor

Re: VMS crash dump DecNet I64 8.3-1H1

Toine,

thanks for providing the full CLUE file and the solution information. This allows this crash footprint to be easily identified in the future and even provides information about a solution.

Volker.