Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

Node re-booted with CWLNMERR bugcheck .. Can anyboby advise/help ?

 
AnkurSaxena
Occasional Contributor

Node re-booted with CWLNMERR bugcheck .. Can anyboby advise/help ?

SDA> clue crash

Crashdump Summary Information:

------------------------------

Crash Time: 19-FEB-2013 11:05:57.14

Bugcheck Type: CWLNMERR, Fatal error in clusterwide logical name support

Node: BFAAQA (Cluster)

CPU Type: AlphaServer ES45 Model 2B

VMS Version: V8.3

Current Process: NULL

Current Image: <not available>

Failing PC: FFFFFFFF.804B3444 LNM$SEND_ACTIVE_C+00734

Failing PS: 34000000.00000804

Module: SYS$CLUSTER (Link Date/Time: 4-AUG-2010 15:09:17.78)

Offset: 0003B444

Boot Time: 19-FEB-2013 11:05:10.00

System Uptime: 0 00:00:47.14

Crash/Primary CPU: 1./0.

System/CPU Type: 2608

Pagesize: 8 KByte (8192 bytes)

Physical Memory: 8192 MByte (4194304 PFNs, discontiguous memory)

Dumpfile Pagelets: 271588 blocks

Dump Flags: writecomp,errlogcomp

Dump Type: raw,selective,dosd,shared_mem

EXE$GL_FLAGS: poolpging,init,bugdump

Paging Files: 0 Pagefiles and 0 Swapfiles installed

Stack Pointers:

KSP = FFFFFFFF.875BDCB4 ESP = FFFFFFFF.875BF000 SSP = FFFFFFFF.875B9000

USP = FFFFFFFF.875B9000

General Registers:

R0 = 00000000.00000001 R1 = 00000000.00000001 R2 = FFFFFFFF.8224A7E0

R3 = FFFFFFFF.81C86400 R4 = FFFFFFFF.81C86400 R5 = 00000000.00000000

R6 = 00000000.00000001 R7 = 00000000.00000000 R8 = 00000001.00000000

R9 = 00000000.00000001 R10 = 00000001.00000007 R11 = FFFFFFFF.823532C0

R12 = 00000000.00000001 R13 = 00000000.00000000 R14 = FFFFFFFF.8198CF18

R15 = 00000000.00000005 R16 = 00000000.00000A4C R17 = 00000000.00000000

R18 = FFFFFFFF.818100F0 R19 = 00000000.00000000 R20 = 00000000.00000008

R21 = 00000000.00000010 R22 = 00000000.00000040 R23 = 00000000.00000000

R24 = 00000000.00000002 AI = FFFFFFFF.818100F0 RA = 00000000.00000000

PV = 00000000.01000001 R28 = 00000000.00000002 FP = FFFFFFFF.875BDE60

PC = FFFFFFFF.804B3448 PS = 34000000.00000804

System Registers:

Page Table Base Register (PTBR) 00000000.000FFFF8

Processor Base Register (PRBR) FFFFFFFF.81D31B00

Privileged Context Block Base (PCBB) 00000000.01531B80

Press RETURN for more.

SDA>

Crashdump Summary Information:

------------------------------

System Control Block Base (SCBB) 00000000.000003DC

Software Interrupt Summary Register (SISR) 00000000.00000100

Address Space Number (ASN) 00000000.00000000

AST Summary / AST Enable (ASTSR_ASTEN) 00000000.00000000

Floating-Point Enable (FEN) 00000000.00000000

Interrupt Priority Level (IPL) 00000000.00000008

Machine Check Error Summary (MCES) 00000000.00000000

Virtual Page Table Base Register (VPTB) FFFFFEFC.00000000

Press RETURN for more.

SDA>

Crashdump Summary Information:

------------------------------

Failing Instruction:

LNM$SEND_ACTIVE_C+00734: BUGCHK

Instruction Stream (last 20 instructions):

LNM$SEND_ACTIVE_C+006E4: LDA R1,#XFFFF(R0)

LNM$SEND_ACTIVE_C+006E8: BNE R1,#X000005

LNM$SEND_ACTIVE_C+006EC: LDQ_U R31,(SP)

LNM$SEND_ACTIVE_C+006F0: ADDL R13,#X01,R13

LNM$SEND_ACTIVE_C+006F4: BR R31,#XFFFFAA

LNM$SEND_ACTIVE_C+006F8: LDQ_U R31,(SP)

LNM$SEND_ACTIVE_C+006FC: LDQ_U R31,(SP)

LNM$SEND_ACTIVE_C+00700: LDA R1,#X0001(R0)

LNM$SEND_ACTIVE_C+00704: BEQ R1,#XFFFFFA

LNM$SEND_ACTIVE_C+00708: BR R31,#XFFFFA5

LNM$SEND_ACTIVE_C+0070C: LDQ_U R31,(SP)

LNM$SEND_ACTIVE_C+00710: EXTLL R16,#X04,R23

LNM$SEND_ACTIVE_C+00714: LDA SP,#XFFF0(SP)

LNM$SEND_ACTIVE_C+00718: STL R16,#X0008(SP)

LNM$SEND_ACTIVE_C+0071C: STL R23,#X000C(SP)

LNM$SEND_ACTIVE_C+00720: STQ R17,(SP)

LNM$SEND_ACTIVE_C+00724: LDQ R17,#XFA00(R14)

LNM$SEND_ACTIVE_C+00728: BIS R17,#X04,R16

LNM$SEND_ACTIVE_C+0072C: LDQ R17,(SP)

LNM$SEND_ACTIVE_C+00730: LDA SP,#X0008(SP)

LNM$SEND_ACTIVE_C+00734: BUGCHK

LNM$SEND_ACTIVE_C+00738: HALT

LNM$SEND_ACTIVE_C+0073C: LDQ_U R31,(SP)

LNM$SEND_ACTIVE_C+00740: LDA SP,#XFFD0(SP)

LNM$SEND_ACTIVE_C+00744: STQ R26,(SP)

SDA>

6 REPLIES 6
Volker Halle
Honored Contributor

Re: Node re-booted with CWLNMERR bugcheck .. Can anyboby advise/help ?

Ankur,

 

could you please post the full CLUE file from CLUE$COLLECT:CLUE$node_ddmmyy_hh.LIS as an attachment. This makes live easier for me...

 

Thanks,

 

Volker.

AnkurSaxena
Occasional Contributor

Re: Node re-booted with CWLNMERR bugcheck .. Can anyboby advise/help ?

Hi Volker,

 

Currently the Sytem is up and running.

 

CLUE has not automatically capture and archive summary dump file information in a CLUE listing file , so no clue file

 

While trying to generate , getting the below response

 

Ankur_BFAAQA>ana/crash $1$DKA0:[SYS1.SYSEXE]SYSDUMP.DMP;1

OpenVMS system dump analyzer

...analyzing an Alpha selective memory dump...

Dump taken on 19-FEB-2013 11:05:57.14 using version V8.3

CWLNMERR, Fatal error in clusterwide logical name support

SDA> clue history

%CLUE-I-ALRDYANA, dumpfile has already been analyzed

SDA>

Volker Halle
Honored Contributor

Re: Node re-booted with CWLNMERR bugcheck .. Can anyboby advise/help ?

Ankur,

 

please use SDA> CLUE HIST/OVERRIDE

 

Regards,

 

Volker.

AnkurSaxena
Occasional Contributor

Re: Node re-booted with CWLNMERR bugcheck .. Can anyboby advise/help ?

Hi Volker,

 

Thanks for the reply , Thanks we have sucessfully generated the file .

 

Volker Halle
Honored Contributor

Re: Node re-booted with CWLNMERR bugcheck .. Can anyboby advise/help ?

Ankur,

 

this CWLNMERR crash happened early after boot (uptime only 47 seconds). Did any other node in the cluster crash or boot at that time (especially node BFABQA) ?

 

There are lots of consistency checks inside the cluster-wide logical name table processing, one of those checks failed and forced a crash. You need the current OpenVMS sources for V8.3 to have a chance to find out more about the context of this specific crash.

 

Escalate this crash to HP, if you can.

 

Volker.

AnkurSaxena
Occasional Contributor

Re: Node re-booted with CWLNMERR bugcheck .. Can anyboby advise/help ?

Hi Volker, Thanks for your reply , yes we are planning to escalate the issue to HP. Also the Other node BFABQA was re-booted also ( before BFAAQA 5-6 min before, and it went well ) , However only BFAAQA failed when re-booted. But it came up fine when re-booted . Directory SYS$SYSROOT:[SYSMGR] STARTUP_BFABQA.LOG;229 26/32 19-FEB-2013 11:06:42.25