Operating System - OpenVMS
1747997 Members
4534 Online
108756 Solutions
New Discussion

My first system crash since years OpenVMS Alpha V8.4; clustered; all actual patches installed

 
SOLVED
Go to solution
Eberhard Heuser
Frequent Advisor

My first system crash since years OpenVMS Alpha V8.4; clustered; all actual patches installed

I cannot remember that a system crash did happen without seeing the cause.

 

As far as I understand is that the cluster driver forces the crash.

 

Did someone see the same problem?

 

regards

Eberhard

11 REPLIES 11
Volker Halle
Honored Contributor

Re: My first system crash since years OpenVMS Alpha V8.4; clustered; all actual patches installed

Eberhard,

 

nice to see, that my AutoCLUE procedure is still being used ;-)

 

Bugcheck Type:    CWLNMERR, Fatal error in clusterwide logical name support
VMS Version:        V8.4   
Current Process:  CLUSTER_SERVER
Current Image:     DSA12:[SYS70.SYSCOMMON.][SYSEXE]CSP.EXE;1
Failing PC:           00000000.00039184    CSP+39184

 

This looks like a known crash footprint, which is most likely caused by Paged Pool shortage:

 

Paged Pool:
Total Failures                               3
Failed Pages Accumulator         22
Total Alloc Requests          105012
Failed Alloc Requests              899

 

This crash is declared in CSP as a safeguard against filling up all of
pool, if there are more than 1000. LNM work requests pending.

SDA> eval @LNM$GL_CW_WORKQ_COUNT  and see it it's greater than 1000.

 

 

Volker.

Eberhard Heuser
Frequent Advisor

Re: My first system crash since years OpenVMS Alpha V8.4; clustered; all actual patches installed

SDA> eval @LNM$GL_CW_WORKQ_COUNT
Hex = 00000000.00000553   Decimal = 1363                 BUG$_DISKCLASS+00003

Eberhard

Volker Halle
Honored Contributor

Re: My first system crash since years OpenVMS Alpha V8.4; clustered; all actual patches installed

Eberhard,

 

further checks:

 

SDA> SHOW MEM/POOL/FULL - what does Paged Pool Free Space look like ?

 

SDA>  SHOW POOL/SUMM/PAGED - what's in there ?

 

Try to find out, why the first request in the LNM work queue (pointer to queue header is in R3), cannot be completed and is thus blocking all other requests in the queue.

 

Consider to increase paged pool.

 

Did you change something regarding cluster-wide logical names in your cluster ?

 

Volker.

Eberhard Heuser
Frequent Advisor

Re: My first system crash since years OpenVMS Alpha V8.4; clustered; all actual patches installed

SDA> SHOW MEM/POOL/FULL
System Memory Resources from Crashdump on  7-OCT-2012 04:31:12.34
-----------------------------------------------------------------

Nonpaged Dynamic Memory      (Lists + Variable)
  Current Size (MB)                 9.46   Current Size (Pagelets)     19376
  Initial Size (MB)                 6.96   Initial Size (Pagelets)     14256
  Maximum Size (MB)                41.76   Maximum Size (Pagelets)     85536
  Free Space (MB)                   0.76   Space in Use (MB)            8.69
  Largest Var Block (KB)          125.00   Smallest Var Block (KB)    125.00
  Number of Free Blocks              720   Free Blocks LEQU 64 bytes     279
  Free Blocks on Lookasides          719   Lookaside Space (KB)       655.68

Bus Addressable Memory        (Lists + Variable)
  Current Size (KB)               128.00   Current Size (Pagelets)       256
  Initial Size (KB)               128.00   Initial Size (Pagelets)       256
  Free Space (KB)                 110.87   Space in Use (KB)           17.12
  Largest Var Block (KB)          104.00   Smallest Var Block (KB)      6.87
  Number of Free Blocks                2   Free Blocks LEQU 64 bytes       0
  Free Blocks on Lookasides            0   Lookaside Space (bytes)         0

Paged Dynamic Memory         (Lists + Variable)
-----------------------------------------------------------------
  Current Size (MB)                 4.17   Current Size (Pagelets)      8560
  Free Space (MB)                   0.00   Space in Use (MB)            4.17
  Largest Var Block (bytes)          176   Smallest Var Block (bytes)     16
  Number of Free Blocks               57   Free Blocks LEQU 64 bytes      48
  Free Blocks on Lookasides            0   Lookaside Space (bytes)         0

Lock Manager Dynamic Memory
  Current Size (MB)                 3.99   Current Size (Pages)          511
  Free Space (MB)                   0.12   Hits                        44219
  Space in Use (MB)                 3.87   Misses                        453
  Number of Empty Pages                0   Expansions                    818
  Number of Free Packets             455
SDA> SHOW POOL/SUMM/PAGED
Paged Dynamic Storage Pool
--------------------------

        NPOOL address:                                       (None)
        Pool map address:                                    (None)
        Maximum number of lookaside lists:                     160.
        Current number of lookaside lists:                       0.
        Highest lookaside list used:                             0.
        Granularity size:                                       16.

LSTHDS(s)
---------

             LSTHDS              Variable             Lookaside
             address             listhead             listheads
        -----------------    -----------------    -----------------
             (None)          FFFFFFFF.81808960    FFFFFFFF.818537A8

Segment(s)
----------

         Start        End        Length
        --------    --------    --------
        8494E000    84D7BFFF    0042E000

Summary of Paged Pool contents
------------------------------

    Packet type/subtype        Packet count      Packet bytes    Percent
---------------------------  ----------------  ----------------  --------
Unknown                      0000004D          00009F20            (0.9%)
ADP                          00000001          00000200            (0.0%)
PQB                          00000001          000008F0            (0.1%)
GSD                          000002D2          0000E1C0            (1.3%)
KFE                          000001BA          0000F920            (1.5%)
KFRH                         0000013F          000638C0            (9.3%)
RSHT                         00000001          00000810            (0.0%)
CIMSG                        00000001          00000010            (0.0%)
ACL                          0000000C          00000280            (0.0%)
LNM                          000032C6          00255640           (55.9%)
KFD                          0000000B          00000290            (0.0%)
KFPB                         00000001          00000010            (0.0%)
ORB                          000000A7          00005A60            (0.5%)
OCB                          00000009          00000750            (0.0%)

PGD                          0000002E          0011C620           (26.6%)
  PGD_F11BC                          00000001          0011A060   (26.4%)
  KFERES                             0000002D          000025C0    (0.2%)

VCC                          00000001          00001B90            (0.2%)
  VCC                                00000001          00001B90    (0.2%)

OVRS                         00000001          0000E200            (1.3%)
  OVRS_CSB                           00000001          0000E200    (1.3%)

RC                           00000003          000092D0            (0.9%)
  RC                                 00000003          000092D0    (0.9%)

SECURITY                     00000004          00009380            (0.9%)
  SECURITY                           00000001          000092F0    (0.9%)
  SECURITY_ACMESDB                   00000001          00000030    (0.0%)
  SECURITY_ACMEADB                   00000001          00000020    (0.0%)
  SECURITY_ACMECH                    00000001          00000040    (0.0%)

CTD                          00000001          00002110            (0.2%)
  CTD                                00000001          00002110    (0.2%)

MMG                          00000001          00004810            (0.4%)
  MMG                                00000001          00004810    (0.4%)

Total space used: 0042D600 (4380160.) bytes out of 0042E000 (4382720.) bytes
  in 000039E3 (14819.) packets

Total space utilization: 99.9%

I not aware of any changes regarding cluster-wide logical names.

 

eberhard

Volker Halle
Honored Contributor

Re: My first system crash since years OpenVMS Alpha V8.4; clustered; all actual patches installed

Eberhard,

 

Paged Dynamic Memory:    Free Space (MB) 0.00  !

 

LNM 000032C6 00255640 (55.9%) - more than half of paged pool is filled by Logical Name Blocks

 

Note that nonpaged pool is also pretty tight: Free Space (MB) 0.76

 

The question to answer is: what's the first packet in the LNM work queue and why doesn't that request get finished ?

 

SDA> VALIDATE QUEUE LNM$GQ_CW_WORKQ   ! show first couple of elements in queue

SDA> READ SYSDEF

SDA> FORMAT @LNM$GQ_CW_WORKQ              ! try to format the FIRST CWLNM packet in the queue

 

Did anything happen on any of the other nodes in your cluster at the time of the crash or some seconds prior to the crash ? Like another node booting or shutting down ?

 

Volker.

John Gillings
Honored Contributor

Re: My first system crash since years OpenVMS Alpha V8.4; clustered; all actual patches installed

 

55.9% of paged pool is a LOT of logical names...

 

I'd be looking for something that's leaking logical names, or logical name tables. The big hammer approach would be to check out SHOW LOGICAL/TABLE=* at (say) 1 hour intervals on your running system, looking for accumulation. Given the association with cluster wide logical names, that's where I'd be concentrating, Filter the lists looking for entries that persist over time.

 

Many years ago there was an issue with Oracle not cleaning up logical name tables (/TABLE=*TNS*) which would eventually consume all of paged pool and take the system down. It's long since been fixed, so I doubt that's the issue here. Even before the fix, it was fairly simple to write a procedure that would detect and delete dead tables, thus protecting the system.

 

 

A crucible of informative mistakes
Eberhard Heuser
Frequent Advisor

Re: My first system crash since years OpenVMS Alpha V8.4; clustered; all actual patches installed

This is the output:

 

SDA> VALIDATE QUEUE LNM$GQ_CW_WORKQ
Queue is complete, total of 1363 elements in the queue
SDA> READ SYSDEF
SDA> FORMAT @LNM$GQ_CW_WORKQ
FFFFFFFF.853E2000   CWLNM$L_FLINK                            8238DB00
FFFFFFFF.853E2004   CWLNM$L_BLINK                   8180ADA8             LNM$GQ_
CW_WORKQ
FFFFFFFF.853E2008   CWLNM$W_FILL_1                               0000
FFFFFFFF.853E200A   CWLNM$B_TYPE                               65
FFFFFFFF.853E200B   CWLNM$B_SUBTYPE                          0C
FFFFFFFF.853E200C   CWLNM$L_SIZE                    00002000
FFFFFFFF.853E2010   CWLNM$L_FLAGS                            00000004
FFFFFFFF.853E2014   CWLNM$L_DATASTART               00000018

All other nodes (three I64 machines) were up without a reboot.

 

Eberhard

Richard Brodie_1
Honored Contributor

Re: My first system crash since years OpenVMS Alpha V8.4; clustered; all actual patches installed

..and maybe CLUE SYSTEM/LOGICALS on the crash dump also.

 

I wouldn't want to post mine without heavy redaction but you might get an idea if something stands out. You might get something comparing against the running system, without having to wait around for a reoccurence.

Eberhard Heuser
Frequent Advisor

Re: My first system crash since years OpenVMS Alpha V8.4; clustered; all actual patches installed

CLUE SYSTEM/LOGICALS:

 

many, many strange entries like this:

 

84C362F0   84C36354   "POP3::DE7C9852:C3B800000000" = "......"

 

I'm using Multinet-TCPIP. If a hacker tries to use pop to breakin this might

be the reason for the crash.

 

Eberhard