- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: My first system crash since years OpenVMS Alph...
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-07-2012 12:05 AM
10-07-2012 12:05 AM
I cannot remember that a system crash did happen without seeing the cause.
As far as I understand is that the cluster driver forces the crash.
Did someone see the same problem?
regards
Eberhard
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-07-2012 12:57 AM
10-07-2012 12:57 AM
Re: My first system crash since years OpenVMS Alpha V8.4; clustered; all actual patches installed
Eberhard,
nice to see, that my AutoCLUE procedure is still being used ;-)
Bugcheck Type: CWLNMERR, Fatal error in clusterwide logical name support
VMS Version: V8.4
Current Process: CLUSTER_SERVER
Current Image: DSA12:[SYS70.SYSCOMMON.][SYSEXE]CSP.EXE;1
Failing PC: 00000000.00039184 CSP+39184
This looks like a known crash footprint, which is most likely caused by Paged Pool shortage:
Paged Pool:
Total Failures 3
Failed Pages Accumulator 22
Total Alloc Requests 105012
Failed Alloc Requests 899
This crash is declared in CSP as a safeguard against filling up all of
pool, if there are more than 1000. LNM work requests pending.
SDA> eval @LNM$GL_CW_WORKQ_COUNT and see it it's greater than 1000.
Volker.
- Tags:
- CWLNMERR
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-07-2012 04:24 AM
10-07-2012 04:24 AM
Re: My first system crash since years OpenVMS Alpha V8.4; clustered; all actual patches installed
SDA> eval @LNM$GL_CW_WORKQ_COUNT
Hex = 00000000.00000553 Decimal = 1363 BUG$_DISKCLASS+00003
Eberhard
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-07-2012 04:54 AM - edited 10-07-2012 04:54 AM
10-07-2012 04:54 AM - edited 10-07-2012 04:54 AM
Re: My first system crash since years OpenVMS Alpha V8.4; clustered; all actual patches installed
Eberhard,
further checks:
SDA> SHOW MEM/POOL/FULL - what does Paged Pool Free Space look like ?
SDA> SHOW POOL/SUMM/PAGED - what's in there ?
Try to find out, why the first request in the LNM work queue (pointer to queue header is in R3), cannot be completed and is thus blocking all other requests in the queue.
Consider to increase paged pool.
Did you change something regarding cluster-wide logical names in your cluster ?
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-07-2012 07:42 AM
10-07-2012 07:42 AM
Re: My first system crash since years OpenVMS Alpha V8.4; clustered; all actual patches installed
SDA> SHOW MEM/POOL/FULL
System Memory Resources from Crashdump on 7-OCT-2012 04:31:12.34
-----------------------------------------------------------------
Nonpaged Dynamic Memory (Lists + Variable)
Current Size (MB) 9.46 Current Size (Pagelets) 19376
Initial Size (MB) 6.96 Initial Size (Pagelets) 14256
Maximum Size (MB) 41.76 Maximum Size (Pagelets) 85536
Free Space (MB) 0.76 Space in Use (MB) 8.69
Largest Var Block (KB) 125.00 Smallest Var Block (KB) 125.00
Number of Free Blocks 720 Free Blocks LEQU 64 bytes 279
Free Blocks on Lookasides 719 Lookaside Space (KB) 655.68
Bus Addressable Memory (Lists + Variable)
Current Size (KB) 128.00 Current Size (Pagelets) 256
Initial Size (KB) 128.00 Initial Size (Pagelets) 256
Free Space (KB) 110.87 Space in Use (KB) 17.12
Largest Var Block (KB) 104.00 Smallest Var Block (KB) 6.87
Number of Free Blocks 2 Free Blocks LEQU 64 bytes 0
Free Blocks on Lookasides 0 Lookaside Space (bytes) 0
Paged Dynamic Memory (Lists + Variable)
-----------------------------------------------------------------
Current Size (MB) 4.17 Current Size (Pagelets) 8560
Free Space (MB) 0.00 Space in Use (MB) 4.17
Largest Var Block (bytes) 176 Smallest Var Block (bytes) 16
Number of Free Blocks 57 Free Blocks LEQU 64 bytes 48
Free Blocks on Lookasides 0 Lookaside Space (bytes) 0
Lock Manager Dynamic Memory
Current Size (MB) 3.99 Current Size (Pages) 511
Free Space (MB) 0.12 Hits 44219
Space in Use (MB) 3.87 Misses 453
Number of Empty Pages 0 Expansions 818
Number of Free Packets 455
SDA> SHOW POOL/SUMM/PAGED
Paged Dynamic Storage Pool
--------------------------
NPOOL address: (None)
Pool map address: (None)
Maximum number of lookaside lists: 160.
Current number of lookaside lists: 0.
Highest lookaside list used: 0.
Granularity size: 16.
LSTHDS(s)
---------
LSTHDS Variable Lookaside
address listhead listheads
----------------- ----------------- -----------------
(None) FFFFFFFF.81808960 FFFFFFFF.818537A8
Segment(s)
----------
Start End Length
-------- -------- --------
8494E000 84D7BFFF 0042E000
Summary of Paged Pool contents
------------------------------
Packet type/subtype Packet count Packet bytes Percent
--------------------------- ---------------- ---------------- --------
Unknown 0000004D 00009F20 (0.9%)
ADP 00000001 00000200 (0.0%)
PQB 00000001 000008F0 (0.1%)
GSD 000002D2 0000E1C0 (1.3%)
KFE 000001BA 0000F920 (1.5%)
KFRH 0000013F 000638C0 (9.3%)
RSHT 00000001 00000810 (0.0%)
CIMSG 00000001 00000010 (0.0%)
ACL 0000000C 00000280 (0.0%)
LNM 000032C6 00255640 (55.9%)
KFD 0000000B 00000290 (0.0%)
KFPB 00000001 00000010 (0.0%)
ORB 000000A7 00005A60 (0.5%)
OCB 00000009 00000750 (0.0%)
PGD 0000002E 0011C620 (26.6%)
PGD_F11BC 00000001 0011A060 (26.4%)
KFERES 0000002D 000025C0 (0.2%)
VCC 00000001 00001B90 (0.2%)
VCC 00000001 00001B90 (0.2%)
OVRS 00000001 0000E200 (1.3%)
OVRS_CSB 00000001 0000E200 (1.3%)
RC 00000003 000092D0 (0.9%)
RC 00000003 000092D0 (0.9%)
SECURITY 00000004 00009380 (0.9%)
SECURITY 00000001 000092F0 (0.9%)
SECURITY_ACMESDB 00000001 00000030 (0.0%)
SECURITY_ACMEADB 00000001 00000020 (0.0%)
SECURITY_ACMECH 00000001 00000040 (0.0%)
CTD 00000001 00002110 (0.2%)
CTD 00000001 00002110 (0.2%)
MMG 00000001 00004810 (0.4%)
MMG 00000001 00004810 (0.4%)
Total space used: 0042D600 (4380160.) bytes out of 0042E000 (4382720.) bytes
in 000039E3 (14819.) packets
Total space utilization: 99.9%
I not aware of any changes regarding cluster-wide logical names.
eberhard
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-07-2012 08:29 AM - edited 10-07-2012 08:37 AM
10-07-2012 08:29 AM - edited 10-07-2012 08:37 AM
Re: My first system crash since years OpenVMS Alpha V8.4; clustered; all actual patches installed
Eberhard,
Paged Dynamic Memory: Free Space (MB) 0.00 !
LNM 000032C6 00255640 (55.9%) - more than half of paged pool is filled by Logical Name Blocks
Note that nonpaged pool is also pretty tight: Free Space (MB) 0.76
The question to answer is: what's the first packet in the LNM work queue and why doesn't that request get finished ?
SDA> VALIDATE QUEUE LNM$GQ_CW_WORKQ ! show first couple of elements in queue
SDA> READ SYSDEF
SDA> FORMAT @LNM$GQ_CW_WORKQ ! try to format the FIRST CWLNM packet in the queue
Did anything happen on any of the other nodes in your cluster at the time of the crash or some seconds prior to the crash ? Like another node booting or shutting down ?
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-07-2012 02:08 PM
10-07-2012 02:08 PM
Re: My first system crash since years OpenVMS Alpha V8.4; clustered; all actual patches installed
55.9% of paged pool is a LOT of logical names...
I'd be looking for something that's leaking logical names, or logical name tables. The big hammer approach would be to check out SHOW LOGICAL/TABLE=* at (say) 1 hour intervals on your running system, looking for accumulation. Given the association with cluster wide logical names, that's where I'd be concentrating, Filter the lists looking for entries that persist over time.
Many years ago there was an issue with Oracle not cleaning up logical name tables (/TABLE=*TNS*) which would eventually consume all of paged pool and take the system down. It's long since been fixed, so I doubt that's the issue here. Even before the fix, it was fairly simple to write a procedure that would detect and delete dead tables, thus protecting the system.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-07-2012 10:45 PM
10-07-2012 10:45 PM
Re: My first system crash since years OpenVMS Alpha V8.4; clustered; all actual patches installed
This is the output:
SDA> VALIDATE QUEUE LNM$GQ_CW_WORKQ
Queue is complete, total of 1363 elements in the queue
SDA> READ SYSDEF
SDA> FORMAT @LNM$GQ_CW_WORKQ
FFFFFFFF.853E2000 CWLNM$L_FLINK 8238DB00
FFFFFFFF.853E2004 CWLNM$L_BLINK 8180ADA8 LNM$GQ_
CW_WORKQ
FFFFFFFF.853E2008 CWLNM$W_FILL_1 0000
FFFFFFFF.853E200A CWLNM$B_TYPE 65
FFFFFFFF.853E200B CWLNM$B_SUBTYPE 0C
FFFFFFFF.853E200C CWLNM$L_SIZE 00002000
FFFFFFFF.853E2010 CWLNM$L_FLAGS 00000004
FFFFFFFF.853E2014 CWLNM$L_DATASTART 00000018
All other nodes (three I64 machines) were up without a reboot.
Eberhard
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-08-2012 06:23 AM
10-08-2012 06:23 AM
Re: My first system crash since years OpenVMS Alpha V8.4; clustered; all actual patches installed
..and maybe CLUE SYSTEM/LOGICALS on the crash dump also.
I wouldn't want to post mine without heavy redaction but you might get an idea if something stands out. You might get something comparing against the running system, without having to wait around for a reoccurence.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-08-2012 06:50 AM
10-08-2012 06:50 AM
Re: My first system crash since years OpenVMS Alpha V8.4; clustered; all actual patches installed
CLUE SYSTEM/LOGICALS:
many, many strange entries like this:
84C362F0 84C36354 "POP3::DE7C9852:C3B800000000" = "......"
I'm using Multinet-TCPIP. If a hacker tries to use pop to breakin this might
be the reason for the crash.
Eberhard