- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - OpenVMS
- >
- Re: LOCKMGRERR bugcheck (V7.3-2)
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Forums
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-07-2008 02:49 AM
04-07-2008 02:49 AM
LOCKMGRERR bugcheck (V7.3-2)
The GRQ and CVTQ on one of the RSBs were corrupted, with the two queues being linked together such that the CVTQ merges into the GRQ. The MS Powerpoint diagram attached to this entry shows the state of the two queues.
Is anyone aware of known synchronization problems in the V7.3-2 lock manager that might cause this type of corruption?
Thanks,
Jerry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-07-2008 08:54 AM
04-07-2008 08:54 AM
Re: LOCKMGRERR bugcheck (V7.3-2)
only HP can tell - if at all ;-)
Could you provide the CLUE file from the crash (CLUE$COLLECT:CLUE$node_ddmmyy_hhmm.LIS) ? Either post it as a .TXT attachment or send it to me via mail (look at my ITRC profile).
This look like some queu manipulation and/or synchronization problem. None of the patches for V7.3-2 seem to contain an obvious description of a similar problem.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-07-2008 09:59 AM
04-07-2008 09:59 AM
Re: LOCKMGRERR bugcheck (V7.3-2)
The CLUE output is attached.
Jerry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-07-2008 09:28 PM
04-07-2008 09:28 PM
Re: LOCKMGRERR bugcheck (V7.3-2)
thanks for providing the CLUE file. Did this system see other unusual crashes in the past ? Could you also provide the crash history (file pointed to by CLUE$HISTORY logical) of this system ?
The crash seems to have happened on the first deadlock search perfomed during the uptime (only 7 hours) of this system. The basic crash footprint is:
Bugcheck Type: LOCKMGRERR, Error detected by Lock Manager
Failing PC: FFFFFFFF.801E44DC LCK$SEARCHDLCK_C+0027C
The deadlock search code within the lock manager is not at fault, but seems to be a victim to a previous queue corruption of the RSB queue.
Consider to reference this ITRC topic in the case 3601524270 and ask the specialist working this call to send the CLUE file to the CCAT tool...
There is one tool inside HP called CCAT (previously: CANASTA), to which all specialists working crashdumps were (are ?)supposed to send all CLUE files. This tool would extract the most important parameters of a crash and compare it to a knowledge base of known crash problems and also to crash footprints of all other crashes ever reported in this tool. This would immediately and automatically point out other system crashes with the same or similar footprints. I have maintained this tool and the knowledge base within Digital/Compaq/HP for about 10 years.
If the HP specialist working this call does not know about CCAT (CANASTA), let me know and I'll ask around, whether this tool is still existing and being used within HP.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-07-2008 09:38 PM
04-07-2008 09:38 PM
Re: LOCKMGRERR bugcheck (V7.3-2)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-07-2008 11:03 PM
04-07-2008 11:03 PM
Re: LOCKMGRERR bugcheck (V7.3-2)
Jur.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-08-2008 02:27 AM
04-08-2008 02:27 AM
Re: LOCKMGRERR bugcheck (V7.3-2)
This is the only crash recorded for the system.
The applications are completely user mode and there are no foreign drivers. The last change was on 16 Jan 08 to install OpenVMS updates (UPDATE V13, CLIUTL V1, DCL V9, PTHREAD V6, TCPIP V5.4-156, and MOTIF V1.3-1). We have this same set of updates running on 21 other similarly configured DS20s running the same application and about 30 other systems and have not seen any similar problems. None of the applications have changed since that time.
As Volker noted, the system had been rebooted just under 7 hours before the crash. It was shut down to replace a power supply wiring harness.
The same system showed one deadlock scan in CLUE MEM/STAT when I first checked about two hours after it rebooted from this crash. This morning, 3.5 days later, the count is still one. One of the comparable systems which has been up for 33 days also shows one deadlock scan.
Non-paged pool was 44% at the crash. One thing I found interesting is that SHOW POOL/STAT shows a negative number for "Packets (approx)" for the 576 byte and 2176 byte lookaside lists; the actual count is positive. I see the same on several different running systems, so I don't believe this to be significant, or at least not a fatal condition.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-08-2008 02:44 AM
04-08-2008 02:44 AM
Re: LOCKMGRERR bugcheck (V7.3-2)
so you're saying that this is the first crash ever occured on this system or the first crash from which a dump and a CLUE file has been captured ?
I've checked some dumps and I also see some negative numbers in the Packets (approx) columns, so appparently no need to worry.
If this is a one-off problem, it is most likley impossible to even further diagnose this problem from just one dump. You may still want to ask for the call to be escalated to OpenVMS engineering. And ask the HP specialist to find out, if crashes with similar footprints have been reported in CCAT (CANASTA) recently.
Volker.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-08-2008 02:57 AM
04-08-2008 02:57 AM
Re: LOCKMGRERR bugcheck (V7.3-2)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-09-2008 03:55 AM
04-09-2008 03:55 AM
Re: LOCKMGRERR bugcheck (V7.3-2)
The patch details document for VMS732_SYS-V1400 shows that a new version of LOCKING.EXE is included in the kit, but none of the problem descriptions list LOCKING.EXE as an affected image, hence why neither Volker nor I found the fix.
I am told the problem involves a very short timing window during which the lock manager does not properly synchronize the queue manipulations. The only known occurrences have been on multiprocessor systems.
Thanks again to Volker an Jur for their assistance.