Operating System - OpenVMS
1839272 Members
2677 Online
110138 Solutions
New Discussion

OpenVMS/VAX V6.2 DCL CLI bug terminates process with %SYSTEM-F-BUGCHECK

 
SOLVED
Go to solution
Mark_Corcoran
Frequent Advisor

OpenVMS/VAX V6.2 DCL CLI bug terminates process with %SYSTEM-F-BUGCHECK

On 25-JAN-2024, I made some changes to a command procedure used to automate testing of functional changes I'm making to our primary application running on OpenVMS, and was surprised when my Telnet connection terminated.

Looking at the session log (we log in using a captive account that then does a SET HOST 0 /LOG), it transpired that the last thing output before the session terminated was:

%DCL-E-INVCALL, invalid CALL nesting structure or data inconsistency detected

The accounting record shows a final exit status of %X000002A4 ("%SYSTEM-F-BUGCHECK, internal consistency failure")., and ANALYZE /ERROR records the following when I ran the procedure again yesterday (process name and node name changed/redacted) - apologies for the formatting... multiple spaces get compressed to a single one when posting here:

ERROR SEQUENCE 12049. LOGGED ON: SID 13001003
DATE/TIME 26-JAN-2024 06:39:44.69 SYS_TYPE 03260B01
SYSTEM UPTIME: 42 DAYS 19:26:23
SCS NODE: MYNODE VAX/VMS V6.2

USER BUGCHECK KA53 CPU Microcode Rev # 3. CONSOLE FW REV# 2.6
Standard Microcode Patch Patch Rev # 8.

DOUBLDEALO, Double deallocation of memory block

PROCESS NAME MYPROCESS

PROCESS ID 000300F2

ERROR PC 86F12ACE
ERROR PSL 02800004
Z-BIT
INTERRUPT PRIORITY LEVEL = 00.
PREVIOUS MODE = SUPERVISOR
CURRENT MODE = SUPERVISOR
FIRST PART DONE CLEAR

STACK POINTERS

KSP 7FFE77B4 ESP 7FFE9800 SSP 7FFEC9F0 USP 7FECE7C4 ISP 88AE9600

GENERAL REGISTERS

R0 7FED80D0 R1 00000000 R2 7FED3C70 R3 7FED80D0 R4 00000078
R5 7FFECC5C R6 7FFE2CC0 R7 0000001C R8 10038942 R9 7FFECC50
R10 7FFED7D4 R11 7FFE2BDC AP 00000010 FP 7FFED7D4 SP 7FFE77F8

KA53 REGISTER SUBPACKET

TODR 00000000
BPCR 00000000
PAMODE 00000000
30 bit physical address mode
MMEPTE 00000000
MMESTS 00000000
PCSCR 00000000
Patchable control store disabled
no microcode patch
CPU microcode Patch Rev # = 0.
ICSR 00000000
virtual instruction cache disabled
ECR 00000000
FBOX DISABLED
SUBSET INTERVAL TIMER ENABLED
FBOX STAGE 4 BYPASS DISABLED
TBSTS 00000000
PCCTL 00000000
pcache disabled for D-stream reference
pcache disabled for I-stream reference
PCACHE PARITY ERROR DETECTION DISABLED
MUST BE ONE FIELD NON-ONE
PCSTS 00000000
CCTL 00000000
backup cache disabled
BCACHE TAG RAM SPEED:
read = 3 cycles, write = 3 cycles
BCACHE DATA RAM SPEED:
read = 2 cycles, write = 3 cycles
128 kilobyte backup cache
BCEDSTS 00000000
BCETSTS 00000000
MESR 00000000
MMCDSR 00000000
2600 cycles before disown write tmeout
disable logging soft errors
CQBIC on CP_I02
CESR 00000000
CMCDSR 00000000
IO2 ID NOT ENABLED
DMA PREFETCHING DISABLED
3200 Cycles Before NDAL Timeout
144 cycles before cp1 mt timeout
144 CYCLES BEFORE CP2 MT TIMEOUT
cp1 interrupts pending:
none
cp2 interrupts pending:
none
CEFSTS 00000000
NESTS 00000000
NEOCMD 00000000
NEICMD 00000000
DSER 00000000


On revisitng the code the following day to determine the cause of the %DCL-E-INVCALL, I found that it was a typo/omission on my part (but which the CLI clearly doesn't handle as well as one would like it to).

A question springs to mind:

  1. Is it actually the case that an attempt to double-deallocate memory has been detected but prevented, or has a double-deallocate actually occurred which might then lead to corruption of O/S linked lists/structures/control blocks?

Whilst I've since managed to create a small reproducer (and can provide here if necessary - particularly to see if the problem persists in more recent versions of OpenVMS), I'm disinclined to do so if a double-deallocate has actually occurred, which someone might then be able to use maliciously.

Mark

[Formerly appearing as woeisme]
6 REPLIES 6
Volker Halle
Honored Contributor

Re: OpenVMS/VAX V6.2 DCL CLI bug terminates process with %SYSTEM-F-BUGCHECK

Mark,

the code (DCL) was about to double-deallocate a piece of memory. The routine to deallocate memory detected this and declared a DOUBLDEALO bughcheck. Because the code was running in supervisor mode, it just deleted the current process and did not harm the system.

If you're interested to see, if this problem still exists in more recent versions of OpenVMS, you could easily download the VSI Student kit, which contains a ready-to-run Alpha emulator (FreeAXP) and an OpenVMS Alpha V8.4-2L2 system:

Student License (vmssoftware.com)

You could also send me the reproducer (hint: I still work at Invenate) and I might try it even on VSI OpenVMS x86-64 V9.2-1

Volker.

Mark_Corcoran
Frequent Advisor

Re: OpenVMS/VAX V6.2 DCL CLI bug terminates process with %SYSTEM-F-BUGCHECK

Hi Volker, thanks for your reply.

>Because the code was running in supervisor mode, it just deleted the current process and did not harm the system."
From my reading of the OpenVMS System Messages and Recovery Procedures Manual A-L for OpenVMS/VAX v6.0 (AA-PVXKA-TE, MAY-1993):

DOUBLDEALO, double deallocation of memory block
Facility: BUGCHECK, System Bugcheck
Explanation: The OpenVMS software detected an irrecoverable, inconsistent
condition. After all physical memory is written to a system dump file, the
system automatically reboots if the BUGREBOOT system parameter is set
to 1.

...it's not really clear whether I should expect a DOUBLDEALO BUGCHECK to trigger a system reboot if the SYSGEN BUGREBOOT parameter is set to 1 (testing it on an evaluation CharonVAX install running on my laptop with BUGREBOOT set to 1 does not trigger it, so I presume the manual's text is correct if the mode at which the BUGCHECK occurs is kernel mode (if not executive as well)).

I'm not sure if there are any circumstances under which an ordinary user might be able to trigger such a reboot, so I don't propose posting the reproducer here for now.


>You could also send me the reproducer (hint: I still work at Invenate) and I might try it even on VSI OpenVMS x86-64 V9.2-1
I have tried to P.M. you within the confines of HPE Community, but I get "You have reached the limit for number of private messages that you can send for now. Please try again later. ", where clearly that limit is zero (!), and trying again "later" without an HPE forum admin manually intervening would be pointless.

Based on historic posts elsewhere, I've divined email addresses for you at Invenate and also encompasserve.org, so will send an email shortly with the reproducer.


Mark

[Formerly appearing as woeisme]
Volker Halle
Honored Contributor

Re: OpenVMS/VAX V6.2 DCL CLI bug terminates process with %SYSTEM-F-BUGCHECK

Mark,

setting BUGCHECKFATAL won't create system bugchecks from supervisor or user mode, just from executive and kernel mode.

Thanks for sending along the reproducer.  I could easily reproduce the problem on OpenVMS VAX V6.2, and also on OpenVMS VAX V7.3 !

Also on OpenVMS Alpha V8.2 and even on VSI OpenVMS Alpha V8.2-L2

Volker.

PS: regarding the limitation of P.M. messages, maybe you can delete some messages from your 'Sent' folder ?

Volker Halle
Honored Contributor

Re: OpenVMS/VAX V6.2 DCL CLI bug terminates process with %SYSTEM-F-BUGCHECK

This problem has been officially reported to VSI as SPS-1428 and SPS-1429.

Volker Halle
Honored Contributor
Solution

Re: OpenVMS/VAX V6.2 DCL CLI bug terminates process with %SYSTEM-F-BUGCHECK

Mark,

on behalf of VSI OpenVMS engineering, I would like to thank you for reporting this 40-year old bug.

The problem has been isolated and fixed and DCL patch kits will be issued for the VSI OpenVMS versions by VSI.

For the HP OpenVMS versions and - of course - VAX/VMS V6.2,  there will be no patches anymore.

Volker.

Mark_Corcoran
Frequent Advisor

Re: OpenVMS/VAX V6.2 DCL CLI bug terminates process with %SYSTEM-F-BUGCHECK

>this 40-year old bug.
Ah, younger than me, but (just) older than some of the bugs I've found and fixed here.


>on behalf of VSI OpenVMS engineering, I would like to thank you
They are very welcome

If it has gone undiscovered for 40 years, it's clearly obscure enought to the extent that few others have encountered it (or nobody else makes typos in DCL code).

Sadly, finding obscure bugs has been a regular occurrence for me over the last 10 years - mostly in the application code here (it was developed for multiple sites, but differences in site-specific end devices & application configuration meant that few (if any) of the other sites encountered them, or if they did, they typically weren't reproducible, and processes would just be restarted, just to bring service back more quickly).


>For the HP OpenVMS versions and - of course - VAX/VMS V6.2, there will be no patches anymore.
Plus ça change...


Mark

[Formerly appearing as woeisme]