Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

OpenVMS/VAX v6.2 (VAX 4000-106A under CharonVAX) Delete-Pending Global Section with Refcnt=0?

 
Highlighted
Frequent Advisor

OpenVMS/VAX v6.2 (VAX 4000-106A under CharonVAX) Delete-Pending Global Section with Refcnt=0?

On our development node pair last week, I was doing a refresh on "NODE1" of certain groups of files from the production system that are changed at regular intervals (config files, command procedures, &etc.), which necessitated shutting down our two applications.

The files are extracted from a backup saveset within a .ZIP file using a command procedure (for ease of use & consistency), and due to the large number of files extracted, I needed to scroll back quite far in my terminal emulator, just to make sure that no errors were encountered (I've yet to find time to add checking).

I scrolled back a little further than intended, and noticed that the command procedure to stop one of the applications had encountered an error (there's several pages of output when you shut down the applications, so it's easy to miss, particularly on test systems where you're perhaps a bit more relaxed about errors).

The problem it encountered was trying to delete a shareable image from SYS$LIBRARY, because of "-RMS-E-FLK, file currently locked by another user".

The shutdown procedure sends signals to the application's detached processes requesting that they shut themselves down, and if after a certain period of time they are still running, then they are STOP /IDed.

The shutdown procedure then does an MC INSTALL /DELETE of the shareable image, attempts to delete the image file from SYS$LIBRARY, then copy a "fresh" version of it into SYS$LIBRARY and re-install it with MC INSTALL ADD /PROTECTED /SHARED /WRITABLE /HEADER_RESIDENT

[The shareable image is a collection of library routines written in Macro-32 and an HLL/3GL whose compiler code is mostly written in the same language, but also provides shared memory between the application processes, so when it is INSTALL /DELETEd, the on-disk file in SYS$LIBRARY contains in the shared memory whatever the application had written to it during its uptime, hence why a "fresh" copy needs to be reinstalled]

After noticing the RMS-E-FLK error, a DIRECTORY of the executable image in SYS$LIBRARY revealed two versions - the one that was originally created when the application was last started 5 months previously, plus the "fresh" copy (the shutdown procedure doesn't check for errors in the INSTALL /DELETE, DELETE, COPY or INSTALL, so it would have gone ahead and COPYed the fresh copy to SYS$LIBRARY).

An INSTALL LIST revealed the freshly copied version as being installed (unsurprising - the INSTALL /DELETE would have deleted the KFE from the known file list for the original file, the on-disk file delete failed with RMS-E-FLK, then the COPY and INSTALL were performed).

A manual attempt to delete the older version (;1) of the shareable image encountered the same RMS-E-FLK error, and a SHOW DEVICE SYS$LIBRARY /FILES showed both the old (;1) and new (;2) shareable images (PID=00000000) - no surprises for any of this.

An INSTALL REMOVE of the shareable image (intended to remove the new/fresh copy) returned to DCL without reporting any errors.

Attempting to delete the new/fresh copy of the file (;2) also reported the RMS-E-FLK error (despite the fact image had only been INSTALL ADDed and INSTALL REMOVEd, with no application startup in the interim).

An INSTALL LIST /GLOBAL then revealed the following (well, a lot of other stuff, but this is the only pertinent bit):

Delete Pending Global Sections
NODNAM$DKA100:<SYS0.SYSLIB>.EXE
INS$8754DD90_001(01000001) WRT TMP SYS Pagcnt/Refcnt=1377/0
INS$8754DD90_001(01000001) WRT TMP SYS Pagcnt/Refcnt=1377/0

On "NODE2", shutting down the application (that uses the shareable image) also reported the RMS-E-FLK error, and an INSTALL LIST /GLOBAL revealed two Global Sections with the same name - one that was active (as expected - the shutdown procedure reinstalls the shareable image), and one that was delete-pending (also with a Refcnt of 0).

I don't think it's relevant, but for what it's worth:

  • INSTALLing the shareable image creates 6 Global Sections (INS$*_001 through INS$*_006).
  • An ANALYZE /IMAGE of the file reveals 8 image sections, of which the page counts of 6 (2x ISD$K_SHRFXD, 3x ISD$K_PRVFXD and 1x ISD$K_SHRPIC) match the Pagcnt reported by INSTALL LIST /GLOBAL
  • It was the INS$*_001 Global Section (one of the ISD$K_SHRFXD ones) that was delete-pending - the other sections were deleted.

    If you attempt to $CRMPSC a section that already exists (and which is active, not delete-pending), then $CRMPSC effectively does a $MGBLSC.

    If you attempt to $CRMPSC a section that only exists as delete-pending, then a new one is created.

    So, the fact that the active INS$*_001 Global Section has the same _NNN value as the delete-pending one doesn't seem odd to me, i.e. when installing a shareable image, INSTALL obviously doesn't look to see if there are delete-pending Global Sections for the same KFE address, then use an _NNN base value of one plus whatever the highest-numbered delete-pending Global Section is.

    What does seem odd however, is having delete-pending Global Sections with a Refcnt of zero.

    Whilst trying to look at old session log files (SET HOST /LOG) from when the application was last started (to see if there were some errors reported, or unusual commands issued), the system hung whilst attempting to EDIT /TPU a session log file of 5141 blocks (not sure if the EDIT /TPU was the trigger).

    It didn't occur to me at the time to force a crash dump (it's been so long since I've had to do this, I've forgotten anyway - I'll need to try out the commands for VAX 4000 series at the >>> prompt in CharonVAX) - I simply got the >>> prompt with ^P then issued a boot command (and a subsequent EDIT /TPU of the session log file when the system rebooted did not then result in the system hanging).

    [Nothing in ERRLOG.SYS or OPERATOR.LOG immediately prior to the hang]

    I've been unable to reproduce a state where the shareable image is INSTALL REMOVEd leading to the Global Sections being delete-pending (as expected), but where this one Global Section resolutely remains delete-pending but with a Refcnt of 0 (not expected).

    Terminating processes that map to the Global Sections, using $DELPRC (via STOP /ID) or $FORCEX have the expected result - the Refcnt drops by the Pagcnt number each time.

    I can't help but think the system getting into a hung state is related, though (more likely correlated than causal;  thinking about it, EDIT /TPU maps to TPU Global Sections, so if the list of Global Sections is trashed, then attempting to walk it could - I suppose - cause the hang).

    Has anyone ever encountered a delete-pending Global Section with a Refcnt of zero, and if so, did you ever determine what the cause (and more importantly, the fix) was?

    [6.2 is old, but one of the applications is written in a combination of Macro-32 with VAX-specific stuff, and an old HLL/3GL (the company which wrote the compiler for VMS appears to have gone out of business in 1985, so pre-dates AXP much less IA64).

    The application's original authors work for the original parent company (who spun off this company), so the only application-level code changes are (mostly) maintenance fixes by myself, and I'm not in the business of re-writing Macro-32 for AXP/IA64 much less rewriting compilers to generate instructions for more modern CPUs.

    i.e. it's not going to be upgraded to newer architecture and unlikely to be upgraded to the latest OpenVMS for VAX because of dependency issues;  it's due to be replaced "in a few years" using altogether difference architecture anyway]

    Mark
[Formerly appearing as woeisme]
6 REPLIES 6
Highlighted
Respected Contributor

Re: OpenVMS/VAX v6.2 (VAX 4000-106A under CharonVAX) Delete-Pending Global Section with Refcnt=0?

A crash dump is likely to be the most useful here, but its too late to get one.  I would believe that the system hanging is likely related as well.  I suspect a resource shortage here that was not handled correctly or not seen.  If this situation shows up again, perhaps looking at the global sections using SDA would be helpful rather than relying on Install.

If you have them, check the release notes in the various hardware releases (6.2-*h*) for any related entries.  While the hardware releases were for "new" hardware support, IRRC there were some fixes in there as well..

I will look up a few ideas that I have as well over the next few days and update this  if I find anything relevant.

Dan

Highlighted
Frequent Advisor

Re: OpenVMS/VAX v6.2 (VAX 4000-106A under CharonVAX) Delete-Pending Global Section with Refcnt=0?

>If you have them, check the release notes in the various hardware releases (6.2-*h*) for any related entries.

Somewhat belatedly (!), Dan, thanks for your response.

 

Unfortunately, I don't have any release notes.

I think even at my first employer (a DEC VAR), if we ever got release notes, I never saw them.

My second employer was a FTSE-100 company that was a comparatively big customer for Compaq/HP, but I was working in O/S, layered produce, application & service support;  it was the platforms team that would come up with the cluster designs that would typically deal with CPQ/HP, and we'd eventually be given CDs with firmware, O/S and layered product patches or new releases, to install (along with a step-by-step document that they'd created after going through the pain of doing this in the lab environemnt).

I do have a condist set that I acquired whilst at my second employer, but it's at my current workplace, not to hand - and I suspect it's not the kind of thing that would be on it.

If you ever did come up with some other ideas, I'd be happy to investigate further.

 

Mark

[Formerly appearing as woeisme]
Highlighted
Respected Contributor

Re: OpenVMS/VAX v6.2 (VAX 4000-106A under CharonVAX) Delete-Pending Global Section with Refcnt=0?

VMS bug, it should be fixed in V7.3.

Jur.

Highlighted
Frequent Advisor

Re: OpenVMS/VAX v6.2 (VAX 4000-106A under CharonVAX) Delete-Pending Global Section with Refcnt=0?

>VMS bug, it should be fixed in V7.3

Thanks for the reply Jur - is that based on knowledge from release notes?

The prospect of the systems being upgraded to v7.3 are as near to zero as makes no difference (Macro-32, non-DEC compiler by a company no longer in existence), so if there is any information on the conditions that cause any (reportedly fixed) bug to occur, that would be helpful - to at least try and reproduce it, but also to see if our problem is the same (and if so, then I perhaps can work out a way to avoid it happening).

 

Mark

[Formerly appearing as woeisme]
Highlighted
Respected Contributor

Re: OpenVMS/VAX v6.2 (VAX 4000-106A under CharonVAX) Delete-Pending Global Section with Refcnt=0?

There is nothing in the release notes for V7.0,V7.1,V72 or V7.3 that would indicate that this problem was known/fixed.  The inly reference is to a change in the "replace" option for Install. 

" In the past, REPLACE was equivalent to DELETE followed by ADD.   Consequently, there was a short time during which neither the new nor the old image was in the known file database.  When activating protected or privileged images, this could result in failed image activations.  Also, if the new image could not be installed, it was possible forneither the old nor the new image to be installed after the failure.   ..."

Highlighted
Frequent Advisor

Re: OpenVMS/VAX v6.2 (VAX 4000-106A under CharonVAX) Delete-Pending Global Section with Refcnt=0?

Dan, belated thanks for your reply.

[Formerly appearing as woeisme]