- Integrated Systems
- About Us
- Integrated Systems
- About Us
09-06-2021 07:51 AM
LATCP image activation failure (process quota exceeded) - how/when is BYTLM returned?
In another thread on the OpenVMS forum, I've previously mentioned issues with a Printronix P8000 printer connected to a DECserver (when the factory default settings on the P8000 for when to deassert the RTS signal - in conjunction with the serial cable being wired a certain way caused - caused lots of LAT traffic).
I initially encountered the problem in our test lab, but following issues with production printers, I found that one of those also had a cable whose wiring caused the same issue.
When the right combination of conditions occurs, the Framing Errors count on the DECserver port increases at a rate of ~470/second.
In LATCP, the Framing Errors count for an LTAnnnn: port also increases, but at a significantly smaller rate.
Checking the counts of all the DECserver ports on a (say) 5-minute interval isn't practical (a single .COM processing all the DECservers in sequential fashion takes longer than this), but checking the LAT port counts would suffice.
I have a command procedure that is run at the end of every week, and which collates server, IP, gateway and port settings from each DECserver, as well as getting server and port counts, then resets the counters.
Until a couple of weeks ago, I hadn't previously reset the LAT port counters as part of this procedure, because it was more useful to have total counts since the node rebooted (notwithstanding a LAT bug I've mentioned in another thread, where it doesn't increment bytes TXed correctly under a particular circumstance).
However, in order to make a useful comparison of the Framing Errors count between a LAT port and a DECserver port, they need to be reset at as close to the same time as possible.
I made changes to zero the LAT port counts, and when the procedure was run, it produced the following error:
%DCL-W-ACTIMAGE, error activating image LATCP
-CLI-E-IMGNAME, image file node$device:[SYS0.SYSCOMMON.][SYSEXE]LATCP.EXE;
-RMS-E-ACC, ACP file access failed
-SYSTEM-F-EXQUOTA, process quota exceeded
After some investigation and testing, I've managed to reproduce the problem.
When using the LATCP command ZERO COUNTERS for a port, you have to specify /PORT=LTAnnnn: - /PORT=ALL, /PORT=* and /PORT /ALL aren't permitted (and likely, for good reason).
Consequently, the procedure loops around issuing the ZERO COUNTERS command for each LTA port (of which we have 400, although only a fraction of these are used).
When the problem occurs*, issuing a SHOW PROCESS /QUOTA reveals that "Buffered I/O byte count quota:" is severely depleted (the account has a UAF BYTLM of 100000, a "resting" Buffered I/O byte count quota of ~99870, and a "depleted" quota of <1500).
*After a varying number of ZERO COUNTERS commands (167 when it occurred the first time I observed the error, and since then anything up to 306).
If I add diagnostic messages (a SHOW PROCESS /QUOTA, SHOW PROCESS /MEMORY or even WRITE SYS$OUTPUT F$GETJPI("", "BYTLM") after each ZERO COUNTERS command, the problem doesn't occur (the count still depletes, but the additional delays from diagnostics results in it being returned quicker than it can be run down to an unusable amount).
I'd never had occasion/need to look that closely at internals of memory management within OpenVMS, and sort of assumed that memory allocated by an image would be available on image rundown - but that's obviously not the case (at least with LATCP.EXE - other .EXEs, YMMV)...
The problem would appear to be related to the speed at which memory allocated by LATCP is returned to the process (and that may well depend on the allocation/deallocation routines it calls).
If you wait a sufficiently "long" enough number of milliseconds, the BYTLM returns to its "resting" state.
Interestingly, after the error occurs, re-running the procedure does not reproduce it (or at least, not for the amount of time I have waited).
Can anyone explain the likely memory allocation/deallocation mechanism by which BYTLM appears to be returned after image rundown of LATCP?
[The problem initially seemed to be reproducible every time a newly-logged in session was started and the procedure was executed, but a recent such test did not produce the error.
I wouldn't describe the system as being loaded - resting CPU usage is <15%, ~33% NPAGEDYN free, ~64% PAGEDYN free; there is bursty CPU usage, so the problem may also be dependent on system load]
I'm reluctant to increase the BYTLM for the account running the procedure, so may have to add a WAIT after N iterations of the LAT port processing loop, to allow memory to be returned.
For what it's worth (in case it's a LATCP specific issue, and corrected in a more recent version), OpenVMS/VAX v6.2 under CharonVAX emulation.