Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

%QMAN-W-LOWMEMORY

 
SOLVED
Go to solution
Shriniketan Bhagwat
Trusted Contributor

Re: %QMAN-W-LOWMEMORY

Hi,

Ok, The fix is not for V7.2-2. Did you check the queue manager's page file quota?

SDA> SET PROCESS/INDEX=
SDA> READ SYSDEF
SDA> FORMAT JIB
...
FFFFFFFF.81DC0C80 JIB$L_PGFLQUOTA 00000A00
FFFFFFFF.81DC0C84 JIB$L_PGFLCNT 00000000
...

Regards,
Ketan
Shriniketan Bhagwat
Trusted Contributor

Re: %QMAN-W-LOWMEMORY

Hi,

>> I'm waiting for a quiet time to restart the queue manager with a larger value for page file quota.

This can be the workaround for the problem.

Regards,
Ketan


Volker Halle
Honored Contributor

Re: %QMAN-W-LOWMEMORY

Art,

you might want to have a look at the problem analysis section of the QMAN-W-LOWMEMORY problem/solution in the patches referenced before:


5.2.5.3 Problem Analysis:

With a large PAGEFILE.SYS, the check for available memory in the Queue Manager may overflow a longword. The results of this overflow are the unnecessary LOWMEMORY warnings and the possible system crash.


As this patch does NOT seem to be available for V7.2-2, your only real 'workaround' seems to be a restart of the queue manager.

Volker.
Volker Halle
Honored Contributor

Re: %QMAN-W-LOWMEMORY

Art,

I've seen this problem in 2004 and have the original IPMT text (from the problem escalation). It's some simple math problem (overflow) in the code.

Note that you can easily force a QUEUE_MANAGER process dump and have it restart automatically (queue commands will hang for as along as it takes to write the process dump, i.e. less than a minute):

$ MCR JBC$COMMAND
JBC$COMMAND> diag 4
%JBC-I-DIAGNOSTIC,
Log for playback = 0
Save old Journal files = 0
Log all requests = 0
Dump on error = 0
Checkpoint: State = 0, In-memory blocks = 100
PersAlpha CHAALP-E8.4 $
%%%%%%%%%%% OPCOM 4-MAY-2010 08:39:33.24 %%%%%%%%%%%
Message from user SYSTEM on CHAALP
%QMAN-F-DIAGNOSTIC, A request was made to dump the queue manager.

Note that increasing the amount of pagefile space is contraproductive in this case !

Look at PAGFILCNT of the QUEUE_MANAGER process with F$GETJPI("pid-of-queue_manager","PAGFILCNT"). If it's getting near 214 million, you may see that problem.

Volker.

Volker.
P Muralidhar Kini
Honored Contributor

Re: %QMAN-W-LOWMEMORY

Hi Art,

>> It's some simple math problem (overflow) in the code.
Volker's right.
Systems with large physical memory and large pagefile.sys would
face this problem. The problem was with the check for available memory
in the queue manager which could overflow the longword boundary.
The "QMAN-W-LOWMEMORY" messages logged were as a result of this overflow.

>> Note that increasing the amount of pagefile space is contraproductive
>> in this case !
Yes. Even after increasing the pagefile space, you could see the same old
error messages again and may be even more number of times.

Intalling the QMAN patch would be the way to go forward. But for this you
have to upgrade the current version of OpenVMS on the system to V73-2 or
onwards.

Regards,
Murali
Let There Be Rock - AC/DC
Art Wiens
Respected Contributor

Re: %QMAN-W-LOWMEMORY

"Murali: What is the JOB_LIMIT of the queue. JOB_LIMIT would indicate number of jobs
in the queue that execute in parallel.
Also, how many jobs in the queue do actually execute in parallel?"

Well, as in most VMS systems/clusters, we have more than one queue! There are about 370 ... ~110 batch queues and the rest print queues. Impossible to say how many jobs execute in parallel at any given time.

"Ketan: Looks like fix is available for this problem."

Great, except these systems are v7.2-2 and aren't going to be upgraded.

"Murali: You have to upgrage VMS to V73-2 or onwards ..."

Not going to happen.

"Ketan: pagefile quota"

FFFFFFFF.812EF840 JIB$L_PGFLQUOTA 00009EB0
FFFFFFFF.812EF844 JIB$L_PGFLCNT 00000950

Which "matches" what I see at DCL:

$ write sys$output f$getjpi(20307833,"PGFLQUOTA") 649984 (%x9EB00)

$ write sys$output f$getjpi(20307833,"PAGFILCNT") 38144 (%x9500)

"Volker: With a large PAGEFILE.SYS ..."

The original single pagefile is 1,300,000 blocks and I added another 1,300,000 block one. That's not "large" is it?

"Volker: ...you can easily force a QUEUE_MANAGER process dump and have it restart automatically (queue commands will hang for as along as it takes to write the process dump."

What happens to currently executing / printing jobs? Evaporate, or also just hang until the mgr comes back? The points for that suggestion depend on the answer. ;-)

Cheers,
Art
Art Wiens
Respected Contributor

Re: %QMAN-W-LOWMEMORY

"Volker: Note that you can easily force a QUEUE_MANAGER process dump and have it restart automatically (queue commands will hang for as along as it takes to write the process dump, i.e. less than a minute):"

Well the time was right, I did the DIAG 4. What followed was one of the longer 3 or 4 minutes of my life ... the queue manager restarted and went solid computable and the cluster "hung" ... not quite, as I was seeing OPCOM messages about users trying to login, timing out. But all of my commands entered stalled. It did finish up whatever it was doing and things are back to "normal".

One exception, I can't use the f$getjpi lexical to get the pagfilcnt and pgflquota:

$ pipe show sys | search sys$pipe queue
2030CD04 QUEUE_MANAGER HIB 9 1893 0 00:08:31.90 4434 3267
$ write sys$output f$getjpi(2030CD04,"PAGFILCNT")
%DCL-W-IVCHAR, invalid numeric value - check for invalid digits
\2030CD04\
$ write sys$output f$getjpi(2030CD04,"PGFLQUOTA")
%DCL-W-IVCHAR, invalid numeric value - check for invalid digits
\2030CD04\

I can't do this for any process. WTF?

Art
Highlighted
Volker Halle
Honored Contributor
Solution

Re: %QMAN-W-LOWMEMORY

Art,

please include the double-quotes around the process-id:

AXPVMS $ write sys$output f$getjpi(26600E13,"PAGFILCNT")
%DCL-W-IVCHAR, invalid numeric value - check for invalid digits
\26600E13\
AXPVMS $ write sys$output f$getjpi("26600E13","PAGFILCNT")
495136

Volker.
Art Wiens
Respected Contributor

Re: %QMAN-W-LOWMEMORY

Never mind that last WTF ... too early in the day:

$ pipe show sys | search sys$pipe queue
2030CD04 QUEUE_MANAGER HIB 9 2805 0 00:08:32.74 4434 3267
$ write sys$output f$getjpi("2030CD04","PGFLQUOTA")
649984
$ write sys$output f$getjpi("2030CD04","PAGFILCNT")
592080

All's well again. ;-)

Cheers,
Art
Robert Gezelter
Honored Contributor

Re: %QMAN-W-LOWMEMORY

Art,

With all due respect, you need to put the hexadecimal Process ID in quotes, to wit:


$ WRITE SYS$OUTPUT F4GETJPI("05CE","BIOCNT")

Otherwise, the DCL parsing does not identify the first parameter as a literal constant, it identifies it as the name of a DCL symbol (hint: a process ID of aced would otherwise be ambiguous).

- Bob Gezelter, http://www.rlgsc.com