Operating System - OpenVMS
1827872 Members
1581 Online
109969 Solutions
New Discussion

Did some testing of pagefile usage under 7.3

 
Jan van den Ende
Honored Contributor

Re: Did some testing of pagefile usage under 7.3

Wim,

@ 1),

I recall seeing them on the hardcopy console once. Must have been V4.x.
I am not really looking forward to a repeat experience though..

@ 2),

my guess would be that the system is still so exceedingly slow, because you already were at the point where the ModifiedPageWriter is still desparately seeking locations to move pages into the pagefile, while a number of processes are trying to fault in pages that are already reserved (by malloc()) in the working set, so such process is entirely entitled to fault in such page from DZRO, but is Resource Waiting for DZRO pages, which can only become available from the (empty) free page list. (How is that for long sentences eh?)
Killing such process moves its in-memory pages to the free list, on to the DZRO list, to be used immediately by (some of) the waiting processes. Thus MPW has to get going again, and now has --some-- pagefile space, most probably very fragmented.
The waiting-for-pages processes use them up as fast as they are freed, and as long as there is still any process trying to get its not-yet mapped alloc-ed pages mapped in, you are still not able to do much.

Only after you kill enough to deplete that things-to-do list will there be room to breath again.

... just my way of trying to explain your measurements though!

Proost.

Have one on me. (deze week geen Duvel, maar Samuel Adams. Ook "best wel" te verdragen!)

jpe
Don't rust yours pelled jacker to fine doll missed aches.
John Gillings
Honored Contributor

Re: Did some testing of pagefile usage under 7.3

Wim,

>1) why didn't I get the opa messages in >6.2 (may be also in 7.3) ?

We can't guarantee that the messages will be written. That's true of many resource depletion conditions. Reporting the condition costs resources. If the resources aren't there, you may not be able to report it. However, EXE$GL_FLAGS will definitely have had the PAGEFRAG and PAGECRIT bits set.

>2) if there is free pagefile space before
>starting a process (and the system is
>functioning normally), why isn't normal
>functioning restored if I kill that
>process ?

Page file usage is not synchronous, nor is it related to a particular process. Dirty pages are put on the Modified Page List, which will be flushed to the page file when memory usage thresholds are triggered.

While one process may kick the Modified Page Writer into action, the pages it's trying to write don't necessarily belong to that process (indeed, it's more likely they belong to other processes). So, killing the initiating process may well recover some memory, but not necessarily any page file space. It would require the other processes to expand into the freed memory AND fault in the pages written to the page file to alleviate the RWMPB condition. OR you could kill one of the processes that's using lots of page file.

Expressed another way, page file usage is NOT Last In First Out. It's Random In, Demand Out. (well, not really "random", but close enough).
A crucible of informative mistakes
John Gillings
Honored Contributor

Re: Did some testing of pagefile usage under 7.3

Oh, almost forgot...

If you want to prevent your pagefiles from ever overrunning, make sure all your processes have reasonable PGFLQUOTA settings. You can do this either as a "demand side economics" system manager or a "supply side economics" system manager.

Demand side says, grant your processes the appropriate PGFLQUOTAs for their requirements, add up all the processes values, add GBLPAGFIL, then add (say) 20% headroom. Allocate that much page file space. By definition, you cannot encroach the 20% threshold. Any process that attempts to go beyond its limit will get an EXQUOTA_PGFLQUOTA error.

Supply side says something like "I can afford X GB of disk space for my page files. I have Y processes on the system and Z GBLPAGFIL". Therefore the PGFLQUOTAs for my processes should be:

(X-GBLPAGFIL-HEADROOM)/Y

you will always have HEADROOM free.

So, again, your PROCESSES fail before your SYSTEM fails.

(familiar tune...) I stress... look at the REAL costs of downtime from system failure. Look at the REAL costs of disk space. This is a no-brainer. "headroom" in the above scenaios can be very very large for very very small real cost. This is the cheapest business continuity insurance you will ever buy.
A crucible of informative mistakes
Wim Van den Wyngaert
Honored Contributor

Re: Did some testing of pagefile usage under 7.3

Ian,

EXE$GL_FLAGS is on 2316874 which is 100011 0 0010110100001110100 binary. The bit pagefile full was NOT set (if a separated the right bit). But I'm sure the pagefile was full. Did anyone ever see the message in 6.2++ ?

The process state MWAIT was indeed RWMPB.

I'm still disappointed that killing the bad process doesn't solve the problem.

Wim
Wim
Willem Grooters
Honored Contributor

Re: Did some testing of pagefile usage under 7.3

Just for information:

I still have to investigate, but I think I got something similar on a 7.3-2 system (no OPA0 since it's a workstation). I got a "stalled" system after TCPIP$POP ran out of quota on processsing a >25Mb message (including attachement). I increased it's pagefilequota and found all multiple addresses on my single NIC were lost; re-ran ipconfig script to add them and the system was "dead" as seen from DECWindows screen (console) or outside (TCPIP). So I stopped running using the reset switch (equivalent to ^P, I think) and crashed the system from console. In the dump I found quite a number of processes in RWMPB, and a pagefile with 0 free blocks.

And no message in operator.log....
Willem Grooters
OpenVMS Developer & System Manager
Wim Van den Wyngaert
Honored Contributor

Re: Did some testing of pagefile usage under 7.3

Willem,

They probably used "unused memory" to give the opcom messages. And went into hang just as our programs ...

Wim
Wim
Volker Halle
Honored Contributor

Re: Did some testing of pagefile usage under 7.3

Wim,

EXE$GL_FLAGS = 02316874 (hex) has the EXE$V_PGFLCRIT bit set, meaning that a Page file FULL message has been issued. OPA0: is broadcast-enabled, right (check with $ SHOW BROADCAST) ?!

$ sea sys$library:lib.req PGFLCRIT
macro EXE$V_PGFLCRIT = 0,21,1,0 %; ! SET IF PAGE FILE FULL MSG ISSUED

SDA> eva (1@^d21)&2316874
Hex = 00000000.00200000 ...

Volker.