Operating System - OpenVMS
cancel
Showing results for 
Search instead for 
Did you mean: 

SWAPPER issue

SOLVED
Go to solution
John A. Beard
Regular Advisor

SWAPPER issue

We have just had it reported that users on one of our older VMS servers running 7.1 were experiencing a major downturn in performance. The is not happening at the present moment, but the administrator for that box sent me the output from MON PROC/TOPC at the time when the isue occured.

It showed SWAPPER at 70% followed by a hanful of user related processes at much lower percentages.

Can someone please advise as to where we can start looking to see what my have caused this problem, and what steps we should take to try and eliminate it... thanks

P.S. Accounting is not enabled on this particular server, so I don't know what we can look at.

Glacann fear críonna comhairle.
11 REPLIES
labadie_1
Honored Contributor

Re: SWAPPER issue

You should first put a tool to collect data, about the Cpu, memory, disk I/O usage, locks and so.

You can always use monitor (see sys$examples:submon.com), and you have ECP, T4, TDC, or even better tools as HP Perfdat.

Check the basics
- pool expansion failures
- pagefile utilization (not more than 50% is best)
- process states (RWxxx/MUTEX comes to mind)
- Cpu saturated (how many process in COM state, and what is your model of Alpha/VAX ?)


Joseph Huber_1
Honored Contributor
Solution

Re: SWAPPER issue

This indicates the system has not enough memory or the page-/swap-file is too small for the load at the time in question.

From Neil Rick's page:
============================================
OpenVMS Swapper

* The OpenVMS swapper serves two functions, trimming and swapping.
* When the free list drops below FREELIM, the swapper will starting trimming processes that have borrowed pages back; possibly all the way to WSQUO. Idle processes may be trimmed back to WSDEF. Processes idle for longer than DORMANTWAIT might be swapped out.
* When the number of active processes exceeds BALSETCNT, the swapped will need to move processes in-and-out of memory in order to meet the needs of the round-robin-scheduler. When virtual memory systems swap in this fashion it's usually not pretty because they get back into the system with their working sets trimmed to WSDEF which is sometimes too small. This is one reason why some system managers will avoid trimming to WSDEF and just go straight to swapping.
===========================================
( http://www3.sympatico.ca/n.rieck/docs/openvms_notes_system_tuning.html )

As a first reaction I would run AUTOGEN to see what it would change ( system parameters, page-/swap-file size).

http://www.mpp.mpg.de/~huber
Robert Gezelter
Honored Contributor

Re: SWAPPER issue

John,

This can also be a consequence of a memory leak in one or more user processes that results in a large amount of paging.

In short, a good review of the system may be in order. The problem could be a surge in usage, or it could be a heavier than normal demand for virtual memory.

- Bob Gezelter, http://www.rlgsc.com
John A. Beard
Regular Advisor

Re: SWAPPER issue

Thanks for everyone's input.

I ran autogen and it recommended increases in both the pagefile and swapfile. Due to lack of contigous free space (no defrafgger and we are not in a position at present to do an image backup/restore), the primary page file could not be increased to the limit specified in the report. The swap file was ok with its recommended increase.

There is only one logical disk on this node, so I did not see the point in creating secondary page and swap files on the same volume.

As for the server, it's oooollllld.

AlphaServer 1000 4/233
Multiprocessing is DISABLED. Uniprocessing synchronization image loaded.
Minimum multiprocessing revision levels: CPU = 1

PRIMARY CPU = 00
Active CPUs: 00
Configured CPUs: 00

Existing page and swap files (before reboot)

System Memory Resources on 15-OCT-2008

Paging File Usage (blocks): Free Reservable Total
DISK$DISK00:[SYS0.SYSEXE]SWAPFILE.SYS
22016 22016 22016

DISK$DISK00:[SYS0.SYSEXE]PAGEFILE.SYS 156032 118016 200576

DISK$DISK00:[000000]
PAGEFILE3.SYS;1
170560 118128 200064
DISK$DISK00:[SYS0.SYSEXE]
PAGEFILE1.SYS;1 163280 119616 200576


CS/FCAXP1> sh mem/pool/fu
System Memory Resources on 15-OCT-2008 18:26:42.02

Nonpaged Dynamic Memory (Lists + Variable)
Current Size (bytes) 4153344 Current Size (pagelets) 8112
Initial Size 4153344 Initial Size (pagelets) 8112
Maximum Size 17465344 Maximum Size (pagelets) 34112
Free Space (bytes) 2566784 Space in Use (bytes) 1586560
Largest Variable Block 1860352 Smallest Variable Block 64
Number of Free Blocks 1104 Free Blocks LEQU 64 Bytes 116
Free Blocks on Lookasides 405 Lookaside Space (bytes) 130240

(No Bus Addressable Memory allocated)

Paged Dynamic Memory
Current Size (PAGEDYN) 1843200 Current Size (pagelets) 3600
Free Space (bytes) 920272 Space in Use (bytes) 922928
Largest Variable Block 825696 Smallest Variable Block 16
Number of Free Blocks 151 Free Blocks LEQU 64 Bytes 91


Physical Memory Usage (pages): Total Free In Use Modified
Main Memory (192.00Mb) 24576 10697 13340 539

Glacann fear críonna comhairle.
labadie_1
Honored Contributor

Re: SWAPPER issue

Can you do
$ mc agen$feedback
this will silently create a new file sys$system:agen$feedback.dat

and then post the result of the command
$ sear sys$system:agen$feedback.dat pagef,fail,mscp

Joseph Huber_1
Honored Contributor

Re: SWAPPER issue

The swap/page/memory display apparently shows the normal state, and there seems to be a lot of free space. (I would even remove the swapfile and use the pagefiles for swapping).

To catch the abnormal case, You would need to run monitor permanently to a file (or use higher level performance tools).

The abnormal case may have been also some increase in the number of processes (a runaway process creation?), so the swapper had to trim working sets.Did autogen tell something about the maximum number of processes or BALSETCNT ?
http://www.mpp.mpg.de/~huber
Robert Gezelter
Honored Contributor

Re: SWAPPER issue

John,

I second Joe's comment about MONITOR. I will enhance it by suggesting that you run MONITOR in such a fashion as to generate a file which can be fed into T4.

If my recollection is correct the T4 kit is set up for a later release, however, the MONITOR background operation is straightforward to extract and run in batch. The results can then be analyzed using the T4 tools. (Been there, done that)

- Bob Gezelter, http://www.rlgsc.com
Willem Grooters
Honored Contributor

Re: SWAPPER issue

The data you showed seems reasonable and no reason for high CPU-rates for SWAPPER, unless you have your process working set limits quite low; or your SYSGEN parameters related to memory management are not set correctly. That alone could trigger SWAPPER to go into the wild.

A question you always have to ask in matters like these:
Did it happen before? If not, what has changed in the system? Do not limit this to SYSGEN parameters, but also consider newly installed software: How does it behave?

If the problem can be reproduced, monitor around that moment. T4 is a tremendous (free!) tool for exactly this kind of examination:

* Start a job running Monitor is recording mode, with a not-too-large interval (30 seconds?)
* Reproduce the problem
* Stop recording
* process the monitor output using the T4 toolset
* Using TlViz, you can now examine all kinds of data, time-based, in graphical format, and relate them: What causes the high SWAP rate?

It may pinpoint a culprit, and if it does, examine it's activity and behaviour.
Willem Grooters
OpenVMS Developer & System Manager
Willem Grooters
Honored Contributor

Re: SWAPPER issue

Additional to Robert's entry:
T4 is noting more than a few exectablesa dn command procedures. The most important two in your case would be the image that ttransforms regular binary MONITOR output into a comma-separated file, that can be copied to a PC and fed into whatever program you like; the second program is the Windows-based viewer name tlViz.

The translation image could well run on VMS 7.1
Willem Grooters
OpenVMS Developer & System Manager
John Gillings
Honored Contributor

Re: SWAPPER issue

John,

This symptom can result from an "Idle Process Killer" (IPK) type process under some conditions. If you fill your balance set slots, any excess processes must (by definition) be swapped out. If the way your IPK samples processes involves an inswap it will force another process to be swapped out. The IPK will therefore "chase" the outswapped processes through the process list. On each IPK system scan, every process will be swapped in and probably back out. If the time for swapper to do all those swaps exceeds your IPK scan interval, you'll see SWAPPER constantly dominating the CPU.

Three ways out
1) increase balance set
2) reduce the number of processes
3) (my preference) get rid of the IPK or get one that's a bit more intelligent.

(by "more intelligent" I mean one that won't inswap a process to sample, BUT realise that means the IPK won't touch an outswapped process, but guess what? idle processes are more likely to be outswapped, so it kind of defeats the purpose!)
A crucible of informative mistakes