Re: Identifying threshold when paging or even swapping starts

Ralph Grothe · ‎02-23-2004

This I've been asked (i.e. subject).

Users see that the global memory utilisation hasn't yet reached 100%, and want to know from what percentage of memory usage paging or even swapping will begin.

I advise them to look at swapinfo rather than mem util (I configured sudo for them to execute "/usr/sbin/swapinfo -tam" because normaly it requires root privileges, and I didn't want to set the suid bit).

Bluntly, there already seems to be paging activity.
I've set up an alarm whenever GBL_MEM_SWAPOUT_RATE is > 0, because in the past even at rates as low as 0.2 the system almost came to a standstill.
Because there hasn't been such an alarm yet, and because "sar -w" and vmstat show no po's I assume now the used swap space is only claimed for page ins or outs.

# /usr/sbin/swapinfo -tma
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 20112 379 19733 2% 0 - 1 /dev/vg00/lvol2
dev 12000 0 12000 0% 0 - 2 /dev/vg00/lvol11
reserve - 18994 -18994
memory 12786 2765 10021 22%
total 44898 22138 22760 49% - 0 -

The system (N4000) has reached maximum RAM capacity of 16 GB and pseudo swap is enabled, although I had configured twice the size of secondary device swap.

Users report continous performance problems.

I suspect that the DBAs brought the system to its knees by allowing far too many processes that must have unlimtted resource posession.

I only found this shocking evidence.

# grep -E -v '^#|^[ ]*$' /app/oracle/product/9.2.0/dbs/init.ora|grep -e size -e process
db_block_size=8192
db_cache_size=5500000000
java_pool_size=1048576
large_pool_size=8388608
shared_pool_size=1100000000
processes=1300
sort_area_size=10000000

Whoa, they allowed 1300 processes!

Currently we've only 962 oracle procs running, that's why work ist still possible.

# ps -u oracle|wc -l
962

To make thinks worse, I'm not allowed to impose a resource limit on the user.

# su - oracle -c ulimit 2>/dev/null

unlimited
logout

They reserved some 6 GB shared memory
(I assume that's what they call Global Shares Area)

# ipcs -mob
IPC status from /dev/kmem as of Mon Feb 23 16:51:51 2004
T ID KEY MODE OWNER GROUP NATTCH SEGSZ
Shared Memory:
m 12800 0xe71d289c --rw-rw---- oracle dba 951 6763302912
m 1 0x4e0c0002 --rw-rw-rw- root root 1 31040
m 2 0x411c0042 --rw-rw-rw- root root 1 8192
m 4099 0x0c6629c9 --rw-r----- root sys 4 19058880
m 4 0x06347849 --rw-rw-rw- root root 1 77384
m 106501 0xffffffff --rw-r--rw- root sys 0 22908

When I look in glance at any oracle process I can see that virtualy every oracle process has these 6 GB as virtual memory claimed.
So summing up PROC_MEM_VIRT doesn't seem to give a realistic total.

I use this wee adviser syntax for summation of memory usage by oracle

# cat oraproc_mem.adv
res_mem = 0
virt_mem = 0
proc loop
if proc_user_name == "oracle" then
{
res_mem = res_mem + proc_mem_res
virt_mem = virt_mem + proc_mem_virt
}
print "total resident memory of oracle procs:", res_mem/1024
print "total virtual memory of oracle procs:", virt_mem/1024

Which yields

# glance -iterations 2 -j 20 -adviser_only -syntax oraproc_mem.adv 2>/dev/null
total resident memory of oracle procs: 10540
total virtual memory of oracle procs: 6147594
total resident memory of oracle procs: 10577
total virtual memory of oracle procs: 6167252

I'm not so sure what ps accounts as size, so this summation deviates slightly (from the manpage I read it's No. pages used

# UNIX95= ps -u oracle -o sz=|awk '{s+=$1};END{printf"%10.2f\n",s*4/1024}'
54380.75

This looks more like paged out regions being included.

Anyway, I would come to the conclusion that they either would have to buy additional phys. RAM (which is impossible since all banks are stuffed with 512 MB DIMMs), or drastically cut down on their processes.
Would you agree?

Madness, thy name is system administration

RAC_1 · ‎02-23-2004

That was a very long explaination.

Is vhand of any user here? When vhand process starts, the swapping starts. Now

There is no substitute to HARDWORK

Ralph Grothe · ‎02-23-2004

# ps -efl|grep vhand|grep -v grep
1003 S root 2 0 0 128 20 4d12a600 0 fb6768 Jan 11 ? 11
:54 vhand

Madness, thy name is system administration

Sridhar Bhaskarla · ‎02-23-2004

Hi Ralph,

Actually there are three parameters that will affect the paging activity of the system.

lotsfree: As long as the free pages are more than this value, system thinks that it has a lot of memory free. However, this is the upper bound at which vhand's steal hand will become active.
desfree: This is the lower bound at which paging begins.
minfree: If the free pages falls below this limit, deactivations will occur.

There is a threshold called gpgslim which is kept at 1/4th the distance between lotsfree and desfree. When the number of free pages falls below this gpgslim, the vhand will start stealing pages.

Look at the following paper for more information.

MAINTAINING PAGE AVAILABILITY chapter

http://docs.hp.com/hpux/onlinedocs/5965-4641/5965-4641.html

So, the memory utilization doesn't need to be 100% before the paging occurs. These *free values are kernel settings and you can find their defaults in the above site.

-Sri

You may be disappointed if you fail, but you are doomed if you don't try

Ralph Grothe · ‎02-23-2004

Hi Sri,

I haven't yet found time to read the whole whitepaper about memory management.
But I found the copies even in the filesystem

# ll /usr/share/doc/mem_mgt.*
-r--r--r-- 1 bin bin 591917 Nov 7 1997 /usr/share/doc/mem_mgt.ps
-r--r--r-- 1 bin bin 147996 Nov 7 1997 /usr/share/doc/mem_mgt.txt

The mentioned kernel tunables aren't defined, so I assume some default algorithm is being used.

# kmtune -q lotsfree -q minfree -q desfree
Parameter Value
===============================================================================
lotsfree 0
minfree 0
desfree 0

Madness, thy name is system administration

Sridhar Bhaskarla · ‎02-23-2004

Hi Ralph,

These are private tunables set automatically by kernel and can be changed.

You can use adb to find out the values.

echo "desfree/D" |adb -k /stand/vmunix /dev/mem

What you get are in pages.

-Sri

You may be disappointed if you fail, but you are doomed if you don't try

RAC_1 · ‎02-23-2004

Found the following thread. http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=48977

Gives good information on these three parameters. I was thinking that these kernel tunables were absolute from 11i.

Thanks Shri for info.

There is no substitute to HARDWORK

Bill Hassell · ‎02-23-2004

Actually, paging will occur without memory pressure from applications if memory mapped files are in use. A system that has 16Gb of RAM and uses 500 megs for applications may actually see some page outs due to memory mapped files. This is by design and due to the nature of memory mapped files, the space is not counted with programs, buffer cache, shared libraries, shared memory segments, etc.

Page-out is the only useful metric. Page-in refers to the startup of new processes as well as page-in from swap space so it is useless for performance metrics. If page-out stays at single digits (0-9) for long periods, you're fine. As it begins to creep into double digits for long periods, you're short of memory. I don't worry much about the way in which HP-UX handles memory pressure (lostfree, desfree, vhand, etc) since there are too many variables to track. One might spend a few weeks tuning (and rebooting a lot) to reduce paging but in the long run, more memory will usually solve the issue.

Bill Hassell, sysadmin

Ralph Grothe · ‎02-23-2004

Sri,

thanks for the hint how to read the paging "watermarks" from the kernel.
Here's the debugger's output for our kernel:

# for p in desfree lotsfree minfree;do echo "$p/D" |adb -k /stand/vmunix /dev/mem;done
desfree:
desfree: 15360
lotsfree:
lotsfree: 65536
minfree:
minfree: 7424

Bill,

many thanks for your competent advice.

Madness, thy name is system administration

john bilyeu · ‎05-05-2004

Likely you can get some easy relief by reducing the ORACLE starting memory footprint,
(db_cache_size=5500000000 looks quite large, ask a DBA to knock it back).

Ralph Grothe · ‎05-05-2004

Thanks JB,

I will pass your suggestion to the DBAs.

Madness, thy name is system administration

Eric Antunes · ‎05-06-2005

Hi Ralph,

I was searching for large_pool_size threads and found your thread. ;)

Tell the DBA(s) to read Metalink Note 30918.1 and Bug 566708 about large sort_area_size (this size refers to EACH SESSION that needs to sort)

Best Regards,

Eric Antunes

Each and every day is a good day to learn.

Eric Antunes · ‎05-06-2005

Exactely one year later!

Each and every day is a good day to learn.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Identifying threshold when paging or even swapping starts

Identifying threshold when paging or even swapping starts