Re: High System or Kernel Memory Usage

RahulS · ‎02-27-2010

Hi,

I am observing high memory usage by system on almost all my HPUX11iv3 ( Sep 2009 release). Following are the output of commands for your reference.

This is Oracle 11g database server having SGA=8GB and PGA set to 1GB

# ./kmeminfo
tool: kmeminfo 9.04 - libp4 9.344 - libhpux 1.236 - HP CONFIDENTIAL
unix: /stand/current/vmunix 11.31 64bit IA64 on host "alonsum1"
core: /dev/kmem live
link: Sun Nov 08 10:10:23 WAT 2009
boot: Sun Feb 7 04:12:22 2010
time: Sat Feb 27 17:45:30 2010
nbpg: 4096 bytes

----------------------------------------------------------------------
Physical memory usage summary (in page/byte/percent):

Physical memory = 4186551 16.0g 100%
Free memory = 15364 60.0m 0%
User processes = 2455466 9.4g 59% details with -user
System = 1456230 5.6g 35%
Kernel = 1456204 5.6g 35% kernel text and data
Dynamic Arenas = 878794 3.4g 21% details with -arena
asyncdsk variab = 124855 487.7m 3%
spinlock_arena = 91111 355.9m 2%
reg_fixed_arena = 74216 289.9m 2%
vx_inode_kmcach = 65328 255.2m 2%
vx_global_kmcac = 63546 248.2m 2%
Other arenas = 459738 1.8g 11% details with -arena
Super page pool = 248191 969.5m 6% details with -kas
UAREA's = 14688 57.4m 0%
Static Tables = 316976 1.2g 8% details with -static
pfdat = 204421 798.5m 5%
inode = 52948 206.8m 1%
vhpt = 32768 128.0m 1%
ncache = 9495 37.1m 0%
text = 8990 35.1m 0% vmunix text section
Other tables = 8353 32.6m 0% details with -static
Buffer cache = 26 104.0k 0% details with -bufcache
UFC file mrg = 177190 692.1m 4%

# kcusage
Tunable Usage / Setting
=============================================
filecache_max 722509824 / 8154624000
maxdsiz 9764864 / 2147483647
maxdsiz_64bit 102367232 / 274877906944
maxfiles_lim 126 / 8192
maxssiz 106496 / 134217728
maxssiz_64bit 786432 / 1073741824
maxtsiz 7249920 / 1073741824
maxtsiz_64bit 318767104 / 8589934592
maxuprc 158 / 27001
max_thread_proc 101 / 3000
msgmni 2 / 30000
msgtql 0 / 30000
nflocks 17 / 30000
ninode 1298 / 242048
nkthread 926 / 52516
nproc 381 / 30000
npty 0 / 200
nstrpty 8 / 200
nstrtel 0 / 60
nswapdev 2 / 32
nswapfs 0 / 32
semmni 61 / 30000
semmns 3406 / 60000
shmmax 8657051648 / 4398046511104
shmmni 7 / 4096
shmseg 3 / 512

Michael Steele_2 · ‎02-27-2010

What are you reporting for system? User? These numbers?

"...Free memory = 0%
User processes = 59%
System = 35%..."

Or are you concerned about what follows?

What is normal? What should you be seeing?

Attach these reports please:

a) glanc m memory usuage
b) UNIX96=1 ps -ef -o vsz,pid,ppid,state,wchan,comm | sort -rn | head -15

Support Fatherhood - Stop Family Law

RahulS · ‎02-27-2010

Thanks Michael,

Following are my concern

1. Free memory is 0%
2. System memory is 35% (5.6GB)

System memory usage seems to be on higher side ,

Sorry I cannot provide the glance output as we dont have it in our environment, and its controlled environment so need to time install any evaluation copy.

output of the command.

$ export UNIX96=1
$ ps -ef -o vsz,pid,ppid,state,wchan,comm | sort -rn | head -15
ps: illegal option -- o
usage: ps [-edaxzflP] [-u ulist] [-g glist] [-p plist] [-t tlist] [-R prmgroup] [-Z psetidlist]
$ export UNIX95=1
$ ps -ef -o vsz,pid,ppid,state,wchan,comm | sort -rn | head -15
328128 28444 1 S e000000316a23e40 ora_dbw4_crmdb1
328128 28442 1 S e000000316901e40 ora_dbw3_crmdb1
328128 28440 1 S e000000374d155c0 ora_dbw2_crmdb1
328128 28435 1 S e0000003169018c0 ora_dbw1_crmdb1
328128 28430 1 S e0000001fa137340 ora_dbw0_crmdb1
320256 29034 1 S e00000037614d940 ora_cjq0_crmdb1
320000 29207 1 S e0000003ad6ec272 oraclecrmdb1
320000 29080 1 S e00000019b12f9f2 oraclecrmdb1
320000 28472 1 S e00000037614d540 ora_mmon_crmdb1
320000 28453 1 S e0000003146ce940 ora_smon_crmdb1
320000 28093 1 S e000000316a23240 asm_gmon_+ASM1
320000 28085 1 S e00000037614d2c0 asm_rbal_+ASM1
320000 28079 1 S e000000374d15840 asm_lgwr_+ASM1
320000 28077 1 S e0000003291fc0c0 asm_dbw0_+ASM1
320000 22876 1 S e0000003d2e5c868 oraclecrmdb1

Jeeshan · ‎02-27-2010

what is your dbc_max_pct and dbc_min_pct value in kernel parameters?

a warrior never quits

Dennis Handly · ‎02-27-2010

>unix_shell: what is your dbc_max_pct and dbc_min_pct value in kernel parameters?

These are pretty useless since this is 11.31 and the vestigial buffer cache is only using .1 Mb.

Emil Velez · ‎02-27-2010

ITs probably all those oracle shared memory segments. too bad that you do not have glance and cannot "evaluate" it.

Do you have measureware and can you extract anything.

Since it is 11.31 you will need to look at filecache min and filecache max.

look at ipcs -ma and look at the size fields of your shared memory segments.

WayneHP · ‎02-28-2010

Lets see you have 16GB of memory.

8GB goes to Oracle SGA

and 1GB goes for each Oracle connection

you can have less than 8 connections because the rest of UNIX needs some memory, or paging/swapping begins. Are you sure you need a 1GB PGA?

PGA
Program Global Area. The PGA is a memory region containing data and control information for a single process (server or background). One PGA is allocated for each server process; the PGA is exclusive to that server process and is read and written only by Oracle code acting on behalf of that process. A PGA is allocated by Oracle when a user connects to an Oracle database and a session is created.

WayneHP · ‎02-28-2010

How many Oracle connections do you have?

http://download.oracle.com/docs/cd/B28359_01/server.111/b28274/memory.htm#i47865

This doc may help

Michael Steele_2 · ‎02-28-2010

Hi

Based upon what you've provided I'd say your Oracle DBA has tuned oracle in this fashion and you will have to review ora_dbw, your largest consumer that appears to be behaving properly and is perfectly reasonable depending on the size of your buffercache.

For processes like the dbwr (which will touch much of the pages of your
buffercache most probably) this means that the reported used memory will
grow initially, and be reported as you see.

If you run 'ipcs -moba' and total up the SEGSZ you'll get a measure of how much total shared memory oracle is using.

From the HP-UX side, the related kernel parameters are 'dbc_min_pct' and 'dbc_max_pct'. Post them for comments and feedbacks.

From my point of view I would do nothing unless you are getting complaints from users. If you are then you have two reliable choices from the HP-UX side of the house, buy more RAM and / or reduce the number of process burdening the box.

Here is a good thread related to this problem. It discuss buffer cache and notes the importance of being properly patch in HP-UX 11.31.

So I guess I would start here, proper buffer cache patching for 11.31.

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=1193484

Support Fatherhood - Stop Family Law

Dennis Handly · ‎02-28-2010

>Michael: the related kernel parameters are dbc_min_pct and dbc_max_pct.

These aren't so important for 11.31. And the file cache usage is only .7 Gb.

Michael Steele_2 · ‎02-28-2010

Oracle maintains its own buffer cache inside the system global area (SGA) for each instance.

http://www.dbspecialists.com/files/presentations/buffercache.html

Support Fatherhood - Stop Family Law

RahulS · ‎02-28-2010

Please the below filecache_max usage pattern of my two Oracle 11g RAC database servers.

Day Date Time "
(BRM DB Node1)" "
(BRM DB node2)"
Bytes % Bytes %
Sat 2/27/2010 23:00 1803579392 11 2958098432 18.1
Sun 2/28/2010 0:00 1938280448 11.9 16322625536 100
Sun 2/28/2010 1:00 1905328128 11.7 16016777216 98.1
Sun 2/28/2010 2:00 1918468096 11.8 15594962944 95.5
Sun 2/28/2010 3:00 1922412544 11.8 15480061952 94.8
Sun 2/28/2010 4:00 1971060736 12.1 15217262592 93.2
Sun 2/28/2010 5:00 1868517376 11.4 15106449408 92.5
Sun 2/28/2010 6:00 1879154688 11.5 15014379520 92
Sun 2/28/2010 7:00 1888792576 11.6 14519771136 88.9
Sun 2/28/2010 8:00 1893126144 11.6 14189719552 86.9
Sun 2/28/2010 9:00 1906495488 11.7 14113128448 86.5
Sun 2/28/2010 10:00 1919987712 11.8 14014951424 85.8
Sun 2/28/2010 11:00 1936580608 11.9 13673652224 83.8
Sun 2/28/2010 12:00 1934462976 11.8 13445591040 82.4
Sun 2/28/2010 13:00 1950392320 11.9 13200674816 80.9
Sun 2/28/2010 14:00 1963429888 12 12804304896 78.4
Sun 2/28/2010 15:00 1972707328 12.1 12385775616 75.9
Sun 2/28/2010 16:00 1984831488 12.2 11934097408 73.1
Sun 2/28/2010 17:00 2005561344 12.3 11391574016 69.8
Sun 2/28/2010 18:00 2022236160 12.4 11014860800 67.5
Sun 2/28/2010 19:00 2033881088 12.5 10861117440 66.5
Sun 2/28/2010 20:00 2042576896 12.5 10673901568 65.4
Sun 2/28/2010 21:00 1968144384 12.1 10589315072 64.9
Sun 2/28/2010 22:00 1981603840 12.1 10480779264 64.2

RahulS · ‎02-28-2010

Please find the attached formatted usage pattern

Michael Steele_2 · ‎02-28-2010

Hi

The arbiter here is going to be buffer cache hits, which is expected to be a 90 to 100% hit ratio.

I can see from your last posting that something changed, you dropped from 100% to 60%. How did you do this? Kill processes? Wait for a big sql to complete?

Review the link I post above, I'm interested in seeing this report:

SELECT 250 * TRUNC (rownum / 250) + 1 || ' to ' ||
250 * (TRUNC (rownum / 250) + 1) "Interval",
SUM (count) "Buffer Cache Hits"
FROM v$recent_bucket
GROUP BY TRUNC (rownum / 250)

Interval Buffer Cache Hits
--------------- --------------------
1 to 250 16083
251 to 500 11422
501 to 750 683
751 to 1000 177

Support Fatherhood - Stop Family Law

Michael Steele_2 · ‎02-28-2010

This is interesting, contradictory, and has some good arguements.

http://raj_oracle90.tripod.com/sitebuildercontent/sitebuilderfiles/whya99percentbuffercacheratioisnotok-carymillsap.pdf

Support Fatherhood - Stop Family Law

Horia Chirculescu · ‎03-01-2010

Hello,

>maxdsiz 9764864 / 2147483647
>maxdsiz_64bit 102367232 / 274877906944

274 877 906 944 = ~270G This is huge. Are you sure you want this?

>semmni 61 / 30000
>semmns 3406 / 60000

The number of semaphores are also very high (30000 !)

>shmmax 8657051648 / 4398046511104

4 398 046 511 104 = ~ 4T Also, some huge value!

How does it look like the output for:

ipcs -moba

Oracle should use only one segment - and the biggest one from your system (maybe) - the one with 8657051648 bytes.

>shmmni 7 / 4096
>shmseg 3 / 512

Why did you set a very large segment identifiers? - 4096 is a huge value!

Horia.

Best regards from Romania,
Horia.

RahulS · ‎03-01-2010

Hi,

All the parameters below are default.

maxdsiz 9764864 / 2147483647
maxdsiz_64bit 102367232 / 274877906944
maxfiles_lim 126 / 8192
maxssiz 106496 / 134217728
maxssiz_64bit 786432 / 1073741824
maxtsiz 7249920 / 1073741824

Please see the attached document for ipcs -moba output. SEGSZ for oracle is 4GB

Michael Steele_2 · ‎03-01-2010

T ID KEY MODE OWNER GROUP CREATOR CGROUP NATTCH SEGSZ CPID LPID ATIME

m 65542 0xc3eb8a14 --rw-rw---- oracle dba oracle dba 1428 4311756800 18386 20019 14:26:38

NATTCH 1428
SEGSZ 4311756800

Number attached: 1428, that is processes - You are pigging out this system

SEGMENT SIZE - In G bytes, 4 GB. Which is fine.

You can figure out the rest for total shared memory consumption yourself. However, you've got to tune down the number of users or processes on this box or buy more RAM.

Support Fatherhood - Stop Family Law

Horia Chirculescu · ‎03-01-2010

Can you please share with us the output of

ps -e |grep oracle |wc -l

Horia.

Best regards from Romania,
Horia.

RahulS · ‎03-01-2010

Here it is
--------------------------
DB Node1

# ps -aef|grep oracle|wc -l
764
--------------------------
DB Node2
$ ps -aef|grep oracle |wc -l
126

Don Morris_1 · ‎03-01-2010

You've got 4 issues in the system space.

1) The default overhead of around 8.6% of physical memory plus filecache_min. This is a known cost in v3 unless you set base_pagesize higher (a setting of 16 would get around 6% of memory back right off the bat). Requires a reboot, should be done in a testing environment first to make sure you don't have applications that assumed the page size was always 4096 instead of using sysconf/getconf or other interfaces as they should.

2) Your file related caching layers are big users (spinlock, region, vx global, vx inodes). This implies that vx_ninode may be too high (man 5 vx_ninode). As mentioned in the man page, by default this is rather aggressive and can be reconfigured.

3) Probably related to (2), your Super Page Pool cache layer is significant. This signifies that large page kernel translations are built and then large parts are freed, but not all -- and new allocations may not be able to use what is freed for one reason or another. This is often aggravated by the (2) issue due to the caching in the file system / file system interaction [VM] layers. Fix (2) and (3) may go down (likely not away, but down is good).

4) You've got a lot of memory tied up in Async disk driver structures. This may be required for performance -- but it may be excessive (I can't really say since I don't know how the async disk driver client is configured). Consult your Oracle documentation regarding async disk driver configuration.

Michael Steele_2 · ‎03-01-2010

Hi Don:

Could I see the references and citations for all this?

Support Fatherhood - Stop Family Law

Don Morris_1 · ‎03-01-2010

Math/kwdb and looking at the data provided.

sizeof(pfd_t) on v3 - 200 bytes. Base pagesize reported 4096 bytes. (200 * 100) / 4096 = 4.8828125. Hence pfdat mandatory cost [being a per-page structure] is 4.88%. In practice, you pick up a little more fluff for the other levels of the table, so 5% is a good rule of thumb.

The VHPT tries to be about 1% of memory [sizeof(pte_t) / 4096 = 0.78%, but there is a little extra+ rounding].

The Overflow PTEs are another 1% or so [sizeof(ovfl_pte_t) / 4096] and this tries to be one per page plus extra for Alias translations and Memory Mapped I/O.

The system critical pool area works out to about 0.2% (I don't think that heuristic is documented, nor should it be).

The PFN2V area is around 0.4% (sizeof(pfn2v_entry_t) / 4096 = .39%).

So that's 5% + 1% + 1% + .4% + .2% or 7.6%.

There's a metadata cache to help the filecache that's about 1%, so that's 8.6%.

Of all that -- the PFDAT, PFN2V, VHPT and Overflow PTE sizes are all per-page, so raise the base pagesize, you have fewer pages -- the cost is less.

And filecache_min is a flat reservation as stated in `man 5 filecache_min`: "The amount of physical memory that is specified by the filecache_min tunable is reserved and guaranteed to be available for file caching." Hence kmeminfo will report it as used (since from the System perspective, that amount is -- even if the particular physical pages aren't chose yet).

(3) is a matter of the dump provided showing almost 1Gb in the Super page pool. Barring this being a ccNUMA based system with 17 localities and the submitter leaving that little fact out -- that's a lot of memory hanging around on a 16Gb system (kmeminfo gives the total SPP layer memory, but there may be multiple distinct caches in play based on the ccNUMA layout). Since memory only hangs around in the SPP layer when some of it is in use and Free memory is close to 0% (hence Garbage Collection should be in play) that strongly implies the arenas using memory are also preventing SPP coalescing [GC is pushing what it can to the SPP layer, but it isn't going further]. Since the VM white paper both predates the whole Super Page Pool implementation and I don't believe goes into any real detail on Kernel Dynamic Memory anyway -- there's nothing external to cite here.

(2) is granted a matter of experience and knowing that all the top arenas other than Async are File System/File Caching related and are all cached above the Arena layer when inodes are cached. Hence if you reduce inode caching, you reduce the caching of these other objects. Reduce the caching of the other objects, the memory gets freed to Arena -- the GC can find it... and hence (3) can clear up as well. The reference to it being aggressive by default is in the man page as I mentioned.

Alternately, you could start from http://docs.hp.com/en/7779/commonMisconfig.pdf and then realize that in v3 there are several additional structures in VM (VAS, pregion, region plus UFC-specifc stuff) -- making the section on the VxFS inode tuning more important.

You can ask me how I know, I suppose -- but since the answer is "I read the UFC design and implementation and have had to triage v3 for years now" I don't see how it does much good or is anything beyond an argument from authority at heart.

(4) is purely a matter of that's the arena for the async driver and hence checking the configuration of said driver would be the only way that memory load could be reduced. That load may be appropriate and required for Oracle performance, of course -- hence the recommendation to check said documentation on what Oracle seeks to do with this driver.

Michael Steele_2 · ‎03-01-2010

Don:

If this isn't certified by the manufacturer, then you are putting the box into an unknown, uncertified by the manufacturer state.

Why should anybody want to put their companies box, a box often relied upon by thousands of users, a box often responsible for a million dollar a day payroll, INTO AN UNKNOWN STATE????

Support Fatherhood - Stop Family Law

Don Morris_1 · ‎03-01-2010

I'm sorry -- I'm rather missing your point here.

You asked me to cite my reasoning for the statements I made that "This is likely why your kernel is using memory in this way, this is what you'd want to investigate doing".

I *specifically* said that base_pagesize (which is a documented tunable from HP, mind you) should be validated in a non-production environment as application issues may arise.

I also cited official HP documentation that vx_ninode (the man page, the white paper) is aggressive. Both give HP's recommendations for the tunable -- if a customer wants to hold to Oracle's recommendations instead, that's their business. I'm simply stating that inode caching can cause this sort of kernel memory caching. I said nothing about what to set it to beyond the documents in question.

Same with async -- I don't see how you can construe "consult your documentation regarding this configuration" as "twiddle this knob and pray for the best".

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: High System or Kernel Memory Usage

High System or Kernel Memory Usage