Dreadful performance 11.31

Brian McKerr · ‎08-25-2009

We have a 2 node (8GB RAM each) Oracle RAC + ASM cluster that performs as expected when we start instances that total around 1GB of memory for the SGA. If we attempt to start around 3GB worth of databases the server starts swapping like mad. The 'scan rate' column 'sr' in vmstat reaches upwards of 50000, page in and page out also show significant values in the range of 1000-2000. HP have provided kmeminfo utility to help diagnose what it going on. We can see that during this time of excessive swapping that the kernel is using almost 4GB of RAM. Here is one kmeminfo ouput;

Physical memory usage summary (in page/byte/percent):

Physical memory = 2089399 8.0g 100%
Free memory = 207597 810.9m 10%
User processes = 709362 2.7g 34% details with -user
System = 1014895 3.9g 49%
Kernel = 1014875 3.9g 49% kernel text and data
Dynamic Arenas = 673936 2.6g 32% details with -arena
reg_fixed_arena = 72798 284.4m 3%
vx_inode_kmcach = 72632 283.7m 3%
misc region are = 59265 231.5m 3%
FCACHE_ARENA = 56911 222.3m 3%
vx_global_kmcac = 51060 199.5m 2%
Other arenas = 361270 1.4g 17% details with -arena
Super page pool = 136131 531.8m 7% details with -kas
UAREA's = 29536 115.4m 1%
Static Tables = 145636 568.9m 7% details with -static
pfdat = 102021 398.5m 5%
vhpt = 16384 64.0m 1%
text = 10326 40.3m 0% vmunix text section
inode = 7805 30.5m 0%
bss = 5657 22.1m 0% vmunix bss section
Other tables = 3441 13.4m 0% details with -static
Buffer cache = 20 80.0k 0% details with -bufcache
UFC file mrg = 98364 384.2m 5%

This was after the oracle instances crashed and we received cannot reserve swap error messages. But during the swapping we were able to run glance and it shows that the system was using even more than 4GB of RAM.

We have configured filecache_max to be 5% of RAM (400mb) but the kernel is still using 4GB. I'm now starting to think that there is a hidden kernel tuneable 'kernel-aggressively-use-memory' which is set to true.

These servers have nothing but Oracle running on them as they are dedicated for the RAC cluster. Also, there is absolutely no activity on the servers when they are experiencing this issue as the servers are not in production yet.

I have calls logged with HP and Oracle and they are not proving to helpful at this stage.

Surely this is a generic OS performance issue ? as we do not see the issues when only 2 instances are started, for example.

Laurent Menase · ‎08-26-2009

Hi Brian

you said the kmeminfo was taken just after oracle instance crashed because it could not reserve 3Gb

indeed it can't
currently system is taking almost 4G
and users 2.7G

so already 6.6G are used on that system.
so no place to find a 3G chunk

Turgay Cavdar · ‎08-26-2009

Hi,
We have a similar config 11.31 / 10gRAC with ASM on 8GB RAM. We have a 3GB SGA on hosts and i see no problem up to now. When i look at with kmeminfo it looks your kernel memory is high (x1,5) compared to my config. At what time did you get kmeminfo output, after oracle crash ? If so, user ram usage is 2.7g and it is too high; did you check for any other process using a lot of RAM or shared memory pieces from crashed oracle?

Don Morris_1 · ‎08-26-2009

There's no kernel tunable - HP-UX is and always has been aggressive in kernel caching for performance. This isn't a problem as long as it comes back (this looks like one of the times when it doesn't in sufficient quantities, obviously).

In your case, look at your VxFS tunables (especially vx_ninode). Have you changed them? (The defaults are pretty agressively tuned towards "File Server", which doesn't sound like your workload). We'd have to track down things within the Arenas (to see what objects are in use preventing the backing memory large page from being freed) -- but from the kmeminfo output shown, it looks like lots of files were open. UFC itself is capped at 5%, but the file metadata (inodes, the VM regions and sub region structures affiliated with the files) is likely mixed between in-use/free... which fragments the kernel backing allocations [super pages] they come from -- hence the Super Page Pool layer is rather flush. vhand is trying to force things out -- but it can't just remove memory that's in use. Cached inodes in VxFS are still in use from the memory allocator's point of view -- hence a larger inode cache gives this sort of arena footprint.

VxFS version is interesting here as well -- if I recall correctly, 5.0 and higher includes increased responsiveness to VM memory pressure to get cached inodes back.

What version of v3 (especially VM) you're running and what version of VxFS may be pertinent.

Brian McKerr · ‎08-26-2009

Thanks for the feedback guys.

I probably didn't explain the situation clearly in my first post;

This kmeminfo was after all 3 running non-asm instances had crashed. +ASM was still running as were the clusterware daemons. As I mentioned, during the heavy paging more than 50% of system ram was used by the kernel alone. I thought it was meant to release this memory when it was needed by existing or new user programs ? Also, it is worth noting that a 'kmeminfo -arena' does show a lot of memory be used up by vx_* type areas. Yes the local filesystems are VXFS but all the oracle data is on raw devices that are under ASM control. There should be practically no need for VXFS to be gobbling upwards of 1GB worth of RAM at this time.

The system is 11.31 fully patched (according to HP tech support) and oracle is 11.1.0.7, again fully patched.

I'm *really* tempted to get an IA64 version of linux to see if we can run the same databases in the 8GB. Common sense says we should have no trouble starting 4GB worth of SGA when we have 8GB of RAM on any OS.

Another interesting thing is that we originally only had 8GB of swap, which was as per clusterware installation recommendations and the system seemed to be eating into that at an alarming rate. So I increased it by another 8GB giving a total of 8GB RAM + 16GB swap. Today when I tried to start all 4 instances on top of ASM and the system crashed after producing errors saying "deferred swap reservation failure"....... all this for < 4GB of user programs with 24GB VM available !

How do I check the VM version or whatever it was ?

Thanks for your help.

Brian McKerr · ‎08-26-2009

VXFS is version 4.1

and this may help identify which version of the OS I am running;

QPKBASE B.11.31.0903.334a Base Quality Pack Bundle for HP-UX 11i v3, March 2009

Don Morris_1 · ‎08-26-2009

You should have the 11.31.0903 VM if you installed the Quality Pack, ok. `swlist -l patch | grep "vm cumulative"` should show PHKL_38651 or higher then.

Yes -- the kernel will give back what it can under memory pressure... but it can't give back what it thinks is in use by clients. Hence my concern over the VxFS inode cache given your arena sizes.

So -- what's vx_ninode set to then? Again, every time I've seen this style of kernel memory load -- it has been open or cached files/inodes and all the metadata that goes with them.

And it doesn't seem like you had 4Gb of user program data total with 16Gb of swap and deferred reservation failures. The kernel can not use lazy swap -- so it would be limited to less than the 8Gb of RAM for memory swap/stolen swap. The user load therefore had to be in the 16Gb range (swapinfo -atm, the reserve line was almost certainly showing that almost all the swap was reserved at that point), not 4Gb. You may intend a 4Gb _physical_ load -- but obviously your configuration virtually is much larger. [And deferred reservation failure shouldn't crash the system -- perhaps your clustering software might reboot the system, but the OS itself will simply fail new virtual allocations at that point and terminate processes that fail the swap reservation (which tends to free swap by its nature)]. If you think your virtual load should only be in the 4Gb range, you may need to check the configuration on the application side as well.

Besides Oracle -- is there a backup process running (that could account for the files)? Java monitoring apps (that could give the large virtual load)? There's obviously more going on than the raw SGA sizes here.

Steven E. Protter · ‎08-26-2009

Shalom,

Surely not a generic OS performance issue.

It could be that the oracle memory requirements exceed the supply of memory.

There are things you can do on the OS.

You can reduced the buffer cache.

Set dbc_max_pct and dbc_min_pct to the same figure and set it low. This sets the percentage of memory for the buffer cache.

Oracle has stats pack and some other tools that helps it self tune its own performance.

See that the dba's have it properly configured.

Search for programs with memory leaks:
http://www.hpux.ws/?p=8

Free performance gathering information:
http://www.hpux.ws/?p=6

Take a look at the disk section and look for i/o wait and hot spots.

It could be that last Oracle Instance.

Look at SGA and see if requirements can be reduced. Add up the requirements and see about getting more memory on the server to meet the requirements.

SEP
hpuxadmin in gtalk
hpuxconsulting in yahoo messenger.

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Alzhy · ‎08-26-2009

"starting 4GB worth of SGA when we have 8GB of RAM on any OS."

Not on HP-UX though my friend.

Do you have Glance?

You should easily see the distribution of your memory per process.

By any chance can you provided output of ipcs -a ?

Also -- are you not supposed to run just one DB instance on a RAC configuration? Just recalling what a DBA had told me once.

Hakuna Matata.

Brian McKerr · ‎08-26-2009

pd01:~# swlist -l patch |grep "vm cumu"
# PHKL_38369 1.0 kevm cumulative patch
# PHKL_39401 1.0 vm cumulative patch

pd01:~# kctune |grep vx
vx_maxlink 32767 Default
vx_ninode 0 Default Immed
vxfs_bc_bufhwm 0 Default Immed
vxfs_ifree_timelag 0 Default Immed
vxtask_max_monitors 32 Default

"You can reduced the buffer cache."

as explained I have reduced this by setting filecache_max at 5%. The tuneables you are referring to are deprecated in 11.31.

Yes, I have run glance and it shows the kernel using 52% of the 8GB.

When I say three instances I mean for example;

1 x accounts DB
1 x HR DB
1 x email DB

and not 3 of the *same* instance !

Don Morris_1 · ‎08-26-2009

Yup -- you've got the default (auto-tune by VxFS) vx_ninode.

Take a look at: http://docs.hp.com/en/5992-5795/apbs02.html you aren't running on a low memory machine per se -- but you effectively want to keep 4 to 5 Gb of physical RAM available for your SGAs, so it works out similarly.

Since you said VxFS isn't supposed to be stressed by your DB load, I'd try setting vx_ninode to 65536 (the 3Gb value) and reboot.

This will cut down on the cached inodes and the VM regions, pregions, VAS, etc. fcache structures that are kept while they're cached.

Brian McKerr · ‎08-26-2009

Thanks Don, setting vx_ninode to a lower value has certainly helped the performance a great deal (i've set it to 16384 which = 1.5GB of ram according to the link you posted). However meminfo still shows 2.3GB worth of RAM being used by the system. The biggest culprit seems to be "asyncdsk varib" which is using upwards of 750MB of that 2.3GB. It appears that it is related to asynchronous IO which would appear to default to 'on'. That is good because our oracle databases will make use of it, however, whether we want 750MB of 'buffers' for async IO is another question. Kind of defeats the purpose of using ASM over raw san luns ! I'd rather have the 750MB available for other user processes.

Does anyone have experience with this parameter and can it be tuned down ?

Also, in the kmeminfo output below pfdat is using ~ 400MB. Any tips for that one ?

Here is the latest kmeminfo;

tool: kmeminfo 9.06 - libp4 9.349 - libhpux 1.241 - HP CONFIDENTIAL
unix: /stand/current/vmunix 11.31 64bit IA64 on host "pd01.md.internal"
core: /dev/kmem live
link: Tue Aug 04 11:54:08 EST 2009
boot: Thu Aug 27 13:09:55 2009
time: Thu Aug 27 13:23:52 2009
nbpg: 4096 bytes

----------------------------------------------------------------------
Physical memory usage summary (in page/byte/percent):

Physical memory = 2089399 8.0g 100%
Free memory = 22922 89.5m 1%
User processes = 1442136 5.5g 69% details with -user
System = 533896 2.0g 26%
Kernel = 533874 2.0g 26% kernel text and data
Dynamic Arenas = 328212 1.3g 16% details with -arena
asyncdsk variab = 177624 693.8m 9%
vx_global_kmcac = 17097 66.8m 1%
spinlock_arena = 10658 41.6m 1%
BTREE_NODE_OLA_ = 8913 34.8m 0%
vm_pfn2v_arena = 8566 33.5m 0%
Other arenas = 105354 411.5m 5% details with -arena
Super page pool = 21627 84.5m 1% details with -kas
UAREA's = 12080 47.2m 1%
Static Tables = 145636 568.9m 7% details with -static
pfdat = 102021 398.5m 5%
vhpt = 16384 64.0m 1%
text = 10326 40.3m 0% vmunix text section
inode = 7805 30.5m 0%

Duncan Edmonstone · ‎08-26-2009

Brian,

I'm no expert on the async driver, but it's my understanding that those buffer sizes are effectively set by the processes that open the /dev/async device via an ioctl. This specifies the number of concurrent IOs the process might have, and setting this high results in larger memory utilisation. As Oracle is the one opening /dev/async, it's effectively Oracle processes that are requesting this much buffer.

It would be interesting to see what size these buffers were when Oracle was completely stopped on the system, and no processes have /dev/async open (you can use lsof to check what processes have /dev/async open, if you don't have that you can get it from here: http://hpux.connect.org.uk/hppd/hpux/Sysadmin/lsof-4.82/ ). I'm not totally sure, but I think maybe the buffers don't get free'd up until all processes have closed /dev/async

HTH

Duncan

I am an HPE Employee

Don Morris_1 · ‎08-27-2009

Yup -- that looks like a variable arena for the Async driver, agreed. max_async_ports (man 5 max_async_ports) is the only tune I know off that affects it. Since you mentioned earlier that this isn't a production environment yet, you should be able to try lowering it and see whether or not it kills your Oracle async performance.

Regarding PFDATs -- no, there's nothing you can do about that. v3 uses 5% for PFDATs -- they're a per-page description structure that VM has to create. It isn't the kind of change that can be patched, so reworking that would require a new release.

The only way to reduce the cost on v3 would be to increase the base_pagesize tunable. (So the system uses a larger pagesize than 4k. 1 PFDAT per page costs less with less pages). Again, being non-production you can try this [I'd probably try 16kb -- your small applications (shell scripts?) may burn more memory if they just need one page, so you don't want the full 64kb... 16kb may be a good compromise]. (Read more at: http://docs.hp.com/en/5992-4174/ch05s30.html)

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Dreadful performance 11.31

Dreadful performance 11.31

Re: Dreadful performance 11.31

Re: Dreadful performance 11.31

Re: Dreadful performance 11.31

Re: Dreadful performance 11.31

Re: Dreadful performance 11.31

Re: Dreadful performance 11.31

Re: Dreadful performance 11.31

Re: Dreadful performance 11.31

Re: Dreadful performance 11.31

Re: Dreadful performance 11.31

Re: Dreadful performance 11.31

Re: Dreadful performance 11.31

Re: Dreadful performance 11.31