Page Outs Up - Trousers Down.

Dan_173 · ‎06-18-2003

Yes - we're getting a jolly good beating and I hope you can help!

System:
HPUX11.0 64bit on T600 with 9 cpu and 3.75G Memory using an EMC disk sub-system with 8G cache. dbc_min_pct=10 and max_pct=30. bufpages=224465 and nbuf=289800.

Applications:
Mixed. Several Oracle Data Warehouses +others using Oracle8.0.6 (32bit) and 9.2.0.3 (64bit)

Problem:
I'm seeing PAGE OUTs per the PO value in VMSTAT running at around 250 at peak times. I suspect this is adversely affecting performance.

Question:
I've reviewed many threads already posted here (to try and not waste your time) and have come to the following conclusions. Your feedback/confirmation of my conclusions would be really appreciated.

CONCLUSIONS I'VE DRAWN - needing to be confirmed:

1. With NBUF and BUFPAGES both set at non-zero values, our dynamic caching is disabled.
2. My Buffer cache is ~1G per GPM and will not reduce due to 1) above.
3. that 1G of our total 3.75G is way too high
4. that we should try reducing to a static allocation of around 400-500M. - and not using dynamic until 11i.
5. that I'm unable to set mincache=direct for Oracle data & index files because we don't have onlineJFS - but if we had it, this would releave double-buffering issues.
6. that reclaiming ~600M (1.1G to 500M) from the UNIX cache is not likely to noticably degrade performance (considering WODISCH's comments where HPUX11 will not take advantage of more than 300-400M buffer cache).
7. the reclaimed memory can be used by user processes and likely to resolve (if not eliminate) current PAGE OUTs and improve overall peformance/throughput.

Thanks in advance for your help. I've already learned heaps simply by reading through all your valuable forum responses.

Dan.

Ian Dennison_1 · ‎06-18-2003

Dan,

Some initial observations and further questions,...

Yes, reducing the buffer cache would be good. How are you determining that bufpages is not zero? I always use SAM to look at the kernel parameters like this; doing a 'sysdef|grep buf' will show bufpages > 0 even if it is configured as such (bufpages in 'sysdef' should show how many are allocated right now).
Hint: Calculate 500MB as a percentage of 3.75GB (Your physical memory) then configure this percentage as your dbc_max_pct value.

How much swap do you have configured? The command 'swapinfo -ta' will show some information. However for pure magic answers, Glance Plus (available as a 30 day trial on the Apps CD) has a great swap memory breakdown feature (option w).

How are the databases actually performing? Yes you may have some memory pressure, but if the Systems are running OK, why change anything? What is System CPU % like?

90% of Oracle efficiency is derived from good SQL. Do you have a DBA that can examine this and comment on it?

Are you overspecced on the number of oracle Work Processes per instance? Or on the size of the SGA? (DBAs hate to give up any space; disk or memory). Can the Apps be de-tuned better or even switched off?

'ipcs -ma' will show SGA Size
'ps -eafl' will show in the SZ column the memory space allocated to each process.

Could you indulge us and produce some 'vmstat' information and place it in a reply here?

Share and Enjoy! Ian

Building a dumber user

Massimo Bianchi · ‎06-18-2003

Hi,
you are almost a super expert !

Regarding your questions:

1) True, setting this values prevwnt dynamic caching
2) true
3) true. since you are using oracle, i raccomedn our values, taken from production system: dbc_min_pct 5 dbc_max_pct 8.
Let oracle do the buffering work.
4) personally i use dynamic allocation on 11.o with no problem. let's wait for other comments
5) as far as i know, you can use the micache=direct. you need the vxfs file system, onlibeJFS is not needed. be sure to have the latest patches. if in doubt, try creating a new lvol, newfs -F vxfs on it and then mount it with the mincache=direct option.

For our little dev server, we use these settings:
-o tmplog,nodatainlog,mincache=direct

(also known as pdeal to the metal settings)

6) true. if you use oracle, fs should really not matter

7) may be true. check also your sga settings, maybe they are allocating far too much memory.

HTH,
Massimo

Stefan Farrelly · ‎06-18-2003

Yes, 1GB cache is too high on a 3.75GB server.

Yes, the optimal is 300-400MB cache - thats what we set all our servers to.

If you had OnlineJFS and used mincache=direct it would only help performance a little - not a massive amount. In this case double buffering is only reduced depending on your application doing things like reading on block size boundries, and other requirements. Not all apps do this. In the real world it doesnt help performance that much.

At the moment your server is losing performance trying to administer such a large cache of 1GB - but even more importantly youre out of memory and now paging - which can affect performance by a factor of 100 ! The goal is to eliminate paging (keep some memory free).

If you reduce cache to say 400MB and you still have paging then used the swapinfo -mt command to see how much DEVICE swap is USED - this figure is how much memory you are short. If its only 1-200MB I would reduce cache again - its far more important to have no paging and keep some free ram than have a cache at even 3-400MB.

You should be able to work out right now how much memory you are short. the DEVICE USED line from swapinfo -mt will tell you. This is how much you should reduce cache by. If the DEVICE USED total in MB is > 1GB then you are right - you will not eliminate all paging and your server will still perform badly in which case by more RAM.

Im from Palmerston North, New Zealand, but somehow ended up in London...

Steve Lewis · ‎06-18-2003

1. Dynamic caching is only disabled if you have hard-set the values of nbuf/bufpages in your system file.
2. Yes your cache is too big, especially as Horrorcle and your EMC do the same job.
3. Yes its quite high, but no necessarily in all cases. If you also run a non-database app which does lots of filesystem i/o then you will need some cache.
4. Yes, go for it.
5. Not sure whether you need Online/JFS. See previous answers. I would always recommend it for uptime/maintenance issues anyway. I used to have to do without Online/JFS, but now I have it I won't go back.
6. True. vmstat po > 0 values are a symptom of reducing your buffer cache, but it isn't as serious as having si/so > 0, which would definitely be bad. If vhand is taking lots of cpu its a bad sign. If swapper is taking lots of cpu its worse.
7. Probably, due to the 3 sets of buffering going on, but likely to help, since those old cpus are having to manage buffer resizes, therefore increasing waits and reducing your usertime in cpu.

T600s were cool 5 years ago. Time to start speccing your replacement! The latest cpus and i/o buses blow the old architecture away, especially rp8400s.

Steven E. Protter · ‎06-18-2003

Collect some performance data with the scripts I am attaching.

I totally endorse the dbc recommendations you got above.

Check inodes and shared memory setting. inodes are expensive.

As far as shmmax, don't increase it to more than 25% of available memory. Available memory is defined as physical memory plus swap. Going past 25% won't help and the system will ignore it anyway.

Check ipcs command when this happens.

I have a link to a performance doc and work and will forward that in an hour.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

John Bolene · ‎06-18-2003

I agree with the min at 5 and max at 8.

Also make sure you have swapmem_on turned on.

shared memory can go as high as 1G

you may need to add more physical memory for this workload

It is always a good day when you are launching rockets! http://tripolioklahoma.org, Mostly Missiles http://mostlymissiles.com

A. Clay Stephenson · ‎06-18-2003

Bottom line: If you are seeing rather high pageout rates (and you are) don't even worry about buffer cache hit rates - that's at least an order of magnitude (that's Physics talk for 10x) less significant than the swapping impact.

Because the T-boxes CPU's are not that hefty, I would alleviate all the extra stuff their littler brains have to deal with and do not use dynamic buffer cache but instead set bufpages=102400 (400MB) BUT make sure that you set nbuf=0 - that's almost always nearly optimal for static buffer cache. I suspect the non-zero nbuf value was your really big problem - aside from it being too large. You may find that still smaller is better. It was never really the "double-buffering" that was the problem but rather by reducing buffer cache, SGA cound be increased. If you still see pageouts then reduce bufpages still further. On 11.0, you would probably see about a 1.1 increase from bypassing the buffer cache with the OnlineJFS options.

If you find that you are still seeing pageout's, it's time to reduce the size of the SGA. No matter what it is, it ain't as bad as swapping.

If it ain't broke, I can fix that.

Steve Lewis · ‎06-18-2003

As for your trousers, what about:

$ troff

or maybe

$ strip

:-))) 0 points pls.

Dan_173 · ‎06-18-2003

Firstly - thank-you to each of you for taking the time to give your advice. I'm truly grateful.

Here's my answers to your questions (hope I haven't missed any) and a refinement of my conclusions drawn....

1.1 I used sysdef to get the BUFPAGES and NBUF values which reflects the current value of pages in use and not the setting provided to build the kernel image. I must use SAM to get the parm value actually set. Therefore, with the info I've provided, we can't actually tell if dynamic caching is enabled or disabled. I'll check SAM's values.

1.2 swapinfo -tm |tail -1 shows:
total 9048 5565 3483 62%

1.3 I'm using VMSTAT and taking the 'po' value as the PAGE OUTs per second. Local tech's have thrown this metric into doubt as being inaccurate - and that another tool should be used. However, no-one here has raised 'po' as being a poor quality metric. Q: Am I safe to consider it as an acceptably accurate metric? If not, what should be used?

1.4 The box supports a mixed load of about six Oracle instances and non-trivial outside-Oracle processing too. The instances already have their SGAs wound right down to the point of showing poor hit ratio's in the Library Caches and the Data Dictionary caches (areas within the shared SQL pool). This is causing significant sql reloads, leading to increased I/O and CPU to perform hard parses on reloaded SQL. The data buffer hit ratio is also poor, but this is of lesser concern to me - being a data warehouse.
Q: I believe my first priority to be to stopping PAGE OUTs, only then if memory to spare - introduce SGA increases (slowly). Right?

1.5 You've generally confirmed that a unix buffer cache of 1G (from total real 3.75G) is too high and that this cache size itself introduces perf problems. A cache size of circa bufpages=102400 (400M) is better - for my set of conditions.

1.6 RE: A.Clay's notes "if..still seeing pageout's,...reduce SGAs...ain't as bad as swapping." Please pardon my ignorance here - but this raises more questions for me. I believed swapping was the next level worse than paging, where (for swapping) all pages associated with a process were paged/swapped out - wheras paging implied only pageout's on demand and not the full resident set for the process(s). Also, that swapping no longer happens in 11.0 and is replaced by deallocations - being handled differently. So..
Q: are the ~200 pageout's/sec to considered a severe impact upon performance?

1.7 Lastly (a *very important point*) - a local tech' suggests that by decreasing the buffer cache, we may stop paging I/O's but simply "relocate" the phys I/O's downstream to a buffer cache miss and subsequently a physical I/O to satisfy the cache miss. The assertion here is that with the reduced buffer cache (say 400M), performance/throughput will be worse.
I believed (rightly or wrongly) that non-zero PAGEOUTs was far worse than a cache miss (which -may- equate to a phys i/o or instead, could be satisfied by the disk sub-system cache with a logical i/o). If you consider winding down all SGA's on the box to absolute screaming minimum, then let SGA misses hit the UNIX buffer cache and fight it out for free buffers, then can we consider the current PAGEOUTs "the best we can do" for the little real memory we have. So...
Q: Which is the lesser evil (PAGEOUTs or cache misses), by how much and why?

Thanks again for your great feedback. I'm trying hard to digest & consolidate all this feedback, then get confirmation that I've got it right.

Dan

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Page Outs Up - Trousers Down.

Page Outs Up - Trousers Down.

Re: Page Outs Up - Trousers Down.

Re: Page Outs Up - Trousers Down.

Re: Page Outs Up - Trousers Down.

Re: Page Outs Up - Trousers Down.

Re: Page Outs Up - Trousers Down.

Re: Page Outs Up - Trousers Down.

Re: Page Outs Up - Trousers Down.

Re: Page Outs Up - Trousers Down.

Re: Page Outs Up - Trousers Down.