Excessive page-outs, even after memory upgrade

Andrew Scott_3 · ‎12-31-2009

We have an rx4640 with 4 single-core Itaniums running some Oracle standby instances, and are having some serious performance issues with it.

Originally, we were running with 12GB of RAM. We were seeing massive paging activity and were several gigabytes into swap.

We have since increased the system RAM to 32GB. We are no longer into physical disk-based swap, but swapinfo is still showing 1000-1500 page-outs a second, and total page-faults are running in the 8-10 thousand range.

Since we are using 0 blocks of physical swap, I'm guessing the page-outs are going to pseudo-swap.

Glance is showing 30.4GB of pseudo-swap available, and 7.2GB of it used on the system.

How can I get this sucker to stop paging out so much? We're only 24% utilized on physical memory at the current time. Our shmmax is 4GB and I have the buffer cache constrained to 10$ of available memory. What other kernel parameters do we need to check?

Also, I know that page-in counts include code loaded from disk to be run from memory. Do page-out counts include data written to disk by programs? The manpage for vmstat isn't clear on that.

Thanks!
Andrew

Steven E. Protter · ‎12-31-2009

Shalom Andrew,

Lets take a look at application usage.

Check the oracle SGA's and such. Trim your buffer pool if possible, to free up memory.

Lets take a look at glance and see what processes are using how much memory. Lets identify the porcess actually using the memory (top can do this) and take action on that basis.

Remember, an application will reserve swap even if it never uses it.

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Andrew Scott_3 · ‎12-31-2009

Here's my glance process list output. The three stdbypr01 processes are my biggest offenders. An occasional gzip will run out of a cron job and cripple the system, but I haven't been able to capture that running in glance yet.

Dennis Handly · ‎12-31-2009

>swapinfo is still showing 1000-1500 page-outs a second, and total page-faults are running in the 8-10 thousand range.

swapinfo(1m) doesn't show that. (Please provide the "swapinfo -tam" and vmstat output.)
Did you mean vmstat(1)?

>Since we are using 0 blocks of physical swap, I'm guessing the page-outs are going to pseudo-swap.

I'm not sure how useful this is for the kernel to do this?

>Do page-out counts include data written to disk by programs?

It depends on whether you have mapped files.
I'm not sure if it applies to the file cache on 11.31?

Andrew Scott_3 · ‎12-31-2009

Attached is a process detail on one of our big Oracle processes.

>swapinfo(1m) doesn't show that. (Please provide the "swapinfo -tam" and vmstat output.)
>Did you mean vmstat(1)?

Yes, I did. Sorry.

>>Since we are using 0 blocks of physical swap, I'm guessing the page-outs are going to pseudo-swap.

>I'm not sure how useful this is for the kernel to do this?

It's not useful at all, but that doesn't stop the system from doing it. The way I understand pseudo swap is the VM requires there to be a page of swap for every page of memory allocated, and they came up with pseudo swap so that they could create the pages in memory instead of on disk when available RAM allowed it.

And perhaps my page-outs are simply page creations in pseudo swap and not actual page outs? I don't know, and I don't know how to tell.

>>Do page-out counts include data written to disk by programs?

>It depends on whether you have mapped files.
I'm not sure if it applies to the file cache on 11.31?

I don't know, either.

Andrew Scott_3 · ‎12-31-2009

Ooops, forgot the attachment

Dennis Handly · ‎12-31-2009

>It's not useful at all, but that doesn't stop the system from doing it.

I would assume it isn't doing it and there is some other cause.

>they came up with pseudo swap so that they could create the pages in memory instead of on disk when available RAM allowed it.

Yes. The page is its own swap area.

>perhaps my page-outs are simply page creations in pseudo swap and not actual page outs?

I would assume they would suppress these as confusing. Or put them under some other statistic.

>I don't know, either.

Are you using 11.31? Do you have all the latest VM patches?

Your attachment says it is waiting for I/O.
It has FS Reads/Writes. No VM Reads/Writes. Some System Writes.

Andrew Scott_3 · ‎12-31-2009

Yes, I'm on 11.31, and I'm patched up to March 2009.

As for the process being in IO Wait, yes. Everything is almost always in IO wait on this system. That's the core problem I'm trying to solve: why is everything in IO Wait all the time?

The system has two 2GB fiber channel cards and has its own EVA 8400 with 91 spindles (all in one disk group). We have CA eHealth watching the fabric and the server, and its telling me the fiber channel cards are under 10% utilized, but my disk is grinding away at 30-70%. My CPU loads are hovering around 70%, mostly due to IO wait.

I ran evaperf against the EVA and processed the results through TLVIZ and found a few nasty spikes, but on average we're not hurting for storage speed and my controller utilization is really low. My write latencies spike to 9 or 10ms during a few really heavy spots where multiple instances are applying transaction logs at the same time, but on average are hovering around 4ms.

I have a bottleneck somewhere, I just can't find it. These excessive page-outs seemed like a good place to start looking.

I checked with the DBAs, and the total SGA allocated for all of the database instances was just 4GB, as they had not yet increased them from before the memory upgrade.

Would having the SGA size set too low cause excessive paging?

Oh, and this system is ~84% WRITE on its data disks. So consider that in any parameter change recommendations.

Emil Velez · ‎12-31-2009

If you are only using 24% of physical memory you do not have any pageouts.

You might have pageins which are ok but not pageouts. You may be misunderstanding the output of the vmstat memory report. You will not get pageouts unless you are low on memory and all of it is used.

You might want to increase the amount of filecache_max

How is your oracle databases setup. JFS filesystem, RAC with CFS or Raw Lvs ?

Dennis Handly · ‎12-31-2009

>why is everything in IO Wait all the time?
>My CPU loads are hovering around 70%, mostly due to IO wait.

These are separate/opposite. You could be up to 30% IO wait.

>I have a bottleneck somewhere, I just can't find it. These excessive page-outs seemed like a good place to start looking.

How do you know the page-outs are for these processes?

>Would having the SGA size set too low cause excessive paging?

Or lots of I/O.

Michael Steele_2 · ‎12-31-2009

Hi

Just out of curiosity, were you using secure path in contention with native load balancing provided in 11.31.

vxfsd is online jfs, I don't suppose you have any contention with raw logical volumes and online jfs file systems?

Since this is a HW product, HP will have to take full responsibility. Stop sweating it and throw it back as incompaitible / bugged.

Support Fatherhood - Stop Family Law

Andrew Scott_3 · ‎01-05-2010

quote:
--------------
If you are only using 24% of physical memory you do not have any pageouts.

You might have pageins which are ok but not pageouts. You may be misunderstanding the output of the vmstat memory report. You will not get pageouts unless you are low on memory and all of it is used.
--------------

I absolutely seeing seeing pageouts. Thousands per second, on a system with a low memory load but a very high I/O load.

Quote:
--------------
You might want to increase the amount of filecache_max
--------------

It's currently set at 10%, which is 3.2GB. Since the size of the transaction files that are being applied here seldom exceed 500MB, and the actual filecache size is hovering around 2.8-3GB depending on the time of day. Does it make sense to increase it when it isn't using everything it's already allowed to use?

Quote:
-------------
How is your oracle databases setup. JFS filesystem, RAC with CFS or Raw Lvs ?
-------------

Oracle data files on VXFS running over LVM on an EVA.

Quote:
-------------
These are separate/opposite. You could be up to 30% IO wait.
-------------
Yes, that's what I was describing. The CPU can't go past 70% because it's sitting around waiting on I/O.

Quote:
-------------
How do you know the page-outs are for these processes?
-------------
I don't know, that's what I'm trying to find out: what are the page-outs, and are they significant?

Quote:
-------------
Just out of curiosity, were you using secure path in contention with native load balancing provided in 11.31.
-------------
Interesting question, I wasn't aware SecurePath even ran on 11.31, we're using the native multipath I/O.

We aren't using any raw volumes, everything is on a filesystem. Is vxfsd known to become a bottleneck in heavily laden systems?

Quote:
-------------
Since this is a HW product, HP will have to take full responsibility. Stop sweating it and throw it back as incompaitible / bugged.
-------------

This statement confuses me. Are you suggesting I box it up and send it all back? Where do I transfer my workload?

Don Morris_1 · ‎01-05-2010

vmstat reports file cache flushing of dirty pages as pageouts (because they are from a memory management point of view -- the memory management is simply being applied to a subset of the system [only the cache] instead of the system as a whole, and the backing store is your file system, not a swap device).

So with 0 actual swap used and high page outs -- it sounds to me that you've got high file cache utilization, causing a large amount of dirty page pushout. (Clean pages wouldn't register as pageouts nor cause I/O -- they'd just get dropped). Increasing your file cache might help to some extent -- but at the rates you're talking about it sounds more like you have workloads chewing through and dirtying significantly more memory than your file cache. Still -- since you say you're only 25% utilized, it certainly would be worth a shot to raise filecache_max and let those dirty pages stay in the cache longer.

And just to settle that topic -- pseudo-swap never, ever gets any time of page-out. Remember that it is an accounting trick at the reservation layer, not some sort of device. The pageout mechanism simply proceeds through the pageable set -- and anything that is all reserved from pseudo-swap gets skipped in consideration, anything partial will be considered but if the actual swap allocation fails, the pager daemon moves along.

Andrew Scott_3 · ‎01-05-2010

Awesome information. I'll increase the filecache_max and see what that does to the system.

Thank you!

Andrew Scott_3 · ‎01-05-2010

I increased the filecache_max from 10% to 15%. Immediately the page-outs dropped to 0 and the filecache started growing. Once it hit the new 4.8GB limit, the page-outs came back.

I think my question has been answered, thanks!

Rita C Workman · ‎01-05-2010

I hope didn't miss this ... but I didn't see some requested info.

Did you mention how much swapdisk you set up?

Can you provide the output for the following:

vmstat -nS 1 10
sar -v 1 10
swapinfo -tam

I'm also curious about parm values. You mentioned you have 32Gb mem and filecache_max set to 10%. What are the values for:

filecache_min
ninode
vx_ninode
Semmaphore parm values
shmmax (you mention 4Gb, could you post the exact value)
maxdsiz

/rcw

Michael Steele_2 · ‎01-05-2010

Can you attach the page out report?

Support Fatherhood - Stop Family Law

Duncan Edmonstone · ‎01-05-2010

Andrew,

for standby databases which I presume are just doing redo apply (essentially a slow synchronous write as the redo arrives over the network followed by a faster synchronous read as the redo is applied, one wonders whether you are gaining much by using the filesystem cache at all for the redo log and archive redo log filesystems...

if you have online jfs you might want to try mounting those filesystems configured for direct io only (filesystem mount options mincache=direct,convosync=direct) and see if that changes anything... you could event try the filesystems with the datafiles as well (oracle doesn't usually benefit that much from filesystem cahce)

HTH

Duncan

I am an HPE Employee

Andrew Scott_3 · ‎01-05-2010

Rita:
The DBA's just shut down the instances to fiddle with something, so I don't have anything valid to give you for vmstat, sar, or swapinfo.

The rest:
filecache_min 1632595968 Default Auto
ninode 8192 Default
vx_ninode 0 Default Immed
shmmax 8192000000 8192000000 Immed
maxdsiz 1073741824 Default Immed
maxdsiz_64bit 4294967296 Default Immed

shmmax was increased from 4GB on a recommendation from Oracle.

Michael, I'll get something when they restart the databases.

Duncan,
There are three standby instances on this machine. One is a true standby that receives transactions from the primary asynchronously and applies them directly to the database.

The other two receive transaction logs from other machines via FTP, and then apply them directly to their respective standby instances. There are no redo or archive logs on this particular system, and the transaction logs are dumped onto the same filesystem as the standby database.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Excessive page-outs, even after memory upgrade

Excessive page-outs, even after memory upgrade

Re: Excessive page-outs, even after memory upgrade

Re: Excessive page-outs, even after memory upgrade

Re: Excessive page-outs, even after memory upgrade

Re: Excessive page-outs, even after memory upgrade

Re: Excessive page-outs, even after memory upgrade

Re: Excessive page-outs, even after memory upgrade

Re: Excessive page-outs, even after memory upgrade

Re: Excessive page-outs, even after memory upgrade

Re: Excessive page-outs, even after memory upgrade

Re: Excessive page-outs, even after memory upgrade

Re: Excessive page-outs, even after memory upgrade

Re: Excessive page-outs, even after memory upgrade

Re: Excessive page-outs, even after memory upgrade

Re: Excessive page-outs, even after memory upgrade

Re: Excessive page-outs, even after memory upgrade

Re: Excessive page-outs, even after memory upgrade

Re: Excessive page-outs, even after memory upgrade

Re: Excessive page-outs, even after memory upgrade