Re: Serious kernel vm parameter concerns.

Werner_31 · ‎04-28-2005

Hi.

I have some serious questions about how certain kernel parameters are supposed to be set up. I am here because I we are having some serious performance issues on a Tru64 machine which I don’t have admit rights to, but which I think I know the problem to.

The machine has 12 CPU's, 32GB Memory and a fiber storage facility with massive raid arrays etc. But it has serious performance issues. Now I have determined that it might be related to certain kernel parameters that are not set up correctly. What I have realized is that these parameters are maybe supposed to be set up according to the specific type of work that is going on, on the box and therefore I give a short description:

This machine is running a fairly large Oracle Dbase in conjuction with concurrent processes feeding the oracle dbase the data. Therefore Oracle + programs feeding oracle are the two things happening on this machine.

The programs feeding oracle constantly generates a huge amount of IO, 50% read 50% write a 100% of the time. The problem I’m having is with the 50% read part where I am unable to load anything into resident memory at speeds of more than 1Mb/s. The programs that are failing to read the data are the programs that have large RSZ values. They keep on going to sleep and load at ridiculous rates. The programs writing to disk though have small RSZ values and are always busy flooding the system with IO writes, never asleep. These RSZ values are roughly 2048Mb, and it is in these memory regions that the files need to be loaded into. Also keep in mind that the IO writes are newly generated data and therefore cannot be inside the UBC and therefore are not paged from disk somewhere.

Now, 32Gb – 12Gb( for Oracle ) leaves 20 Gb for everything else. This 20Gb is shared by the UBC and the applications if I am not mistaken. Now my theory is that because of the massive excessive IO writes going on in the background the UBC is consuming all the virtual memory pages because the kernel parameter vm_maxpercent is set to 70% of the system memory, which, in the first place should be based on 20Gb and not 32Gb ( If I am not mistaken Oracle bypasses the UBC with direct writes, which means it robs the system of that 12Gb making it seem like there is a lot of memory for the UBC which there is not ) . Further more, the borrow percent is set to 20% and worst of all the vm_ubcseqstartpercent = 80% with vm_maxdirtymetadata_pcnt = 70%. This for me translates into the UBC getting pomped up until it breaches 18Gb when it then realizes it has consumed all the available memory. Then the trashing starts… ( read next post this thing is giving me MIME err

Werner_31 · ‎04-28-2005

The theory on thrashing goes that the writes instantly consume all virtual memory pages and then when the reads have to take place they are told to wait while the VMM tries to reclaim pages from the UBC which it cannot achieve in the time required because the data needs to be flushed to disk first. Because it is unable to reclaim memory from the UBC, the second step is to start swapping out the programs using the most memory, which are the programs trying to read that data in the first place. Therefore:
1. The read programs allocate 2048Mb each with the malloc syscall
2. OS returns an OK on the malloc, it has lots of virtual memory
3. Read programs start to read the files into their respective memory regions
4. OS now gets allot of page faults ( The zero fill on demand ones I think )
5. OS looks in the free page buffer but finds it to be very low.
6. OS requests UBC free up some pages, vm_prewrite_target=4096 -> 32Mb
7. UBC cannot free pages because they are ALL dirty pages need to be committed to disk first. With the IO already getting owned it takes to long.
8. OS now turns to hard swap out all programs that have huge RSZ values, to free up some pages. Even more IO occursâ ¦
9. The read programs get swapped out because they have a big RSZ values

Now to make things even worse, the sleep time for swapped out programs is set to 1 sec and the vm_rss_wakeup_target = vm_rss_block_target = 1039 = 8312Kb!!!! This causes the programs to immediately wake up again only to find that the pages freed for them by the VM got stolen by the programs that are doing the excessive writes. They get put back to sleep again and so it continues.

What sort of kernel parameters for vm are we looking at here to maybe fix this problem. I was thinking a lower vm_maxpercantage of say 43% ( 70% of 20Gb ) with 0% borrow and vm_ubcseqstartpercent = 50%, vm_maxdirtymetadata_pcnt=10%

I only read through the Unix Tru64 documents on hpâ s site this afternoon and Iâ m not 100% sure that my interpretation is right!? I desperately need confirmation before I go tj00ning people that they did something wrong.

Am I completely off or not!? And sorry for the long post :P and if anything, just let me know if kernel parameters like these needs to be fine tuned for every scenario, or better left to the defau

Hamel_2 · ‎04-28-2005

Hello,

Have you read the chapter 4 (Tuning Oracle) in "System Configuration and Tuning" guide ?

http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V51B_HTML/ARH9GCTE/TITLE.HTM

Han Pilmeyer · ‎04-28-2005

You give a lot of detail, but don't really say what kind of system it is or what version of the software you're running. It also not clear whether you are talking about normal Oracle (background) programs or other programs.

My suspicion is that you are completely on the wrong track (assuming that this is just an Oracle database server). Oracle uses Direct IO on V5.1 and newer. This means that the reads and writes from Oracle don't go through the UBC at all.

For an Oracle system you shouldn't have to touch the UBC parameters at all. Assuming that this is a relatively recent version of Oracle.

You are on a NUMA based system. Depending on what system it is (GS160/GS320 versus GS1280) you may need to take some actions to better configure the system for the load.

You keep mentioning "trashing", but you don't say what the symptoms are. Is the system actually paging out?

Werner_31 · ‎04-29-2005

aah thanks for the reply.

The secondary processes I am talking about have nothing to do with oracle. They are programs that gather data and enter them into the database.

The thrashing i refer to is the continuos swapping of these programs ( in particular the programs doing reads only but which have a large RSZ ) in and out of memory.

The reason I am here is because I don't have experience with these sort of parameters. How closely does the UNIX OS follow these rules. The reason I am asking is that say for instanse you have an external storage solution that is not actually on the server but somewhere else, with its own cache etc. If you know Oracle uses its own cache, and the secondary programs are not going to benifit from huge UCB cache, then you obviously dont want the UBC parameters to be set so that it keeps the dirty pages ( resulting from IO writes ) to take up all the systems memory. This could possibly result in IO disk writes only occuring when the page stealer daemon is trying to free up some pages. The problem with this is that there are parameters that throttle IO when the OS prewrites the pages. We rather would want the pages to expedited straight to the external storage systems. Would a small UBC setup achieve this sort of behaviour or not ? And how big should it be? 6Gb maybe? Would that be enough?

Werner_31 · ‎04-29-2005

And yes I read the entire paper. They only assume you would only run Oracle on it. But what if you have programs running with Oracle. This is where I cannot seem to find any advice on.

Michael Schulte zur Sur · ‎04-29-2005

Hi,
is this a new installation, meaning did it ever run better?
Have you checked on the performance of the oracle database?

greetings,

Michael

Werner_31 · ‎04-30-2005

Yes it did run better with 16Gb of memory and oracle set to take 8Gb of memory. THis is not a new installation and I also have no idea what version of the Os is running on it. I do assume however that it is the latest. They did mention upgrading the os a month or 2 ago.

Michael Schulte zur Sur · ‎04-30-2005

Werner,

can you please post
uname -a
dupatch -track -type kit
sysconfig -q ipc
sysconfig -q proc
sysconfig -q vm
?

thanks,

Michael

Hein van den Heuvel · ‎05-01-2005

>> I also have no idea what version of the Os is running on it.

show us:
uname -a
psrinfo -v 0

>> I am here because I we are having some serious performance issues on a Tru64 machine which I donâ t have admit rights to

IMHO you first and foremost need to work with the 'admin'. You can not tackle what seems to be a system problem in isolation.
This needs a 'holistic' approach: DBA + Admin + Application + performance engineer.

If this is a serious problem, then surely the admin and dba are interested to work with you no? If you do not (can not) get them involved in the analysis phase, then what are the chances that they will listen to requests for changes based on that analysis?

Next, some qualification / quantification of 'serious' might help us (and yourself). Is there a response time not being reached? a throughput level? what is the CPU load? System time vs user time? something... anything.

>> Yes it did run better with 16Gb of memory and oracle set to take 8Gb of memory.

Now that is really interesting, and relatively uncommon. It would be kinda interesting, but probably too late, to compare system measurements like vmstat. Maybe the admin has a hystorical 'collect' files, or sar data to be able to compare how it was with how it is?

I like the 'too much vm being dirtied by irrelevant data' suggestion. It should be relatively easy to just reduce ubc-max to say 30% or 40% for a while and monitor vmstat before and after.

>> I am unable to load anything into resident memory at speeds of more than 1Mb/s.

I do not understand that line. Is the production data coming in at that rate? Or is this am independend test program reading from a file? from a socket?

>> fairly large Oracle Dbase in conjuction with concurrent processes feeding the oracle dbase the data

What SQL*net protocol was chosen? TCP? IPC? BEQ/local? The first would be the slowest, the latter the fastest. Still, that has not changed recently no?

What are the IO rates in MB/sec? DB datafile IO? DB redo IO? Those applicition writes? Those application reads?

Probably irrelevant, but I had one 'odd' oracle experience where more memory was hurting when rollback segments were also set too large. For that application the bulk (3/4) of the write IOs were to the RBS tablespace, never to be read back, so basically all wasted. When we reduced the undo files/segments a lot Oracle started to 'WRAP' and re-use segments over and over. With that a much smaller SGA was sufficient, and the IO was drastically reduced.

Alos, in this space, how many concurrent connections to Oracle? Coming and going, or staying? Several processes (un)mapping a 12Gb SGA may be significant overhead if GH (Granularity Hints) is not in place.

Like Han suggests, this is likely to Numa machine. This may or may not be relavant. Is the application trying to deal with that? For example, you might want to start Oracle in selected, adjacent, RADs (cells) to minimize memory latency. (runon -r 1 -r 2 sqlplus ...)

good luck,
Hein

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: Serious kernel vm parameter concerns.

Serious kernel vm parameter concerns.