Operating System - HP-UX
1839295 Members
1765 Online
110138 Solutions
New Discussion

Re: Performance Question???

 
SOLVED
Go to solution
Shaun Aldrich
Frequent Advisor

Performance Question???

I have recently been told by one of our DBA's that the system seems to be working very slow. It is a K570 running HP-UX 10.20.

He had mentioned that it looks like the box is heavily loaded. Is there anything we can do?

I have run the top command and there are four CPU's on the system with an average LOAD of 3.25. There are a number of oracle jobs which constantly run on this system.
Are there any suggestions from the wide range of experts in here?

Thanks for your help...

Shaun Aldrich
SAldrich@chaptersinc.com
19 REPLIES 19
Stefan Farrelly
Honored Contributor

Re: Performance Question???


using sar, whats the wio% (wait-on-I/O%) ? and how much physical memory and free memory do you have ?
Im from Palmerston North, New Zealand, but somehow ended up in London...
Alan Riggs
Honored Contributor

Re: Performance Question???

Tuning is a complex issue. As a beginning, make sure that your kernel parameters are set appropriately.

Oracle produces a set of recommended parameters for HPUX 10.20. Other values should be set according to other usage/load on the server.

A load of 3,25 on a 4 CPU box is not particularly excessive in itself, but that also depends on what is running at the time. Use glance, top, vmstat, sar, swapinfo to look at various system resources and see whether one or more of them is heavily utilized. Standard resource binds are:

disk I/O (sar, glance, iostat)
cpu (top, sar, vmstat)
swapspace (swapinfo, glance)
memory (glance, sar)
Mark Mitchell
Trusted Contributor

Re: Performance Question???

Do you have Glance, there is a 30 day eval on the 10.20 CD's. Basically you need more information. You need to see which processes are running the hardest. How the HardDisk IO through put it doing, or if you are processor bound. I would start there then look at how users are affecting the system too.
Rick Garland
Honored Contributor

Re: Performance Question???

An item I have seen slow down oracle boxes is the presence of defaunct shells (sh).
The oracle connection has been discountinued but the sh processes hang out there.
James R. Ferguson
Acclaimed Contributor

Re: Performance Question???

Shaun:

If you have GLANCE look at the global (the bar graphs at the top give a good summary), CPU, and I/O metrics on the detailed screens. A load of 3.5 if throughput (response time)is poor is getting high.

...JRF...
Anthony deRito
Respected Contributor

Re: Performance Question???

Shaun, take a look at this problem from the application angle. Find out from top what processes are acually doing all the work. Use Glance and check out the wait states that these process are consuming. Look for wait states like priority, IPC time delays, local network trafic.

Is your database using the local UNIX domain protocol or the high overhead TCP/IP to do its local database access? (This is a common problem with local database access.)

Take a look at your buffer cache hit rates if you are doing filesystem I/O. Use sar -b for this.

How much time is being spent in user mode vs kernel mode? Check this with sar -u.

Also, check the %wio parameter from sar -u. This may lead you to an I/O bottleneck.

Tony
Antoanetta Naghiu
Esteemed Contributor

Re: Performance Question???

1. Unix Kernel params.
2. performance tools.
3. Get "Oracle and UNIX Performance Tuning" by Ahmed Alomari, Format: Paperback, 400pp.
ISBN: 0138491674 Publisher: Prentice Hall Edition Desc: BK&CD ROM

CHRIS_ANORUO
Honored Contributor

Re: Performance Question???

Hi Shaun,
The documents in the link will help you determine where the system slowness or system bottleneck is coming from, if you don't have glance+.
http://europe-support2.external.hp.com/cki/bin/doc.pl/sid=21485163197106e47a/screen=ckiSearchResults
This doc is S3100002312A, you can get S3100002312B and S3100002312C from the same site by using the Doc Id as the search string.
When We Seek To Discover The Best In Others, We Somehow Bring Out The Best In Ourselves.
Shaun Aldrich
Frequent Advisor

Re: Performance Question???

 
Alan Riggs
Honored Contributor

Re: Performance Question???

Well, your read cache hit rate is low, but the cpu usage seems to be more of a culprit. You have 0% idle time in top and for significant periods in teh sar history. Your top cpu consumers are all oracle processes. It looks to me like database tuning is in order. You may also need to educate end users on how to minimize the impact of their SQL. I have often found that much can be accomplished simply by training users to utilize indexed fields properly and eliminate the largest data set first when making compound searches. It may also be that your DBA needs to run stats against his databases and add an index or two. Finally, do check your kernel parameters to make sure they meet orale recommendations.
Stefan Farrelly
Honored Contributor
Solution

Re: Performance Question???


There looks to be 2 issues here. Alan is correct that one is lack of free cpu time (idle) from 9:40 till 13:40 someone/something was maxing out all the cpu's (idle time=0). You need to find out who/what it was and find out what they were running. Perhaps it was a runaway process?, or some number crunching report - seeing as wio% at these times was around 0 which means almost no i/o was taking place.

Also, your wio% is too high also. Before and after these times, even though you have some idle cputime, your wio% is running very high - anything greater than 5% isnt good. What kind of disk subsystem do you have ? do you use striping ? some sort of cached disk subsystem ? looks like improvements can be made here also. On any database server there is always i/o improvements that can be made.

Im from Palmerston North, New Zealand, but somehow ended up in London...
Anthony deRito
Respected Contributor

Re: Performance Question???

Shaun, you have process that are running in "NICE" processor state. Did someone change the default priority of these processes? This is evident by several paramaeters I noticed. Take a look at your top output. Check out the NICE column at the top. Generally this column should read 0.0% unless there are processes running that were "niced". Also check out the NICE value for the oraclertkp processes. They are at NICE value 35 which puts them at a low priority compared to other processes. The pritority seems to be 255 which is the lowest you can get for user processes. I do not see any other problem. Ask the DBAs if they have tampered with the priority of the processes.

Tony
CHRIS_ANORUO
Honored Contributor

Re: Performance Question???

Hi Shaun,

If you read through those documents that I mentioned earlier, the will guide you to understand the bottlenecks your sstem is experiencing.
Alan, Stefan and most espercially Anthony hit the nail on the head.
Like Tony pointed out the NICE values should be 20 with priority range of 154 to 158. Ask your DBA to renice the processes owned by users oracle and rtkprd65.With sar -b, %rcache and %wcache are okay, averaging 86% and 83%. For you disk subsystem use sar -d.

Cheers!
When We Seek To Discover The Best In Others, We Somehow Bring Out The Best In Ourselves.
Rhonda Thorne
Frequent Advisor

Re: Performance Question???

What is the db_max_prct and db_min_orct kernel change? are you using this param or using bufpages?

Also, how many log/DB writers does the DBA have configured? only one may be too small and part of the cause of I/O waits. everyone has to wait for the DB writer to come back to the SGA before attending to another oracle process.

Also, if you are running oracle apps, the priorities set within the app for recurring jobs may cause some jobs to take total controll over the CPU and put others on wait.

Rhonda
Sharing my knowledge of UNIX flavors
Shaun Aldrich
Frequent Advisor

Re: Performance Question???

Thanks for all your responses...

Here is some information you requested.

db_max_pct = 20 (in our case this translates to ~300MB)
db_min_pct = 5 (translates to ~75MB on Chap3)

What is the suggested level we should renice the oracle and rtkprd65 procs?

Can you clarify further for us on the real impact for the below if this were to happen?

If are are running oracle apps, the priorities set within the app for recurring jobs may cause some jobs to take total controll over the CPU and put others on wait.

What is the highest priority that we can renice a process to run ( for example 15 as opposed to the default 20) without it interfering with system processes which run at a certain nice value?

Any help is appreciated.

Shaun Aldrich
SAldrich@chaptersinc.com


Alan Riggs
Honored Contributor

Re: Performance Question???

Be very careful renicing processes. You never (well, almost never) want to give any processes, particularly database processes, more priority than system processes. In general, renicing should be done to lower the priority of less important processes.

Now, in your own case hwat is the driver behind renicing the oracle processes in the first place? Lowering the nice value means individual processes are going to be more likely to yield CPU access to other processes. This applies only to access from the run queue, not to processes waiting on I/O (often a database bottleneck). So, lowering the priority (increasing nice value) of oracle processes means each individual process is likely to take longer to complete. If you run in an interactive environment, this is likely to lead to greater number of processes running concurrently == higher CPU load and more time lost to process switching.

It is hard to say mre without some benchmarks from your particular environment and without knowing what else runs on this server.

As for cache. The values you are using are fine for a generic environment. It may be possible to tune them, however. Does your database get more read or more write access? Oracle does its own caching--how much space do you have set aside for that?
Anthony deRito
Respected Contributor

Re: Performance Question???

Shaun, I think you need to understand the implication of changing process priority.

It has been determined by your posts that the application or some physical person has tampered with the priority of the Oracle processes. Have you verified this to be true?? If the answer is yes, the next question to ask is why??

When a process is put into the runnable state, it has 1/10 of a second to run. This is because your processes are on the time-share run queue - (unlike real time processes which always have a very high priority.) After it runs for 1/10 of a sec, it then has to wait for its next turn in the run queue.

The determination of wether the process will run again during its next turn is based on its priority and the priorities of other jobs on the run queue. When you decrease the priority of these processes (using the nice command) the kernel will sacrifice your job for jobs of higher priority and you job will not get its chance to run. This is called a "Context Switch". A context switch is the CPU dispatcher changing from one running process to another.

I bet you will see a high number of "Context Switches" occuring on your system. You need GlancePlus to see this.

The question is still, why would you want to decrease the priority of your processes?

Tony
Alan Riggs
Honored Contributor

Re: Performance Question???

Be very careful renicing processes. You never (well, almost never) want to give any processes, particularly database processes, more priority than system processes. In general, renicing should be done to lower the priority of less important processes.

Now, in your own case hwat is the driver behind renicing the oracle processes in the first place? Lowering the nice value means individual processes are going to be more likely to yield CPU access to other processes. This applies only to access from the run queue, not to processes waiting on I/O (often a database bottleneck). So, lowering the priority (increasing nice value) of oracle processes means each individual process is likely to take longer to complete. If you run in an interactive environment, this is likely to lead to greater number of processes running concurrently == higher CPU load and more time lost to process switching.

It is hard to say mre without some benchmarks from your particular environment and without knowing what else runs on this server.

As for cache. The values you are using are fine for a generic environment. It may be possible to tune them, however. Does your database get more read or more write access? Oracle does its own caching--how much space do you have set aside for that?
john strumila
Occasional Advisor

Re: Performance Question???

howdy gurus,

I'm new to unix but I've been doing performance for many years. I saw a statement which looks wrong to me so I thought I'd get educated.

A previous post said: "...seeing as wio% at these times was around 0 which means almost no i/o was taking place." Is this wrong?

My understanding is that wio implies the processor was idle while waiting for the interrupt. If usr & sys are very high (nearly 100) then i/o may indeed be occuring but wio isn't incremented.

Only if user/sys were low and wio was 0 could you say no i/o was taking place.

thanks