System Administration
cancel
Showing results for 
Search instead for 
Did you mean: 

Performance Issue on Tru64 Unix

Bobcat_1
Advisor

Performance Issue on Tru64 Unix

H/W -Alpha GS320 with hard partition of 4 QBB.
OS- Tru64 Unix v5.1 PK5.
Memory - 32GB
#CPU - 20
Storage - SAN Storage on HSG80s'

Pattern observed :-

Slow performance impacting batch jobs and sometimes impacting interactive users as well.
"ps -ef" cmd during this time, system returns output after 5 to 6 minutes, sometimes even longer.
vmstat output shows;

50% to 65% of cpu time is in user mode, 25% in kernel mode and about 10% to 15% in idle state.
"ps -ef |wc -l" returns with about 2000 to 3000 processes .
Top 10 process occupying CPU time are Oracle processes.

Collect data are captured. What is very evident is the cpu runq is about 15 to 30 when the system is experincing slow behaviour.

San Storage I/O statics are not seen yet.

I know with above data, it would be difficult to pin-point the exact cause but would appreciate any help .

Thanks.
7 REPLIES
Hein van den Heuvel
Honored Contributor

Re: Performance Issue on Tru64 Unix

Really slow perfomance often seems to come from swapping being active.
How is the memory situation?
What is the per-Rad picture?
Could one of the rads (qbb's) be low on memory?
Tru64 'mostly' does the right thing, but there have been some glitches.
Commands to use:
vmstat -R 10 10
vmstat -P

If there may be memory concerns, then be sure to check several prior discussion here.
Google for: +tru64 +numa +site:itrc.hp.com

Check for topics by Alexey, Joerg and Florent.
Some of those discussion were 'unpleasant', but valuable information was exchanged.


http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=859125
http://forums1.itrc.hp.com/service/forums/questionanswer.do?threadId=667865


hth,
Hein.

Bobcat_1
Advisor

Re: Performance Issue on Tru64 Unix

I can't see memory as an resource issue here.

However, pls see attachment on some more information on the current behaviour of the system.

Thx
Hein van den Heuvel
Honored Contributor

Re: Performance Issue on Tru64 Unix

Agreed, until otherwise indicated there seems to be enough memory in each rad.
There does seem to be some 'pout' activity, but not too bad.

Pretty high system call rate on some Rads.
Do you have Oracle statistics set to more than basic? ('Typical'?). What effect does changing that have?
If you have oracle system stats going, then did you make a timedev to aleviate the gettimeofday costs?

That system time seems even higher than you warned for. Time to make a profile (dcpi, kprofile) to get more visibilty on exactly where the system time is spend (everywhere? locks? network stack? memory management? Scheduling ? )

The memory/system config is a little 'odd': 4*6+8
The GS320 really benefits from maximum interleaved memory.
At the risk of pissing of your system designers you may want to consider scaling back to 16 processors in 4 QBBs each with 8GB fully interleaved.
- there appears to be enough idle time,
- the Oracle license could be cheaper
- the local vs global memory access rate will be marginally better, and with that the average latency.

Is this customer/system in Malaisia? I haven't been there in years. Wouldn't mind a good excuse (work) to get back there :-).
(family in Miri on Sarawak and KL).

Cheers,
Hein.
Bobcat_1
Advisor

Re: Performance Issue on Tru64 Unix

Thanks.

I made an error on the configuration.

System is on 5 QBB with 32GB memory and with interleaving set as in the attachment.
Unix v5.1 with PK6.

Oracle statistics has been set to basic.

Besides kernel profiling, is there any other areas that may need to focus on ? and

yes this is frm Malaysia, KL.


Archunan Muthiah
Honored Contributor

Re: Performance Issue on Tru64 Unix

Bobcat,

As you said top 10 processes eating system resources are oracle process, I doubt there may be so many oracle related issues involved,

Apart from making sure all the queries are tuned well, starting from Tu64 5.1, as oralce starts using direct IO (avoiding system caching), most of the poorly written sql queries will be ended up with IO wait state. So I would like to make sure you have large enough SGA DB_BLOCK_BUFFERS, try increase this parameter to 5 to 10 times.

The trial you can do is disable the direct
I/O usage by Oracle RDBMS using TRU64_DIRECTIO_DISABLE parameter.

Archunan


God is Artist, we are all just brushes
Bobcat_1
Advisor

Re: Performance Issue on Tru64 Unix

Thanks.

Attached is some information on Oracle parameters. Does it help ?
Hein van den Heuvel
Honored Contributor

Re: Performance Issue on Tru64 Unix

>>> Apart from making sure all the queries are tuned well

How would bad queries increase system time?
They result in excessive IO and user time, but are rarely directly responsible for system time. They may cause some extra context switching of course.

>>, starting from Tu64 5.1, as oralce starts using direct IO (avoiding system caching)

So what? That reduces system time.

>> most of the poorly written sql queries will be ended up with IO wait state.

Of which there is no evidence.

>> So I would like to make sure you have large enough SGA DB_BLOCK_BUFFERS, try increase this parameter to 5 to 10 times.

How can you suggest a 5x increase with no information to go on? Not usage data, No current settings? As we now see, there is 2.8GB allocated. Do you really want to try 14GB without a lot more home work?
Furthermore, the VMSSTAT -P output shows GH is being used. That's cool (hot!), but means one should be extra careful turning SGA knobs.

>> The trial you can do is disable the direct
I/O usage by Oracle RDBMS using TRU64_DIRECTIO_DISABLE parameter.

Please help me understand how that may help.

>> Attached is some information on Oracle parameters. Does it help ?

It does raise a little question on GH sizing.
There is 5*1536MB = 7.5GB GH set aside.
The SGA you show uses less than 1/2.
Is that the only usage on this box?
If so, you could either increase the SGA a lot with no physical memory cost, or you could reduce the GH allocation, reboot and get some more fluid pages back for general purpose usage.
Still, as i argued earlier in this reply, why start turning knobs without a clear indication?

I think that the amount of system time in your case is sufficiently far away from the norm that it needs a detailed explanation (profile). Maybe context switches, more likely a network stack issue.
Is the application network intense?

Have you installed (and run) statspack?

What are the top Oracle wait-event?

Good luck,
Hein.