Re: D380 Performance Issue

Bejoy C Alias · ‎10-18-2005

We have one D380 server with 256 MB RAM / 512 MB Swap . For the last 20 days we r facing some performance pblms in this server . This server is running oracle database and the same server is used for emails . Nowadays the load average of the server is 15 to 19 most of the time because of which the mails are getting rejected. I tried increasing the sendmail's refusing load average but it is again increasing the load to above 19 .
swapinfo -atm
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 512 28 484 5% 0 - 1 /dev/vg00/lvol2
reserve - 158 -158
memory 184 83 101 45%
total 696 269 427 39% - 0 -

No of users at normal working hours - ~25
No of proceses - ~300

Anything else should be checked ??

Be Always Joy ......

RAC_1 · ‎10-18-2005

Load average in itself is not the correct and reliable way to decide on "how system is performing"

The important is "priority job queue"
You can check it with glance/gpm

How about following values?

vmstat 1 4
(Check po columns. Are you swapping??)
mailq -d -v

Are mails getting queued on account of something??

glance. What bottleneck do you observe??

There is no substitute to HARDWORK

Arunvijai_4 · ‎10-18-2005

If you want to measure performance of a system, take a look at this link from HP DSPP,

http://h21007.www2.hp.com/dspp/dld/dld_DownloadsListingPage_IDX/1,2381,11169,00.html

-Arun

"A ship in the harbor is safe, but that is not what ships are built for"

Steven E. Protter · ‎10-18-2005

That server has barely enough memory to boot the OS, let alone run the oracle application. If you run vmstat you will probably see a lot of swap paging as various processes get swapped in and out of main memory.

sar -w and sar -d will respectively show similar data with the later proably showing that your system has high i/o wait times due to the swapping.

I assume you've thought of replacing the server, but if you must use this lovable old workhorse, it needs to be upgraded to at least 1 GB of RAM.

system perf util:
http://www.hpux.ws/system.perf.sh

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Bejoy C Alias · ‎10-18-2005

We dont have glance in this server .
po values in vmstat is 0 .

What is the normal no of processes supported for better performance on a oracle database loaded d380 server with 256 mb ram / 512 swap . We r having another d380 with 200 processes and 20 users with oracle database wihout any performance issues .

Be Always Joy ......

Jayasuntar · ‎10-18-2005

Dear Bejoy,

Please check for unnecessary subsystem, services loaded on the system. Since U have one more system ruunning, U can check the services and subsystem required for the operation.

This will release lot of memory for use.

Cehck the kernal parameters on both the system.

Regards

jay

Steven E. Protter · ‎10-18-2005

Whats different about the two servers.

How oracle performs depends a lot on the nature of the oracle application. If there is a lot of bad sql then it will bring even a mighty superdome down to its knees.

metalink.oracle.com has documentation for the database that if followed should get you good oracle performance.

Some reading for you:
http://www2.itrc.hp.com/service/james/dispDoc.do?docURL=http%3A%2F%2Fsearch.hp.com%2Fredirect.html%3Furl%3Dhttp%253A%2F%2Fforums1.itrc.hp.com%2Fservice%2Fforums%2Fbizsupport%2Fquestionanswer.do%253FthreadId%253D964527%26qt%3D%252Boracle%2B%252Bperformance%2B%26hit%3D1&aid=SEARCH_FORUMS&pil=1&serStr=oracle+performance&pir=1

http://www2.itrc.hp.com/service/iv/node.do?node=prodITRC%2FWW_Start%2FN1%7C20%7C9

http://www2.itrc.hp.com/service/james/dispDoc.do?docURL=%2Fservice%2Fcki%2FdocDisplay.do%3FdocLocale%3Den_US%26docId%3D200000080087007&aid=SEARCH_CKI&pil=6&serStr=oracle+performance

SEP

Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com

Bill Hassell · ‎10-19-2005

If you see sendmail running a lot, you'll simply have to move mail to another server. If you do not have a spam filter appliance between your server and your D380, you may be getting dozens of junk emails per second and this will definitely affect everyone on the system. Since vmstat doesn't report any swapping (po=0) then your memory is adequate. And the only difference between this server and the other one running Oracle is email, that is likely the problem.

You can verify this by turning off sendmail for a while. If performance is normal for the database, you'll have to get another server for email, or better yet, a front-end appliance between the Internet and your mail server to filter all the junk.

Bill Hassell, sysadmin

Kent Ostby · ‎10-19-2005

Bejoy --

What size RAM do you have on your other D380?

"Well, actually, she is a rocket scientist" -- Steve Martin in "Roxanne"

TwoProc · ‎10-19-2005

I tend to agree with Bill, that if this email server is getting mail which isn't spam filtered, then I would think that it is your problem. This type of load is growing and growing on everyone's servers nowadays, and is not insignifigant to process.

Since you've got another similar server running Oracle with more users, but without an email server on it, I'd say the big delta between the two is the email services.

I also agree with Steven that you need to up the memory on this. Yeah, it may be running OK on the other box, but if you put more ram in the box, and increased the shared pool, and the buffer_cache (db_cache_size for pre-9.x) you'd probably see a HUGE performance increases in the box. All for the price of $300-$500 in used ram from a used HP dealer. This is because you've GOT to be I/O bound on both of these boxes (unless your application has only 30 small tables that you regularly purge).

We are the people our parents warned us about --Jimmy Buffett

Bejoy C Alias · ‎10-20-2005

This server is for internal email use only and is not connected to the external servers , so spam mails may not be the pblm . I think there may be some pblms with the oracle applications which is making the load high and also the memory may be a bottleneck . So iam going to increase the sendmail's refuse load average to 30 as the load average is currently 16 - 19 all the time .let me see whether this is increasing the load average .

Be Always Joy ......

Bill Hassell · ‎10-21-2005

The load average definitely needs some checking. 256 megs is far too small but if vmstat doesn't show a lot of po (page outs), then additonal memory won't help. What is happening with the high load is that there are many more processes eligible to run but not enough processors. If the system seems to be running OK, then the processes are very short-lived, that is, they run for a very short time and then go back to a wait state. This is not normally a good thing so I would investigate what's happening. The OS is changing programs far too fast for you to see so you may have to run ps into a file to see what may be happening.

Bill Hassell, sysadmin

TwoProc · ‎10-21-2005

Bill, that's a good point. What about context switch rates, or better yet, forced Cswitch rates? Are they high?

We are the people our parents warned us about --Jimmy Buffett

Bill Hassell · ‎10-21-2005

Context switching is a fancy term that defines what happens when the OS needs to switch to another program, either to timeshare among CPU-bound programs, or that the current program has gone into a wait state and it's time to resume another program. It is unimportant what the numbers are except that a very high context switch rate means that there is a lot for the OS to do. You don't fix the context switch rate, you analyze the reason why it is so high - it may be absolutely normal because you are running several dozen programs that are polling certain conditions and the poll takes only a few dozen microseconds then the program goes into a short sleep.

The reason that a high context switch rate was mentioned is that your load is so high. The load (from top, uptime, etc) is thye runqueue depth. The runqueue is the quantity of programs that are running (one per CPU) and those that could run immediately if there was another CPU. If your D380 has 2 processors, then a load factor of 2 means both CPUs are busy. A runqueue of 4 means that on average, 2 programs are running and two are ready to run during the measurement period, These 4 programs could be polling at the rate of 50 times per second and after a poll, they go into a short wait. That would mean 50 x 4 or 200 context switches per second which might be considered to be high--unless this is what you have designed the programs to do. And then it is normal.

If you have Glance, you'll at least have a chance to see heavy context switching (and probably high system overhead) and can figure out which programs are causing this load. If you're running Java and the code is setup for multi-threading, you might see a high load and context switch rate.

Bill Hassell, sysadmin

Bejoy C Alias · ‎10-23-2005

After incresing the sendmail's refuse load average there is no increase in load average and sendmail started working without any pblm . But load average is still at 16-19 . What should be done next.
# vmstat -S 1 10
procs memory page
faults cpu
r b w avm free si so pi po fr de sr in
sy cs us sy id
0 246 0 16216 1352 109373 109437 2 0 0 0 2
186 1432 170 48 45 7
0 246 0 16216 1322 0 0 3 0 0 0 0 229
386 98 0 1 99
0 246 0 16216 1319 5 5 3 0 0 0 0 216
334 91 0 0 100
0 246 0 16216 1319 0 0 2 0 0 0 0 211
289 85 0 0 100
0 246 0 16216 1319 0 0 2 0 0 0 0 205
248 81 0 0 100
1 246 0 16190 1318 0 0 2 0 0 0 0 202
225 81 5 2 93
1 246 0 16190 1203 0 0 4 0 0 0 0 216
302 96 18 2 80
1 246 0 16190 1302 5 5 4 0 0 0 0 250
538 107 0 0 100
1 246 0 16190 1302 0 0 3 0 0 0 0 237
447 98 3 3 94

Be Always Joy ......

Tim Sanko · ‎10-24-2005

ADD RAM.

eliminate buffer cache issue 1% min 1% max

Tim

TwoProc · ‎10-24-2005

Give us some data from sar at the peak period.

sar -d 15 5

We are the people our parents warned us about --Jimmy Buffett

Bejoy C Alias · ‎10-24-2005

# sar -d 15 5

HP-UX unix B.10.20 A 9000/810 10/25/105

11:01:15 device %busy avque r+w/s blks/s avwait avserv
11:01:30 c0t5d0 23.87 0.52 24 441 4.85 23.55
c0t8d0 2.67 0.50 13 206 5.61 4.79
c0t9d0 5.13 0.50 6 89 4.74 10.03
11:01:45 c0t5d0 18.13 1.89 21 475 14.93 28.44
c0t8d0 2.07 0.50 6 96 4.96 6.99
c0t9d0 3.40 0.50 4 59 3.70 10.33
11:02:00 c0t5d0 10.40 1.27 13 181 10.00 17.30
c0t8d0 4.13 0.50 22 351 4.58 3.59
c0t9d0 3.53 0.50 4 69 5.02 9.10
11:02:15 c0t5d0 9.13 5.83 11 132 13.37 13.84
c0t8d0 11.07 0.50 50 795 5.04 4.19
c0t9d0 1.80 0.50 1 23 5.52 11.49
11:02:30 c0t5d0 12.47 0.76 14 222 17.26 24.26
c0t8d0 2.00 0.50 2 36 5.24 8.42
c0t9d0 0.60 0.50 1 9 2.84 12.62

Average c0t5d0 14.80 1.72 17 290 11.44 22.63
Average c0t8d0 4.39 0.50 19 297 5.01 4.43
Average c0t9d0 2.89 0.50 3 50 4.58 10.07

current dbc_max_pct = 50 , dbc_min_pct = 5 . whether i should change this . we r planning to add more memory also .

Be Always Joy ......

TwoProc · ‎10-25-2005

Well, as far as I/O is concerned, you really had only one period in that minute or so of peak time that you were a little bottlenecked on disk i/o - but the backlog cleared.

Yes, your max_dbc is set too high. You need to bring that down. I think you were using the OS I/O system for buffering instead of letting Oracle do it for you.

Bring this down to 10% and increase your sga size(buffer_cache) by the same amount of memory that this memory represents.

You may need some tuning on your database.

Can you load Oracle statspack into the database (with hourly snapshots) and see what's consuming your oracle time? This tool will identify which pieces of code (and connections) are running away from you, and therefore need to be tuned.

Since you've got another system that runs well and this one doesn't - are they running the same(basically the same) data? code?
One or the other should be different enough to explain the problem.

We are the people our parents warned us about --Jimmy Buffett

Bejoy C Alias · ‎10-28-2005

I will change the dbc % and see what happens .
Both the servers are running totally different applications , so i think there is no meaning in comparing the servers .
and I know only some basics in oracle so i can't do any dba jobs in the server as it is an old ( 7.1.4 Oracle Database ) "working" server . The main pblm which the users were complaining about was the email pblm , So we shifted the email setup to the second server which has less load .
We dont have any plan to upgrade the server as the applications in this server will be shifted to our head office server ( rp5470 ) one by one in 2 -3 months and will discard this old one .
As the main issue ( email ) is solved i think this thread should be closed . I will try all the options (except oracle :) ) u people suggested and will see for any performance increase .
Thanks to all for ur valuable suggestions ....

Be Always Joy ......

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: D380 Performance Issue

D380 Performance Issue