Re: hpux 11.11 on rp7410, oracle, disk i/o performance

John Kittel · ‎10-12-2007

We are running oracle 9i on hp-uc 11.11. Oracle data is on 7 pairs of mirrored disks in 2 fiber attached hp 2405 disk enclosures. 18GB of memory. 4 cpus. dbc_max_pct = 8, dbc_min_pct = 4. This is one node in a serviceguard active-active cluster. We have had increasing performance problems, i.e. slow application response, over the past year or more, as the size of the db has increased and additional application programs have been implemented.

The system seems to me to be i/o bound. For example, using glance, with 12 second scan interval, 4 of the disks consistently show 1 to 2 i/o requests pending in greater than 20% of the intervals. There is no paging out going on. cpu utilization (normalized for 4 cpus) is usually between 25% and 75%. I have been trying to deal with this (i/o bottleneck)primarily by identifying specific application programs for the application group to optimize the db queries, and starting to see some success there.

Our DBA has contacted a consultant who has looked at the system and is recommending some tuning. Below is posted the contents of an email from the consultant.

Can you help me by commenting on the consultants recommendations? Which recomendations sound good, bad, need further study? What other information do I need in order to help evaluate this advice? etc.

I am not a highly experienced hpux system administrator. I am somewhat familiar with many of the items mentioned by the consultant, mostly through reading threads in this forum, but I don't have a lot of practical experience with it. (...but application on other node uses 5-7GB of memory, so I think all of what consultant calls the "free" 10gb memory on the oracle system is not available for oracle, must be reserved for application to run if necessary)

-----------------------------------------

cosultant wrote:

First and foremost we will need to do some kernel tuning to the system to the SHMMAX parameter. Currently it is only 2 gig so you will not be able to make use of the memory on the server for the SGA unless this is changed. At version 11.11 this parameter is dynamic meaning we can may make it larger and we won’t have to reboot the system.

At present there is 10.2 gigs of RAM that is free on the server that we are not taking advantage of at this time. I would recommend setting SHMMAX as high as 6 or 8 gig in order to take advantage of the memory on the server for usage within the Oracle RDBMS SGA. This would still leave 2 to 4 gigs of RAM free for usage by the OS or other processes on the server. The current SGA size is only 2 gig which is as large as the current setting of SHMMAX.

The system does not look CPU bound it appears to be I/O bound. The blocked processes column bears this out. There are many processes waiting to perform I/O. A look at the I/O performed on the database servers shows that much of the I/O is physical I/O as opposed to logical I/O. Hence the disks are being hit very hard.

If we are able to leverage the memory and hold more blocks in the database block buffer cache we could perform more logical I/O. On an HPUX system using veritas filesystems we also have another option.

If this server is hooked up to a large enterprise class array we could think about using direct I/O, what is required is a large cache on the array. If this is a smaller HP array then we may not have that option but could still test it out in dev and or qa? By changing the veritas mount point options to use convosync=-direct and mincache=direct. The vxfs filesystem will bypass the filesystem buffer cache and perform I/O directly between the arrays cache and the oracle SGA hence alleviating the double buffering curse.

What I would first like to do is change SHMMAX and then run through making the SGA larger and using quite a few parameter changes. First and foremost this HPUX server used cooked filesystems so it can only use synchronous I/O. It cannot use asynchronous I/O. Hence on HPUX you set disk_asynch_io=false filesystemio_options=SETALL so you can use any type of I/O option. This necessitates using multiple db_writer_processes, typically I start with 4 and work my way up. Since this system has 4 cpus let’s start with 4 for now.

HPUX systems do have a setting in the kernel fs_async which you currently have set to 0 which is the default meaning it is turned off. You can change this to 1 to turn it on but only when using a specialized /dev/async device with raw volumes not cooked filesystems.

John Kittel · ‎10-12-2007

sorry, forgot to mention, it is late Friday here, and I am heading home. So busy. This email just came in over the transom from consultant, so I wanted to post asap. But I am going home now and probably not be back 'til Monday to read and respond to replies. Thank you in advance for your expert advice.

- John

Duncan Edmonstone · ‎10-12-2007

John,

A few points:

1. The size of the Oracle SGA hasn't been limited by the size of a shared memory segment (SHMMAX) since Oracle 8i, if Oracle wants a bigger SGA then it will simply use more than one shared memory segment. Of course its more efficient to use just one segment so you might as well go ahead and up this, then have your DBA up the size of his SGA.

2. You aren't hooked up to a large enterprise array - just a JBOD.

3. The VxFS filesystem options mentioned will only work if you have Online JFS installed. Check using the swlist command. Even if it is installed I wouldn't expect setting these options to make more than 5-1-% difference at the absolute most.

4. Using multiple DBWR processes *might* help some, but I would simply add 1 to the number you currently have for starters - if that makes no difference, then adding more isn't going to help either.

5. If neither you or your DBA are that experienced I wouldn't look at using raw volumes at this point.

6. You don't tell us much about how your disks are laid out - 7 pairs of mirrored disks, but are you also striping across those disks as well (probably using extent based striping)? An output of vgdisplay -v for the volume group holiding the database filesystems would be interesting.

7. Some simple proof of which disks are busy would also help - the output of sar -d 10 10 during a period when the system is under-performing would be interesting. From that you may be able to determine busy disks and from that if you DBA knows how to look at IO stats within Oracle to busy datafiles/tablespaces/tables and try moving some of them to less busy disks...

Some food for thought...

HTH

Duncan

I am an HPE Employee

Hein van den Heuvel · ‎10-13-2007

The (part of) the recommendations you shared is not bad, but not great either.

Judging by your comments, the current parameter settings, and the unforgiveness of direct attach storage (no controller cache to even out the bumps) this application and its management ( DBA + SA ) is NOT ready to deploy Async IO.
Let's focus on getting the basics right and then worry (or not) about eeking out the final few percents usnig RAW storage to some extent (sic).

2 GB SGA for an 18 GB might be optimal, but more likely is too small.... specially if there is no significant outside buffer (DBC)

Indeed, those 10GB free should be allowed to be put to good use, notably for file system cache, but the someone erroroneously chose to to constraint the DBC to just 8% = 1.5GB.

The DBC min=4 and max=8 is a common recommendation, and often a good one, but it is a lousy choice with the current SGA setting and availabel free memory.

Without the benefit of Oracle stats to check I would say double the SGA to 4GB and set DBC_MAX to 20 or so and try again.

I applaud you for the focus on application query tuning to impact the IO load. Excellent!

I applaud you and the DBA for the system tuning review and getting (outside) help to do so,

I frown at the lack of Oracle stats interpretations.
Oracle knows what it is doing, Oracle knows where it hurts and it perfectly happy to share that with you. Exploit that!
Run STATS_PACK or whatever perfromance tools you have for oracle. Look at the top wait events! log_file_sync? read IO?
Look at the SGA recommendation table! What impact does oracle suggest would a 50% or 100% SGA increase have?
Did you give it enough shared SQL cache to avoid hard parsing?

Hope this helps some already.

Feel free to contact me offline if you think I can help further,

Regards,
Hein van den Heuvel ( at gmail dot com )
HvdH Performance Consulting

TwoProc · ‎10-15-2007

I agree that you need allocate more to the SGA,
allocating more to the "db_cache_size" parameter. You and the Tuning guy are OK with this.

He is wrong on the fact that SHMMAX is going to stop you from getting a bigger sga, it's just going to make the SGA cut itself up into pieces, so a 10G SGA would show itself in 5 2G segments, but it wouldn't be broken. Since a single segment is more efficient go ahead and do it. However, until you get past 6 or 7 segments you probably wouldn't notice a degradation, even a measurable one running with shmmax set at 2G, with a 10G SGA.

Also, I've found that in Oracle 9.5 on HPUX, you get much better total rhroughput turning off PGA_aggregate targets (about 50% more throughput). However, in Oracle 9.4, this was not the case, so pay attention to the version you're on.

I would not increase dbc_max. Put your efforts into buffering at the Oracle level. The suggestions re: convosync won't work unless you've got online jfs - AND, really won't get you anywhere on 9i unless you've got your multiblock_read_count set higher than 32, and almost no-one does unless the database is a data warehouse application.

Re: buffering at the frame on an array, well you don't have an array... so don't bother with that.

Re: setting fs_async... you're not doing anything with raw files... so don't bother with that.

What I'm most surprised about is that that the consultant didn't mentioning tuning the running code. In most cases, this is your main problem. While you've got the ram and clearly you should take advantage of it to get some of this I/O off of the scsi busses and cached up in the Oracle Engine - this should be coupled with an effort to identify the worst offending code, and tuning it.

Using statspack, as well as just watching the database and looking at the current queries with the longest running active code, checking the current queries out, and looking at the execution plan costs (this is easy with Toad) will go a long toward identifying your problems. Use this data to recommend code changes (unnecessary column data transformations are common like the use of NVL() even when not needed), creating missing indexes, or when necessary functional indexes, questioning code complexity, creating histograms when possible, and overall review of query methodology and design which yield better run times.

Naturally, tuning takes longer, but does the most to help these systems retain their value, and makes for happier end users. Keep in mind that an RP7410 can go all the way up to 32G, and used HP certified ram on an old server like that is VERY inexpensive, so don't be afraid of using even large buffer caches if the data shows you can still get more I/O off the disks.

Make sure to analyze your code in the shared pool, and see that you pin code to the shared pool that has high reload counts. It is best to script this as a part of your database startup scripts just after starting the database.

We are the people our parents warned us about --Jimmy Buffett

John Kittel · ‎10-15-2007

Thank you for your responses. It will be a while for me to consider these recomendations, provide more information about my environment, think of additional questions, assign points.

- John

Rita C Workman · ‎10-15-2007

May I just pipe in with this thought:

Your problem, you say, is I/O - so drop back to basics and let's look at the disk. Ask yourself:

Oracle:
Just where are the Oracle overhead logfiles, archive files, exports, etc. writing out to.
>>Sometimes DBA's get in the bad habit of putting high I/O activity on the same lousy disk. Make sure your Oracle files are positioned so high Oracle overhead writes don't impact data writes. Make sure your Oracle log files are not all positioned on same disk....you will thrash, and things will slow down.

Disk:
Next-Check on your dbf files, and where folks just drop files. Ask DBA - which dbf files get hit the hardest ? Ask yourself which disk are these dbf files on ? Adjust accordingly. See if your users are doing alot of writes on drives that contain logfiles or dbf files - and make them write elsewhere if they are. You may have to move things around, add a couple disk to spread out the load.

Disk2:
Are you mirrored ? What kind of disk do you have ? {I didn't catch that in your comments}. If you have an array, is there any chance of getting something so you can maybe round-robin (or better) the writes on disk and use both hba paths to double your I/O performance. We have an array with EMC/Powerpath - the performance increase was remarkable. Something similar may be an option for you to check into.

SHMMAX - others have already given good thoughts. There are other kernel parms to watch out for being too high. Sometimes you get more by setting things smaller too ! But parms I'd address once my I/O was more acceptable. One change at a time.

Rgrds,
Rita

Geoff Wild · ‎10-15-2007

I have a somewhat similar layout...

Though I have just upgraded to 11.23 this weekend (RP7410,s, SG , Oracle and SAP).

As far as I/O goes, do you have multiple paths to the disks? If so, you can manually loadbalance them.

As far as the diks go - are they striped?

Here's some info on my setup:

# memdetail
Memory Stat total used avail %used
physical 16128.0 15382.8 745.2 95%
active virtual 17685.4 9323.1 8362.2 53%
active real 11771.5 5935.6 5835.8 50%
memory swap 16128.0 3826.0 12302.0 24%
device swap 26528.0 16455.6 10072.4 62%

I'd say your dbc_max_pct is too high - drop to 5....and increase SHM...

# kctune
Tunable Value Expression Changes
NSTREVENT 50 Default
NSTRPUSH 16 Default
NSTRSCHED 0 Default
STRCTLSZ 1024 Default
STRMSGSZ 65535 65535
acctresume 4 Default
acctsuspend 2 Default
aio_listio_max 256 Default Immed
aio_max_ops 2048 Default Immed
aio_monitor_run_sec 30 Default Immed
aio_physmem_pct 10 Default
aio_prio_delta_max 20 Default Immed
aio_proc_thread_pct 70 Default Immed
aio_proc_threads 1024 Default Immed
aio_req_per_thread 1 Default Immed
allocate_fs_swapmap 0 Default
alwaysdump 0 Default Immed
chanq_hash_locks 256 Default
core_addshmem_read 0 Default Immed
core_addshmem_write 0 Default Immed
create_fastlinks 0 Default
dbc_max_pct 5 5 Immed
dbc_min_pct 4 4 Immed
default_disk_ir 0 Default
disksort_seconds 0 Default
dmp_rootdev_is_vol 0 Default
dmp_swapdev_is_vol 0 Default
dnlc_hash_locks 512 512
dontdump 0 Default Immed
dst 1 Default
dump_compress_on 1 Default Immed
enable_idds 0 Default
eqmemsize 15 Default
executable_stack 0 0 Immed
fr_statemax 800000 Default Immed
fr_tcpidletimeout 86400 Default Immed
fs_async 0 Default
fs_symlinks 20 Default Immed
ftable_hash_locks 64 Default
gvid_no_claim_dev 0 Default
hdlpreg_hash_locks 128 Default
hires_timeout_enable 0 Default Immed
hp_hfs_mtra_enabled 1 Default
intr_strobe_ics_pct 100 Default Auto
io_ports_hash_locks 64 Default
ioforw_timeout 0 Default Auto
ksi_alloc_max 33600 33600 Immed
ksi_send_max 32 Default
max_acct_file_size 2560000 Default Immed
max_async_ports 50 Default
max_mem_window 128 128
max_thread_proc 3001 3001 Immed
maxdsiz 1073741824 0X40000000 Immed
maxdsiz_64bit 17179869184 0X400000000 Immed
maxfiles 4096 4096
maxfiles_lim 4096 4096 Immed
maxssiz 134217728 134217728 Immed
maxssiz_64bit 1073741824 0X40000000 Immed
maxtsiz 67108864 0X4000000 Immed
maxtsiz_64bit 1073741824 0X40000000 Immed
maxuprc 3780 3780 Immed
maxvgs 256 256
msgmap 1026 Default
msgmax 8192 Default Immed
msgmnb 65535 65535 Immed
msgmni 4200 4200
msgseg 32767 32767
msgssz 96 96
msgtql 4200 4200
ncdnode 150 Default
nclist 8292 8292
ncsize 35648 35648
nfile 189090 189090 Imm (auto disabled)
nflocks 4096 4096 Imm (auto disabled)
ninode 35648 35648
nkthread 10000 10000 Immed
nproc 4200 4200 Immed
npty 60 Default
nstrpty 60 60
nstrtel 60 Default
nswapdev 10 Default
nswapfs 10 Default
nsysmap 8400 Default
nsysmap64 8400 Default
o_sync_is_o_dsync 1 1
pagezero_daemon_enabled 1 Default Immed
pfdat_hash_locks 128 Default
physical_io_buffers 1280 Default Auto
region_hash_locks 128 Default
remote_nfs_swap 0 Default
rng_bitvals 9876543210 Default
rng_sleeptime 2 Default
rtsched_numpri 32 Default
sched_thread_affinity 6 Default Immed
scroll_lines 100 Default
scsi_max_qdepth 8 Default Immed
scsi_maxphys 1048576 Default
secure_sid_scripts 1 Default Immed
semaem 16384 Default
semmni 4096 4096
semmns 8192 8192
semmnu 4196 4196
semmsl 2048 Default Immed
semume 100 100
semvmx 32767 Default
sendfile_max 0 Default
shmmax 17179869184 17179869184 Immed
shmmni 512 512 Immed
shmseg 200 200 Immed
st_ats_enabled 1 1
st_fail_overruns 0 Default
st_large_recs 0 Default
st_san_safe 0 Default Immed
streampipes 0 Default
swapmem_on 1 Default
swchunk 4096 4096
sysv_hash_locks 128 Default
tcphashsz 2048 Default
timeslice 10 Default
timezone 420 Default
unlockable_mem 0 Default
vnode_cd_hash_locks 128 Default
vnode_hash_locks 128 Default
vol_checkpt_default 10240 Default
vol_dcm_replay_size 262144 Default
vol_default_iodelay 50 Default
vol_fmr_logsz 4 Default
vol_max_bchain 32 Default
vol_max_nconfigs 20 Default
vol_max_nlogs 20 Default
vol_max_nmpool_sz 4194304 Default Immed
vol_max_prm_dgs 1024 Default
vol_max_rdback_sz 4194304 Default Immed
vol_max_vol 8388608 Default
vol_max_wrspool_sz 4194304 Default Immed
vol_maxio 256 Default
vol_maxioctl 32768 Default
vol_maxkiocount 2048 Default
vol_maxparallelio 256 Default
vol_maxspecialio 256 Default
vol_maxstablebufsize 256 Default
vol_min_lowmem_sz 532480 Default Immed
vol_mvr_maxround 256 Default
vol_nm_hb_timeout 10 Default
vol_rootdev_is_vol 0 Default
vol_rvio_maxpool_sz 4194304 Default Immed
vol_subdisk_num 4096 Default
vol_swapdev_is_vol 0 Default
vol_vvr_transport 1 Default
vol_vvr_use_nat 0 Default
volcvm_cluster_size 16 Default
volcvm_smartsync 1 Default
voldrl_max_drtregs 2048 Default
voldrl_min_regionsz 512 Default
voliomem_chunk_size 65536 Default
voliomem_maxpool_sz 4194304 Default
voliot_errbuf_dflt 16384 Default
voliot_iobuf_default 8192 Default
voliot_iobuf_limit 131072 Default
voliot_iobuf_max 65536 Default
voliot_max_open 32 Default
volpagemod_max_memsz 6144 Default Immed
volraid_rsrtransmax 1 Default
vps_ceiling 64 64
vps_chatr_ceiling 1048576 Default
vps_pagesize 4 Default
vx_maxlink 32767 Default
vx_ninode 0 Default Immed
vxfs_bc_bufhwm 0 Default Immed
vxfs_ifree_timelag 0 Default Immed
vxtask_max_monitors 32 Default

We use EMC arrays, so our disks are striped wide...also, we use powerpath - so it does the load balancing for us.
For our filesystems that contain database files, I also striped them like so:

lvcreate -L 531328 -n lvsapd01 -i 7 -I 128 /dev/vg61

The vg was created like so:

vgcreate -s 16 -p 128 -e 9552 /dev/vg61 `cat syminq.52.sort6.6.uniq.devs.vg61`

The vg is made up of 35 x 72GB LUNS.
The lvcreate command - works out to 7 of these LUNs in that lvol. I have no overlapping - IE sapdat01 is on 7 LUNs and nothing else is there.

Rgds...Geoff

Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.

TwoProc · ‎10-15-2007

John,

Keep in mind that you don't need a special tuner to come in and fix your shmmax to 10G on your kernel. And you won't need one to set the db_cache_size to say 7G or so in the init.ora file to see if it makes a difference is easy enough. All you need is a bit of downtime. And, I wouldn't pay a tuning consultant just for that. Don't you have a test server with this system cloned on it that you could play with to see if you can get it done? If you hire a tuning person, let them handle the harder stuff, and get the easy low-hanging fruit yourself. If just doing those easy moves gets the issue fixed, then send the consultant a check for the hours he or she has put into doing your analysis and save yourself further costs, unless you want or need more analysis and help. Providing that you pay for the assistance you've already received, the consultant should have no problem with that.

We are the people our parents warned us about --Jimmy Buffett

Hein van den Heuvel · ‎10-15-2007

My friend Geoff wrote:

"I'd say your dbc_max_pct is too high - drop to 5...."

I respectfully disagree.

(That's long for 'bulls*#t!')

The notion of setting dbc_max_pct low, which I often recommended myself, stems from trying to avoid double-buffering (database and Oracle) and the fear that the system will not give back the memory in time when pressure is building.

It is at best a tweak to attempt to reduce overhead by a few percent.

The purpose of a high dbc_max_pct is to give 'idle' memory to the OS as a large cache for (read) buffers sonce it has nothing better to do with the memory.

It is particulary helpful ( 10x performance )when :
1) there is no significant hardware cache, such as the case with direct attached disks, and
2) when the application if restricted to limited internal caching, such as using a 2GB SGA for oracle on an 18GB box. and
3) thus when there is lots of free memory available.

All three conditions appear to exist here.

By setting dbc_max_pct low, just because there is a rule-of-the-dumb (:-) out there, one runs the risk of loosing out on a 2x - 10x improvement potential in a non-ideal setting, for the fear of a 5% - 10% overhead in an otherwise ideal setting.

That would appear to be a poor recommendation to me.

Geoff>> We use EMC arrays, so our disks are striped wide...also, we use powerpath - so it does the load balancing for us.

Ah, you see, you have maximum HW caching in place, where John appears to have to work with minimum caches. Apples and Oranges... both fruits, but different rules apply.

Worry about dbc_max_pct when everything else is perfectly understood an arranged, that's when it might have a positve and quantifiable effect.

Of course if you are on 11.31, then you may be able to stop worrying about it alltogether, as the algortimes have much improved.

Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting

"si on ne mets pas les gros cailloux en premier dans le pot, on ne pourra jamais les faire entrer tous, ensuite".

John Kittel · ‎10-17-2007

Thank you everyone for your quality responses. It will be a while longer yet for me to evaluate your recommendations, implement them, get back to you with requested information and data. I wish I could dedicate myself to just this one important problem, but like most sysadmins I have so much else to do.

I will leave thread open for now. When I'm ready for more help I may append to this thread or close it and open a new one.

In any case I will let you know what I do as far as tuning and the effects.

- John

Geoff Wild · ‎10-17-2007

As far as dbc_max_dbc - on systems with large memory - if you do happen to come close to 100%, having a too high buffer cache can put a drain on system resources - IE - it takes a while for the buffer cache to decrease - which could cause swapping - just when you least need it:

ftp://hprc.external.hp.com/memory.htm#review_buffer_cache_size

In 11.31 - this has been replaced with: filecache_max

filecache_max, filecache_min - maximum or minimum amount of physical
memory used for caching file I/O data

Again, oracle bypasses the OS buffer cache anyways - so on a Oracle server - there is almost no need for an OS one at all - better to give as much ram as possible to Oracle - and let it handle the I/O.

Rgds...Geoff

Proverbs 3:5,6 Trust in the Lord with all your heart and lean not on your own understanding; in all your ways acknowledge him, and he will make all your paths straight.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: hpux 11.11 on rp7410, oracle, disk i/o performance

hpux 11.11 on rp7410, oracle, disk i/o performance