Performance problem in disk IO

chetan a · ‎03-06-2007

Hello,

I have a problem in scaling disk i/o. The machine is an 11i v2 itanium application server with two HP EVA's ( 75 gig, 15k rpm each).

AVerage values of sar output is given below,

c0t6d0 8.67 0.67 11 113 0.16 16.62
c127t0d7 78.73 0.52 716 24672 0.06 3.44
c80t0d3 77.73 0.51 700 24256 0.03 3.44
c57t1d2 77.03 0.51 722 24917 0.01 3.12
c32t0d6 78.71 0.51 737 25241 0.02 3.40
c95t0d2 77.69 0.51 721 24975 0.02 3.30
c102t1d1 79.77 0.51 973 39190 0.01 2.45
c53t0d5 78.73 0.51 726 24764 0.02 3.45
c16t0d1 77.59 0.51 718 24686 0.02 3.35
c102t1d3 70.10 0.50 3385 54153 0.00 0.21
c65t1d0 0.02 0.50 0 0 0.00 0.24

AVerage CPU usage using sar,

HP-UX rx8640f B.11.23 U ia64 03/05/07

13:02:38 %usr %sys %wio %idle
Average 30 13 32 25

I tried tweaking scsi_max_qdepth, but it was of no use. My cpu consumption is only around 40% but I am not able to scale up my I/O capcity.

How to determine the problem with IO, ?

Cheers,
Chetan

I can implement the switch, but switch will be risky :-)

Steve Lewis · ‎03-06-2007

You have discovered the problem with scsi_max_qdepth. It simply pushes the queue into the arrays away from the o/s and does not always improve the perforance.
You have many options.
Option 1:
tune your system and/or database buffer cache to avoid the need to go to i/o in the first place. This may need more memory in the server.
Option 2:
tune your application to not require so much i/o, or tune your database indexes, or your database statistics,
Option 3:
Spread the load over even more storage spindles, consider your RAID policy (is it RAID5/6/Autoraid, if so try RAID 1/0).
Option 4:
A bigger disk array.
Option 5:
Consider evening out the load on controllers by striping the data.

Chan 007 · ‎03-06-2007

Hi,

Try

IOSTAT and Glance.

Check your kernel paramaters w.r.t Database e.g Oracle / Sybase.

What is the RAID that you have implememted?

Chan

Yogeeraj_1 · ‎03-06-2007

hi Chetan,

if you are running Oracle, try to run a statspack report and see if you have any waits relative to IO.

e.g.
Wait Events for DB: MYDB Instance: mydb Snaps: 17 -18
-> s - second
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)

Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
db file parallel write 764 382 37 48 254.7
log file parallel write 394 392 32 81 131.3
control file parallel write 391 0 29 73 130.3
db file scattered read 89 0 2 19 29.7
log file switch completion 4 0 1 209 1.3
async disk IO 13 0 1 42 4.3
log file sync 1 0 0 211 0.3
process startup 3 0 0 49 1.0
db file sequential read 21 0 0 1 7.0
db file single write 1 0 0 23 0.3
log file single write 2 0 0 11 0.7
control file sequential read 238 0 0 0 79.3
latch free 1 0 0 1 0.3
LGWR wait for redo copy 1 0 0 0 0.3
log file sequential read 2 0 0 0 0.7
virtual circuit status 41 41 1,202 29311 13.7
jobq slave wait 66 63 197 2978 22.0

also take a look at:
http://technet.oracle.com/deploy/availability/pdf/oow2000_sane.pdf

hope this helps!

kind regards
yogeeraj

No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)

Hein van den Heuvel · ‎03-06-2007

Chetan,

You are running at 9,400 IO/sec, 130MB.sec.
This may well be all there is, in which case you need to focus on reducing the IOs needed, even more so than normal. Can you get more filesystem or database caches going?

You need to provide more context, and some explanation for the 'odd' numbers.

What is the application doing? Oracle? NFS? Read-write ratio?

What is happening to c102t1d3? It is reporting 4x more IO/sec as 1/2 the IO size and with sub-millisecodn response. So that means they are NOT real IO, but cache activities (backed up by real IO).

For example, the 75Gig, 15Krps is per disks right? But how many drives? If we remove c102t1d3 from the equation then sar shows 6,000 IO/sec and at a generous 150 IO/sec per spindle that suggests you need at least 40 disks to or your threshhold is the number of spindle. If we add those 3,000 IO/sec for c102t1d3 to the mix, then you need 60+ spindles. How many do you have?

How many disks, groups, disks per group on the EVA? RAID-5 or RAID-0+1? read-write ratio? How many fibres? Switches.

Are some IOs waiting for others? Again that c102t1d3 activity, with its 8kb IO may well be the absolute max to a single logical unit over a single fibre, through a single HBA.

Regards,
Hein van den Heuvel
HvdH Performance Consulting

chetan a · ‎03-09-2007

Hi,

Thanks for all you answers, I have more information this time.

1.) Hardware Configuration,

32 core 11.23 ipf Server with 4 Fibre channel connected to 3 EVA's ( HP EVA8000: 28 disk/75 gigs) using two switches. Two EVAs have 3 LUNs and remaining one has 5 LUNs. RAID 0 is implemented on all the EVAs.

2.) Software Configuration,

Server has an application which queires(60% read, 40% write) oracle 10g database running on server. Oracle has not been configured/using ASM or SAME and uses it uses 3 raw luns for 3 log files and remaining LUNs have VxFS file system on them.

Following list explains the file system present on each device.
c0t6d0 VxFS:BootVolume
c32t0d6 VXFS
c16t0d1 VXFS
c57t1d2 VXFS
c53t0d5 VXFS
c65t1d0 VXFS
c102t1d3 VXFS
c80t0d3 VXFS
c102t1d1 RAW
c95t0d2 VXFS
c127t0d7 VXFS

I have also attached the wait events, which I got from statspack.

Thanks,
Chetan

I can implement the switch, but switch will be risky :-)

chetan a · ‎03-09-2007

Am sorry about the rabbit :(, my problem is still not solved.

I can implement the switch, but switch will be risky :-)

Hein van den Heuvel · ‎03-09-2007

That helps some more. Looks like plenty of CPU cycles to spare. A little high on the system CPU usage, but not too surprising giving the IO/sec load.

Good to see you have statspack data.
It's trimmed down a bit much though.
Sure looks like you could use some more read IO power, or more effective (SGA) caching.

Also looks like you want in increase your SQL*net buffers and you have 3/4 of you response messages needing a second package. Of course this could also be a distorted average with little room for improvement.

And the library cache may need tweaking.

RAID-0 huh? Should be fast, but scary!

Perhaps you want to Email me a full statspack and I can help some more?

Regards,

Hein van den Heuvel
HvdH Peformance Consulting.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Performance problem in disk IO

Performance problem in disk IO

Re: Performance problem in disk IO

Re: Performance problem in disk IO

Re: Performance problem in disk IO

Re: Performance problem in disk IO

Re: Performance problem in disk IO

Re: Performance problem in disk IO

Re: Performance problem in disk IO