Operating System - HP-UX
1829608 Members
1395 Online
109992 Solutions
New Discussion

Re: Stripe Size and Number of Disks

 
SOLVED
Go to solution
steven Burgess_2
Honored Contributor

Stripe Size and Number of Disks

Hi All

At the moment we are running an oracle 9i database connected to IBM FastT900 storage. I have initially presented 2 * 270g LUN's for oracle. The database is using oltp. At the moment all I am seeing is disk utilisation at 100% with horrendous throughput (9mbytes/s) for oracle transactions (all waiting I/O) If I dd a 1gb file in the area I get upto 120mbytes/s second for the sequential I/O.

The LUN's are made from a RAID5 array of 5* 72gb disk spread across 5 drawers. Each LUN is over a separate controller.

I am thinking that I am going to gain significant throughput by doing the following

Creating 4 RAID5 arrays and presenting to the server 4 LUNS of 60gb, one LUN from each array. From this I will create new striped lvols using the 4 LUNS

From this configuration can anyone give advise on the best stripe size with oracle in mind.

If in the future I need to increase the size of the lvols in the event that I run out of disk, I will have to add disk in LUNS of 4 to each vg to satisfy the allocation policy ?

Is it good practice to have multiple striped filesystems all using the same 4 LUNS ?

ie 4 lvols of 60gb, each striped across the 4 disks

Can anyone see anything wrong or have any reservations about the proposal ?

TIA

Steven
take your time and think things through
9 REPLIES 9
Hein van den Heuvel
Honored Contributor
Solution

Re: Stripe Size and Number of Disks

Steven,

When you are doing the dd test for throughput you probably used bs=1024k or larger no?

What is your (average) Oracle page size? 8K?
In that case the 9mb/sec represents 9*1024 / 8 = 1150 IO/sec. That may or might not be a limit on your IBM storage. If that is just to one 5-disk lun, that it is really pretty high, and a limit: more than 200 IO/sec/disk.

You want to check a statspack report to see whether your Oracle IO is mostly single page or not. While there, please also check and let us know about the Read-write ratio. Is this mostly read (in which case Raid-5 is fine) or heavy on writes (temp, undo).

You could try to Re-check with dd and bs=8k, but that will eb sequential IO and may still show a seemingly reasonable number thanks to read-ahead trick working.

If you are write intense and have excess space, then please consider raid-1. But you knew this.

Oracle like SAME (stripe and mirror everything) for good reasons. It tends to work.

You want to get more spindles behind your luns.
Your striping plan should get you there.
Personally I favor large stripes, evan simple LVM extent striping at say 1mm or 4mb as that will spread enough and does not risk fragmenting many IOs over spindle.

Many others prefer small stripes like 64kb

Hope this helps some,
Hein.
Pat Lieberg
Valued Contributor

Re: Stripe Size and Number of Disks

I remember doing this on AIX and DB2 in the past, so it may not be relevant to HPUX/Oracle, but we matched the extent size of the DB tables to the stripe size of the array.

As mentioned though, having a DB that is write intensive on RAID isn't the best for performance. If your configuration is very write-intensive, you might consider mirroring over striping. Of course mirroring always requires more disk to get the same amount of useable space.

If I remember, the IBM FastT only supports RAID5 for configuring its disk on the backend. So, you're stuck there, but IBM assured us that wasn't an issue based on caching. Unfortunately, we never got a chance to verify that since the company was shutdown and we were all laid off before the project made it too far.
Victor BERRIDGE
Honored Contributor

Re: Stripe Size and Number of Disks

Hi Steven,
I tried all sorts of size of stripes and havent found much difference unless you go beyond normal usage but what is "normal"?
I did 8k to 1MB
Here is one of the most mad one:
--- Logical volumes ---
LV Name /dev/s4vg01/lvol2
VG Name /dev/s4vg01
LV Permission read/write
LV Status available/syncd
Mirror copies 0
Consistency Recovery MWC
Schedule striped
LV Size (Mbytes) 4096
Current LE 1024
Allocated PE 1024
Stripes 4
Stripe Size (Kbytes) 256
Bad block on
Allocation strict
IO Timeout (Seconds) default

Yep its for swap... and I have to live with it (production box)...
This size would be a good candidate for oracle... but the most suitable for most purpose in my opinion would be 64K unless you know what you rae doing it (1MB was fine for very intensive heavy read/write gigantic files...)
Im sure IBM would go for 1MB since AIX 5.X would let you set IO buffer size to 1MB (not sure not tested but heard of by IBM folk)

Im not sure of the IO kernel parameter tunable for VXFS but remember the max size is 32K so I would stick to 64K (but you could do like me create different lv with different size and find out for yourself...)
Now you can after that change the blocksize of the files sytem and match oracle block size with it...

>Creating 4 RAID5 arrays and presenting to >the server 4 LUNS of 60gb, one LUN from each >array. From this I will create new striped >lvols using the 4 LUNS
Now Yes 4 LUNS but why not 2X4LUNS of 30GB for 2 separate VG?
One vg for data tablespace and exports
the other for indexes and temporary tablespaces


>If in the future I need to increase the size >of the lvols in the event that I run out of >disk, I will have to add disk in LUNS of 4 >to each vg to satisfy the allocation policy >?
Yes...
So a good compromise would be by 2?
slight change in performance is noticeable though...


How many interface do you have HP box side?


All the best
Victor
Leif Halvarsson_2
Honored Contributor

Re: Stripe Size and Number of Disks

Hi,
Have you consider reconfiguring the RAID instead, using RAID 1+0 (hardware striping and mirroring). I belive that should be a better idea.
Tim Nelson
Honored Contributor

Re: Stripe Size and Number of Disks

There is another open thread talking about striping array vols also called plaiding.

I agree that 1100io/s to a single 5 disk raid group is pretty high but there are many other items to review.

What are the disk queues at, if any ?
Response times ?
From the OS's point of view ?
From the array's point of view ?

If the array stats do not jive with the OS stats, i.e. disks not overly busy and responding but OS shows large disk queues, you may be stuck in a driver throttle situtation. I.e. HPUX defaults to 8 jobs in disk queue. If this looks like a driver throttle problem up the limit using the scsictl command ( see man page ).

If your stats match and the bottle next seems to be all the way through to the array I would then look at either larger luns with more spindles or as you mentioned striping accross multiple array luns with the OS. Keep in mind that configuring a plaid ( striping a stripe) may defeat the array's cache from re-ahead benefits.
Test each way first.

Sandman!
Honored Contributor

Re: Stripe Size and Number of Disks

Steven,

Is your oltp application read or write intensive? You should design the stripe size with that in mind and also consider each of the links in the I/O chain i.e. the caching capability of the RAID controller, HBA, LVM I/O block size and others. Consider software mirroring with LVM instead of going for a huge stripe size which might become a bottleneck owing to packet fragmentation.

regards!
steven Burgess_2
Honored Contributor

Re: Stripe Size and Number of Disks

High everyone

Really appreciate the replies, I haven't had chance to go over these yet as when I hit the post button, one of our customers lost half a datacenter, which we have just recovered from.

We have a conference call with some IBM SAN and Oracle guys tomorrow, so hopefully this may help us out a little. I will try the suggestions posted and get back as soon as.

I am on holiday tomorrow, till monday so will reply then

Thanks again

Steven
take your time and think things through
TwoProc
Honored Contributor

Re: Stripe Size and Number of Disks

Steven - I'd stay away from putting together a large Oracle systems with just two big giant luns. And, I'd not use R5.

If you're using only two luns then your scsi queue depth is probably large. Please review sar -d and look at the avg queue depth. Also,
make undo, temp, redo logs, and archive logs separate luns on R0/1 and if I were you I'd make sure that they don't share disk on the storage array. Index and Data areas can be considered for R5 if your i/o is overwhelmingly reads, but for oltp, I prefer R0/1 pretty much regardless. Make sure that data and index areas are also on separate luns, and preferably on different physical disks on the storage server.
These suggestions are not hard and fast, they are preferred - if you have to bend in some areas, try to minimize them.

Look at statspack reports, and examine carefully those top I/O consuming sql statements. Look also at how your cache hit ratio is holding up (both on data cache and code - shared_pool, large_pool and db_cache_buffers.

If your cache hit ratios are low, you'll have to start increasing the SGA footprint in those key areas.

Make sure to interleave the redo logs, interleave the archive log output areas, and interleave redo space when possible (requires multiple mount points for each if you are trying to achieve this).

I think there are three things that are going to pull you out.
1) First and best one - identification and tuning of crud code. This takes the longest but has the biggest impact if the amount of code that needs tuning is large.
2) R0/1 instead of R5. This will be your second largest impact, but is conceptually quicker to do.
3) After you've tuned your code, if you hit ratios aren't in the high, high 90% numbers *(like 98% and staying there), then you should consider bigger cache areas, possibly bigger pool sizes - statspack will guide you on this).
We are the people our parents warned us about --Jimmy Buffett
steven Burgess_2
Honored Contributor

Re: Stripe Size and Number of Disks

Hi All

Sorry for the late response, have been away on holiday

We have essentially presented smaller luns to the oracle data areas and increased the oracle sga to 4gb. We decided not to stripe the stripe as we felt that this may have a reverese effect on the disk and essentially make things slower.

Customer is happy

Thanks again

Steve

take your time and think things through