1836434 Members
2693 Online
110100 Solutions
New Discussion

Re: LUN sizes

 
SOLVED
Go to solution
Tim Medford
Valued Contributor

LUN sizes

I have an HP rp5470 connected to an IBM Shark storage array via fibre channel.

On the shark there is 8gb of cache and a Raid5 array striped across 8 drives. When I write data to the shark it generally goes into cache and then flushes it to disk when it needs to. Performance is excellent most of the time.

We are getting ready to install a new Shark array and remove the old one. I have 400gb of storage available, and the question is how to split that up. The easiest scenario from my side is simply 2x200gb luns.

I use LVM, volume groups, logical volumes and all that good stuff. I have primary and alternate paths to each lun defined in my VGs.

Is there any reason from an LVM perspective to partition that 400gb into smaller chunks? For example, 4x100gb or 8x50gb??

Thanks,
Tim
12 REPLIES 12
A. Clay Stephenson
Acclaimed Contributor
Solution

Re: LUN sizes

The answer is divide into as many LUN's as you have SCSI paths. If you've got 2 Fibre channels then 2 LUN's is it; if you have 4 then 4.

Assuming the simplest case of 2 LUN's the idea is that the primary path to LUN0 would use SCSI path A (alternate path B) and LUN1's primary path would use SCSI path A (alternate B). You then stripe all LVOL's across both LUN's.

The downside to using only a few LUN's is that from the perspective of host-based performance tools like Glance, it will appear that a tremendous amount of i/o is going through one disk -- Glance has no way of knowing that what it see's as one devive might be many physical devices.

If you can pursuade yourself to take the capacity hit (or buy more disk), you would also see performance improvements by going from RAID5 to RAID 1/0. I've seen cases where that made a 3X-5X improvement but I've also seen case where it made little difference because the array was lightly loaded.
If it ain't broke, I can fix that.
Robert Bennett_3
Respected Contributor

Re: LUN sizes

Clay -

wouldn't you use alternate paths for the LUNS?

LUN0 Path A - alternate B
LUN1 Path B - alternate A

I just figured that's what you meant as that's what I do, but I'm not sure I'm correct now. You are a fount of knowledge so I can only question my own.

Thanks

B
"All there is to thinking is seeing something noticeable which makes you see something you weren't noticing which makes you see something that isn't even visible." - Norman Maclean
A. Clay Stephenson
Acclaimed Contributor

Re: LUN sizes

Yes, I'm an idiot. That's what I meant although it's clearly not what I wrote.

It should be:
Lun0 - Primary SCSI PATH A (Alternate SCSI PATH B)
Lun1 - Primary SCSI PATH B (Alternate SCSI PATH A)

Each LVOL should then be stripped across both of these LUN's; for most applications/arrays 64K is a good stripe size.


If it ain't broke, I can fix that.
Gary L. Paveza, Jr.
Trusted Contributor

Re: LUN sizes

Personally, if I would create twice as many luns as I have channels. Make 1/2 disks on each channel the alternate for the other side. That way 50% of your space is active on either channel at once, with 100% available on a channel if you lose a channel.

If you create the same number of luns as channels, all I/O will go down one channel - and only use the second channel if the first fails.
Robert Bennett_3
Respected Contributor

Re: LUN sizes

clearly not an idiot - but you had me wondering about myself.

B

"All there is to thinking is seeing something noticeable which makes you see something you weren't noticing which makes you see something that isn't even visible." - Norman Maclean
Tim Medford
Valued Contributor

Re: LUN sizes

Thanks for all the info guys!

Clay - How exactly do you setup that striping across the LVs? I used to do extent-based striping on the VGs, but with 8mb extents that really doesn't accomplish much.

The Shark is going to stripe across 8 drives anyway so I usually just let it handle it. In the old days when I had a bunch of HP jbods I used to worry about this but not so much anymore.

A. Clay Stephenson
Acclaimed Contributor

Re: LUN sizes

Extent-based striping is all but worthless because even the smallest possible PE (1MB) is much too large to be a good strip.

Sticking with our 2 LUN scenario comprising the VG, then it's simply:
lvcreate -i 2 -I 64 -L 32000 -n lvol10 /dev/vg06

This will create a 32000 MB LVOL with 2 stripes of 64k.

The idea is to spread the i/o over as many scsi channels as you have connected to the host. By alternating the primary/alternate pipes to each LUN in the VG, you spread the i/o around as much as possible.

You repeat this with every LVOL and don't worry about the paths to each LUN. The idea is to throw the data as fast as possible to the array and let the array and God figure out which disks it really goes to.
If it ain't broke, I can fix that.
Tim Medford
Valued Contributor

Re: LUN sizes

OK, thanks again for the info.

I see what you are doing now. I will research this a bit further.

With dual 2gb fibre channel connections I've never even come close to saturating that bandwidth so this has not been much of an issue for us so far.

There's so much cache on the Shark too that 90% of the time we're just writing to cache anyway.
A. Clay Stephenson
Acclaimed Contributor

Re: LUN sizes

It's not really a bandwidth issue with Fibre but from an LVM perspective, the more we can spread it across the channels the better.
If it ain't broke, I can fix that.
Hein van den Heuvel
Honored Contributor

Re: LUN sizes

Clay>> Extent-based striping is all but worthless because even the smallest possible PE (1MB) is much too large to be a good strip.

Hmmm....

I find that fibre channels and attached controller are generally fast enough that you can not tell the difference between say a 128KB io or a 64KB IO. At close to 200 MB/sec (for the actual transfer) for 2gb this is 200KB/millisecond, so both are about 1/2 a millisecond... for the transfer alone. Not much to parallize/optimize there. One would like to think that the setup cost for a 64K IO or a 128K IO are much similar and maybe higher than the actual transfer (talking HBA and controller cache access here).

Sure there is still a good reason to go striping for high bandwith, large IO situation.
But for many (OLTP) applications the pattern is more many smallish IOs over randomish seek ranges. In that case the major objective becomes to balance IOs and extend based striping does just fine for that, with minimal overhead, and minimal risk to split an already small IO into two smaller ones because the target happens to stradle two stripes.

fwiw,
Hein.
A. Clay Stephenson
Acclaimed Contributor

Re: LUN sizes

I'll admit that my blanket pooh-pooh'ing of extent-based stripping is a generalization; however, I have never seen a case where extent-based stripping even using 1MB PE's made any noticable improvement. On the other hand, I have seen many cases, using a variety of disk arrays (copper and fibre SCSI) when LVOL striping and spreading primary SCSI paths over the LUN's made very significant differences. On newer arrays, larger strip sizes may help but I've never been tempted to exceed 256K and even then I saw no improvement over 64K. Under 10.20 and 11.0 vxfs, 64K seemed to be the magic size and I've stuck with it into 11.x although I have tried 128K. In most cases the goal is to get the data to the array's cache as fast as possible and then let the array deal with it.

If it ain't broke, I can fix that.
Tim Medford
Valued Contributor

Re: LUN sizes

Clay - Sorry to keep beating this dead horse, but just a couple more questions.

I seem to recall some "gotchas" when using LVM striping. For example, it was difficult or impossible to extend a logical volume once it was established. Can you confirm or deny this?

Also, when we cut over to the new array I am planning to use vgreduce to eliminate to the alternate paths. I will have all i/o going through 1 of the fibre channel cards. I will then use fcmsutil to disable the inactive card, unplug it and plug it into the new system.

Once eveything is copied to the new system I'll disconnect the other card and put it into the new Shark as well. Then I can re-establish the primary/alternate links.

I'm currently using physical volume groups and extent-based striping. Do you think I'll have any trouble using vgreduce to shut-off the alternate links? Will I need to edit /etc/lvmpvg to remove the alternate links there too?

Thanks again for all your help.