LVM and VxVM

Tuning Faculties on LVM Abstraction Layer

 
SOLVED
Go to solution
Ralph Grothe
Honored Contributor

Tuning Faculties on LVM Abstraction Layer

Hi Performance Tweakers,

I was asked by our Informix DBA if there were (layout) parameters during the creation of volumes that could have a performance impact on later disk I/O.

I replied my assumption being that the tuning possibilities, probably unlike filesystem creation and mounting options (which don't play a significant role as Informix performs raw disk I/O), are pretty limited.

I think it boils down to the two options Nos. of PEs and PE size (i.e. -e and -s) during a vgcreate.
The question was whether fewer but larger PEs were favourable over more smaller ones,
especially with regard to the restricted size of the volume header for this meta data,
which sometimes leaves you no choice but to select a bigger PE size.
But even if the fancied maximum largest PV size of the VG would allow for smaller PEs
my feeling is that bigger PEs would be more advantageous.

Another issue that came to my mind was how one could (if at all) achieve to hit the exact spindle bounds from the SAN disk subsystem
because from our SAN admin I am afaik only provided with what I would call virtual disks,
viz. some sort of disk chunk that appears as a PV on an ioscan to the OS.

Herein also falls the value of LVM striping.
My suspicion is that as long as I cannot guarantee that every LVM stripe would in reality map to a different spindle I am better off without any LVM striping.

I would be interested to hear your views
(outch, one of my English improvement asides, can one really say "hear your views", or isn't that some kind of an oxymoron or paradox?)

Regards
Ralph
Madness, thy name is system administration
9 REPLIES 9
RAC_1
Honored Contributor

Re: Tuning Faculties on LVM Abstraction Layer

The no. of pes and pe size as you know are related to the issue. But what to go for?? more pes or big size should depend upon the size of the volume.

When the volumes size is very big, you will have to go for bigger pe size and vice a versa.

I personally have never gone beyond 32mb.
There is no substitute to HARDWORK
Pete Randall
Outstanding Contributor

Re: Tuning Faculties on LVM Abstraction Layer

Ralph,

OK, here is an "opinion" for you to hear.

It would seem to me that a larger PE size would function like a read-ahead buffer, making more data available in memory, assuming the application can take advantage of it. Mostly sequential applications would benefit, while truly random applications would probably not. On the other hand, truly random applications would probably benefit from not having to fetch so much data if the PE size was smaller. So, in this theory, it depends on the application.


Pete

Pete
Devender Khatana
Honored Contributor

Re: Tuning Faculties on LVM Abstraction Layer

Hi,

Allthough the mount options and file system creation options do not apply here as you mentioned the usage of this volume is only for raw devices. The stripping can still play a big role, reason being most of the LUNs allocated from storages are from 3 or 4 physical disks depending upon the configuration. (Allthough new arrays also support 7D+1P configuration Where it will be across 8 disks). Additional spanning of sequential IOs in this event can also be done by doing an OS level stripping across multiple LUNs.

But for this care must be taken that that two or multiple LUNs you are stripping across are from different RAID groups so that actually the data is spread across multiple disks.

PE Size and no. of PE shall depend upon the situation as stated by others. The situations shall not include the type of applications but also the size of LUNs, af with small PE the Max_PE_PER_PV can not be very huge.

HTH,
Devender

Impossible itself mentions "I m possible"
Steve Lewis
Honored Contributor
Solution

Re: Tuning Faculties on LVM Abstraction Layer

Informix does tend towards lots of small random operations compared with a filesystem. I tried it using cooked files on vxfs with mount options convosync, mincache=direct and it crawled, I also tried it with cooked files and 800Mb vxfs buffers and it still crawled - with Informix raw LVs are definitely the way to go.

Here are my views:
Dont create chunks greater than 4Gb unless you are forced to, because the informix cleaners can struggle.
Don't use informix mirroring for anything.
Keep the physical and logical logs apart from the database, especially the first chunk and especially if you use unbuffered logging.
Striping a logical volume/chunk over 2-4 LUNs is better than distributed and better than 1:1 CHUNK:LUN .
You still need to map out your storage in advance.
KAIO is good, but set IFMX_HPKAIO_NUM_REQ >=2000 and maybe up to 4000.

Disk arrays these days have variable read-ahead and this can make informix tuning of RA_ parameters difficult. Keep your firmware up-to-date in this respect (xp12000 at 50-04-31)

Last but by no means least...it depends on the application.


David Child_1
Honored Contributor

Re: Tuning Faculties on LVM Abstraction Layer

Ralph,

On the LVM striping query; We are primarily an EMC shop so I will base my response using EMC examples. I'm sure other high-end arrays are similar.

One EMC Symmetrix frams you can verify that each LUN in your stripe set is on a different spindle. A quick way to be fairly sure is to use sequential LUN numbers (for meta-devices see below). If the frame was set up correctly then each hypervolume (aka slice) should be created in order using different physical drives. Example:

/dev/rdsk/c4t0d1 EMC SYMMETRIX 5670 3854E000 8838720
/dev/rdsk/c4t0d2 EMC SYMMETRIX 5670 3854F000 8838720
/dev/rdsk/c4t0d3 EMC SYMMETRIX 5670 38550000 8838720

Symm devices 54E, 54F, and 550 should all be on different spindles. Of course it will eventually wrap around and use the first spindle again, etc. If you wish to double-check you could run 'symdev show 54F' and look for the backend disk information. In my example about I found that hyper 54F is using the physical drives at "16D, C, 0" and "02C, D, 0" (mirrored set). Hyper 550 is using physical drives at "15D, C, 1" and "01C, D, 1". So I can be sure those are not on the same spindles.

Now if striped meta devices are used on the frame then you have to ask yourself if striping on both the frame and the host is the way to go. For a long time (and probably still) Oracle recommended striping on both host and frame. In recent years EMC found this often hurt performance. Striping (true) in LVM is also a pain as you cannot mirror logical volumes that are striped. This can be a pain if you want to do certain migration methods. I have used EMC's striped meta devices and the performance over host-based striping single hypervolumes seems to be very good. Note: when using meta-devices the symm device numbers will not be completely sequential. If they are 4-way metas then the device numbers will be in factors of 4 (540, 544, 548, 54B, etc.).

Lastly, picking spindle location. The storage guys should be able to provide some information to assist in locating the sweet spots, but personnally I don't think its worth the effort. Don't forget that there is a large amount of cache on the front end of these arrays. There is no direct write/read to/from the spindles.

Other arrays may or may not have these features, etc. so you will need to look at what you have.

David
Ralph Grothe
Honored Contributor

Re: Tuning Faculties on LVM Abstraction Layer

Dear repliers,

many thanks for you valuable suggestions and opinions.
I will share your views with our Informix and SAN admins.
I am faithful we will derive a sound solution from them.

Kind regards
Ralph
Madness, thy name is system administration
Bill Hassell
Honored Contributor

Re: Tuning Faculties on LVM Abstraction Layer

It turns out that volume parameters such as the extent size and number of extents are only used as indexes (bookeeping). A volume with 4meg extents will perform the same way as one with 64meg extents. The extent lists are kept in RAM and serve to translate a given block number in the lvol to a physical address--read: very low overhead.

You are correct in assuming that the SAN has eliminated any possibility of tuning your data layout to match physical disks. That's why you pay so much for the array controller--you want a very fast, special purpose box to optimize the disks. And when you take into account a large disk cache in the controller, any striping done at the HP-UX level is meaningless since the data may be partially (or fully) in controller memory.

So once you get a SAN, you treat the disk as a commodity and let the SAN do all the work to optimize things. Think about striping and physical disk layouts with JBODs only.


Bill Hassell, sysadmin
Ted Buis
Honored Contributor

Re: Tuning Faculties on LVM Abstraction Layer

I agree will Bill Hassel if the array is an EVA or VA or if EMC Symmetix meta devices are being used, however, in some cases with older arrays, I have seen performance improvements drilling into the array and paying attention to where the DB index files are placed and using host based LVM striping to get more spindles involved in the backend I/O. The number of back end spindles count for sustained high I/O performance, so if you have a bottleneck on a particular lvol, it can be worth the time to check to see how many spindles are really involved for that lvol, and if there is contention from other lvols or even other computers for space on those disks. I have even had to waste disk space at times to get the spindle count where it needed to be to hit the performance level desired. The other reason that some people use host based striping is to increase the effective queue depth, since the default is only 8. I disagree with this approach, and suggest that is is better just to increase the queue depth on the particular device.
Mom 6
Alzhy
Honored Contributor

Re: Tuning Faculties on LVM Abstraction Layer

I have dealt with "assembling" UNIX volumes (either LVM/VxVM and for both RAW usage or Filesystem/cooked) whose components are either simpled sisks (or JBODs), cache-centric arrays (XP, EMC, Shark, etc.) and controller-centric arrays (EVA, Celerra, Clarion) - and one common thing that matters most is the "HOW" you build and use those components.

For LVM: PE size does not matter period.
For VxVM: no such parameter exists.

As with regards to how you "assemble" these "components (disks - virtual or real)":

For JBODS:
You've no choce - for protection's sake - you have to stripe and mirror (or mirror and stripe). You can even use RAID-5 if your application's I/O needs are not that great.

For Cache-Centric Arrays:
Always stripe, period. Stripe wide and thin for OLTP. Stripe wide and mid-thin for DSS/warehouses. How wide? No less than 4 - no more than 8 "components" - with each component sourced from different areas (controllers/domains) inside your array. How "thin" (stripe-width)? - It depends on your application. I find 64K to be the most neutral for mixed use - OLTP/DSS.

For Controller-Centric Arrays:
Forget about any striping. Striping is done on the backend for you. Except of course if you have several Controller-Centric arrays (e.e, 8 EVAs each to its own pair of Fibre) -- then I will stripe accross those 8 arrays.



Hakuna Matata.