- Integrated Systems
- About Us
- Integrated Systems
- About Us
01-02-2012 05:25 AM
Re: strict vs pvgstrict vs distributed pvgstrict
These are combinations of settings, related to how LVM allocates extents when you're creating, extending or mirroring logical volumes.
These allocation policies are defined in terms of extents. An extent is essentially just a piece of disk space with a fixed size, defined at VG creation time.
We must also talk about physical extents vs. logical extents. When a LV is not mirrored, each logical extent of the LV will have one-to-one mapping to corresponding physical extent. When a LV is mirrored, each LV will have two (or three, if three-way mirroring is used) physical extents, each holding a copy of the same data.
- strict (the default, lvcreate/lvchange options "-D n -s y"): if the VG has multiple PVs with free disk space, the LVM generally tries to use all the free space on one PV before moving to the next, with one restriction: with mirrored LVs, the physical extents must be chosen so that for each logical extent, no two copies of it are on the same PV. (The idea is to make it impossible to lose both copies of any extent if any one PV fails.)
- PVG-strict (lvcreate/lvchange options "-D n -s g"): to be useful, this requires that the VG has more than two PVs, the /etc/lvmpvg file has been created and the PVs of the VG are listed in it. The lvmpvg file is normally used to split the PVs of the VG into two or more groups (PVGs). Compared to the default "strict", the PVG-strict mode limits the extent allocation with mirrored LVs still further: the physical extents must be chosen so that for each logical extent, no two copies of it are in the same PVG. With old parallel-SCSI disk systems, this can be used to make sure the mirror copies are on separate SCSI buses even if each bus has multiple disks belonging to the same VG; with modern FC/SAS disks this is usually not an issue.
Example: you have an old system with a data VG that contains a number of small LVs on two old small PVs. Due to some recent removal of obsolete software, both PVs are currently about 50% used. The disks have not been mirrored because someone forgot to configure the system with enough disks for that, and the disks are now so old you expect them to fail at any time soon. You add one new PV that has more capacity than the sum of the old PVs (fortunately the previous admin had prepared for future larger disks at VG creation time...). You would want the data to be mirrored so that one copy of each extent is on the new disk and another on either of the two old disks.
Bad solution: you just add the new PV to the VG and explicitly specify the PV when running "lvextend -m 1" for each LV. Some time later, a junior sysadmin must respond to an urgent request to extend one of the LVs. He does it successfully, but later you find that the LVM allocated the extents so that both copies of the extended portion are on the old small slow disks. (Easy to fix with pvmove, but a bit scary because both the old disks have now started to show some early warning signs of failure...)
Good solution: create the /etc/lvmpvg file (see "man lvmpvg") and define two PVGs for this VG: one will contain the new PV, and the other the two old PVs. Before mirroring the LVs, you use "lvchange -s g" to switch the LVs to PVG-strict mode. Now, even if you don't explicitly specify the mirror destination PVs, the LVM will automatically make sure the that the already-existing LVs on the old disks will be mirrored to the new disk only. Furthermore, the LVM will reject any commands that would violate this restriction, so the junior sysadmin cannot accidentally place your data at risk.
- Distributed pvgstrict (lvcreate/lvchange options "-D y -s g"): like pvgstrict, but instead of using all the free space on each PV before moving to the next one when extending/creating LVs, LVM attempts to allocate each extent using different PV than the previous extent. With non-mirrored LVs, this is roughly equivalent of RAID 0 with a large stripe size: if doing large disk operations (larger than the extent size) this ensures the operation will be split on at least two disks, so the operations can be somewhat parallelized. However, the size of modern disks tends to require large extent sizes, making this less worthwhile (when striping, you would want the stripe size to be small, like 4 kB). With mirrored LVs, this is called "extent-based mirrored striping".
Example: with four disks (A, B, C and D) with a capacity of four physical extents each, the list of allocated physical extents for a single unmirrored 16-extent LV would look like this in each case:
- strict: AAAABBBBCCCCDDDD
- PVG-strict: AAAABBBBCCCCDDDD (no difference to strict, if LV is not mirrored)
- distributed pvgstrict: ABCDABCDABCDABCD
If the LV is only 8 logical extents in size and mirrored, a typical allocation immediately after creation would be:
- strict, mirror half #1: AAAABBBB
- strict, mirror half #2: CCCCDDDD
- strict, mirror half #1: AAAACCCC
- strict, mirror half #2: BBBBDDDD
... but after many extend/reduce/pvmove rearrangements, this could degrade into a complete mess:
- strict, mirror half #1: ADBCBCDD
- strict, mirror half #2: BCAADBCA
With PVG-strict (PVG 1 = A+B, PVG 2 = C+D, there is only one possible way for the system to create it:
- PVG-strict, mirror half #1: AAAABBBB
- PVG-strict, mirror half #2: CCCCDDDD
No matter how much the LV is extended/reduced/moved/rearanged, LVM will maintain some amount of order:
- mirror half #1: AABBCCDD
- mirror half #2: CDCDABAB
(Note: with PVG-strict, you have a guarantee that you can remove either A+B or C+D and the LV will still work; with strict, you don't have such a guarantee.)
A distributed PVG-strict LV with mirroring would look like this immediately after creation:
- mirror half #1: ABABABAB
- mirror half #2: CDCDCDCD
It is also possible to configure a non-strict allocation policy (lvcreate/lvchange options "-D n -s n"), but this should be avoided if possible. It could be used as a very temporary workaround for exceptional situations only: it's definitely not for normal production use. (If you use this and lose your data that was supposed to have been mirrored, don't say we did not warn you.) It would allow allocation arrangements like this:
- mirror half #1: AAACCCCD
- mirror half #2: ABBBBDDD
(In the above case, if disk A fails, the first logical extent will be lost because both mirror copies of it are on disk A; if disk D fails, the last logical extent will be lost.)