Re: LVM and striping

brian_31 · ‎04-12-2011

In RHEL5 how does these 2 work together? What are the standards that others follow? Also in a shared storage how do we make sure a LV created on one machine is not activated on the other machine?

Thanks

Brian.

Matti_Kurkela · ‎04-12-2011

Striping in general:
- If you have a hardware RAID controller, as is common on PC server hardware, you can use it to do some striping in RAID0, RAID5 or RAID6 modes. The available stripe size options will depend on your RAID controller type.

- If you use software RAID (/dev/md*), you have RAID0, RAID4, RAID5, RAID6 and RAID10 modes to provide you striping if you want. This will give you fully adjustable stripe size (known as "chunk size" in mdadm). The default chunk size is 512 KiB, but it's adjustable.

- In 2.6.* kernel series, it's technically possible to pile multiple software RAID layers on top of each other, should you really wish to do so.

- Linux 2.6.* LVM (i.e. Linux LVM2) can do striping too, if you have multiple PVs available. The stripe size must be 2^n KiB: the minimum is 4 KiB and the maximum is the physical extent size of the VG.

Personally, I find multiple layers of striping are just a way to spend a lot of effort for diminishing returns. If you're using a SAN, the LUNs you see are usually not actual physical disks. They are logical constructs of the storage system, typically pieces of a large RAID array that has already been striped. Any I/O requests to the storage system will first hit the storage controller's cache RAM, and the controller will attempt to predict your access patterns.

If you implement an extra layer of striping, you're making it harder for the storage system to predict your I/O operations and may lose some of the benefit of predictive caching at the storage system level: you may end up "robbing Peter to pay Paul", gaining nothing but extra work hours. The same can happen with a non-SAN hardware RAID too, although in a smaller scale.

If you need maximum performance and your access pattern is read-mostly, you can get much better performance by switching to SSDs than by fine-tuning an array of spinning disks. But remember that an access pattern consisting mostly of small, non-contiguous writes is typically the worst-case for SSDs. If you need SSDs for such an application, budget for a huge battery-backed write cache too.

If you decide to use a SSD (or an array of them), pay attention to data alignment issues and the sustained sequential and random write speeds your (array of) SSD(s) can achieve.

----------
VG interlocks with shared storage:

Both HP Serviceguard for Linux and RedHat Cluster Suite ended up using essentially the same technology for basic VG interlocks. For RedHat cluster, usually the CLVM is presented first, but it's more appropriate for cluster filesystems (although in RHEL 5.6+ and RHEL 6, CLVM can do the job for failover VGs too).

The interlock setup for basic Serviceguard-style package failover is known as HA-LVM in RedHat documentation. If you have RedHat Network access, you can read the instructions directly from the RedHat Knowledge Base:
https://access.redhat.com/kb/docs/DOC-3068

But the essential idea is, you edit /etc/lvm/lvm.conf to make LVM require a specific tag in your VG metadata before it allows the VG to be activated. Typically the tag equals the hostname of the cluster node (this is true in both RedHat Cluster Suite and Serviceguard for Linux).

Your cluster infrastructure (RHCS or Serviceguard) will then automatically add the necessary tag after positive confirmation that the VG is not active on any other cluster node, and will remove it when cleanly shutting down a package.

If a node has crashed, the internal rules of the cluster infrastructure will determine when it's safe to override the tag of another node on a shared VG, so the cluster infrastructure will generally do the right thing automatically.

Nothing prevents a knowledgeable sysadmin from adding/deleting the VG tags manually... but a knowledgeable sysadmin should also understand the state of his/her cluster before doing so, and take the responsibility of his/her actions.

(In practice, the HA-LVM configuration works as an "anti-ignorance interlock" too: even regular Linux-savvy sysadmins who haven't Read The Fine Manual about cluster configurations are usually completely unaware of VG tags and their effects... and as a consequence, usually they'll find they cannot operate shared VGs manually at all.
:)

If you have shared storage without any cluster infrastructure (also known as Poor Man's Cluster), it's entirely the sysadmin's responsibility to keep track of the activation of shared LVs and to avoid doing harmful things. Of course, just reading the documentation on HA-LVM configuration may give you some useful ideas...

MK

MK

brian_31 · ‎04-12-2011

Thanks MK! Great Explanation. One question i have is the danger of having same lv names in cluster scenario. Any ideas of working around that case?

Thanks

Brian

Matti_Kurkela · ‎04-13-2011

A VG must exist before you can create a LV into it. When the system knows a VG exists, it also knows the names of the already-existing LVs in it. If you try to create a LV using a name that already exists in the same VG, I'd expect the LV creation won't happen and you'll get an error message.

On the other hand, it's possible to end up with two VGs with the same name. But that is not a big problem either: the primary identification for a Linux VG is not its name, but its VG UUID. It's a long string of numbers, created using an algorithm that uses information like hostname, creation time and some random numbers... so it'll be extremely unlikely to have two VGs with the same UUID unless you're cloning disks on purpose.

You can see the VG UUIDs with this command:
vgs -o +vg_uuid

If you have two VGs with the same name, you can rename one of them using vgrename:

vgrename

It's even one of the examples on the vgrename(8) man page.

MK

MK

Alzhy · ‎04-13-2011

LVM and striping. How they interplay?

It depends primarily on the SAN Storage array one is using. If you still use one of the previous generations of Tier-1 arrays from HP/ Hitachi / EMC -- then those have limited LUN virtualization/RAID'ing on the backend -- which means it will still augur well if you do some striping on the host end -- with LVM.

What about controller centric highly virtualized arrays like a NotApp or EVA -- is there a need to do another layer of striping on the host end whether it is via LVM or Oracle ASM? Yes absolutely -- and teh reason is so you have LUNs sitributed accross controllers and accross controller front ends.

Hakuna Matata.

Alzhy · ‎04-13-2011

Matti mentions "cache coherency" "and array optimsation" issues with additional striping on the host end. Yes it is true if your front-end (i.e. hitachi technology arrays like the XP/P900 line) presentations lose "symmetry" -- meaning say CHA1A and CHA1B does not have equal numbers of client connections. But in most cases (there have been several WPs on this) additional striping using apt-sized LUNs versus MEGA-sized LUNs will always have some performance advantages.

Hakuna Matata.

brian_31 · ‎04-13-2011

Thanks much for the Great Posts!
I tried to rename the vg but running inti issue with the device for the lv. I have pasted below the steps. I think i am missing something simple.

[root@al21 ~]# vgs -o +vg_uuid
VG #PV #LV #SN Attr VSize VFree VG UUID

app1_vg 1 1 0 wz--n- 9.97G 0 ACfxeP-10PB-labg-Uc5m-JZkC-0y7S-7gP
n7I
test_vg 1 3 0 wz--n- 150.00G 145.70G xLwGtc-6GzT-COlt-0fS0-tVfn-rI4U-kFo
f4C
[root@al21 ~]#
[root@al21 ~]# vgs
VG #PV #LV #SN Attr VSize VFree
app1_vg 1 1 0 wz--n- 9.97G 0
test_vg 1 3 0 wz--n- 150.00G 145.70G
[root@al21 ~]# vgrename xLwGtc-6GzT-COlt-0fS0-tVfn-rI4U-kFo
Old and new volume group names need specifying
Run `vgrename --help' for more information.
[root@al21 ~]# vgrename xLwGtc-6GzT-COlt-0fS0-tVfn-rI4U-kFo
Old and new volume group names need specifying
Run `vgrename --help' for more information.
[root@al21 ~]# vgs -o +vg_uuid
VG #PV #LV #SN Attr VSize VFree VG UUID

app1_vg 1 1 0 wz--n- 9.97G 0 ACfxeP-10PB-labg-Uc5m-JZkC-0y7S-7gP
n7I
test_vg 1 3 0 wz--n- 150.00G 145.70G xLwGtc-6GzT-COlt-0fS0-tVfn-rI4U-kFo
f4C
[root@al21 ~]# vgs
VG #PV #LV #SN Attr VSize VFree
app1_vg 1 1 0 wz--n- 9.97G 0
test_vg 1 3 0 wz--n- 150.00G 145.70G
[root@al21 ~]# pvs
PV VG Fmt Attr PSize PFree
/dev/mapper/mpath1 test_vg lvm2 a- 150.00G 145.70G
/dev/mapper/mpath2 lvm2 -- 9.77G 9.77G
/dev/sda7 app1_vg lvm2 a- 9.97G 0
[root@al21 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda2 16253956 603316 14811648 4% /
/dev/sda9 67771192 184348 64088692 1% /home
/dev/sda8 5080796 141500 4677040 3% /tmp
/dev/mapper/app1_vg-app1_lv
10125560 154164 9448748 2% /opt/app/al21
/dev/sda6 15235040 1142564 13306096 8% /usr
/dev/sda5 15235040 371908 14076752 3% /var
/dev/sda1 256666 26775 216639 11% /boot
tmpfs 4088752 0 4088752 0% /dev/shm
[root@al21 ~]# cd /new
[root@al21 new]# ll
total 0
[root@al21 new]# cd ..
[root@al21 /]# lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
app1_lv app1_vg -wi-ao 9.97G
lvol0 test_vg -wi-a- 100.00M
lvol1 test_vg -wi-a- 200.00M
testvol test_vg -wi-a- 4.00G
[root@al21 /]# mount /dev/test_vg/testvol /new
[root@al21 /]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda2 16253956 603316 14811648 4% /
/dev/sda9 67771192 184348 64088692 1% /home
/dev/sda8 5080796 141500 4677040 3% /tmp
/dev/mapper/app1_vg-app1_lv
10125560 154164 9448748 2% /opt/app/al21
/dev/sda6 15235040 1142564 13306096 8% /usr
/dev/sda5 15235040 371908 14076752 3% /var
/dev/sda1 256666 26775 216639 11% /boot
tmpfs 4088752 0 4088752 0% /dev/shm
/dev/mapper/test_vg-testvol
4129984 50076 3875132 2% /new
[root@al21 /]# vgrename xLwGtc-6GzT-COlt-0fS0-tVfn-rI4U-kFof4C newvg
Volume group "test_vg" still has active LVs
Internal error: Volume Group newvg was not unlocked
Device '/dev/mapper/mpath1' has been left open.
Device '/dev/mapper/mpath1' has been left open.
Device '/dev/mapper/mpath1' has been left open.
[root@al21 /]# vgchange -a n test_vg
Can't deactivate volume group "test_vg" with 1 open logical volume(s)
[root@al21 /]# umount /new
[root@al21 /]# vgchange -a n test_vg
0 logical volume(s) in volume group "test_vg" now active
[root@al21 /]# vgrename xLwGtc-6GzT-COlt-0fS0-tVfn-rI4U-kFof4C newvg
Volume group "test_vg" successfully renamed to "newvg"
[root@al21 /]# mount /dev/newvg/testvol /new
mount: special device /dev/newvg/testvol does not exist
[root@al21 /]# vgs
VG #PV #LV #SN Attr VSize VFree
app1_vg 1 1 0 wz--n- 9.97G 0
newvg 1 3 0 wz--n- 150.00G 145.70G
[root@al21 /]# lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
app1_lv app1_vg -wi-ao 9.97G
lvol0 newvg -wi--- 100.00M
lvol1 newvg -wi--- 200.00M
testvol newvg -wi--- 4.00G
[root@al21 /]#
[root@al21 /

Thanks

Brian

Reiner Rottmann · ‎04-13-2011

You did a vgchange -a n volgroup.

Therefor you need to re-activate the volumegroup with vgchange -ay volgroup.

man 8 vgchange:

-a, --available [e|l]{y|n}
Controls the availability of the logical volumes in the volume group for input/output. In other words, makes the logical volumes known/unknown to the kernel.

If clustered locking is enabled, add â eâ to activate/deactivate exclusively on one node or â lâ to activate/deactivate only on the local node. Logical volumes with single-host snapshots are always activated exclusively because they
can only be used on one node at once.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: LVM and striping

LVM and striping