Operating System - Linux
1839241 Members
2226 Online
110137 Solutions
New Discussion

Re: creation of same volume group for two systems

 
SOLVED
Go to solution
Abdul Wahab
Advisor

creation of same volume group for two systems

Hello,
we have two blade servers with Redhat 4 update 4 installed. storage partition from eva is presented to these servers.
However the configuration is done such that the storage partition is mounted manually on one server at a time.
mount command shows:
/dev/mapper/vgac-lvol0 on /apps
fstab entry has
/dev/vgac/lvol0 /apps ....
If server 1 is down on teh other server we can mount the same partition by the command
vgchange -a y vgazac
mount /dev/vgazac/lvol0
We dont know how this volume configuration was done by consultants adn now they are out of reach.
We are upgrading to new EVA and want the same kind of configuration to be done.
Can someone help us by giving us the steps to do this.
I reasearched a lot on redhat site for volume group creation but the methods shown there are not configured as such
in the existing scenario. what is confusing me is that device-mapper-multipath is not installed. multipath.conf file
also does not exist. Yet a volume group is created under /dev/mapper?!
even fdisk shows multiple partition names and also names like dm1 .. etc
8 REPLIES 8
Calandrello
Trusted Contributor

Re: creation of same volume group for two systems

dear
hello
I was only a doubt that vg and mounted on 2 servers at the same time?
Another issue on Linux it seems set mapper for this, plus the correct path and
/dev/vgname
Abdul Wahab
Advisor

Re: creation of same volume group for two systems

Dear clanderello,
I could not make out what you want to say? the partition is not mounted on both servers at the same time! it is mounted only on one of the servers at a time.
Matti_Kurkela
Honored Contributor
Solution

Re: creation of same volume group for two systems

Device-mapper-multipath is not the *only* thing that uses /dev/mapper.

In Linux 2.6 kernel series, a feature called "device-mapper" was introduced: it is a building block for many storage features, like multipath, LVM2, RAID and snapshots. All these use /dev/mapper and /dev/dm* devices.

The /dev/dm* devices are primarily for device-mapper subsystem's internal use: these names are dynamically generated and a disk/LUN may get a different /dev/dm* name after a reboot. The names in /dev/mapper should be used for configuration, as they are (more) persistent.

With LVM2, the /dev/vg* directories contain only symbolic links to respective /dev/mapper/vg* devices.

The storage configuration you described is similar to what ServiceGuard for Linux uses, but without any automation and safety checks.

It is rather easy to create:

1.) configure the storage system to present the necessary LUN(s) to both system1 and system2

2.) create the VG and the filesystem on system1 as normal

3.) make VERY SURE that the system1 will NOT try to auto-mount this filesystem when it boots:
- use the "noauto" option in /etc/fstab
- the RedHat system startup will attempt to auto-activate all VGs early in boot sequence; make a simple custom startup script that de-activates this VG to prevent accidental access while the VG is used by the other system.

(The script does not need to be very complex: it only needs to run the "vgchange -a n vgazac" or equivalent command.)

4.) reboot system2 to make sure it detects the SAN devices (and their partition tables) correctly

5.) run "vgscan -vv" on system2 to make sure it detects the shared VG that was created on system1; there is no need to use pvcreate, vgcreate, lvcreate or mkfs again.

6.) make VERY SURE that the system2 will NOT try to auto-mount this filesystem when it boots (see step 3: copy the /etc/fstab entry and the custom startup script from system1 to system2.)

7.) create the mount-point directory on system2

8.) test it

WARNING for others that would like to try this setup: unless you use a real cluster filesystem (e.g. RedHat GFS or Oracle OCFS), mounting the filesystem on both systems simultaneously will quickly cause data corruption.

Nightmare example 1:
If both systems mount the same partition in read/write mode, the data on the disk is corrupted and fsck cannot save it.

1.) System1 reads a file, caching it and the filesystem metadata associated with it

2.) System2 reads the same file too, also caching the file and its metadata.

3.) System1 re-writes the file, causing it to be stored to a different physical location on the disk. The cache on system2 is now stale, but system2 does not know it.

4.) System2 writes to the file, using the stale filesystem metadata in its cache... causing the write operation to target wrong disk blocks.
*Data is lost or corrupted on disk!!!*


Even mounting the filesystem in read-only mode is not safe while the other system has the filesystem mounted in read/write mode.
Nightmare example 2:
System1 is read/write, System2 is read only.

1.) System2 reads something from the disk, caching it.

2.) System1 makes changes to the data.

3.) Application on System2 needs to re-read the same file as in step 1; it gets old data from System2's disk cache.
*Stale data processed on System2*

4.) Application on System2 reads more data than is already in the cache. System2 reads more data from disk to cache, including some of the changes made by System1.
*Incoherent mixture of old and new data processed on System2*

MK
MK
Abdul Wahab
Advisor

Re: creation of same volume group for two systems

Dear Matti,
Thanks a lot for your answer which is very informational indeed. But it would be great if you could give me few more steps
and clear few more doubts:

1. I still dont understand how device files were created in /dev/mapper. Device-mapper-multipath rpm is not installed in the system.
multipath.conf file is also not present.
2. When proper LUNs will be presented to these servers , what commands shall i use to create the volume group? what device path shall i use in these
commands. After presenting the LUNs and running partprobe, will it automatically create /dev/dm* device files?
3. how shall we remove the present volume groups from the existing SAN (which is to be removed) ? when we will disconnect the old san and connect to new san , the partitions can appear with the same old names perhaps?

Thanks a lot for your helpful answer. Waiting for your next answer.

A.Wahab
Matti_Kurkela
Honored Contributor

Re: creation of same volume group for two systems

1.) LVM creates files in /dev/mapper too, with or without device-mapper-multipath. This is because the device-mapper kernel sub-system is used to implement LVM in kernel 2.6.* series.

Device-mapper kernel sub-system is always present in RHEL 4 standard kernels. The device-mapper-multipath RPM installs extra tools that allow the use of that sub-system to handle multi-pathed disks. Without the device-mapper-multipath RPM, the kernel sub-system can still be used for other purposes, e.g. for LVM.

If you have a VG named "vgazac" which has one LV named "lvol0", LVM creates /dev/mapper/vgazac-lvol0 when the VG is activated.

For reasons of convenience and backwards compatibility, LVM also creates the directory /dev/vgazac and a symbolic link /dev/vgazac/lvol0 that points to /dev/mapper/vgazac-lvol0.
For the internal use of the device-mapper sub-system, it will also create the /dev/dm- device.

When a filesystem is mounted, the mount command will examine the device file it's instructed to use. If it's a symbolic link (like /dev/vgazac/lvol0), the mount command resolves it and uses the name of the actual disk device, not the name of the symbolic link. This is why the /dev/vgazac/lvol0 gets converted to /dev/mapper/vgazac-lvol0 in the "mount" command output.

2.) You must first choose whether you want to create a traditional partition table on the new LUNs or not.

It is easier to just use the entire LUN as a LVM PV, but in a multi-architecture environment it might be useful to set up a partition table, so that other systems can recognize that the disk is already in use.

Imagine a Windows sysadmin setting up a new LUN he's just received from a SAN admin: "Huh. There is already a partition table there, but the partition type is something that Windows does not recognize... Wait a minute. Did the SAN admin just present me a LUN that is already used by some Linux system, by mistake?" One phone call to the SAN admin: "Oops, you're right. Looks like I made a typo in SAN configuration. Just a moment, I'll fix it." One service outage and filesystem corruption avoided!

The new LUNs will usually appear as /dev/sd* devices. The exact device names will depend on your SAN and hardware configuration. There may be EVA-specific tools to make the identification of LUNs easier. (I'm more familiar with EMC storage systems; I have not worked with EVA.)

The creation of the VG + LV + filesystem (the step 2 of the procedure I outlined in my previous answer):

Let's assume that you're getting three LUNs for your VG, and they're named /dev/sdx, /dev/sdy and /dev/sdz.

Step 1.) (Optional)
Use fdisk, parted or any suitable partitioning tool to set up a single partition on each LUN, taking up 100% of the disk space. The partition type should be "Linux LVM".

The names of the partitions will be /dev/sdx1, /dev/sdy1 and /dev/sdz1.

Step 2.)
Use pvcreate on each partition (or directly on the LUN, if you omitted step 1) to prepare the LUNs for LVM use:

Either
pvcreate /dev/sdx1 /dev/sdy1 /dev/sdz1
or
pvcreate /dev/sdx /dev/sdy /dev/sdz

The partitions/LUNs will now be LVM PVs, ready for use with LVM.

Step 3.)
Use vgcreate to create a new VG that uses the PVs:

If you did step 1:
vgcreate vgazac /dev/sdx1 /dev/sdy1 /dev/sdz1

If you omitted step 1:
vgcreate vgazac /dev/sdx /dev/sdy /dev/sdz

I think the VG is activated automatically as it is created. If not, use
vgchange -a y vgazac

to activate it.

Step 4.)
Use lvcreate to set up a LV of desired size:

lvcreate -L vgazac

Step 5.)
Create a filesystem on the LV. You can use any filesystem type you think is suitable for the purpose of this filesystem: my default choice is ext3 unless there is a specific reason to use something else.

mke2fs -j /dev/vgazac/lvol0

The filesystem is now ready to be mounted!

In this procedure, the /dev/vgazac/lvol0, /dev/mapper/vgazac-lvol0 and /dev/dm-* will be created automatically at step 4, as the LV is created. These device files will disappear whenever the VG is deactivated and re-appear when the VG is activated again. The /dev/dm-* device name may change, depending on what other VGs, encrypted filesystems, snapshots etc. etc. you have active at the moment of VG activation. Don't rely on the /dev/dm-* device names.

You really should read the chapter on LVM in the RHEL 4 System Administration Guide:

https://www.redhat.com/docs/manuals/enterprise/RHEL-4-Manual/en-US/System_Administration_Guide_/Logical_Volume_Manager_LVM.html

The LVM HOWTO is not as pretty, but very informative too:
http://tldp.org/HOWTO/LVM-HOWTO/

3.)
To destroy an existing VG, just unmount the filesystem(s) on that VG, deactivate the VG with "vgchange -a n", and then use the vgremove command to remove the VG.
As the VG still has LVs in it at this point, the command will ask for a confirmation. Press Y if you're sure you're removing the correct VG.

Yes, the VG and LV names and mount points can be the same as with the old SAN. The /dev/sd* device names may be different, but the LVM makes that difference irrelevant to applications.

If you can connect your system to both old and new SAN simultaneously, you could even transfer the data to the new SAN without a service outage, while the filesystem is mounted. This would be called a "SAN data migration". If you search the forums with these key words, you'll find quite a few answers.

MK
MK
Abdul Wahab
Advisor

Re: creation of same volume group for two systems

Hello ,
Thanks a lot man. Answers to my questions would not have been better than what you gave.
Have you referred /dev/sdx1 /dev/sdy1 /dev/sdz1 as the multipaths shown by SAN for single LUN ?

Thanks a lot again. can i have you email id?
Abdul Wahab
Advisor

Re: creation of same volume group for two systems

Dear Matti,
thanks for the links , i read it.
In your last update, after step 5 when the VG is created on system 1, i deactivate it and run vgscan -vv on system 2? Do i need to do extra steps of vgexport or vgimport ? all the required information of these existing volume groups will come in to system2 by vgscan?
Can you please tell me about the custom startup script for vg ?
Matti_Kurkela
Honored Contributor

Re: creation of same volume group for two systems

You did not say what kind of multipathing you're using, only that you *don't* use device-mapper-multipath. So I assumed you're able to deal with it yourself if you have one :)

There are at least 3 possibilities for multipathing in RHEL 4:

1.) multipath integrated into Qlogic HBA driver (the default driver cannot do it; you must use the driver downloadable from http://www.hp.com/go/support or maybe from www.Qlogic.com)
- when configured, this driver hides all the complexity of the multipathing from the OS: the OS sees only a single /dev/sd* device per LUN (+ any partitions, if created)
- harder to monitor/diagnose specific paths

2.) Device-mapper-multipath, available as an optional RPM in the standard RHEL4 distribution.
- this is the solution recommended by RedHat
- used to be rather buggy, but much improved during the RHEL4 release cycle; get the latest version from RedHat Network if you wish to use this.
- each individual path will show as a /dev/sd* device, and then there will be extra /dev/mapper/mpath* devices for multipathing (/dev/mapper/mpathXpY for partitions)
- you can use standard tools with /dev/sd* devices to diagnose/monitor the state of each path
- Documentation is a bit sparse: effective use may require some tricks that are documented only in the RedHat Knowledge Base. Please see:

http://kbase.redhat.com/faq/docs/DOC-3691
http://kbase.redhat.com/faq/docs/DOC-4255
http://kbase.redhat.com/faq/docs/DOC-4042
http://kbase.redhat.com/faq/docs/DOC-3651

3.) SAN-manufacturer-specific multipathing solution (I know PowerPath for EMC, I think EVA has SecurePath which may offer similar functionality...?)
- see the documentation published by your SAN manufacturer.

----

Vgexport/vgimport is *not* needed. Vgscan on system2 will pick up all the VG information automatically.
(vgexport on Linux is *very* different from vgexport in HP-UX.)

----

For the startup script, the requirement is exactly the same as when using the "HP Serviceguard for Linux" product: the automatic activation of all volume groups at system boot must be overridden, and this VG must be activated manually only.

The HP ServiceGuard manual seems to suggest adding the necessary command (in your case, probably "vgchange -a n vgazac") to the end of /etc/rc.d/rc.sysinit script.

See the sub-title "Preventing Boot-Time vgscan and Ensuring Serviceguard Volume Groups Are Deactivated" in this ServiceGuard manual:

http://docs.hp.com/en/B9903-90051/ch01s03.html#bgeecfaf

----

I really would not like to give out my email address, because I don't have the time to provide any sort of private consulting service: my day job at Fujitsu comes first, after all.

On this week, I've been on sick leave because of a minor surgery, so I had plenty of time to write extra-wordy answers to this forum. But on Monday I'll be back at work...

MK
MK