Linux Multipath with DRBD & TGT stack

Occasional Advisor

Linux Multipath with DRBD & TGT stack

Trying out SAN replication over iSCSI with DRBD.  I have:


RAC1 <-----> RAC2


          OCFS2                                           (note:  ocfs2 for ocr, votedisk and asm spfile only)

CentOS         CentOS

Linux1            Linux2

TGT               TGT

DRBD <---> DRBD    (primary/primary LUN replication using DRBD by Linbit)

Debian          Debian

Linux1           Linux2


Unfortunately, something is wrong with my UDEV rules or something because I'm getting switcheroo with the LUN device assignments.


Anyone know of commands to figure out which /dev/sd* devices are getting assigned to which /dev/dm-* devices?  I need to debug where the device assignments are going wrong.  I've used scsi_id -gus to check the WWID's, and I built the RAC to use the aliases in /dev/mapper (the partition aliases - e.g. asm1p1, asm2p1).  I thought the /dev/mapper aliases were persistent across reboots.  Is that a true statement?  Seems maybe not based on what I'm seeing with surprise loss of Oracle functionality on some reboots.




Gilbert Standen
Honored Contributor

Re: Linux Multipath with DRBD & TGT stack

"multipath -ll" will describe the current assignments.


Which version of CentOS are you using? RHEL (and thus probably CentOS too) had a nasty little trap in versions 4.x and 5.x. The multipath subsystem starts up rather early in the boot process, and if it does not see the current configuration files at that point, you may get surprises later.

  1. If your /var is a separate filesystem and you're using the default "friendly names", the friendly name information file /var/lib/multipath/bindings is not accessible when multipathing is started, because at that point, /var is not yet mounted. So the multipath system assigns the names from scratch. Later, when /var is mounted, various things will trigger the system to reread the multipath configuration... but if the multipath devices are already in use (by LVM, ASM or whatever), then the multipath subsystem cannot change the mappings to match the bindings file, resulting in much confusion if the bindings file would assign different names to the already-assigned devices.
  2. If your system boots from a multipathed device, the multipath configuration is included in the initrd file. Therefore, you must always remember to run mkinitrd after changing the multipath configuration, otherwise you'll risk the same kind of confusion as above when the configuration in the initrd does not match with the configuration in the real /etc/multipath.conf.

The standard workaround for point 1.) above is to move the bindings file to /etc/multipath/bindings:

mkdir /etc/multipath
mv /var/lib/multipath/bindings /etc/multipath/bindings
ln -s /etc/multipath/bindings /var/lib/multipath/bindings

# edit /etc/multipath.conf. 
# add to the "defaults" section this line:
#     bindings_file "/etc/multipath/bindings"


For diagnostic purposes, or after making major changes to multipath configuration, you might want to completely flush the in-kernel multipath configuration. To make the system flush and re-detect the multipath devices, you must:

  1. Stop everything that uses the multipath devices. Stop applications, unmount filesystems, deactivate VGs, shutdown ASM.
  2. Stop multipathd ("service multipathd stop"). This ensures it won't do anything behind your back, and that it will be using the latest configuration when restarted. (Note: this won't cause the multipath devices to vanish; the multipathd daemon only monitors the multipath devices and re-activates failed paths when they start working again.)
  3. Run "multipath -F". This will "flush" all inactive multipath devices from the kernel. If your system disk is multipathed, the multipath device for your root filesystem won't go away when you do this.
  4. Run "multipath" (or if you want to see what is going on, "multipath -v2"). This will re-establish the multipath devices, according to the current configuration available to the multipath tools. If you see error messages, something is wrong; read the messages and fix their causes.
  5. Restart multipathd. ("service multipathd start")
  6. Now the multipath devices should exactly match the current configuration. Restart everything you stopped at step 1. (Activate VGs, start ASM, mount filesystems, start applications.)
Occasional Advisor

Re: Linux Multipath with DRBD & TGT stack

Just for the sake of completeness, let me follow up here.

Thanks for the pointers, gtk!  In my setup, /var was not on a separate mount point so the notes did not apply.

It turned out that my DRBD was not constructed correctly.  I had ocfs running up at the rac level, but of course, silly me, ocfs had to be running at the debian linux level too (where the drbd replication takes place) because drbd will replicate in primary/primary iff a cfs (such as ocfs2) is running on the drbd devices.  So I made sure my drbd r1 resource was setup correctly with respect to drbd.conf etc.,

So, I had to boot up all 4 "boxes" (virtual machines all on one development machine). 

Next, shutdown all Oracle processes (asm, nodeapps, crs/evmd/css)

copy the votedisk, ocr/ocrm, spfile+ASM.ora to non-ocfs2 mount point area

shutdown both RAC vm's

then ran at one of the the debian box only :

sudo mkfs.ocfs2 -b 4K -C 32K -N 8 -L oradatafiles /dev/drbd1

Note I gave it 8 slots to allow for plenty of mount slots - why not?

Next I mounted the file system at the CentOS Oracle RAC boxes:

mount -t ocfs2 -o _netdev,datavolume,nointr /dev/mapper/ocfs2 /u02

Next I mounted the file system at the Debian "SAN" boxes

sudo mount -t ocfs2 /dev/drbd1 /media

Note that the mount at Debian is different, because the options specified on the CentOS boxes ("datavolume","nointr") are not supported in the Debian version of ocfs2 that i am using (the one from the repository). These options aren't needed at this SAN side because I'm just mounting it for DRBD to be able to use and so that I can see what's on the LUN from any of the 4 vm's.

Now with all this done, the DRBD LUN supporting the OCFS2 filesystem was properly replicating between the 2 drbd devices.  The problem I had originally was due to the simple fact that the DRBD luns "mirrored devices" were not replicating, and the ocr, votedisk, and asm spfile were only on one side of the DRBD setup, so when multipath happened to choose the path to the "blank" LUN Oracle did not start.


Gilbert Standen
Occasional Advisor

Re: Linux Multipath with DRBD & TGT stack

ok life and linux is a learning experience so I'm updating my above post to say that the part about having ocfs running at the debian level turns out to be nonsense, and in fact, not a good idea. My "stable" drbd rac storage setup has the drbd ocfs2 luns in primary/primary (active/active) mode at the debian (storage) level but the ocfs2 cluster runs ONLY at the RAC level. I had a problem with one of the debian nodes evicting from the debian "ocfs cluster" and it was because the two debian vm's were run from different devices (one an external usb device) and one on the laptop harddrive. I read recently that devices with different "speeds" can lead to fencing problems etc. causing one node to panic and reboot, so, I have found that the whole setup is very stable when I only run the ocfs2 at the RAC nodes level. It all works very nicely now. My entire set of RAC LUNS are replicated by DRBD to a second disk device (an external USB) so in event of device failure, I should have a mirrored copy at the physical block level (at the "SAN" level if you will) of the entire RAC DB.
Gilbert Standen