System Administration

replacing shared drive under service guard question

Go to solution
Frequent Advisor

replacing shared drive under service guard question

Trying to keep this as simple as possible...

I have a shared volume group ($VGNAME) with two luns ($LUN1, $LUN2). Each lun is on an EMC Clariion on separate physical JBOD drives. The volume group is shared between 3 servers with MC/Serviceguard (A.11.14) and Oracle.

The lvol's of the volume group are used for Oracle redo logs (raw devices). Since Oracle software mirrors the redo logs, each lvol is on separate lun under the one volume group.

Soft errors are being reported by EMC and they would like to swap out the drive.

I understand that I may need to take the Oracle "mirrored" redo off line while the drive is being replaced. Is it possible to restore the disk configuration on one server while the cluster is up and running? If so, do I restore it like I would a non-shared drive, such as...

vgcfgrestore -n $VGNAME $LUN2
vgchange -a s $VGNAME

It gets more complicated, as there are a total of 16 volume groups (one for each instance of Oracle) set up in the same manner. So, it's possible that each of the JBOD drives has 16 LUNS.

I'm assuming the "worse case" is to stop Oracle, bring the whole cluster down, replace the drive; restore the config for each of the volume groups; then bring the cluster and Oracle back up.

Thoughts or experiences would be helpful. As always, thanks for your input.
It's only a flesh wound...
Steven E. Protter
Exalted Contributor

Re: replacing shared drive under service guard question


It seems this configuration has a single point of failure, which is this volume group and disks mounted in shared mode.

Your situation is complex, but I can't see a way to replace a bad disk without bringing down the database.

Volume groups mounted in shared mode limit your ability to even use mirror/ux to help.

You might be able to use mirror/ux to make a mirror of this logical volume to another location. That way when you pull the disk, the system should stay running. But that other location probably needs to be shared storage and if the logical volume involved has a cluster lock disk on it, down goes the cluster.

Still, to avoid downtime you might try mirroing the raw device to an emergency location. Warn the users first, downtime is probable.

Steven E Protter
Owner of ISN Corporation
Frequent Advisor

Re: replacing shared drive under service guard question

After way too much thinking, here's my plan to solve the problem...

Create new luns using RAID-5and new shared volume groups using these LUNS

Create new raw devices (logical volumes)

Have the Oracle DBA's add the new raw devices for redo and drop the old redo

Then I'll update the control scripts for Serviceguard manually with the new volume groups and remove the old.

Remove the old logical volumes, volume groups, LUNS etc.

Deallocate the LUNS / drives on the EMC side; replace the drive... and use the drives for something other than RAID-0 / JBOD.
It's only a flesh wound...