Operating System - Linux
1748136 Members
3630 Online
108758 Solutions
New Discussion юеВ

Re: vgchange activation error

 
SOLVED
Go to solution
MikeL_4
Super Advisor

vgchange activation error

When I try to activate a Volume Group on db01 I am receiving this activation filter errorтАж.

We have a db02 server that can see and use the same SAN DASD and I can activate it and

mount the file system there okтАж

The issue started when db01 crashed and rebooted and must of set some flag some where that

is now preventing it from startin up againтАж.



IтАЩve tried doing a vgexport on db02 and an import on db01 but I still get the same errorтАж



Any ideas ??



[root@db01 ~]# /sbin/vgchange -a y datavg1

Found duplicate PV tbU5yWceVhgPgS6RIvj0M2TxChiLp61b: using /dev/sdr2 not /dev/sdb2

Not activating datavg1/lvol01 since it does not pass activation filter.

0 logical volume(s) in volume group "datavg1" now active

[root@db01 ~]#

[root@db01 ~]# lvdisplay -v /dev/datavg1/lvol01

Using logical volume(s) on command line

Found duplicate PV tbU5yWceVhgPgS6RIvj0M2TxChiLp61b: using /dev/sdr2 not /dev/sdb2

--- Logical volume ---

LV Name /dev/datavg1/lvol01

VG Name datavg1

LV UUID e2DFlG-CweU-zVsV-wzfs-oDwY-IpUF-JKgHKt

LV Write Access read/write

LV Status NOT available

LV Size 1000.00 GB

Current LE 256000

Segments 4

Allocation inherit

Read ahead sectors auto

[root@awopdb01 ~]#



[root@db01 ~]# vgimport datavg1

Found duplicate PV tbU5yWceVhgPgS6RIvj0M2TxChiLp61b: using /dev/sdr2 not /dev/sdb2

Volume group "datavg1" successfully imported

[root@db01 ~]# /sbin/vgchange -a y datavg1

Found duplicate PV tbU5yWceVhgPgS6RIvj0M2TxChiLp61b: using /dev/sdr2 not /dev/sdb2

Not activating datavg1/lvol01 since it does not pass activation filter.

0 logical volume(s) in volume group "datavg1" now active

[root@db01 ~]#







But when I activate it on the second server it works ok:



[root@db02 ~]# /sbin/vgchange -a y datavg1

Found duplicate PV tbU5yWceVhgPgS6RIvj0M2TxChiLp61b: using /dev/sdr2 not /dev/sdb2

1 logical volume(s) in volume group "datavg1" now active

[root@db02 ~]#

[root@db02 ~]# ls -al /dev/datavg1

total 0

drwxr-xr-x 2 root root 60 May 18 19:45 .

drwxr-xr-x 17 root root 7280 May 18 19:45 ..

lrwxrwxrwx 1 root root 26 May 18 19:45 lvol01 -> /dev/mapper/datavg1-lvol01

[root@db02 ~]#



7 REPLIES 7
Matti_Kurkela
Honored Contributor
Solution

Re: vgchange activation error

You seem to have some HP-UX experience. Beware: vgimport and vgexport will not work on Linux at all like you may have used to on HP-UX.

Does this system have any kind of cluster suite installed? (Serviceguard? RedHat Cluster Suite? DB2 cluster? Something else?)

What is the output of these commands:

grep -e filter -e volume_list /etc/lvm/lvm.conf
vgs -o +tags
lvs
pvs

If /etc/lvm/lvm.conf contains an uncommented filter expression that is different from the default value:

filter = [ "a/.*/" ]

... or an uncommented "volume_list" definition, then it's probably been added there for a reason: don't change it until you understand why the current value is there.

The activation filter and/or VG tags are often used as a part of a cluster interlock mechanism that stops the cluster node from activating a VG that is in use by another cluster node. (If the particular VG is supposed to be accessed by more than one node simultaneously, then the lockout is designed to prevent *uncoordinated* access: cluster nodes must be able to communicate with each other to be aware of what the other nodes are doing. The nodes must coordinate their actions so that one node does not accidentally use an stale cached copy of some record when another node has just updated it.)

If this is what is stopping you from activating the VG, it probably means that some sort of cluster infrastructure process did not automatically start up when db01 was rebooted; when you find it and start it, it might automatically fix this problem for you.

MK
MK
MikeL_4
Super Advisor

Re: vgchange activation error

The two servers: db01 and db02, are running in a Red Hat Cluster...

The problem started when db01 failed when it lost contact with the quorum disk...

I believe the cluster tried to start on db02 but failed with the same quorum disk issue, since it lost the connectivity and must of set some kind of lock or tag that is preventing it to activate on db01..

I was able to mount it manually on db02 as the SAN group resolves the issue with the quorum disk..

Just need to figure out what is preventing it from starting up on db01 so we can get the cluster back up and going again...

[root@awopdb01 ~]# grep -e filter -e volume_list /etc/lvm/lvm.conf
# A filter that tells LVM2 to only use a restricted set of devices.
# The filter consists of an array of regular expressions. These
# Don't have more than one filter line active at once: only one gets used.
#filter = [ "a/.*/" ]
#filter = [ "r|/dev/sdr/|", "r|/dev/sdi/|" ]
#filter = [ "a|/dev/sda.*|", "a|/dev/mpath/.*|", "r/.*/" ]
# filter = [ "r|/dev/cdrom|" ]
# filter = [ "a/loop/", "r/.*/" ]
# filter =[ "a|loop|", "r|/dev/hdc|", "a|/dev/ide|", "r|.*|" ]
# filter = [ "a|^/dev/hda8$|", "r/.*/" ]
# The results of the filtering are cached on disk to avoid
# If volume_list is defined, each LV is only activated if there is a
# volume_list = [ "vg1", "vg2/lvol1", "@tag1", "@*" ]
volume_list = [ "VolGroup00", "@awopdb01" ]
[root@awopdb01 ~]# vgs -o +tags
Found duplicate PV tbU5yWceVhgPgS6RIvj0M2TxChiLp61b: using /dev/sdr2 not /dev/sdb2
VG #PV #LV #SN Attr VSize VFree VG Tags
VolGroup00 2 6 0 wz--n- 680.34G 80.25G
datavg1 4 1 0 wz--n- 1000.00G 0 awopdb02
[root@awopdb01 ~]# lvs
Found duplicate PV tbU5yWceVhgPgS6RIvj0M2TxChiLp61b: using /dev/sdr2 not /dev/sdb2
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
LogVol00 VolGroup00 -wi-ao 100.00G
LogVol01 VolGroup00 -wi-ao 192.00G
LogVol02 VolGroup00 -wi-ao 192.00G
lvol1 VolGroup00 -wi-ao 46.09G
lvol2 VolGroup00 -wi-ao 50.00G
lvol3 VolGroup00 -wi-ao 20.00G
lvol01 datavg1 -wi--- 1000.00G
[root@awopdb01 ~]# pvs
Found duplicate PV tbU5yWceVhgPgS6RIvj0M2TxChiLp61b: using /dev/sdr2 not /dev/sdb2
PV VG Fmt Attr PSize PFree
/dev/mpath/mpath11 datavg1 lvm2 a- 250.00G 0
/dev/mpath/mpath12 datavg1 lvm2 a- 250.00G 0
/dev/mpath/mpath13 datavg1 lvm2 a- 250.00G 0
/dev/mpath/mpath14 datavg1 lvm2 a- 250.00G 0
/dev/mpath/mpath1p2 lvm2 a- 267.75G 267.75G
/dev/sda2 VolGroup00 lvm2 a- 408.09G 0
/dev/sda3 VolGroup00 lvm2 a- 272.25G 80.25G
[root@awopdb01 ~]#
Matti_Kurkela
Honored Contributor

Re: vgchange activation error

OK, this looks like a HA LVM configuration of RedHat Cluster Suite. (In other words, very much like what later versions of Serviceguard for Linux use as a substitute of HP-UX "vgchange -a e".)

In this configuration, the cluster VG can only be active on one of the cluster nodes at a time, or on none at all: never on two or more nodes simultaneously. The activation of the cluster VG is controlled by the cluster suite.

Which version of RHEL? The RedHat Cluster Suite has changed a lot between versions.
The instructions below assume RHEL 5, but should be mostly compatible with RHEL 4 or 6 too. The HA LVM mode is available on RHEL 4.5 and newer.

Your datavg1 volume group currently has a tag "awopdb02" on it - meaning the VG is currently in use on awopdb02 (or the cluster suite had it active there when the system crashed).

It is a cluster volume group, so *you should not activate it* manually in a normal situation - the cluster suite will activate it if (and only if) appropriate checks are successful. *It is not an error* that you cannot activate the VG - that is the cluster safety system doing its job.

If both nodes failed to reach the quorum disk, that means both nodes should have noticed they've lost quorum and rebooted - is this what happened? That's what a cluster *should* have done in that situation.

First, you should run "clustat" and "cman_tool status" on both nodes.

- Are both nodes "online" in the clustat listing? (if not, the node that is not "online" should not activate datavg1 unless the cluster daemons have been completely stopped on both nodes AND the sysadmin has verified the other node does not have it active.)

- Does "clustat" say "Member Status: Quorate" on both nodes? (If not, the node that is not quorate should not activate datavg1...[see above])

- What's the state of the cluster services in the clustat listing? (If the nodes are online but the services are stopped, then datavg1 should not be activated anywhere.)

- In the "cman_tool status" listings, are the values of "Config Version", "Cluster Id" and "Cluster Generation" the same in both nodes?
(If not, and both nodes are online as per clustat, then *you're in a split-brain situation*: both nodes are thinking "I'm OK, the other node is not.")


My recommendation:
1.) Undo all your manual activation steps on awopdb02. If you started the database manually, stop it. If you mounted the disks manually, unmount them. Deactivate the VG.

2.) If the cluster services are not running on one or both nodes, start them: qdiskd, cman and rgmanager. If HA LVM-style configuration is used, you shouldn't need clvmd; but starting it too won't hurt anything. The fact that datavg1 is not activated should not prevent starting the cluster daemons.

3.) Make sure both cluster nodes are quorate and communicating with each other (see the "clustat" and "cman_tool status" checks above).

4.) If your database service is configured to start up automatically, rgmanager should start it: if not, use the "clusvcadm -e " command to start it.

If your cluster is properly configured, this should take care of the VG activation and all the necessary application start-up actions.

If you really need to override the cluster suite's control on VG activation, you should understand how the HA LVM configuration works, and then read the vgchange(8) man page, paying attention to the --addtag and --deltag options.

MK
MK
Zinky
Honored Contributor

Re: vgchange activation error

Post your "filter" line in your /etc/lvm/lvm.conf. It is very possible it was changed "recently" by someone who does not fully understand the implications.
Hakuna Matata

Favourite Toy:
AMD Athlon II X6 1090T 6-core, 16GB RAM, 12TB ZFS RAIDZ-2 Storage. Linux Centos 5.6 running KVM Hypervisor. Virtual Machines: Ubuntu, Mint, Solaris 10, Windows 7 Professional, Windows XP Pro, Windows Server 2008R2, DOS 6.22, OpenFiler
MikeL_4
Super Advisor

Re: vgchange activation error

# grep filter /etc/lvm/lvm.conf
# A filter that tells LVM2 to only use a restricted set of devices.
# The filter consists of an array of regular expressions. These
# Don't have more than one filter line active at once: only one gets used.
filter = [ "a/.*/" ]
# filter = [ "r|/dev/cdrom|" ]
# filter = [ "a/loop/", "r/.*/" ]
# filter =[ "a|loop|", "r|/dev/hdc|", "a|/dev/ide|", "r|.*|" ]
# filter = [ "a|^/dev/hda8$|", "r/.*/" ]
# The results of the filtering are cached on disk to avoid
#
MikeL_4
Super Advisor

Re: vgchange activation error

Was able to resolve issue by changing following in /etc/lvm/lvm.conf

from:
volume_list = [ "VolGroup00", "@db01" ]

to:
volume_list = [ "VolGroup00", "@db01", "datavg1/lvol1" ]
Matti_Kurkela
Honored Contributor

Re: vgchange activation error

> volume_list = [ "VolGroup00", "@db01", "datavg1/lvol1" ]

You've now effectively disabled the HA LVM protection: datavg1 can now be activated on this node even if it has a tag that indicates it may currently be active on another node.

If db02 is currently running the service and db01 is rebooted, this change allows db01 to activate the datavg1 at boot time and perhaps perform an automatic filesystem check on datavg1/lvol1... while the filesystem is active on db02. This will *certainly* cause filesystem corruption, because db01's fsck will see db02's on-going operations as "corruption" and will attempt to fix it.

At that point, db02 will see problems like "WTF??? I just changed this directory entry from X to Y, but now it's back at X again?" This will typically cause the filesystem to become read-only at db02.

Let me emphasise: In a HA LVM configuration, it is important that the shared VGs *must not* be activated before the cluster services are started and communicating with the other node(s). The shared VGs *must not* be activated, filesystem-checked nor mounted by the regular start-up procedure: they must be controlled entirely by the cluster mechanisms.

If the shared filesystem is mentioned in /etc/fstab at all (you could omit it completely), it *must* have mount option "noauto" and the filesystem check pass number at the 6th column of fstab set to 0. Otherwise your system will fail to boot if the HA LVM locking mechanism works, or may corrupt your shared filesystem if the locking mechanism fails.

If your cluster configuration requires that the shared VG is activated on one or the other node before the cluster daemons are started, then your cluster configuration is misdesigned.

The correct procedure for manually activating a HA LVM -configured shared VG is like this:

(Note: this procedure is for emergency/maintenance use only. In normal use, the cluster should handle all this automatically - if it doesn't, your cluster may not be able to perform an automatic failover in a real failure situation.)

1.) Use "vgs -o +tags" to see if the VG currently has a tag on it.

2.) If the VG has no tag, or a tag that matches the name of the host you wish to activate the VG on, you can go directly to step 7.

3.) If the VG has a tag that matches the hostname of another node, *you must* first make sure that node does not have the VG currently activated.

4.) When you're sure the VG is not currently active on any node, you can use "vgchange --deltag" to remove the VG tag of the other node:

vgchange --deltag db02 datavg1

5.) At this point, say to yourself: "I am definitely certain this VG is not active on any cluster node, and I understand I will held responsible of any damages to data if this is not true." You're telling you know better than the cluster here.

6.) Then add a new tag that matches the hostname of the node you wish to activate the VG in:

vgchange --addtag db01 datavg1

7.) Activate the VG as normal.

vgchange -a y db01

8.) If applicable, run a filesystem check on the LV(s):

fsck -C0 /dev/mapper/datavg1-lvol1

9.) If applicable, mount the filesystem(s).

If the LV contains a raw database instead of a filesystem, steps 8 and 9 will not be applicable; instead, the database engine may be started at that point.

MK
MK