Operating System - HP-UX
1837085 Members
2402 Online
110112 Solutions
New Discussion

Re: service guard failover

 
SOLVED
Go to solution
Asad Malik
Frequent Advisor

service guard failover

Hi
We run service guard(A.11.08) on 2 nodes, K220, running HP-UX 11.00. We have Sybase database running on it. The database was shutdown abnormally on the primary node and the secondary node did not take over. In the log file on 2ndry node it is mentioned that one of the VG failed to activate.
When using vgdisplay command it give the message for a couple of PV,s
"Couldn?t query physical volume. The specified path does not correspond to the pV attached to this VG."

pvdisplay on both disks give the same message.
"couldn't query physical volumes. could not retrieve the names of pV,s belonging to VG.

Any help I can get
is this the possible reason that package did not switch to secondary node

Thanks
21 REPLIES 21
James R. Ferguson
Acclaimed Contributor

Re: service guard failover

Hi:

The first thing I'd do it make sure that your cmclconfig file is current and the same on all nodes.

...JRF...
John Palmer
Honored Contributor

Re: service guard failover

It sounds as though this VG is not correctly configured on the second node.

The easiest way to fix it is to remove it from that system with 'vgexport' and reimport it with 'vgimport'.

If you need any help with the specifics for doing this please repost.

Regards,
John
Rita C Workman
Honored Contributor

Re: service guard failover

My question would be did this failover properly before? If it did...has anything changed on the volume group on the first node?
To explain:
If you added disk or made chgs to the volume group on the first node -
did you remember that you must then do vgexport -pvs -m /etc/lvmconf/vg.map volgrp and then rcp that file over to the second node.
Then on the second node you would have needed to completely remove the /dev/volgrp/group and all files under the /dev/volgroup and then recreate it with the mknod....Now you need to import the information back into this node by doing the vgimport -vs -m /etc/lvmconf/vg.map volgrp and this will fix your /etc/lvmtab (so all the drives come up this second node) and put your /dev/volgrp/files out there too.
And if you added any new logical volumes or new filesystems, so that on your first node you changed your package configuration file, you must also change that information on the other node as well.
And one last note: I usually take my packages down on all affected nodes, while I'm working ... so I would have done a cmhaltpkg, then done a vgchange -c n /dev/volgrp, then done a vgchange -a y /dev/volgrp before I started any changes....then when I was all done and ready to put everything back I would have reversed these three steps (vgchange -a n /dev/volgrp, then vgchange -c y /dev/volgrp, then cmrunpkg...)

My guess is that a change was done to the primary node, but the secondary node did not get the vgimport mapfile...so your /etc/lvmtab is not current. I'd run strings on /etc/lvmtab on the second node to check this.

Just a thought,
melvyn burnard
Honored Contributor

Re: service guard failover

what does a :
strings /etc/lvmtab
on each node show?
has this ever worked? or been tested lately?
if the PV's do not match what the system knows about, then yes the VG will fail to activate.
Another question is, what does ioscan show?
do you maybe have a hardware error, in that a path to the discs is down?
My house is the bank's, my money the wife's, But my opinions belong to me, not HP!
Asad Malik
Frequent Advisor

Re: service guard failover

on both nodes, strings /etc/lvmtab
shows the same # of VG's and the respective PV's for VG's. ioscan displays the disk and are in claimed state.
ha seen it worked before.
James R. Ferguson
Acclaimed Contributor

Re: service guard failover

Hi:

OK, from your latest reply we will assume that NO LVM changes have been made to either node (?) and that the cmclconfig file is the same on both nodes (?).

Next, did the package actually halt correctly on the primary node? Look at the /etc/cmcluster//control.sh.log on the primary node to see if there were any problems deactivating the volume group. If the volume group didn't deactivate then it can't be adopted by the other node.

Also, please post the cmviewcl output.

...JRF...
Stephen Doud
Honored Contributor

Re: service guard failover

If the package failed to halt properly on the primary server, it's VG would not be activatable (or displayable) on the secondary server. Check the primary package control log file to see just what happened. The package control log registers the package startup and shutdown messages. If the packages' DB failed abnormally, it's possible that the package may not have shutdown all the way due to open data files. Always review the pkg log files and syslog.log for clues regarding package adoption problems.
Asad Malik
Frequent Advisor

Re: service guard failover

Hi
cmclconfig on both nodes is same.
package halt correctly on primary node, all 3 VG were deactivated successfully. On secondary node 2/3 were activated successfully. one failed.
output of cmviewcl is as under

CLUSTER STATUS
tang_cl1 up

NODE STATUS STATE
tang1 up running

Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 10/4/8 lan1
PRIMARY up 10/12/6 lan0
STANDBY up 10/4/16 lan2

PACKAGE STATUS STATE PKG_SWITCH NODE
sybase_pkg up running enabled tang1

Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual

Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up Unlimited 0 sybase0
Subnet up 10.14.0.0

Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled tang1 (current)
Alternate up enabled tang2

NODE STATUS STATE
tang2 up running

Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 10/4/8 lan0
STANDBY up 10/4/16 lan1
PRIMARY up 10/12/6 lan2
James R. Ferguson
Acclaimed Contributor

Re: service guard failover

Hi Asad:

James R. Ferguson
Acclaimed Contributor

Re: service guard failover

Hi Asad:

On tbe secondary node, on which the package failed to start, would you please post the /etc/cmcluster//control.sh.log
Thanks.

...JRF...
Asad Malik
Frequent Advisor

Re: service guard failover

Hi
output of cntl.log is attached
John Palmer
Honored Contributor

Re: service guard failover

Asad,

Please do the following on you backup node:-

ll /dev/vg*/group

and check that each volume group has a unique minor number.

For example:-

crw-r----- 1 root sys 64 0x000000 Jul 6 08:17 /dev/vg00/group
crw-r----- 1 root dba 64 0x010000 Sep 12 10:38 /dev/vg01/group
crw-r----- 1 root dba 64 0x030000 Sep 21 11:04 /dev/vg03/group

the above groups are 0x00, 0x01 and 0x03.

Your problem could be due to having configured two groups with the same number.

If your vg_c13 has the same minor number then you will have to vgexport it to remove it then repeat your original vgimport process but use a unique value in your 'mknod group....' command.

Hope this helps,
John
Asad Malik
Frequent Advisor

Re: service guard failover

Hi
you are absolutely right. the minor # is same as of another VG. output of
ll /dev/vg*/group is displayed for both primary and secondary nodes

primary node
crw-r--r-- 1 root sys 64 0x000000 Jul 10 1997 /dev/vg00/group
crw-r--r-- 1 root sys 64 0x020000 May 26 1998 /dev/vg01/group
crw-r----- 1 sybase sybdba 64 0x030000 Oct 23 1999 /dev/vg_cl1/group
crw-r----- 1 sybase sybdba 64 0x040000 Oct 23 1999 /dev/vg_cl2/group
crw-r----- 1 sybase sybdba 64 0x050000 Oct 23 1999 /dev/vg_cl3/group
crw-rw-rw- 1 root sys 64 0x010000 Aug 26 1997 /dev/vg_sybase/group

output of bdf on primary node
Filesystem kbytes used avail %used Mounted on
/dev/vg00/lvol3 103413 71613 21458 77% /
/dev/vg00/lvol1 47829 28732 14314 67% /stand
/dev/vg00/lvol8 598357 304093 234428 56% /var
/dev/vg00/lvol7 646229 489279 92327 84% /usr
/dev/vg01/lv_syb 4190208 2645219 1448746 65% /u1
/dev/vg00/lvol6 299157 80361 188880 30% /tmp
/dev/vg00/lvol5 498645 424490 24290 95% /opt
/dev/vg00/lvol4 19861 14388 3486 80% /home
/dev/vg_cl1/lv_syb 4190208 2992703 1122686 73% /u1_cl


secondary node where the failover fail

crw-r----- 1 root sys 64 0x000000 May 7 1998 /dev/vg00/group
crw-rw-rw- 1 root sys 64 0x050000 Aug 24 1998 /dev/vg01/group
crw-rw-rw- 1 root sys 64 0x030000 Jun 2 1998 /dev/vg_cl1/group
crw-rw-rw- 1 root sys 64 0x040000 Jun 2 1998 /dev/vg_cl2/group
crw-rw-rw- 1 root sys 64 0x050000 May 27 1998 /dev/vg_cl3/group

output of bdf on secondary node
Filesystem kbytes used avail %used Mounted on
/dev/root 115605 43303 60741 42% /
/dev/vg00/lvol1 47829 28042 15004 65% /stand
/dev/vg00/lvol8 626413 269849 293922 48% /var
/dev/vg00/lvol7 650261 465757 119477 80% /usr
/dev/vg00/lvol13 299157 17930 251311 7% /tmp
/dev/vg00/lvol6 749973 526873 148102 78% /opt
/dev/vg00/lvol5 19861 10536 7338 59% /home
/dev/vg01/lvol1 2048000 1655795 367853 82% /u2

/dev/vg01 has been activated and filesystem has been mounted on secondary node recently.

as i am not very strong in LVM, I shall greatly appreciate if a step by step procedure can be provided on how should i proceed from here and on which node.

Thanks a lot
John Palmer
Honored Contributor

Re: service guard failover

Asad,

Your problem on the second node appears to have been caused by vg01 having been created with minor number 05 since you last failed over.

It is easily solved however. Proceed as follows on the secondary node:-

vgexport vg_cl3

This will remove vg_cl3 from the system.

On the primary node do:-
vgexport -p -v -s -m /tmp/map vg_cl3
then copy the map file '/tmp/map' to your secondary node with rcp or ftp.

On the secondary node do:
mkdir /dev/vg_cl3
mknod /dev/vg_cl3/group c 64 0x060000
vgimport -m /tmp/map -s -v vg_cl3

That's it - you will be able to do this with the cluster running, I tested this earlier in the week.

Regards,
John


Asad Malik
Frequent Advisor

Re: service guard failover

Hi
Is this can be a alternate solution

"On secondary node /dev/vg01 has been activated recently and it is not part of the cluster. If filesystem is unmounted and vg01 is deactivated, can vgexport and vgimport be applied to this vg on the secondary node only? like

on secondary node
unmount /u2
deactivate /dev/vg01
vgexport -m /tmp/mapfile /dev/vg01
recreate /dev/vg01 with minor number other than /dev/vg_cl3
vgimport -m mapfile /dev/vg01

and now we have different minor numbers for these two VGs

just checking whether this procedure can be applied in this situation.
Thanks
John Palmer
Honored Contributor

Re: service guard failover

Yes, this will work as well. It's just that reimporting vg_cl3 didn't require you to unmount any filesystems etc.
Asad Malik
Frequent Advisor

Re: service guard failover

Hi
I shall try the fix coming weekend.
but a question
how 2 VG's will end up with same minor number. is it because 1 vg was deactivated and the new one was created or is there something else.

Thanks
John Palmer
Honored Contributor
Solution

Re: service guard failover

How did you create vg01?

If manually, then someone didn't check for unique group id's (ll /dev/vg*/group).

If you go for the solution I posted above - reimporting vg_cl3 rather than vgo1 then you can do it now with no downtime. No need to wait until the weekend.
Asad Malik
Frequent Advisor

Re: service guard failover

crw-r----- 1 root sys 64 0x000000 May 7 1998 /dev/vg00/group
crw-rw-rw- 1 root sys 64 0x050000 Aug 24 1998 /dev/vg01/group
crw-rw-rw- 1 root sys 64 0x030000 Jun 2 1998 /dev/vg_cl1/group
crw-rw-rw- 1 root sys 64 0x040000 Jun 2 1998 /dev/vg_cl2/group
crw-rw-rw- 1 root sys 64 0x050000 May 27 1998 /dev/vg_cl3/group

hi
this is the output of ll /dev/vg*/group. form the secondary server.
and the dates shown are from 1998 for the two VG in question. the package did failover a few times in 1999 and in 2000 successfully. if the minor number was same at that time did the recent activation of vg01 has contributed to this failover failure.
I shall do it on weekend because of customer's wishes.
Chris Garman
Frequent Advisor

Re: service guard failover

An explaintation I can think of is that this directory has been copied from another system, using the copy option to preserve the time attribute.

When I read through this it sounded odd, because I think on HP-UX 11 vgcreate will check that the minor number is unique. Has the o/s been upgraded after these volume groups were created?
Asad Malik
Frequent Advisor

Re: service guard failover

OS was upgraded in Oct. 1999 from 10.20 to 11.00.