Re: service guard failover

Asad Malik · ‎09-19-2000

Hi
We run service guard(A.11.08) on 2 nodes, K220, running HP-UX 11.00. We have Sybase database running on it. The database was shutdown abnormally on the primary node and the secondary node did not take over. In the log file on 2ndry node it is mentioned that one of the VG failed to activate.
When using vgdisplay command it give the message for a couple of PV,s
"Couldn?t query physical volume. The specified path does not correspond to the pV attached to this VG."

pvdisplay on both disks give the same message.
"couldn't query physical volumes. could not retrieve the names of pV,s belonging to VG.

Any help I can get
is this the possible reason that package did not switch to secondary node

Thanks

James R. Ferguson · ‎09-19-2000

Hi:

The first thing I'd do it make sure that your cmclconfig file is current and the same on all nodes.

...JRF...

John Palmer · ‎09-19-2000

It sounds as though this VG is not correctly configured on the second node.

The easiest way to fix it is to remove it from that system with 'vgexport' and reimport it with 'vgimport'.

If you need any help with the specifics for doing this please repost.

Regards,
John

Rita C Workman · ‎09-19-2000

My question would be did this failover properly before? If it did...has anything changed on the volume group on the first node?
To explain:
If you added disk or made chgs to the volume group on the first node -
did you remember that you must then do vgexport -pvs -m /etc/lvmconf/vg.map volgrp and then rcp that file over to the second node.
Then on the second node you would have needed to completely remove the /dev/volgrp/group and all files under the /dev/volgroup and then recreate it with the mknod....Now you need to import the information back into this node by doing the vgimport -vs -m /etc/lvmconf/vg.map volgrp and this will fix your /etc/lvmtab (so all the drives come up this second node) and put your /dev/volgrp/files out there too.
And if you added any new logical volumes or new filesystems, so that on your first node you changed your package configuration file, you must also change that information on the other node as well.
And one last note: I usually take my packages down on all affected nodes, while I'm working ... so I would have done a cmhaltpkg, then done a vgchange -c n /dev/volgrp, then done a vgchange -a y /dev/volgrp before I started any changes....then when I was all done and ready to put everything back I would have reversed these three steps (vgchange -a n /dev/volgrp, then vgchange -c y /dev/volgrp, then cmrunpkg...)

My guess is that a change was done to the primary node, but the secondary node did not get the vgimport mapfile...so your /etc/lvmtab is not current. I'd run strings on /etc/lvmtab on the second node to check this.

Just a thought,

melvyn burnard · ‎09-19-2000

what does a :
strings /etc/lvmtab
on each node show?
has this ever worked? or been tested lately?
if the PV's do not match what the system knows about, then yes the VG will fail to activate.
Another question is, what does ioscan show?
do you maybe have a hardware error, in that a path to the discs is down?

My house is the bank's, my money the wife's, But my opinions belong to me, not HP!

Asad Malik · ‎09-19-2000

on both nodes, strings /etc/lvmtab
shows the same # of VG's and the respective PV's for VG's. ioscan displays the disk and are in claimed state.
ha seen it worked before.

James R. Ferguson · ‎09-19-2000

Hi:

OK, from your latest reply we will assume that NO LVM changes have been made to either node (?) and that the cmclconfig file is the same on both nodes (?).

Next, did the package actually halt correctly on the primary node? Look at the /etc/cmcluster//control.sh.log on the primary node to see if there were any problems deactivating the volume group. If the volume group didn't deactivate then it can't be adopted by the other node.

Also, please post the cmviewcl output.

...JRF...

Stephen Doud · ‎09-20-2000

If the package failed to halt properly on the primary server, it's VG would not be activatable (or displayable) on the secondary server. Check the primary package control log file to see just what happened. The package control log registers the package startup and shutdown messages. If the packages' DB failed abnormally, it's possible that the package may not have shutdown all the way due to open data files. Always review the pkg log files and syslog.log for clues regarding package adoption problems.

Asad Malik · ‎09-21-2000

Hi
cmclconfig on both nodes is same.
package halt correctly on primary node, all 3 VG were deactivated successfully. On secondary node 2/3 were activated successfully. one failed.
output of cmviewcl is as under

CLUSTER STATUS
tang_cl1 up

NODE STATUS STATE
tang1 up running

Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 10/4/8 lan1
PRIMARY up 10/12/6 lan0
STANDBY up 10/4/16 lan2

PACKAGE STATUS STATE PKG_SWITCH NODE
sybase_pkg up running enabled tang1

Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual

Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service up Unlimited 0 sybase0
Subnet up 10.14.0.0

Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled tang1 (current)
Alternate up enabled tang2

NODE STATUS STATE
tang2 up running

Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 10/4/8 lan0
STANDBY up 10/4/16 lan1
PRIMARY up 10/12/6 lan2

James R. Ferguson · ‎09-21-2000

Hi Asad:

James R. Ferguson · ‎09-21-2000

Hi Asad:

On tbe secondary node, on which the package failed to start, would you please post the /etc/cmcluster//control.sh.log
Thanks.

...JRF...

Asad Malik · ‎09-22-2000

Hi
output of cntl.log is attached

John Palmer · ‎09-22-2000

Asad,

Please do the following on you backup node:-

ll /dev/vg*/group

and check that each volume group has a unique minor number.

For example:-

crw-r----- 1 root sys 64 0x000000 Jul 6 08:17 /dev/vg00/group
crw-r----- 1 root dba 64 0x010000 Sep 12 10:38 /dev/vg01/group
crw-r----- 1 root dba 64 0x030000 Sep 21 11:04 /dev/vg03/group

the above groups are 0x00, 0x01 and 0x03.

Your problem could be due to having configured two groups with the same number.

If your vg_c13 has the same minor number then you will have to vgexport it to remove it then repeat your original vgimport process but use a unique value in your 'mknod group....' command.

Hope this helps,
John

Asad Malik · ‎09-22-2000

Hi
you are absolutely right. the minor # is same as of another VG. output of
ll /dev/vg*/group is displayed for both primary and secondary nodes

primary node
crw-r--r-- 1 root sys 64 0x000000 Jul 10 1997 /dev/vg00/group
crw-r--r-- 1 root sys 64 0x020000 May 26 1998 /dev/vg01/group
crw-r----- 1 sybase sybdba 64 0x030000 Oct 23 1999 /dev/vg_cl1/group
crw-r----- 1 sybase sybdba 64 0x040000 Oct 23 1999 /dev/vg_cl2/group
crw-r----- 1 sybase sybdba 64 0x050000 Oct 23 1999 /dev/vg_cl3/group
crw-rw-rw- 1 root sys 64 0x010000 Aug 26 1997 /dev/vg_sybase/group

output of bdf on primary node
Filesystem kbytes used avail %used Mounted on
/dev/vg00/lvol3 103413 71613 21458 77% /
/dev/vg00/lvol1 47829 28732 14314 67% /stand
/dev/vg00/lvol8 598357 304093 234428 56% /var
/dev/vg00/lvol7 646229 489279 92327 84% /usr
/dev/vg01/lv_syb 4190208 2645219 1448746 65% /u1
/dev/vg00/lvol6 299157 80361 188880 30% /tmp
/dev/vg00/lvol5 498645 424490 24290 95% /opt
/dev/vg00/lvol4 19861 14388 3486 80% /home
/dev/vg_cl1/lv_syb 4190208 2992703 1122686 73% /u1_cl

secondary node where the failover fail

crw-r----- 1 root sys 64 0x000000 May 7 1998 /dev/vg00/group
crw-rw-rw- 1 root sys 64 0x050000 Aug 24 1998 /dev/vg01/group
crw-rw-rw- 1 root sys 64 0x030000 Jun 2 1998 /dev/vg_cl1/group
crw-rw-rw- 1 root sys 64 0x040000 Jun 2 1998 /dev/vg_cl2/group
crw-rw-rw- 1 root sys 64 0x050000 May 27 1998 /dev/vg_cl3/group

output of bdf on secondary node
Filesystem kbytes used avail %used Mounted on
/dev/root 115605 43303 60741 42% /
/dev/vg00/lvol1 47829 28042 15004 65% /stand
/dev/vg00/lvol8 626413 269849 293922 48% /var
/dev/vg00/lvol7 650261 465757 119477 80% /usr
/dev/vg00/lvol13 299157 17930 251311 7% /tmp
/dev/vg00/lvol6 749973 526873 148102 78% /opt
/dev/vg00/lvol5 19861 10536 7338 59% /home
/dev/vg01/lvol1 2048000 1655795 367853 82% /u2

/dev/vg01 has been activated and filesystem has been mounted on secondary node recently.

as i am not very strong in LVM, I shall greatly appreciate if a step by step procedure can be provided on how should i proceed from here and on which node.

Thanks a lot

John Palmer · ‎09-22-2000

Asad,

Your problem on the second node appears to have been caused by vg01 having been created with minor number 05 since you last failed over.

It is easily solved however. Proceed as follows on the secondary node:-

vgexport vg_cl3

This will remove vg_cl3 from the system.

On the primary node do:-
vgexport -p -v -s -m /tmp/map vg_cl3
then copy the map file '/tmp/map' to your secondary node with rcp or ftp.

On the secondary node do:
mkdir /dev/vg_cl3
mknod /dev/vg_cl3/group c 64 0x060000
vgimport -m /tmp/map -s -v vg_cl3

That's it - you will be able to do this with the cluster running, I tested this earlier in the week.

Regards,
John

Asad Malik · ‎09-22-2000

Hi
Is this can be a alternate solution

"On secondary node /dev/vg01 has been activated recently and it is not part of the cluster. If filesystem is unmounted and vg01 is deactivated, can vgexport and vgimport be applied to this vg on the secondary node only? like

on secondary node
unmount /u2
deactivate /dev/vg01
vgexport -m /tmp/mapfile /dev/vg01
recreate /dev/vg01 with minor number other than /dev/vg_cl3
vgimport -m mapfile /dev/vg01

and now we have different minor numbers for these two VGs

just checking whether this procedure can be applied in this situation.
Thanks

John Palmer · ‎09-22-2000

Yes, this will work as well. It's just that reimporting vg_cl3 didn't require you to unmount any filesystems etc.

Asad Malik · ‎09-27-2000

Hi
I shall try the fix coming weekend.
but a question
how 2 VG's will end up with same minor number. is it because 1 vg was deactivated and the new one was created or is there something else.

Thanks

John Palmer · ‎09-27-2000

How did you create vg01?

If manually, then someone didn't check for unique group id's (ll /dev/vg*/group).

If you go for the solution I posted above - reimporting vg_cl3 rather than vgo1 then you can do it now with no downtime. No need to wait until the weekend.

Asad Malik · ‎09-27-2000

crw-r----- 1 root sys 64 0x000000 May 7 1998 /dev/vg00/group
crw-rw-rw- 1 root sys 64 0x050000 Aug 24 1998 /dev/vg01/group
crw-rw-rw- 1 root sys 64 0x030000 Jun 2 1998 /dev/vg_cl1/group
crw-rw-rw- 1 root sys 64 0x040000 Jun 2 1998 /dev/vg_cl2/group
crw-rw-rw- 1 root sys 64 0x050000 May 27 1998 /dev/vg_cl3/group

hi
this is the output of ll /dev/vg*/group. form the secondary server.
and the dates shown are from 1998 for the two VG in question. the package did failover a few times in 1999 and in 2000 successfully. if the minor number was same at that time did the recent activation of vg01 has contributed to this failover failure.
I shall do it on weekend because of customer's wishes.

Chris Garman · ‎09-27-2000

An explaintation I can think of is that this directory has been copied from another system, using the copy option to preserve the time attribute.

When I read through this it sounded odd, because I think on HP-UX 11 vgcreate will check that the minor number is unique. Has the o/s been upgraded after these volume groups were created?

Asad Malik · ‎09-27-2000

OS was upgraded in Oct. 1999 from 10.20 to 11.00.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: service guard failover

service guard failover