Operating System - HP-UX
1833877 Members
2308 Online
110063 Solutions
New Discussion

Reg cluster package failover

 
dattu_1
Regular Advisor

Reg cluster package failover

hi guys,
i have to go live for producn on sun night..
cmviewcl is as shown below :

CLUSTER STATUS
dev up

NODE STATUS STATE
ge1 up running

PACKAGE STATUS STATE AUTO_RUN NODE
x up running enabled ge1
y up running enabled ge1

NODE STATUS STATE
ge2 up running

PACKAGE STATUS STATE AUTO_RUN NODE
z up running disabled ge2


i was testing my cluster for which i removed both of my network cables after which a toc happened as a result of which both of my packages x and y halted ,but x package gave an error saying two vg busy i.e. Device busy
and as a result disabled global switching at node ge2.
And for other package y in node ge2 i got

Volume group "/dev/vg_mcgb_prod_fone" does not exist in the "/etc/lvmtab" file.
ERROR: Function activate_volume_group
ERROR: Failed to activate vg_mcgb_prod_fone
Same for other volume group.

What should i do now???

15 REPLIES 15
Fabian Briseño
Esteemed Contributor

Re: Reg cluster package failover

dattu.

What the uotput of cmviewcl -v ?.

Please post it here.
Knowledge is power.
dattu_1
Regular Advisor

Re: Reg cluster package failover

# cmviewcl -v

CLUSTER STATUS
cedgedev up

NODE STATUS STATE
cedge1 up running

Cluster_Lock_LVM:
VOLUME_GROUP PHYSICAL_VOLUME STATUS
/dev/vg_cluster_lock /dev/dsk/c7t0d1 up

Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 0/1/2/0 lan0
PRIMARY up 0/5/1/0 lan2
STANDBY up 0/1/2/1 lan1

PACKAGE STATUS STATE AUTO_RUN NODE
mcgb_bancs up running enabled cedge1

Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual

Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Subnet up 10.1.1.0

Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled cedge1 (current)
Alternate up enabled cedge2

PACKAGE STATUS STATE AUTO_RUN NODE
mcgb_bancs_live up running enabled cedge1

Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual

Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Subnet up 10.1.1.0

Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled cedge1 (current)
Alternate up enabled cedge2

NODE STATUS STATE
cedge2 up running

Cluster_Lock_LVM:
VOLUME_GROUP PHYSICAL_VOLUME STATUS
/dev/vg_cluster_lock /dev/dsk/c7t0d1 up

Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 0/1/2/0 lan0
PRIMARY up 0/5/1/0 lan2
STANDBY up 0/1/2/1 lan1

PACKAGE STATUS STATE AUTO_RUN NODE
mcgb_fone up running disabled cedge2

Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual

Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Subnet up 10.1.1.0

Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled cedge2 (current)
Alternate up enabled cedge1

UNOWNED_PACKAGES

PACKAGE STATUS STATE AUTO_RUN NODE
mcgb_bancs_ref down halted disabled unowned

Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual

Script_Parameters:
ITEM STATUS NODE_NAME NAME
Subnet up cedge2 10.1.1.0
Subnet up cedge1 10.1.1.0

Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled cedge2
Alternate up enabled cedge1

PACKAGE STATUS STATE AUTO_RUN NODE
mcgb_exim down halted disabled unowned

Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual

Script_Parameters:
ITEM STATUS NODE_NAME NAME
Subnet up cedge1 10.1.1.0
Subnet up cedge2 10.1.1.0

Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled cedge1
Alternate up enabled cedge2
dattu_1
Regular Advisor

Re: Reg cluster package failover

Attached herewith are the log files from both nodes



dattu_1
Regular Advisor

Re: Reg cluster package failover

atttachment from node2 is attached herewith
dattu_1
Regular Advisor

Re: Reg cluster package failover

Actually,
It is possible that when a ServiceGuard package unmounts filesystems in umount_fs, not all filesystems are unmounted and the volume group deactivation fails
with device busy.


i think my problem is exactly similar like the above statement.

so now what should i do to ensure that the cluster unmounts all the file systems even if they r busy?????
Stuart Urquhart
Frequent Advisor

Re: Reg cluster package failover

Looks like you've got two problems.
On cedge1, not all the filesystems on the shared volume groups are being umounted. Check what's not getting umounted. Check for currently mounted filesystems on cedge1 and make sure they're in the package control script and in order that allows them to umounted.
Rather than force a panic, start off making sure cmhaltpkg/cmrunpkg execute cleanly and don't leave anything mounted.
On cedge2, there's a volume group, vg_mcgb_prod_fone, missing. You'll need to "vgexport -s -v -m vg_mcgb_prod_fone.map vg_mcgb_prod_fone" on cedge1, copy the .map file to cedge2, and import it on cedge2.
Stuart Urquhart
Frequent Advisor

Re: Reg cluster package failover

If policy allows, it might be worth posting the control scripts and a "mount -v" when the package is halted.
dattu_1
Regular Advisor

Re: Reg cluster package failover

thanks stuart ,
yaa i too have found out the cause for my second package mcgb_bancs_live the vgimport issue...i am going to doit afetr 06:00 pm ist today..after halting the cluster....B'coz i will have to deactivate the vg_mcgb_prod_fone VG and then export the map file to second node....

what abt the 2 vg's who were not forcibly unmounted during package halting..what should i do for that????
control script attached herewith...
Let me know asap
dattu_1
Regular Advisor

Re: Reg cluster package failover

mcgb_bancs contril file attached herewith
Stuart Urquhart
Frequent Advisor

Re: Reg cluster package failover

I think the order of LV's looks OK. I've noticed on line 91 vg_mcgb_bancs_arch is commented out, but the next VG has index 4, which should probably be 3. Try fixing that and increasing FS_UMOUNT_COUNT to 2 on line 249. Then manually umount any filesystems left mounted, vgchange -n the vg_mcgb_bancs_code volume group. Start then stop the package and see what happens. If it doesn't stop cleanly try posting a mount -v to show what's been left.
dattu_1
Regular Advisor

Re: Reg cluster package failover

Ok stuart,
then i will just incerase the FS_UMOUNT TO 2 .I Want to check whether my packages will get shifted to node cedge 2 automatically if node cedge1 goes down...
What is the need of starting and stopping the package after deactivating the vg.
Will changes take affect after doing this.
Stuart Urquhart
Frequent Advisor

Re: Reg cluster package failover

I'd take things step by step. First off all make sure cmhaltpkg and cmrunpkg run cleanly on the primary node. Then copy the .cntl and .conf files over to the adoptive node and make sure cmhaltpkg/cmrunpkg run cleanly there. Then finally go for the heroic stuff, pulling lan cables, RS'ing nodes, etc. It's easier to debug, taking it stage by stage.
dattu_1
Regular Advisor

Re: Reg cluster package failover

actuallt stuart,
my cluster is running fine in primary node....for the past 1 yr with no issues.
i just wanted it to test it for my knowledge whether my packages will shift to node 2 or not.
so i went for pulling the lan cables
(not heartbeat) and then checking the package running on adoptive node.
For which i got the error device busy not able to umount the 2 filesystems for my main producn package mcgb_bancs.
Stuart Urquhart
Frequent Advisor

Re: Reg cluster package failover

It sounds like some filesystems have been added to the volume group and mounted, but not added to the package control file. Try stopping the package and seeing what filesystems are left, add them into the control script, copy the script to the adoptive node and give it a try.
dattu_1
Regular Advisor

Re: Reg cluster package failover

Hi stuart,
when i again carried out the stress test i.e. removed both lan cables from my primary node I found that in adoptive node[cedge 2] the main package log file had an error of fsck aborting for two raw logical volumes for which i recently did vgimport from 1st node cedge1.
Will my reboot of adoptive node solve this issue.bcoz im not able to fsck manually also..as a result of which im not able 2 mount those 2 vg's.
What should i do???