Package move/failover

Dee Jacobs · ‎07-01-2010

I have a two node cluster (faadev1,faadev2) running a package (djtest). Moving the package works well in all except one condition. Please advise.

The package creates a logfile which I need to examine from an ssh connection to either of the cluster members. The disk (/data1a) is moved with the package. In the case I have, I am running tail -f /data1a/log/djtest.log. When the move occurs away from the cluster member I am logged into, the cmhaltpkg kills the process that is running the tail. Note: I have not cd'd to the logging disk. The package fails to start on the other node (error trap below) because it cannot get the disk. I checked after the failure and the /data1a disk is still mounted. I umounted the disk, but the package still refuses to come up on the other node (same error).
I have attached the script that is doing the moves for me. How can I avoid this problem? When I go to the full application, I will have many gigs of logfiles that I need to scan and analyze periodically, which should work well unless a failover occurs. Then I will be stuck with not an HA system.

Thanks

Jul 1 12:34:42 root@faadev2.duats.com master_control_script.sh[5348]: ###### Starting package djtest ######
Jul 1 12:34:42 root@faadev2.duats.com volume_group.sh[5373]: Activating volume group /dev/vg01 with exclusi
ve option.
vgchange: Activation of volume group "/dev/vg01" denied by another node in the cluster.
Request on this system conflicts with Activation Mode on remote system.
Jul 1 12:34:42 root@faadev2.duats.com volume_group.sh[5373]: ERROR: Function sg_activate_volume_group
Jul 1 12:34:42 root@faadev2.duats.com volume_group.sh[5373]: ERROR: Failed to activate /dev/vg01
Jul 1 12:34:42 root@faadev2.duats.com master_control_script.sh[5348]: ##### Failed to start package djtest,
rollback steps #####
Jul 1 12:34:42 root@faadev2.duats.com volume_group.sh[5413]: Deactivating volume group /dev/vg01
Volume group "/dev/vg01" has been successfully changed.
Jul 1 12:34:42 root@faadev2.duats.com master_control_script.sh[5348]: ###### Failed to start package for dj
test ######

Kapil Jha · ‎07-01-2010

Hi,
The most probable reason behind here is your VG is not deactivating correctly as you had a process tail running on it.
So better you change the VG and start on other machine
you need to look over pkg.cntl file what exactly is happening.

BR,
Kapil+

I am in this small bowl, I wane see the real world......

S. Ney · ‎07-01-2010

Usually package log files are on the local system for each node in the cluster. Because your log file (that your are doing a tail on)is in the volume group that fails over between systems cmhaltpkg will kill the tail and that should be expected behavior. All activity on the failover disks needs to stop so that the disk will failover cleanly. Your package logs should be on a local disk on each system that way you can do a tail -f /etc/cmcluster/package/control script log on each system. As control is transferred between systems the local logs are quick to log activity.

As to the volume group errors:
make sure /dev/vg01 was initialized with vgchange -c y /dev/vg01 (means make /dev/vg01 cluster aware) I am not sure if you are using legacy or modular package scripts but here is an example of a legacy script parameters:
VGCHANGE="vgchange -a e" # Default
VG[0]=vg01
LV[0]=/dev/vg01/lvol1; FS[0]=/u01
FS_MOUNT_OPT[0]="-o delaylog,largefiles"
FS_UMOUNT_OPT[0]=""; FS_FSCK_OPT[0]=""; FS_TYPE[0]=vxfs
# FILESYSTEM UNMOUNT COUNT
# Specify the number of unmount attempts for each filesystem during package
# shutdown. The default is set to 1.
FS_UMOUNT_COUNT=5
# NOTE: If the FS_MOUNT_RETRY_COUNT > 0, the script will execute
# "fuser -ku" to freeup busy mount point.
FS_MOUNT_RETRY_COUNT=1
CONCURRENT_VGCHANGE_OPERATIONS=1
CONCURRENT_FSCK_OPERATIONS=1

Rita C Workman · ‎07-01-2010

As part of the shutdown process, i.e 'function customer_defined_halt_cmds', generally some form of shutdown scripts are run. Shutdown the application; shutdown the database..etc.

You can put it in a script and run it here too to clean up processes holding file(s) open. Just some simple command lines, like you might use:
who -u
ps -ef | grep fuser -cu /
..and so forth...

Example of a script to kill 'tail' processes might say:

for a in `ps -ef | grep tail | awk '{print $2'}'`
do
kill $a
(or if you feel you need to kill -9 $a)
done

Run the command as part of the pkg halt section, so when it goes to take down the mountpoints...it cleans up everything first.

Just a thought,
Rita

Dee Jacobs · ‎07-06-2010

Thanks, Rita.

It seems that letting SG kill the processes does not get it done at the right time. When I explicitly checkec the mountpoint for the VG with fuser, I found them and I killed the processes in my halt script. This takes care of it at the right time and eliminates the timing glitch. THANKS.

Dee Jacobs · ‎07-06-2010

It seems that letting SG kill the processes does not get it done at the right time. When I explicitly checkec the mountpoint for the VG with fuser, I found them and I killed the processes in my halt script. This takes care of it at the right time and eliminates the timing glitch.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Package move/failover

Package move/failover

Re: Package move/failover

Re: Package move/failover

Re: Package move/failover

Re: Package move/failover

Re: Package move/failover