1835960 Members
2922 Online
110088 Solutions
New Discussion

Pkg Switch Problem

 
Filosofo
Regular Advisor

Pkg Switch Problem

Hello guys,
I have a server with install 11i, Service Guard 11.14 and Oracle Rac 9.2.0.3.
We have 2 pkg, first for DB and second for archive and ofa Filesystem.
We try to make test to verify high availability,
we stop pkgs and start on other node witout problems.
But when we make a shutdown on the node with Pkgs, they not switch, and in cntl.log file we see that thay can't umount File system and that device is busy.
What can I do?

Please help me.

Thanks

Filo
Sistem engeneer expert
16 REPLIES 16
Karthik S S
Honored Contributor

Re: Pkg Switch Problem

In the package configuration file configure these parameters,

HALT_SCRIPT_TIMEOUT
NODE_FAIL_FAST_ENABLED YES

Regards,
Karthik S S
For a list of all the ways technology has failed to improve the quality of life, please press three. - Alice Kahn
Massimo Bianchi
Honored Contributor

Re: Pkg Switch Problem

Hi,
when you see that device is busy, usually there a list of the offending PID.

So check what those pid are, and take proper actions, like killing them, stopping other process or simply add a major timeout in the umount fase or a higher number of retries.


There are situation when a killed process takes over a minute to proper release all the resources, including FS.


HTH,
Massimo
V.Tamilvanan
Honored Contributor

Re: Pkg Switch Problem

Hi,
Try changing the parameter FS_UMOUNT_COUNT
on package control files. The default Filesystem unmount try is 1. Change it to 6 or 7. most of the time it solves.
Check wheteher all the process are getting cleared before the unmount happen.

HTH
Filosofo
Regular Advisor

Re: Pkg Switch Problem

HALT_SCRIPT_TIMEOUT is set NO_TIMEOUT and FS_UMOUNT_COUNT is set to 3

This is the cntl.log

########### Node "h3rmh129": Halting package at Thu Jul 17 18:17:05 METDST 2003 ###########
Jul 17 18:17:05 - Node "h3rmh129": Remove IP address 10.195.66.110 from subnet 10.195.66.0
Jul 17 18:17:05 - Node "h3rmh129": Remove IP address 10.195.164.78 from subnet 10.195.164.0
Jul 17 18:17:05 - Node "h3rmh129": Unmounting filesystem on /dev/vg_GBCUAT2/lv_ofa_GBCUAT2
umount: cannot unmount /ofa_GBCUAT2 : Device busy
WARNING: Running fuser to remove anyone using the file system directly.
/dev/vg_GBCUAT2/lv_ofa_GBCUAT2: 17748o(oracle) 21049o(oracle) 17746o(oracle) 17734o(oracle) 17740o(o
racle) 21047o(oracle) 17742o(oracle) 21261o(oracle) 17752o(oracle) 17758o(oracle) 21239o(oracle)
21195o(oracle) 21231o(oracle) 21128o(oracle) 21193o(oracle) 17750o(oracle) 17761o(oracle) 21265o(or
acle) 21241o(oracle) 21051o(oracle) 21187o(oracle) 17744o(oracle) 17738o(oracle) 21263o(oracle) 2
1243o(oracle) 21233o(oracle) 21255o(oracle) 21219o(oracle) 21229o(oracle) 21235o(oracle) 21120o(ora
cle) 21259o(oracle) 17756o(oracle) 21132o(oracle) 21045o(oracle) 17736o(oracle) 21116o(oracle) 17
754o(oracle) 21237o(oracle) 21221o(oracle) 21215o(oracle) 21199o(oracle) 21257o(oracle) 21189o(orac
le) 21253o(oracle) 21245o(oracle) 21043o(oracle) 21124o(oracle)

Jul 17 18:17:07 - Node "h3rmh129": Unmounting filesystem on /dev/vg_GBCUAT2/lv_archive_GBCUAT2
Jul 17 18:17:08 - Node "h3rmh129": Deactivating volume group /dev/vg_GBCUAT2
Deactivated volume group in Exclusive Mode.
Volume group "/dev/vg_GBCUAT2" has been successfully changed.

########### Node "h3rmh129": Package halt completed at Thu Jul 17 18:17:09 METDST 2003 ###########


Please help

Thanks

Filo
Sistem engeneer expert
Filosofo
Regular Advisor

Re: Pkg Switch Problem

Hello,
to change FS_UMOUNT_COUNT from 3 to 6 I must stop the pkg?

Filo
Sistem engeneer expert
Massimo Bianchi
Honored Contributor

Re: Pkg Switch Problem

Hi,
looks like you are not properly closing oracle.
This is no good. Check your halt script before doing any other change.

Massimo
Filosofo
Regular Advisor

Re: Pkg Switch Problem

Hi Massimo,
for halt oracle, we execute a lsnrctl stop LISTENERnAME and a shutdown immediate.
What think that we do?

Thanks

Filo
Sistem engeneer expert
Bernhard Mueller
Honored Contributor

Re: Pkg Switch Problem

Hello,

according to the log this does not sound like a umount problem since the vg could be deactivated successfully, and the package HALT seemed to have completed successully.

So the most likely reason for the package not starting on the other node is that it was not enabled to run there (maybe you did some testing and *forced* a switch before?) then you have to re-enable the package on the first node if you want to allow the package to "switch back".

You should check the syslog.log on both nodes, grep for cmcld messages and see when it has been enabled/disabled for running on one host and whether it has been re-enabled after a forced switch.

Regards,
Bernhard
Filosofo
Regular Advisor

Re: Pkg Switch Problem

I find This in OLDsyslog.log file
Jul 17 18:17:04 h3rmh129 cmcld: Request from node h3rmh129 to disable global switching for package dSVW15SNGVEW
.





Please help

Filo
Sistem engeneer expert
Bernhard Mueller
Honored Contributor

Re: Pkg Switch Problem

man cmmodpkg

read carefully.

next time you switch, always thoroughly check the status using
cmviewcl -v

before you attempt to do anything else.

important to use -v to see not only AUTO_RUN but also the individual nodes enabled / disabled

Regards
Bernhard
Massimo Bianchi
Honored Contributor

Re: Pkg Switch Problem

Hi Filosofo,
ok, you are closing oracle rdbms server and the listener.


So what are all those oracle process with open files ?


Another think to check, as for the disabling.

Is autoswitch enabled ?

Check with cmviewcl -v

HTH,
Massimo
Karthik S S
Honored Contributor

Re: Pkg Switch Problem

Hi,

Do a,

cmmodpkg -e pkgname

Also post the pkgname_cntl.log of the other node

Regards,
Karthik S S
For a list of all the ways technology has failed to improve the quality of life, please press three. - Alice Kahn
Bernhard Mueller
Honored Contributor

Re: Pkg Switch Problem

to make this clearer:

depending on what happened in which sequence you could have AUTO_RUN enabled but the node (to which a package is supposed to switch) disabled.

You can see the node switching parameters only with cmviewcl -v

Regards,
Bernhard
Filosofo
Regular Advisor

Re: Pkg Switch Problem

I execute cmviewcl -v and I see that is all enabled.

Hi

Filo
Sistem engeneer expert
Bernhard Mueller
Honored Contributor

Re: Pkg Switch Problem

Then you should be all set to test a forced switch.

When you test always run tail -f on the syslog.log and the pkg.control.log on both nodes.

This way you see which node is telling you what and in which sequence (having to reconstruct later is always more difficult).

Regards,
Bernhard
okcunix
Advisor

Re: Pkg Switch Problem

Had a similar experience recently during an upgrade from ORACLE 8i to 9i. Mountpoints started not umounting w/"busy" mesg. This is a known issue with MCSG/ORACLE, there is a kernel (LVM) patch available. Go do a search on the ITRC/open a call with them.

HTH,
Tim