cancel
Showing results for 
Search instead for 
Did you mean: 

TruCluster script

edi_4
Advisor

TruCluster script

Hi! Can anybody help me to undestand the chech section in cluster script.

There are defined which proces to control. In the case, when proces fail what is happen.
Do I have to exactly write in the sctipt what to do (start/stop/relocate)?

Thank you!
6 REPLIES
Johan Brusche
Honored Contributor

Re: TruCluster script


The check section only needs to contain verification of the sanity of the application and provide an exit status "fail" or "success". Upon failure, the caad decides what to do, based on the settings in the profile .cap file. In particular FAILURE_THRESHOLD, RESTART_ATTEMPTS and PLACEMENT.

When the caad decides that a relocation is required it calls the stop section the current member abd the start section in the script oon the other member

See manpage caa_profile and examples in
/var/cluster/caa/examples/*

Rgds,
___ Johan./

_JB_
Vladimir Fabecic
Honored Contributor

Re: TruCluster script

Johan said what I wanted to say.
You can se some examples in:
http://h30097.www3.hp.com/docs/best_practices/BP_TCR_ORA_SS/TITLE.HTM
In vino veritas, in VMS cluster
edi_4
Advisor

Re: TruCluster script

In our case oracle proces went offline, script check section anly has stoped the resourse it did not relocate. I think profiles are ok. Because there placemend=favored, it should relocate to any available member ie m3 in our case.

caa_stat -p resource1

NAME=resource1
TYPE=application
ACTION_SCRIPT=resource1.scr
ACTIVE_PLACEMENT=0
AUTO_START=1
CHECK_INTERVAL=180
DESCRIPTION=ora
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=ganesh
OPTIONAL_RESOURCES=
PLACEMENT=favored
REBALANCE=
REQUIRED_RESOURCES=
RESTART_ATTEMPTS=1
SCRIPT_TIMEOUT=600
USR_MUTEX=




# cluamgr -s all

Status of Cluster Alias: shiva.
netmask: 0.0.0.0
aliasid: 1
flags: 17
connections rcvd from net: 441
connections forwarded: 109
connections rcvd within cluster: 491
data packets received from network: 43675
data packets forwarded within cluster: 20151
datagrams received from network: 77485
datagrams forwarded within cluster: 5087
datagrams received within cluster: 78581
fragments received from network: 0
fragments forwarded within cluster: 0
fragments received within cluster: 0
Member Attributes:
memberid: 1, selw=3, selp=1, rpri=1 flags=11
memberid: 3, selw=3, selp=1, rpri=1 flags=11


Venkatesh BL
Honored Contributor

Re: TruCluster script

Do you see any related message in var/adm/syslog.dated/current/daemon.log file (look for CAAD logs)?
Ivan Ferreira
Honored Contributor

Re: TruCluster script

I think that you need to add all cluster members to the profilel in the following line:

HOSTING_MEMBERS=ganesh

HOSTING_MEMBERS=host1 host2 host3 host4
Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
edi_4
Advisor

Re: TruCluster script

It seems like oracle process went offline, relocate did not occur because of daemon problem?

Jan 25 08:55:29 ganesh CAAD[525043]: `ora_hibis` on `ganesh` went OFFLINE unexpectedly
Jan 25 08:55:29 ganesh CAAD[525043]: Attempting to stop `ora_hibis` on member `ganesh`
Jan 25 08:55:31 ganesh CAAD[525043]: Attempting to stop `ora_seb` on member `brahma`
Jan 25 08:56:01 ganesh CAAD[525043]: Could not communicate with daemon on node `brahma`
Jan 25 08:56:01 ganesh CAAD[525043]: Remote stop for `ora_seb` failed on member `brahma`
Jan 25 08:56:02 ganesh CAAD[525043]: `ora_hibis` on member `ganesh` has experienced an unrecoverable failure.
Jan 25 08:56:02 ganesh CAAD[525043]: Action script failure during 'stop' operation. Manual cleanup of resource
Jan 25 08:56:02 ganesh CAAD[525043]: `ora_hibis` experienced a failure on `ganesh`. Stopping dependent resources.
Jan 25 08:56:02 ganesh CAAD[525043]: Attempting to stop `ora_hibis` on member `ganesh`
Jan 25 08:56:02 ganesh CAAD[525043]: Attempting to stop `ora_seb` on member `brahma`
Jan 25 08:57:03 ganesh CAAD[525043]: Stop of `ora_seb` on member `brahma` succeeded.