cancel
Showing results for 
Search instead for 
Did you mean: 

TruCluster and CAA

SOLVED
Go to solution
Tom Kempster
Occasional Advisor

TruCluster and CAA

I have now a successful configuration of Oracle9 and SAP on a 2 node cluster.
I can use the applications without problem outside CAA.
Having written a profile and scripts that appear to start Oracle and SAP OK, under CAA I am left with the applications failing over to the other server as soon as there is any load on the first server. This continues with the applications appearing to oscillate between the 2 servers as soon as there is a load on the server that is currently hosting the applications.

I have attached the profile that I am using, any help would be very much appreciated.

Tom Kempster

NAME=sap_and_oracle
TYPE=application
ACTION_SCRIPT=sap_and_oracle.scr
ACTIVE_PLACEMENT=0
AUTO_START=0
CHECK_INTERVAL=60
DESCRIPTION=sap_and_oracle
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=test1 test2
OPTIONAL_RESOURCES=
PLACEMENT=favored
REQUIRED_RESOURCES=
RESTART_ATTEMPTS=1
SCRIPT_TIMEOUT=600
5 REPLIES
Ivan Ferreira
Honored Contributor
Solution

Re: TruCluster and CAA

You should check your sap_and_oracle.scr script. The check portion should be failing that's why you have relocations. We had similar problems and modified the check portion where it determines that the process is running. In our case, by tunning the ps command that check the process status, the problem was solved.

Por que hacerlo dificil si es posible hacerlo facil? - Why do it the hard way, when you can do it the easy way?
Tom Kempster
Occasional Advisor

Re: TruCluster and CAA

Thanks for the reply.

I have now tried tunning this, without success, even removed all checks but to no avail. It is a shame because it made sense.

Regards,

Tom
Johan Brusche
Honored Contributor

Re: TruCluster and CAA

Tom,

Please remember that each time you change the .scr file you must "caa_register -u" to make the caad aware of the change.

Your CHECK_INTERVAL=60, are you sure the check section executes within 60 secs.

What you can do to test is:
Set CAA-service sap_and_oracle offline, make sure all is stopped.
Manually start Oracle and SAP applications outside CAA ie "./sap_and_orecle.scr start".
How long does this take ? Is it shorter than SCRIPT_TIMEOUT ? Is everything really started when it reports exit-status succes ?
Next, manually execute "./sap_and_orecle.scr check", does it exit with status 'succes' ?
Does it execute within 60 secs, ie the CHECK_INTERVAL ?

Rgds,
Johan.


_JB_
Venkatesh BL
Honored Contributor

Re: TruCluster and CAA

Look into /var/adm/syslog.dated/current/daemon.log file (look for CAAD logs). This should give you a clue about the failure.
Tom Kempster
Occasional Advisor

Re: TruCluster and CAA

Thank you all for your responses, they were invaluable. Johan finally nailed it for me, I had changed things and executed “caa_profile update” but had missed “caa_register –u”!
I had made a few changes and now everything seems fine, including changing SCRIPT_TIMEOUT and CHECK_INTERVAL. I had also made changes to the script as Ivan suggested so am really unsure as to which of these cured the problem.