Operating System - Linux
1829440 Members
1573 Online
109992 Solutions
New Discussion

How to provice Fast Failover with RHCS for OracleDB

 
S.M.Athar
Advisor

How to provice Fast Failover with RHCS for OracleDB

Hello All,
I successfully configured the RHCS for OracleDB 10g Clustering. RHCS providing failover and failback of oracle service.
My issue is when I reboot or shutdown the active node, Cluster is taking 20 to 25 sec to removing the node from the cluster and 15 to 20 sec for starting the Oracle service on the passive node. Total time is 40 to 45 sec. Is this normal or is there way to fast this failover process.
I am using HP Blades and HP EVA4400 and OS is RHEL 5.5.
I am also attaching my cluster.conf.
Regard
7 REPLIES 7
Steven E. Protter
Exalted Contributor

Re: How to provice Fast Failover with RHCS for OracleDB

Shalom,

These times are pretty good.

This is normal fail over.

It takes time to detect the failure. It takes time to STONITH, Shoot the Other Node in the Head. It takes time for the oracle database to start up on the passive node.

Regards,

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Serviceguard for Linux
Honored Contributor

Re: How to provice Fast Failover with RHCS for OracleDB

I agree. Those a pretty reasonable times. RHCS has to wait for ILO to acknowledge that a node is down, so that explains that portion of the time.

Be aware that, if you have more activity on the Oracle DB that your recovery time there will go up. That is pretty much independent of the cluster SW you use.
S.M.Athar
Advisor

Re: How to provice Fast Failover with RHCS for OracleDB

Thanks for your reply.
Time is acceptable, but now I am facing problem. For fencing, we are using HP iLO and server is BL460c G6. Problem is resource is start moving to the passive when the failed node is power on. It is really strange for me. For example, I shutdown the machine and remove the machine from the chassis and monitor the clustat output, clustat was still showing that the resource is on node 1, even node 1 is power down and remove from c7000. But when I plugged the failed node on the c7000 and it power-on, then clustat is showing that the resource is moving to the passive node from the failed node.
Please help me.
Regards
Athar
macosta
Trusted Contributor

Re: How to provice Fast Failover with RHCS for OracleDB

Don't remove the blade from the chassis just to fail it over. When you unseat the blade, the iLO is no longer available, meaning the blade can't be fenced, which should be a requirement before any service can fail over.

If you want to test a node crash, try panicking the node with sysrq-trigger, or something similar.
S.M.Athar
Advisor

Re: How to provice Fast Failover with RHCS for OracleDB

Thanks for your reply.
Please let me know,
If my active node is dead due to the hardware failure and ilo is not responding, then how it is fenced? In this case resource failover to the passive node?
Regards
Athar
Serviceguard for Linux
Honored Contributor

Re: How to provice Fast Failover with RHCS for OracleDB

As long as it is plugged in, the ILO will be able to respond to a fence.
S.M.Athar
Advisor

Re: How to provice Fast Failover with RHCS for OracleDB

What happen in the case , when the Server Motherboard has been faulty, ILO Still working? Please clear me.
Thanks
Athar