Operating System - HP-UX
1753868 Members
7057 Online
108809 Solutions
New Discussion юеВ

Re: IRac failover with HyperFabric

 

IRac failover with HyperFabric

we have 2 hpux 11i running oracle 9iRac, the cache fusion is on Hyperfabric link with udp, when the link is cuted one of the two instance keep alive, and the other instance die, the problem here is this, when the failure occurs the Oracle take 5 minutes to detect the Hyperfabric lost, and another 10 minutes to kill one instance, after this 15 minutes, the TAF work fine, but the problem are the 15 minutes, somebady knows wich parameters must tune for this?

Regards
Christian
9 REPLIES 9
Brian Crabtree
Honored Contributor

Re: IRac failover with HyperFabric

Christian,

This appears to be a standard response. Oracle appears to rely more on the cluster heartbeat than it's own. I would check out your cluster heartbeat setting inside of your cmcluster.ascii file, and see if it can be changed instead.

You should consider adding a secondary network backup for this, so that a failure of your Hyperfabric will not bring the RAC cluster down. Your normal network connections could probably be used for this so you don't have to buy another set of hyperfabric cards.

Thanks,

Brian

Re: IRac failover with HyperFabric

Brian ,that is not the problem, the only purpouse of the service guard with this installation is to permit the vgchange -a s,
the two nodes of the cluster access in active mode to the same VG.
The PKG doesn't have Service Guard Failover, because this is a Oracle function.
I cut the link between the two Rac nodes, bacause I must test the behaivor of the installation. I know that this is an Oracle issue, but I don't find the items that I must change.

Regards.
Brian Crabtree
Honored Contributor

Re: IRac failover with HyperFabric

Yes and no. The cmcluster.ascii file is going to have the same parameters as a standard cluster, you just deactivate the failover portion.

Take a look at the following parameters:

HEARTBEAT_IP (one for each node)
HEARTBEAT_INTERVAL
NODE_TIMEOUT

If the cluster is not detecting the failure, then you will have to wait for the database to detect it. Looking through metalink, I don't see any way to change this.

Brian

Re: IRac failover with HyperFabric

ok, but, the Service Guard heartbeat is not in the hyperfabric link, also, if I change the SG configuration to your propouse, when the link goes down, one of the two node will made a TOC, loosing not only one Oracle Instance, in these way I will lose the other applications that are running in the server.

What did you think?

Regards
Christian
Brian Crabtree
Honored Contributor

Re: IRac failover with HyperFabric

Unfortunantly, that is correct. The network failure that you are looking for would cause the same problem though. It should recognize that there is a problem on the node, and remove the node from the cluster. The other option (I think I said this earlier) is to put a backup network in place to support the hyperfabric in case it goes down. The performance wouldn't be great, but 100mbit (1000mbit preferably) is better than nothing.

Thanks,

Brian

Re: IRac failover with HyperFabric

Ok, thanks Brian

Re: IRac failover with HyperFabric

Brian, I have another solution to this issue
there are in the init file 3 hidden parameters:

_cgs_send_timeout,
_lm_dlm_send_timeout,
_imr_splitbrain_res_wait

With this parameters you can control the time that the database wait for the Operative System, and the time that the database, after take the control migrate the sessions.

We are testing this issue right now

Regards

Christian Pisacane
Stephen Andreassend
Regular Advisor

Re: IRac failover with HyperFabric

We had the exact same problem, and fixed it at an Oracle parameter level by setting:

_cgs_send_timeout = 5

Also, if you have HyperFabric/2, why not use HMP rather than UDP or TCP/IP. This has lower latency and lower CPU usage, but also is synchronous rather than using an asynchronous protocol like UDP. Win win. You have to relink Oracle to do this.

Re: IRac failover with HyperFabric

The problem with HMP is the driver, the actual version for UX 11.i V1 is the 11.11.03, this version doesn't work fine with hmp transparent local failover, in fact have very problems, there is another version, a lab version called 11.11.09 but this is not an oficial HP version. I have this version but I not testing yet

Regards

Christian