Operating System - HP-UX
1834406 Members
1839 Online
110067 Solutions
New Discussion

Re: TCP Keepalive and Oracle RAC

 
SOLVED
Go to solution
Mark Baker_5
New Member

TCP Keepalive and Oracle RAC

We are having a performance problem when we fail a database in our RAC cluster. Essentially, when both DBs are running normally, an SQL query takes 5 seconds. We kill the DB that the query is on and TAF moves the query to the surviving node. This take 14 seconds then runs in five seconds after the initial failover.

When the failed DB is brought back up, this query takes five minutes to run. The DBA is saying we need to adjust the tcp_keepalive_timer to five minutes.

I have seen no concise supporting documents from Oracle specifically identifying keepalive as the culprit.

The keepalive default is 2 hours (7200000 msec) what is the potential impact of changing this to 5 minutes as requested by the DBA?

Any other ideas welcome?
mtb
4 REPLIES 4
rick jones
Honored Contributor
Solution

Re: TCP Keepalive and Oracle RAC

Frankly, if Oracle RAC is relying on the TCP Keepalive mechanism to detect node failure, well, that is shall we say sub-optimal. It should instead be running its own keepalive messages at the application level.

Given that the query takes five minutes, and your tcp_keepalive_interval (if your DBA is calling it the tcp_keepalive_timer he is either confused with some other platform, or isn't fully "up" on ndd names :) is 2 hours, I have a difficult time understanding how changing tcp_keepalive_interval to five minutes would make things any better. I would be looking to other possibilities as to why a query might take five minutes on the "returned" node. I'm not a DB expert but recovery comes to mind.

If you make the tcp_keepalive_interval five minutes, then for those TCP connections on which SO_KEEPALIVE is set, that were idle for more than five minutes, but less than two hours, you will have an increase in TCP traffic. This could be significant if you had large numbers of idle connections, and epsilon if you did not.
there is no rest for the wicked yet the virtuous have no pillows

Re: TCP Keepalive and Oracle RAC

Marck, can you explain me more in details what happend?
We have IRac running here and we have test a lost of posibilities of failures.

Regards.
Christian
Mark Baker_5
New Member

Re: TCP Keepalive and Oracle RAC

Thanks Rick,

Your comments are in lock-step with where my mind is on this. The issue at hand should be resolved via Oracle iRAC error handling, not ndd tweeks. We surely have an issue that is related to the RAC configuration or possible RAC bug. Thanks for the ammo to push back on this request.

mtb
Mark Baker_5
New Member

Re: TCP Keepalive and Oracle RAC

Closing...