Re: TCP Keepalive and Oracle RAC

Mark Baker_5 · ‎08-18-2004

We are having a performance problem when we fail a database in our RAC cluster. Essentially, when both DBs are running normally, an SQL query takes 5 seconds. We kill the DB that the query is on and TAF moves the query to the surviving node. This take 14 seconds then runs in five seconds after the initial failover.

When the failed DB is brought back up, this query takes five minutes to run. The DBA is saying we need to adjust the tcp_keepalive_timer to five minutes.

I have seen no concise supporting documents from Oracle specifically identifying keepalive as the culprit.

The keepalive default is 2 hours (7200000 msec) what is the potential impact of changing this to 5 minutes as requested by the DBA?

Any other ideas welcome?
mtb

rick jones · ‎08-19-2004

Frankly, if Oracle RAC is relying on the TCP Keepalive mechanism to detect node failure, well, that is shall we say sub-optimal. It should instead be running its own keepalive messages at the application level.

Given that the query takes five minutes, and your tcp_keepalive_interval (if your DBA is calling it the tcp_keepalive_timer he is either confused with some other platform, or isn't fully "up" on ndd names :) is 2 hours, I have a difficult time understanding how changing tcp_keepalive_interval to five minutes would make things any better. I would be looking to other possibilities as to why a query might take five minutes on the "returned" node. I'm not a DB expert but recovery comes to mind.

If you make the tcp_keepalive_interval five minutes, then for those TCP connections on which SO_KEEPALIVE is set, that were idle for more than five minutes, but less than two hours, you will have an increase in TCP traffic. This could be significant if you had large numbers of idle connections, and epsilon if you did not.

there is no rest for the wicked yet the virtuous have no pillows

Christian Pisacane · ‎08-19-2004

Marck, can you explain me more in details what happend?
We have IRac running here and we have test a lost of posibilities of failures.

Regards.
Christian

Mark Baker_5 · ‎08-23-2004

Thanks Rick,

Your comments are in lock-step with where my mind is on this. The issue at hand should be resolved via Oracle iRAC error handling, not ndd tweeks. We surely have an issue that is related to the RAC configuration or possible RAC bug. Thanks for the ammo to push back on this request.

mtb

Mark Baker_5 · ‎08-23-2004

Closing...

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Forums

Discussions

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

Re: TCP Keepalive and Oracle RAC

TCP Keepalive and Oracle RAC