Operating System - HP-UX
1839268 Members
2535 Online
110137 Solutions
New Discussion

Re: Oracle 10g R4 RAC Performance

 
EZ2007
Advisor

Oracle 10g R4 RAC Performance

We have two RAC cluster nodes each reside on a partition on RP8420 11i v2. Node1 and Node2 each with 16CPU and 32GB RAM. We have upgraded Oracle from R2 to R4. Here is the situation we are facing.

Node1 and Node2 have NIC 2,3 APA as lan900 and NIC 4,5 APA as lan901. Configuration is :

Node1:

NETWORK_INTERFACE lan900
HEARTBEAT_IP x.x.x.33 NETWORK_INTERFACE lan6
HEARTBEAT_IP x.x.x.3 NETWORK_INTERFACE lan901
NETWORK_INTERFACE lan7

Node2:

NETWORK_INTERFACE lan900
HEARTBEAT_IP x.x.x.34
NETWORK_INTERFACE lan6
HEARTBEAT_IP x.x.x.4
NETWORK_INTERFACE lan901
NETWORK_INTERFACE lan7

Lan6 and Lan6 of node 1 and node 2 are connected to a dedicated Cisco switch 1Gbps, Lan7 and Lan7 of Node 2 are connected to Cisco switch 1Gbps , two switches for redundancy.

The primary NICs are connected to a switch which in turn connected to Load Balancer.

On Node1: The oracle backend processes are running, and CPU idle % is about 10%. When I run SQL query on node1, it takes seconds.

On Node2: The fronend users are connected , and CPU idle % is about 70%. When I run the same SQL query on node2, it takes 4 minutes.

I tried to FTP a 300MB file to Node1, it took 12 minutes, and same file to Node2 took 16 minutes.

The problem is users connected to Node2, are experience a slow in the SQL and application.

Its weired that the loaded node execute SQL faster than the non-loaded node.

Please share with me your expertise since this is becoming critical issue.

13 REPLIES 13
Steven E. Protter
Exalted Contributor

Re: Oracle 10g R4 RAC Performance

Shalom,

You need way more data to reach any conclusions.

See: http://hpux.ws/system.perf.sh

Collect more data.

run landadmin -x on all systems and NIC cards.

Look for problems with speed and connections that are not full duplex.

What is the QPK patch state of these systems. If a QPK has not been installed for over a year, it should be scheduled.

See about relinking oracle after consulting with Oracle support.

Check the oracle install logs carefully.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Bill Hassell
Honored Contributor

Re: Oracle 10g R4 RAC Performance

> I tried to FTP a 300MB file to Node1, it took 12 minutes, and same file to Node2 took 16 minutes.

300 MB in 12 mins (720 seconds) is 0.4MB/sec, and 16 mins is about 0.3MB/sec. So your LAN cards are not running anywhere close to 80MB/sec, the typical maximum for a single 1000Mbit link.

Run these commands:

lanadmin -g 2
lanadmin -g 3
lanadmin -g 4
lanadmin -g 5
lanadmin -g 6
lanadmin -g 7

on each host. You should have nothing but zeros in the stats following Index. If you have lots of collisions, your cards have defaulted to half-duplex because of negotiation failures. For 1GB cards, they must all be set to autonegotiate, and the Cisco switches must be set the same way. And of course, 1Gbit cable must be CAT6 or better, especially if the cables are more than a few feet long.

The fix this, I would stop the applications and Oracle as these changes will be slightly disruptive to the cards involved. Have your network administrators set the Cisco switches to autonegotiate for all your 1Gbit ports. Then use lanadmin -x (2,3,4,5,6,7 for each card) as Steven points out to see the HD (HalfDuplex which is wrong) setting. Then use lanadmin -X auto_on (2,3,4,5,6,7 for each card that is wrong) to set the negotiation on the HP-UX side. Now run lanadmin -c (2,3,4,5,6,7 on each card) to clear the statistics registers.

Now start your applications and database and monitor the stats with lanadmin -g.


Bill Hassell, sysadmin
EZ2007
Advisor

Re: Oracle 10g R4 RAC Performance

Guys, Please i attached a log with some of the info requested by you both, can you please check.
TwoProc
Honored Contributor

Re: Oracle 10g R4 RAC Performance

Actually, the fact that the loaded node is faster with your query is not strange at all.

It (the loaded node), apparently, is caching the objects that you are querying. You see, the results will vary by which node "has" the object(s) cached that you are querying. If you query the node that has cached the objects, you'll get the faster answer there. If you query the other node, then things have to be brought to the node you are querying to respond to your query. And then latches and locks have to be tossed back and forth between the two nodes so they don't step all over each other and foul up data. Takes time. No way around it.

So, tuning a RAC solution becomes quite a step in the deployment of a server and service. You'll somehow need to determine which node will hold which assets, and try to direct queries of same assets repeatedly to the same node, rather than letting all queries float all over all nodes in the RAC group (ne' Grid). Takes quite a bit of work actually, especially if the code you're running isn't modifiable by your company (i.e. canned code), which nowadays is largely the case. UNLESS, you've bought a canned solution that is already set up with extensions to direct back end code to preferred nodes in a RAC cluster, and still provide meta-informative rules that allow the nodes themselves to have states that allow each one to be up/down/moved within the cluster, and still provide location execution preference (sort of like processor affinity on NUMA systems, and possibly some UMA systems). Which, to my knowledge... no one does, (yet).

We are the people our parents warned us about --Jimmy Buffett
EZ2007
Advisor

Re: Oracle 10g R4 RAC Performance

I did FTP from node1 to node2 using dd, of 10gb , and it tooks 300sec with transfer rate around 34Mby/sec. Same ftp from node2 to node1 took around 400sex with rate of 24Mby/sec. this is reall problem .
Prasanth V Aravind
Trusted Contributor

Re: Oracle 10g R4 RAC Performance

Ohh .. Now you are getting 24Mb/sec for FTP.
if you are putting data on any vg00 filesystem, i feel this is the max speed you can achieve.

as you said, node1 giving speed of 34 M/s & node2 giving only 24M/s, May be your root disk on node2 is soo busy,

Can you check disk status using sar -d ??

Also i would suggest to dio ftp on any san filesystem.

Also you have t

Gudluck
Prasanth
Prasanth V Aravind
Trusted Contributor

Re: Oracle 10g R4 RAC Performance

Ohh .. Now you are getting 24Mb/sec for FTP.
if you are putting data on any vg00 file system, i feel this is the max speed you can achieve.

as you said, node1 giving speed of 34 M/s & node2 giving only 24M/s, May be your root disk on node2 is soo busy,

Can you check disk status using sar -d ??

Also i would suggest to do ftp on any san file system.

Also you have to compare /etc/hosts , etc/nsswitch.conf of both nodes.

Gudluck
Prasanth

Gudluck
Prasanth
EZ2007
Advisor

Re: Oracle 10g R4 RAC Performance

I did an ftp of 23Gb data on san disk from node2 to node 1 and here is the result:

226 Transfer complete.
23622328320 bytes sent in 1059.40 seconds (21775.25 Kbytes/s)

I checked the /etc/hosts on both nodes, same but one node has additional IP for other servers.

My /etc/nsswitch.conf on both nodes shows:

passwd: files ldap
group: files
hosts: files dns
ipnodes: files
services: files
networks: files
protocols: files
rpc: files
publickey: files
netgroup: files
automount: files
aliases: files
EZ2007
Advisor

Re: Oracle 10g R4 RAC Performance

guys any help !
Prasanth V Aravind
Trusted Contributor

Re: Oracle 10g R4 RAC Performance

Any difference in speed between nodes, when used san fs

GUdluck
Prasanth
EZ2007
Advisor

Re: Oracle 10g R4 RAC Performance

As you see in the previous message, when i used SAN data and ftp it from node to node, the transfer rate was less than when i used dd= to transfer to /dev/null.
EZ2007
Advisor

Re: Oracle 10g R4 RAC Performance

Also the Oracle DBA has obtained AWR report , and the report shows a message about interconnect latency is high.

Higher than expected latency of the cluster interconnect was responsible for
significant database time on this instance.

RECOMMENDATION 1: Host Configuration, 4.4% benefit (60536 seconds)
ACTION: Investigate cause of high network interconnect latency between
database instances. Oracle's recommended solution is to use a high
speed dedicated network
ACTION: Check the configuration of the cluster interconnect. Check OS
setup like adapter setting, firmware and driver release. Check that
the OS's socket receive buffers are large enough to store an entire
multiblock read. The value of parameter
"db_file_multiblock_read_count" may be decreased as a workaround.
RATIONALE: The instance was consuming 24636 kilo bits per second of
interconnect bandwidth.

SYMPTOMS THAT LED TO THE FINDING:
SYMPTOM: Inter-instance messaging was consuming significant database
time on this instance. (28% impact [387401 seconds])
SYMPTOM: Wait class "Cluster" was consuming significant database
time. (29% impact [405882 seconds])
EZ2007
Advisor

Re: Oracle 10g R4 RAC Performance

Here is the output from my switch , is this correct ?

| Intrusion MDI Flow Bcast

Port Type | Alert Enabled Status Mode Mode Ctrl Limit

----- --------- + --------- ------- ------ ---------- ----- ----- ------

1 100/1000T | No Yes Down 1000FDx MDIX off 0

2 100/1000T | No Yes Down 1000FDx MDI off 0

3 100/1000T | No Yes Up 1000FDx MDI off 0

4 100/1000T | No Yes Up 1000FDx MDI off 0

5 100/1000T | No Yes Up 1000FDx MDI off 0

6 100/1000T | No Yes Down 1000FDx MDI off 0

7 100/1000T | No Yes Up 1000FDx MDI off 0

8 100/1000T | No Yes Up 1000FDx MDI off 0

9 100/1000T | No Yes Down 1000FDx MDIX off 0

10 100/1000T | No Yes Up 1000FDx MDIX off 0

11 100/1000T | No Yes Up 1000FDx MDI off 0

12 100/1000T | No Yes Down 1000FDx MDIX off 0

13 100/1000T | No Yes Up 1000FDx MDI off 0