Operating System - HP-UX
1821587 Members
3735 Online
109633 Solutions
New Discussion юеВ

Oracle 10G RAC - crashes under load - consumes free mem

 
SOLVED
Go to solution
Robin T. Slotten
Trusted Contributor

Oracle 10G RAC - crashes under load - consumes free mem

Oracle RAC 2 node
(2) ia64 hp server rx4640
16 GB Memory with about 4 GB free under normal load
HP-UX 11.23
CRS and ASM
XP12000 storage array
* NOT USING MCSG

LAN = 1000 Full-Duplex
Interconnect = 100 Full-Duplex

LAN is going to a large Cisco Core switch
Interconnect is an isolated 100MB Cisco (2850?) switch with just these 2 machines.

System has worked for some time in development. We started to load test the system and have had a few crashes that appear to be TOC crashes.

Just before the system or systems crash, I can track the free memory suddenly disappear going from about 4 GB free to 0 free in less than 10 minutes.

One thing I have noticed is the logical disk IO seems especially high and seems to continually increase all the time that Oracle is running ( days and weeks ). Most of this traffic appears to be going through the interconnect.

My Obvious question: Is a 100Mb interconnect an issue? I have never been able to catch it pushing more than 50 MB max. Usually it is down around 10-20 MB.

Has anyone seen this memory consumption issue?

What ever happens to trigger the event happens so fast it does not leave any dumps or very little information in any logs. Most of my clues have come from Measureware logs.

Rob...


IF you do it more than twice, write a script.
15 REPLIES 15
A. Clay Stephenson
Acclaimed Contributor
Solution

Re: Oracle 10G RAC - crashes under load - consumes free mem

This is rather tough to track down given the limited data available. The interconnect should never cause the system to crash and if your metrics are accurate, the bandwidth is sufficient. I assume that you have tuned to dbc_max_pct value down to a reasonable level (no more than 10%). Since this box is running out of memory, the very first thing that I would do is reduce maxdsiz_64bit and reduce shmmax so that no one process is able to grab all the memory in sight. I would expect the application to then possibly fail with application errors (or warnings) rather than crashing the system. The system then at least has a chance of telling you what is actually happening.

You should also have a look a MetaLink for any available Oracle patches and/or any reccomended HP-UX patches.

Whenever I see huge numbers of logical I/O's related to a database, that immediately suggests inefficient SQL because the system is being asked in essence to re-read data that it should already know. That doesn't cause the system to crash but it does indicate that some SQL tuning is probably in order.
If it ain't broke, I can fix that.
Robin T. Slotten
Trusted Contributor

Re: Oracle 10G RAC - crashes under load - consumes free mem

Thanks Clay,
I always try to maintain dbc_max_pct at about 500MB or less. in this case it is 5% or 746MB.

maxdsiz 1073741824 maxdsiz_64bit 4294967296
shmmax 1073741824

Basically Oracle's target parms across the board.

Patches are current as of last Dec.
The load is somewhat artificial as it is being done in a test mode. I suspect a lot of duplicate querys., so that would be in line with your statement. What I do think is strange is the logical IO seems to continue to grow even after the load testing has dropped of as if there is a process stuck in a loop somewhere. System CPU use is purportionally high for other systems I have worked with, but I attribute that to ASM running under the control of root.

I'll run this by our team tomorrow and see if we can give it a try.

Thanks,
IF you do it more than twice, write a script.
Yogeeraj_1
Honored Contributor

Re: Oracle 10G RAC - crashes under load - consumes free mem

hi rob,

you may also wish to run STATSPACK report or through the Enterprise Manager Database console, verify the overall database performance. Any bottlenecks will be highlighted there..


hope this helps too!


kind regards
yogeeraj
No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)
Eric Antunes
Honored Contributor

Re: Oracle 10G RAC - crashes under load - consumes free mem

Hi Robin,

I think your issue must be related with bad apps sql or RAC issues.

About RAC, I think metalink is the best place to search for bugs, notes, alerts, etc..

About possible bad apps sql's, check them with the following script:

select substr(s.username,1,20) "User Name",
s.osuser "OS User",
s.status "Status",
lockwait "Lock Wait",
substr(s.program,1,30) "Program",
substr(s.machine,1,15) "Machine",
p.program "Process Program",
si.consistent_gets "Consistent Gets",
s.process "Process PID",
p.spid, p.pid, s.serial#, si.sid
from sys.v_$sess_io si, sys.v_$session s, sys.v_$process p
where s.username is not null and
si.sid(+)=s.sid
and p.addr(+)=s.paddr
order by si.consistent_gets desc

If the first rows have much bigger consistent_gets than the others than it is likely that there are bad sql.

Best Regards,

Eric
Each and every day is a good day to learn.
Robin T. Slotten
Trusted Contributor

Re: Oracle 10G RAC - crashes under load - consumes free mem

The strange thing about this problem is we don't see a lot of stress on the system other than the logical IOs. We have been running all types of statistics and the system and oracle seem to be fairly happy until the "event" that crashed the machine. Thanks for the SQL we'll give it a shot.

Thanks for the help folks.
IF you do it more than twice, write a script.
Robin T. Slotten
Trusted Contributor

Re: Oracle 10G RAC - crashes under load - consumes free mem

I was finally able to capture an time that the interface LAN was showing 156MB of traffic. We replaced the 100MB hub with a temporary D-link 1000MB hub. Solving that problem, the application soon consumed memory. Yesterday I installed an additional 32 GB of Mamory on each machine. ( DBA had tracked a separate issue to lack of SGA memory.

We will be load tasting again soon.
Rob..
IF you do it more than twice, write a script.
Steven E. Protter
Exalted Contributor

Re: Oracle 10G RAC - crashes under load - consumes free mem

Shalom Rob,

Common Oracle problem.

The two nodes do not have the same OS patches. I'd make sure they have memory leak and consumption patches from HP.

http://www.hpux.ws/system.perf.sh
Might want an idea where all the memory is going.

SEP
Steven E Protter
Owner of ISN Corporation
http://isnamerica.com
http://hpuxconsulting.com
Sponsor: http://hpux.ws
Twitter: http://twitter.com/hpuxlinux
Founder http://newdatacloud.com
Bill Hassell
Honored Contributor

Re: Oracle 10G RAC - crashes under load - consumes free mem

If you're not using raw I/O for Oracle data, 11.23 has major buffer cache enhancements that modify the previous recommendations for dbc_max_pct. For 11.23, you may find significant cache performance benefits by increasing the cache size into several Gb. Try 3Gb as a start. I've seen logical I/O rates as high as 125,000 with a 6Gb cache. One great feature of 11.23 is that the cache can be expanded and reduced online and it takes just a few seconds to take effect.

As far as memory usage, I would use measureware and perhaps a once/minute ps analysis of local data for each process, something like this:

#!/usr/bin
date
UNIX95=1 ps -e -o vsz,pid,ruser,args | sort -rn | head -20

Run this script in cron every minute, appending the output to a log file:

* 1,2,3,4,5,6,7,...etc...58,59,60 * * * /usr/contrib/bin/ramusage.sh >> /var/tmp/ramusage.log

The ps list will show any process that suddenly increases local RAM usage. It won't document shared memory, so ipcs -bmop may need to be run in a loop too.


Bill Hassell, sysadmin
Yogeeraj_1
Honored Contributor

Re: Oracle 10G RAC - crashes under load - consumes free mem

hi Robin,

one further step you can take into analysing the performance of your database is to periodically verify your v$sqlarea to see which SQL statements are not using BIND VARIABLES.

kind regards
yogeeraj
No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)
Robin T. Slotten
Trusted Contributor

Re: Oracle 10G RAC - crashes under load - consumes free mem

We still have an occasional system panic caused by Oracle evicting a node. No log entries anywhere. Currently working with Oracle trying to track it down. Some of the steps we did take were to replace the interconnect switch with a 1000MB switch after I caught the traffice surge over 100MB just before a panic. After replacing the switch, we saw a great improvement and have tracked a lot of interconnect traffic well over 100MB. We also recieved a document from Oracle about setting the realtime priority for the cssd process. This makes the interconnect traffic one of the top priority processes. This also helped quite a bit. Just an update, in case someone else is fighting this problem.

Rob...
IF you do it more than twice, write a script.
Robin T. Slotten
Trusted Contributor

Re: Oracle 10G RAC - crashes under load - consumes free mem

BTW, we also increased the total RAM memory to 48 GB on both nodes. This allowed us to increase the SGA.

Rob...
IF you do it more than twice, write a script.
Hein van den Heuvel
Honored Contributor

Re: Oracle 10G RAC - crashes under load - consumes free mem

Thanks for the update.

It makes sense that you need a Gb interconnect.

I was closely involved with early Oracle RAC work, albeit on Tru64 fro Digital/Compaq. We used a dedicated interface called "Memory Channel" (Reflective Memory) for micro-second measured latency and high bandwith. Great technical solution, but too expensive requireing dedicated hardware. At the some time our competition (in those days) at HP using HPUX were using hyperfabric (or what is that name again) and everyone was considering Infiniband

To consider an 100 mb lan as a viable alternative seems like a strech to me and I am surprised Oracle support/consulting let you go that route.

You see, the RAC interconnect is NOT just a 'I'm alive' heartbeat kind of thing. It is very active, with two flavors of activity:
- Many short lock messages
- Fewer large database page block ships (cache fusion!)

The lock essages would readily saturate 100mb/sec in packets/sec well before the mb/sec limit is reached.
The block shipping will push the MB/sec limits.

In the final days of Tru64 they even considered a hybrid: MC for locks, GB for data.

Hope this helps some,
Hein van den Heuvel (at gmail dot com)
HvdH Performance Consulting
Robin T. Slotten
Trusted Contributor

Re: Oracle 10G RAC - crashes under load - consumes free mem

The system was configured by a consultant before I got here. I have found a number of things I didn't agree with. It has been quite time consuming to get this cluster tuned and performing well. Some of it Oracle, some HP-UX and of course, bad SQL is bad SQL no matter how fast a machine you run it on. Thanks for the insight.
Rob...
IF you do it more than twice, write a script.
Eric Antunes
Honored Contributor

Re: Oracle 10G RAC - crashes under load - consumes free mem

Hi Rob,

Maybe this consultant was sure about puting redologs in RAID5...

Just a tought, :)

Eric Antunes
Each and every day is a good day to learn.
Vladimir Fabecic
Honored Contributor

Re: Oracle 10G RAC - crashes under load - consumes free mem

Robin
I also had problems with Oracle RAC (on TRU64 cluster with Memory Channel interconnect).
Even cluster interconnect was best type, it was not the only problem.
After some time I saw that the main problem were applications. RAC is not good for all types of applications. It is good for many "short connections", not for applications causing large number of locks.
I spent a lot of time for OS tuning and DBA spent lots of time for database tuning.
But only application tuning did some good.
Once also had a test TRU64 cluster with RAC, but with gigabit ethernet cluster interconnect. Performance was much worse than with Memory Channel (latency problem).
From my experiance, gigabit ethernet is minimum for cluster interconnect.
As Hein said, it not just hardbeat.
In vino veritas, in VMS cluster