latch contention, cpu is spinning :-(

Henrique Silva_3 · ‎02-08-2006

I know this is a loaded question, but here it is :-)

We have an environment, which was I/O bound before we tuned it. We addresses some issues, and now, we made it cpu bound, as queries, which were performing horribly, are not very fast, but this did not scale to the number of concurrent users hitting the system.

What we did, was to collect stats at 25%, and then, as we identified the bad queries in the system, which is generated by the 3rd party app, and we can not tune, we changed the histograms on the PK for the main tables involved, so that the CBO would find the best path, and thus, give us better results.

One of the tables have 2.5 million rows, another 3.5, and there are a couple of columns, which are used on joins, where the distinct value, or cardinality is very low, so, we got that, and re-analized the index with that number in. The queries are now flying, but latch contention is also very high,and when user load hit the system, ( there is no way to test this in non-production mode ), it brought it to its knees, and we had to back out some of it.

Ironically, we had to introduce some table scans back, so that the CPUs would have enought time to wait.

Production is much better than it was, but we still see many queries that could benefit from individual tuning ( these queries are simply happening all the time, and are very similar ). We changed cursor sharing to SIMILAR, increased the shared pool a bit, pga also, so sorts would happen in memory, and now, need to understand how to tune these queries, without having cpu ( latch contention ).

we tried to remove the stats on the joined indexes, to see if RULE would be better, but still had the contention. We are continue to test by changing DB writers to 8, to try to reduce library cache contention and _DB_BLOCKS_HASH_LATCHES=40000 ( from 8192 ), however, this can not be done in prod, as we need an outage there, and we can not replicate the problem in staging :-(

I am attaching the init.ora for this instance.

DB server has 16 CPUs, 32 GB memory

"to be or not to be, what was the question ???? "

Hein van den Heuvel · ‎02-08-2006

A loaded question it indeed appears to be, well beyond the skills of your average passerby :-), and more in the realm of serious performance consulting.

But still, you never know, someone may well have a good idea in a reply here.

The only thing i miss from your description is which kinda latch. I'm sure you have been doign statspack reports, possibly at level 7 and have some insight as to whether the contention is about buffer pages, or parse areas or specific object.

You probably already tried to get details through tprocedures like:
objects_on_hot_latches.sql
from: http://www.ixora.com.au/scripts/library.htm

The only seemingly low value in the attached init.ora is perhaps the OPEN_CURSOR at 300. That seems low for the kind of dedicated server you describe. The rest of the params seems at least in the ballpark. You did not run out of cursors no?
What does a report from v$resource_limit show? See below.

The 'de-tuning' to get better results sounds a little suspect/hokey. Maye the latch contention, while ugly and high in cpu use, still overal performed better but looked ugly? But again, you seem to be well underway analyzing all that.

Good luck,
Hein.

set pages 9999
column NAME format A25
column INIT format A10
column LIMIT format A10
SELECT RESOURCE_NAME "Name", CURRENT_UTILIZATION "Current",
MAX_UTILIZATION "Max", INITIAL_ALLOCATION "Init", LIMIT_VALUE "Limit"
FROM V$RESOURCE_LIMIT;

Yogeeraj_1 · ‎02-08-2006

hi,

Note that library cache latch contention is typically caused by NOT using bind variables. It is due to excessive parsing of statements.

Also, consider hein's recommendation about increasing open cursors. I would set it to 1000. Note that setting this artificially high does NOT have a negative impact on resource usage.

good luck!

kind regards
yogeeraj

No person was ever honoured for what he received. Honour has been the reward for what he gave (clavin coolidge)

Simon Wickham_6 · ‎02-08-2006

Hi,

A process acquires a latch when working with a structure in the SGA
(System Global Area). It continues to hold the latch for the period
of time it works with the structure. The latch is dropped when the
process is finished with the structure. Each latch protects a different
set of data, identified by the name of the latch.

Latches protect the data structures describing the blocks
in the buffer cache from the list of users trying to access these blocks
in the database. If a process is not able to obtain a latch right away,
it must wait for the latch. This causes a slowdown to occur and results
in additional CPU usage, until a latch is available. The CPU usage is a
consequence of the "spinning" of the process.

The SQL*DBA monitor display is a useful tool for observing
waits, requests and contention for latches. Note that the values
in the columns of the SQL*DBA monitor are system_specific in meaning.

Relevent data dictionary views to query:
----------------------------------------

V$LATCH
V$LATCHHOLDER
V$LATCHNAME

The following queries provide some information by
querying v$latch, v$latchholder and v$latchname:
--------------------------------------------------

/*
** Given a latch address, find out the latch name.
*/
column name format a64 heading 'Name'
select name from v$latchname a, v$latch b
where b.addr = '&addr'
and b.latch#=a.latch#;

/*
** Display System-wide latch statistics.
*/
column name format A32 truncate heading "LATCH NAME"
column pid heading "HOLDER PID"
select c.name,a.addr,a.gets,a.misses,a.sleeps,
a.immediate_gets,a.immediate_misses,b.pid
from v$latch a, v$latchholder b, v$latchname c
where a.addr = b.laddr(+)
and a.latch# = c.latch#
order by a.latch#;

/*
** Display latch statistics by latch name.
*/
column name format a32 heading 'LATCH NAME'
column pid heading 'HOLDER PID'
select c.name,a.addr,a.gets,a.misses,a.sleeps,
a.immediate_gets,a.immediate_misses,b.pid
from v$latch a, v$latchholder b, v$latchname c
where a.addr = b.laddr(+) and a.latch# = c.latch#
and c.name like '&latch_name%' order by a.latch#;

To reduce contention for internal latches We can reduce contention for these latches and tune them by adjusting
certain init.ora parameters.

Cache buffers chain latch:
..........................

This latch is related to DB_BLOCK_HASH_BUCKETS (undocumented in
Oracle7) which is related to db_block_buffers. It can be tuned by
adjusting DB_BLOCK_BUFFERS.

Cache buffers LRU chain latch:
..............................

Increasing the parameters DB_BLOCK_BUFFERS and DB_BLOCK_WRITE_BATCH
(undocumented in Oracle7) will reduce waits on this kind of latches.

Redo Allocation Latch:
......................

Contention for this latch can be reduced by decreasing the value of
LOG_SMALL_ENTRY_MAX_SIZE on multi-cpu systems to force the use of the
redo copy latch.

Redo copy latch:
................

This latch is waited for on both single and multi-cpu systems.
On multi-cpu systems, contention can be reduced by increasing the
value of LOG_SIMULTANEOUS_COPIES and/or increasing
LOG_ENTRY_PREBUILD_THRESHOLD (undocumented in Oracle7).

Notice that on single-cpu systems increasing the value of
LOG_ENTRY_PREBUILD_THRESHOLD won't have much effect as
CPU_COUNT is zero. Default is 0.

Row cache objects latch:
........................

In order to reduce contention for this latch, we need to
tune the data dictionary cache. In Oracle7 this basically means
increasing the size of the shared pool (SHARED_POOL_SIZE)
as the dictionary cache is a part of the shared pool.

LATCH_SPIN_COUNT (_LATCH_SPIN_COUNT):
====================================

This parameter is undocumented in Oracle7.
It sets the number of times a process attempts to get
a latch before sleeping. If the oracle process tries to get a latch
and finds that it is busy, the process spins and checks back with
latch LATCH_SPIN_COUNT number of times until the latch is not busy.
Once it attains LATCH_SPIN_COUNT, the process sleeps for a set period
of time, then wakes up and tries again.

Performance can be adjusted by changing the value of SPIN_COUNT.
If a high value is used, the latch will be attained sooner than if
you use a low value. However, you may use more CPU time spinning to
get the latch if you use a high value for SPIN_COUNT.

Regards,
Simon

Simon Wickham_6 · ‎02-08-2006

Hi,

Note that You can set SESSION_CACHED_CURSORS to higher value than OPEN_CURSORS, lower than OPEN_CURSORS, or anywhere in between. This parameter has no effect on ora-1000's or on the number of cursors a session will have open.

Regards,
Simon

Eric Antunes · ‎02-09-2006

Hi Henrique,

db_file_multiblock_read_count = 18?

Since most OS I/O size is 512K, I think you should benefit in setting this to a multiple of 8! 32 or 64 should be good values...

Is this a Datawarehouse system (shared_pool_size = 1677721600)?? :-D

Best Regards,

Eric Antunes

Each and every day is a good day to learn.

Categories

Company

Local Language

Forums

Discussions

Forums

Discussions

Discussions

Forums

Discussions

Forums

Discussions

Forums

Forums

Discussions

Forums

Discussions

Forums

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Discussion Boards

Community

Resources

Other HPE Sites

Discussions

Forums

Blogs

latch contention, cpu is spinning :-(

latch contention, cpu is spinning :-(

Re: latch contention, cpu is spinning :-(

Re: latch contention, cpu is spinning :-(

Re: latch contention, cpu is spinning :-(

Re: latch contention, cpu is spinning :-(

Re: latch contention, cpu is spinning :-(