Home > Articles

  • Print
  • + Share This
This chapter is from the book

Measuring Latch and Mutex Contention

As with most contention scenarios, the wait interface and time model provide the best way to determine the extent of any contention that might exist. Time spent in latch or mutex sleeps is recorded in V$SYSTEM_EVENT and similar tables and usually is the primary indication that a problem exists.

However, be aware that the wait interface records only latch sleeps; latch misses do not result in a wait being recorded, even though they might consume CPU (if the session spins on the latch). Therefore, latch misses should be considered to be a lesser but still important aspect of latch contention.

Prior to Oracle Database 10g, a single latch free wait event was recorded for all latch sleeps. From Oracle 10g onward, certain latches now have their own event—such as latch: cache buffers chains. Not all latches have their own event, though, and those that do not continue to be included in the latch free wait.

Mutex waits are represented by waits such as library cache: mutex X, which represents a wait on an exclusive library cache mutex.

To break out mutex and latch waits and compare them to other high-level wait categories, we could issue a query such as that shown in Listing 16.1.

Listing 16.1 Latch Wait Times

SQL> WITH system_event AS
  2    (SELECT CASE WHEN (event LIKE '%latch%'  or event
  3                       LIKE '%mutex%' or event like 'cursor:%')
  4                THEN event  ELSE wait_class
  5                END wait_type, e.*
  6       FROM v$system_event e)
  7  SELECT wait_type,SUM(total_waits) total_waits,
  8         round(SUM(time_waited_micro)/1000000,2) time_waited_seconds,
  9         ROUND(  SUM(time_waited_micro)
 10               * 100
 11               / SUM(SUM(time_waited_micro)) OVER (), 2) pct
 12  FROM (SELECT  wait_type, event, total_waits, time_waited_micro
 13        FROM    system_event e
 14        UNION
 15        SELECT   'CPU', stat_name, NULL, VALUE
 16        FROM v$sys_time_model
 17        WHERE stat_name IN ('background cpu time', 'DB CPU')) l
 18  WHERE wait_type <> 'Idle'
 19  GROUP BY wait_type
 21  /

--------------------------------- ------------ ------------------- ------
CPU                                                       1,494.63  69.26
latch: shared pool                   1,066,478              426.20  19.75
latch free                              93,672              115.66   5.36
wait list latch free                       336               58.91   2.73
User I/O                                 9,380               27.28   1.26
latch: cache buffers chains              2,058                8.74    .40
Other                                       50                7.26    .34
System I/O                               6,166                6.37    .30
cursor: pin S                              235                3.05    .14
Concurrency                                 60                3.11    .14
library cache: mutex X                 257,469                2.52    .12

Of course, this query reports all waits since the database first started. To get a view over a specific period of time, you would need to run the query twice and compare totals. We can also observe the ongoing state of these statistics in the Oracle Cloud Control, in third-party tools such as Toad, or by using Automatic Workload Repository (AWR) or Statspack reports.

Identifying Individual Latches

If we’re lucky, the latch that is responsible for whatever latch contention exists will be identified by its specific wait event—latch: cache buffers chains, for instance. However, this won’t always be the case; some latches are included in the general-purpose latch free event, and some might be recorded against the event wait list latch free.

The wait list latch free event relates to Oracle’s latch wait posting algorithm. Oracle implements a latch wait list that allows sessions sleeping on a latch to be woken when the latch becomes available. When a session sleeps on a latch, it normally places itself on the latch wait list and is woken by the session that releases the latch. If there’s heavy contention on the wait list, then the wait list latch free event may occur.

If the specific latch waits are being obscured by these general-purpose latch free events, then you may need to examine V$LATCH, which includes latch statistics for each specific latch. The V$LATCH view records the number of gets, misses, sleeps, and wait times for each latch. The query in Listing 16.2 interrogates this view to identify the latches with the most sleeps and wait times.

Listing 16.2 Latch Miss Statistics

SQL> WITH latch AS (
  2   SELECT name,
  3          ROUND(gets * 100 / SUM(gets) OVER (), 2) pct_of_gets,
  4          ROUND(misses * 100 / SUM(misses) OVER (), 2) pct_of_misses,
  5          ROUND(sleeps * 100 / SUM(sleeps) OVER (), 2) pct_of_sleeps,
  6          ROUND(wait_time * 100 / SUM(wait_time) OVER (), 2)
  7                   pct_of_wait_time
  8     FROM v$latch)
  9  SELECT *
 10  FROM latch
 11  WHERE pct_of_wait_time > .1 OR pct_of_sleeps > .1
 12  ORDER BY pct_of_wait_time DESC;

                               Pct of Pct of Pct of    Pct of
NAME                             Gets Misses Sleeps Wait Time
------------------------------ ------ ------ ------ ---------
cache buffers chains            99.59  99.91  70.59     89.75
shared pool                       .07    .03  16.69      7.78
session allocation                .18    .05  11.39      1.88
row cache objects                 .07    .00    .78       .24
simulator lru latch               .01    .00    .31       .18
parameter table management        .00    .00    .08       .14
channel operations parent latc    .00    .00    .16       .02

Drilling into Segments and SQLs

Determining the latches associated with contention is usually not enough to identify the root cause. We most likely need to identify the SQLs and segments involved.

If you have an Oracle diagnostic pack license, then you can query the Active Session History (ASH) and/or AWR tables to identify the SQLs and segments associated with particular wait conditions. The query in Listing 16.3 identifies entries in the ASH table associated with latch contention.

Listing 16.3 Finding Latch Contention with ASH

SQL> l
  1  WITH ash_query AS (
  2       SELECT event, program,
  3              h.module, h.action,   object_name,
  4              SUM(time_waited)/1000 reltime, COUNT( * ) waits,
  5              username, sql_text,
  6               RANK() OVER (ORDER BY COUNT(*) DESC) AS wait_rank
  7        FROM  v$active_session_history h
  8        JOIN  dba_users u  USING (user_id)
  9        LEFT OUTER JOIN dba_objects o
 10             ON (o.object_id = h.current_obj#)
 11        LEFT OUTER JOIN v$sql s USING (sql_id)
 12       WHERE (event LIKE '%latch%' or event like '%mutex%')
 13       GROUP BY event,program, h.module, h.action,
 14           object_name,  sql_text, username)
 15  SELECT event,module, username,  object_name, waits,
 16           sql_text
 17  FROM ash_query
 18  WHERE wait_rank < 11
 19* ORDER BY wait_rank
SQL> /

EVENT                     MODULE       USERNAME OBJECT_NAME       WAITS
------------------------- ------------ -------- ------------ ----------
library cache: mutex X    SQL*Plus     OPSG                          13

latch: shared pool        SQL*Plus     OPSG                           8

latch: shared pool        SQL*Plus     OPSG     LT_SALES_PK           3
 begin     latch_test(10000,10000,1000000,10000);  end;

library cache: mutex X    SQL*Plus     OPSG     LT_SALES_PK           2

library cache: mutex X    SQL*Plus     OPSG     LT_SALES              1
 begin     latch_test(10000,1,10000,10000);  end;

latch: shared pool        SQL*Plus     OPSG                           1
SELECT  quantity_sold , amount_sold FROM lt_sales t539564 WH
ERE id BETWEEN 124410 AND 360759

latch: shared pool        SQL*Plus     OPSG                           1
SELECT  quantity_sold , amount_sold FROM lt_sales t539571 WH
ERE id BETWEEN 512313 AND 825315

library cache: mutex X    SQL*Plus     OPSG                           1
SELECT  quantity_sold , amount_sold FROM lt_sales t539563 WH
ERE id BETWEEN 698302 AND 392634

latch: shared pool        SQL*Plus     OPSG     LT_SALES_PK           1
SELECT  quantity_sold , amount_sold FROM lt_sales t539555 WH
ERE id BETWEEN 387009 AND 268338

If you don’t have a Diagnostic Pack license, then you can indirectly identify the SQLs by focusing on those SQLs with the highest concurrency wait times. The concurrency wait class includes most commonly encountered latch and mutex waits, although it also includes some internal locks and buffer waits. However, if you’re encountering high rates of latch contention, it’s a fair bet that the SQLs with the highest concurrency waits are the ones you want to look at.

Listing 16.4 pulls out the SQLs with the highest concurrency waits.

Listing 16.4 Identifying High Contention SQL

SQL> WITH sql_conc_waits AS
  2      (SELECT sql_id, SUBSTR(sql_text, 1, 80) sql_text,
  3              concurrency_wait_time/1000 con_time_ms,
  4              elapsed_time,
  5              ROUND(concurrency_wait_Time * 100 /
  6                  elapsed_time, 2) con_time_pct,
  7              ROUND(concurrency_wait_Time* 100 /
  8                  SUM(concurrency_wait_Time) OVER (), 2) pct_of_con_time,
  9              RANK() OVER (ORDER BY concurrency_wait_Time DESC) ranking
 10         FROM v$sql
 11        WHERE elapsed_time > 0)
 12  SELECT sql_text, con_time_ms, con_time_pct,
 13         pct_of_con_time
 14  FROM sql_conc_waits
 15  WHERE ranking <= 10
 16  ORDER BY ranking  ;

                                                       SQL Conc    % Tot
SQL Text                                 Conc Time(ms)    Time% ConcTime
---------------------------------------- ------------- -------- --------
DECLARE job BINARY_INTEGER := :job; next           899    18.41    44.21
_date DATE := :mydate;  broken BOOLEAN :

select max(data) from log_data where id<           472      .01    23.18

begin   query_loops ( run_seconds=>120 ,           464      .01    22.80
 hi_val =>1000 ,                    use_

update sys.aud$ set action#=:2, returnco           143    75.46     7.02
de=:3, logoff$time=cast(SYS_EXTRACT_UTC

As expected the SQL that generated the latch waits is found (the second and third entries are from a job that generated the latch waits). However, other SQLs—associated with waits for certain internal Oracle locks—are also shown. You’ll need to exercise judgment to determine which SQLs are most likely associated with your latch waits.

  • + Share This
  • 🔖 Save To Your Account