Identifying the Cause of Blocking
After blocking has been discovered, the next step is to identify the cause. The root cause is often specific to a particular wait type, but several shared concepts and operations help determine the root cause.
Current Statements and Plans
Although the name of the wait type and the wait description can help identify the component and resource in which blocking is occurring, the current statements for both the blocked and blocking tasks are useful to have for gathering more information. The current statement for both sessions is available from the sys.dm_exec_requests DMV. This works only for sessions that are currently executing. It is not possible, using DMVs, to find the text of the last batch for a session, although you can use DBCC INPUTBUFFER in many circumstances. sys.dm_exec_sessions does, however, contain the time the last batch was issued. You can use this to determine a minimum duration since the earliest possible start of blocking. Some blocking, especially lock-based blocking, is often affected by the query plans chosen. The current query plan is also available from sys.dm_exec_requests. The amalgam.current_statements_and_plans view included on the CD accompanying this book hides some of the messy details.
For monitoring purposes, it is useful to note that the source of the current statement text, sys.dm_exec_requests.sql_handle, is a durable handle within an instance of SQL Server. This means that the same sql_handle value is always used for the same batch on a particular instance of SQL Server. Note, however, that to retrieve the actual text corresponding to the sql_handle, the batch must be in cache. For dynamic SQL, the sql_handle can even be compared across SQL Server instances. For objects, such as stored procedures and functions, the sql_handle is derived from the database and object IDs, so it varies across instances.
A word of caution is in order regarding the current statement and query plan for the blocking tasks. A blocking task’s current statement may not be the actual cause of the blocking. Given that some locks are held longer than a single statement, this is particularly the case with locks where the lock that is causing the blocking could have been acquired by any statement in the blocking task’s transaction, not just the current statement.
The output of sys.dm_os_waiting_tasks may exhibit several blocking patterns. These are relatively easy to identity, and their classification can be useful in determining which blocking issues to investigate first.
The first pattern is that of a single long wait. This is characterized by one task (or a small number of tasks) waiting on a unique wait type and resource for a long time. This type of blocking might not have a significant effect on the server’s throughput, but it can dramatically increase the response time for the blocked query. Externally the response time delay could have more-significant effects. The following query lists waits that have lasted longer than ten seconds. The threshold level is relative and should be adjusted to the tolerances of each individual system. You can further adjust it to specific levels based on the wait type or some other qualifier:
select * from amalgam.dm_os_waiting_tasks_filtered -- adjust the following threshold as appropriate where wait_duration_ms > 10000 order by wait_duration_ms desc
The next pattern is a large number of direct waits for a single resource. There is no extended blocking chain in this case—each task is blocked by the same single task. This blocking may start having an effect on the server throughput, because fewer workers are available to process the remainder of the workload. Response time is also affected for all queries. This pattern is often a hot spot, and because all contention is on a single resource, this type of blocking is generally relatively easy to resolve.
The third pattern is a large number of single waits on different resources. These can be much harder to handle than the previous categories, because there is no clear target to investigate. One option is to do short investigations of each of these waits. This might uncover a common trait between them. When this category is coupled with short wait times, it may be better to use statistical approaches to determining the cause of the blocking.
The final pattern is a blocking chain with multiple levels of blocking. Each level may have different types of waits as well as categories of blocking. This pattern is effectively a combination of the preceding three categories, and, as such, each level of blocking can be investigated separately.
As mentioned earlier, not all blocking is created equal. Some forms of blocking affect throughput or response times more than others. One measure of this is the number of other tasks blocked by a given blocking task. Some of these tasks are directly blocked by the head blockers; others are blocked indirectly. Indirect blocking is blocking where task T2 is blocked by T1 and T3 is blocked by T2, so T3 is indirectly blocked by T1 because it cannot proceed until T2 can proceed, which cannot occur until T1 unblocks T2.
Finding head blockers is deceptively simple: Just find all blocking tasks that are not blocked themselves. The task is made slightly more complex by the fact that not all waits have blocking information. Thus, if task T2 is blocked by T1, which itself is waiting but has no identified blocker, who should be marked as the head blocker? In order not to lose sight of this type of blocking chain, it is useful to consider tasks that are blocking others but do not have an identified blocker themselves as head blockers. This is especially the case for voluntary waits such as WAITFOR queries where there truly is no blocker:
create view amalgam.head_blockers as select blocking_task_address as head_blocker_task_address, blocking_session_id as head_blocker_session_id from sys.dm_os_waiting_tasks where blocking_task_address is not null OR blocking_session_id is not null except select waiting_task_address, session_id from sys.dm_os_waiting_tasks where blocking_task_address is not null OR blocking_session_id is not null go
Note two important aspects about this query. First, it qualifies a task/session as a head blocker if it is blocking some task and it itself does not have an identified blocker—in fact, it may not even exist in the DMV as a waiter. Second, not all waiting tasks without an identified blocker are head blockers. This is the case for wait types that do not provide the ability to determine the blocker, in which case the head blocker may be waiting for a resource, but the identity of the blocker cannot be determined.
You can use the list of head blockers to calculate the number of direct and indirect blockers. This measure is one factor to consider when deciding which blocking chains to tackle first. You can use the amalgam.blocking_chain view to calculate the direct blocking counts for any blocker, and indirect blocking counts for head blockers. The view uses a recursive common table expression (CTE). The indirect count relies on the fact that at every level of the blocking chain, the head blocker information is maintained. Because of the possibility that any given snapshot of sys.dm_os_waiting_tasks may contain blocking chain cycles, the maxrecursion option should always be specified. This option could not be included as part of the view because option clauses are not allowed in views. Blocking chain cycles may exist due to three reasons:
- A deadlock exists, but the deadlock monitor has not yet detected it.
- A deadlock exists, but it cannot be detected, because it involves resources that do not participate in deadlock detection but populate the blocking information. This is the case, for example, with latches.
- No deadlock exists, but it appears as if one does exist (due to timing conditions when sys.dm_os_waiting_tasks was materialized).
-- Count of directly blocked tasks per blocker -- select blocking_task_address, blocking_session_id, count(*) as directly_blocked_tasks from amalgam.blocking_chain group by blocking_task_address, blocking_session_id option (maxrecursion 128) -- Count of indirectly blocked tasks for each -- head blocker -- select head_blocker_task_address, head_blocker_session_id, count(*) as indirectly_blocked_tasks from amalgam.blocking_chain group by head_blocker_task_address, head_blocker_session_id option (maxrecursion 128)