Home > Articles

  • Print
  • + Share This
This chapter is from the book

Monitoring Blocking

The preceding sections have focused first on detecting that blocking is occurring and second on identifying the cause and possible resolutions. They have been geared more toward interactive investigations. However, it is generally not possible to dedicate a database operator to continuous active monitoring of a system. It would also not be efficient. This calls for a way to monitor blocking where alerts can be raised when blocking is encountered or the proper information gets collected automatically. To achieve this, many of the scripts from earlier sections, and some additional ones, can be rolled into a collection that can be run via SQLDiag to monitor blocking and collect the appropriate data. I have included here several script snippets and explanations as to why I would include them in a monitoring script. These can be used to build monitoring stored procedures such as sp_blocker_pssNNN used by SQLDiag and also available from the Microsoft website. As mentioned previously, SQLDiag is now included as part of SQL Server. This tool can collect many of the data points included here out of the box and can be extended to include custom scripts. The level of monitoring can be customized based on specific needs and the availability of CPU cycles to execute the scripts. This is an important concern because some of these script snippets can be somewhat expensive to run or might produce a lot of output that must then be analyzed.

Wait Statistics

It’s always useful to collect wait statistics. These are low-impact queries. Filtering out the innocuous wait types and any zero statistics greatly reduces the output and makes it easier to review. Here’s a sample query:

select *
from amalgam.dm_os_wait_stats_filtered
select *
from sys.dm_os_latch_stats
where waiting_requests_count <> 0

Current Wait Information

Various queries can be run against sys.dm_os_waiting_tasks to collect information on current waiters. The cheapest option is to just include the entire contents of sys.dm_os_waiting_tasks, or amalgam.dm_os_waiting_tasks_filtered, like this:

select * from sys.dm_os_waiting_tasks
select * from amalgam.dm_os_waiting_tasks_filtered
select * from amalgam.dm_os_waiting_tasks_filtered2

A slightly enhanced version includes the current statements and plans for the waiting tasks:

select
     amalgam.current_statement (
          st.dbid, st.objectid, st.encrypted,
st.text,
          er.statement_start_offset,
          er.statement_end_offset)
          as current_statement,
          qp.query_plan,
     wt.*
from amalgam.dm_os_waiting_tasks_filtered wt
     left join sys.dm_exec_requests er
          on wt.waiting_task_address = er.task_address
     outer apply
sys.dm_exec_sql_text (er.sql_handle) st
           outer apply
sys.dm_exec_query_plan (er.plan_handle) qp

And a further enhancement includes the blocking task’s current statement and plan. Remember: Locks might have been acquired by a statement other than the current statement:

select
     amalgam.current_statement (
          st.dbid, st.objectid, st.encrypted,
st.text,
          er.statement_start_offset,
          er.statement_end_offset)
     as waiters_current_statement,
          qp.query_plan,
     amalgam.current_statement (
          stb.dbid, stb.objectid, stb.encrypted,
          stb.text,
          erb.statement_start_offset,
          erb.statement_end_offset)
     as blockers_current_statement,
          qp.query_plan,
     wt.*
from amalgam.dm_os_waiting_tasks_filtered wt
     left join sys.dm_exec_requests er
          on wt.waiting_task_address = er.task_address
     outer apply
sys.dm_exec_sql_text (er.sql_handle) st
           outer apply
sys.dm_exec_query_plan (er.plan_handle) qp
     left join sys.dm_exec_requests erb
          on wt.blocking_task_address =
               erb.task_address
     outer apply
sys.dm_exec_sql_text (erb.sql_handle) stb
           outer apply
sys.dm_exec_query_plan (erb.plan_handle) qpb

These, however, require manual analysis of potentially verbose output. This can be eased by analyzing some of the information at the time of collection.

Often, it is useful to find the hottest resources or wait types. The following queries find all resources and wait types with at least five waiters:

select resource_description,
     additional_resource_description,
     count(*)
from amalgam.dm_os_waiting_tasks_filtered2
where resource_description is not null
group by resource_description, additional_resource_description
having count (*) > 5
select wait_type, count(*)
from amalgam.dm_os_waiting_tasks_filtered
group by wait_type
having count (*) > 5

Long waiters are generally of more concern than short-duration waiters, so it might be useful to call them out; let’s see all waiters that have been waiting more than 10 seconds:

select *
from amalgam.dm_os_waiting_tasks_filtered
where wait_duration_ms > 10000
order by wait_duration_ms desc

The blocking chain also has some interesting information available (for example, head blockers that are blocking a large number of other tasks, and the chains themselves):

select head_blocker_task_address,
     head_blocker_session_id,
     count(*)
from amalgam.blocking_chain
group by head_blocker_task_address,
     head_blocker_session_id
having count(*) > 10
order by count(*) desc
option (maxrecursion 128)
select *
from amalgam.blocking_chain
option (maxrecursion 128)

The index operational statistics are also useful to have when looking for tables with high latch or lock waits:

select top 20 *
from sys.dm_db_index_operational_stats (
null, null, null, null)
order by page_latch_wait_count +
     page_io_latch_wait_count desc
select top 20 *
from sys.dm_exec_query_stats
order by
     (total_physical_reads +
total_logical_reads +
total_logical_writes) /
     execution_count desc

These are a sampling of queries that can prove useful in monitoring and then investigating blocking. Again, much of this can be easily collected using SQL Server’s SQLDiag tool. Obviously, the more information that is available, the easier it is to investigate, but the costlier it is to monitor. The balance depends largely on the extra load that the system can handle without adversely affecting throughput and response times of actual application work.

  • + Share This
  • 🔖 Save To Your Account