Home > Articles > Data > SQL

  • Print
  • + Share This
From the author of

Using EXPLAIN to Optimize Databases

The EXPLAIN command is one of the most useful tools in a DBA's belt for uncovering and resolving performance issues. It describes the execution plan for the supplied query and returns the following information:

  • Starting cost. This is an estimation of how much time elapsed before an output scan began. Typically this number is greater than 0 if it was waiting on another query to complete before it could begin—as in the case with subselects and joins.

  • Total cost. This is an estimation of how much time would be spent if all rows were returned. This occurs, regardless of whether any other factors—such as a LIMIT statement—would have prevented all rows from being returned.

  • Output rows. This is an estimation of the number of rows returned.

  • Estimated average width. This is an estimation of the width, in bytes, of an output row.

The time units mentioned in the preceding bulleted list are not related to an objective amount of time, but are an indication of how many disk-page fetches would be needed to complete the request.

The following code shows the basic use of the EXPLAIN command on the table authors:

EXPLAIN SELECT * FROM authors;

NOTICE: QUERY PLAN
Seq Scan on authors (cost=0.00..92.10 rows=5510 width=20)

In the preceding code, the EXPLAIN statement lists the following estimations:

  • Use a sequential scan (as opposed to an index)

  • No delay on start time

  • A cost of 92.10 for delivering the entire query

  • An estimated 5,510 rows returned

  • Average width of row is 20 bytes

Modifying the query produces different results:

EXPLAIN SELECT * FROM authors WHERE age<10000;

NOTICE: QUERY PLAN
Seq Scan on authors (cost=0.00..102.50 rows=5510 width=20)

In this code, you can see that the total cost increased slightly. It is interesting to note that although there is an index on age on this table, the query planner is still using a sequential scan. This is because our search criterion is so broad that an index scan would not be of any benefit. (Obviously, all values in the age column are less than 10,000.)

If we constrain our search criterion more, we can see even more changes:

EXPLAIN SELECT * FROM authors WHERE age<30;

NOTICE: QUERY PLAN
Index Scan using age_idx on authors 
        (cost=0.00..32.20 rows=991 width=20)

A number of things are interesting about this result. First, we have constrained our criteria enough to force the query planner to make use of the age_idx index. Secondly, both the total cost and the number of returned rows are dramatically reduced.

Using EXPLAIN on more complex queries can sometimes illuminate potential problems with the underlying database structure. For instance, consider the following join query:

EXPLAIN SELECT * FROM authors, payroll WHERE 
   authors.name=payroll.name;

NOTICE: QUERY PLAN

Merge Join (cost=69.83..425.08 rows=85134 width=36)
      ->Index Scan using name_idx on authors 
           (cost=0.00..273.80 rows=5510 width=20)
      ->Sort (cost=69.83..69.83 rows=1000 width=16)
         ->Seq scan on payroll 
              (cost=0.00..20.00 rows=1000, width=16)

This output produces some interesting facts about our underlying database structure. Obviously, the authors table has an index on name, but the payroll table appears to be resorting to using sequential scans and sorts to match fields.

After investigating, it is determined that the payroll table does not have an appropriate index for this join. After an index is created, we get the following results:

CREATE INDEX pr_name_idx ON payroll(name);

EXPLAIN SELECT * FROM authors, payroll WHERE 
   authors.name=payroll.name;

NOTICE: QUERY PLAN

Merge Join (cost=0.00..350.08 rows=44134 width=36)
   ->Index Scan using name_idx on authors 
        (cost=0.00..273.86 rows=5510 width=20)
   ->Index Scan using pr_name_idx on payroll 
        (cost=0.00..29.50 rows=500 width=16)

By including an index on the payroll table, we have now achieved a 25-percent increase in query execution.

Running EXPLAIN on your queries is a good way to uncover hidden bottlenecks that are negatively impacting system performance. In fact, if a performance problem is not hardware related, the EXPLAIN command is probably the best tool that a DBA can use to solve it. EXPLAIN can provide the data necessary to optimize your queries and indexes for greater system performance.

  • + Share This
  • 🔖 Save To Your Account