Home > Articles > Data > SQL

  • Print
  • + Share This

Defining the Basic Problem

Proper indexing can have a huge effect on database efficiency. Just like an index in a book, an index in a database makes it easy to find data. Instead of paging through the book, hoping to find information about Emu Sisters Productions, you can go directly to the location the index gives you. Now think about what happens when you remove Chapter 3 from a book or add 20 pages to Chapter 7. The book index is invalidated from the point of change on and needs to be rebuilt.

In much the same way, adding, removing, and changing rows in a database leads to index maintenance tasks. Even a relatively small edit, such as changing Emu Sisters to E-Gals, is likely to cause some index differences, as the shorter size of the new entry affects the page splits. If you are changing book text or database rows all the time and you have very detailed indexes, you'll need to get the text pages and index back in synch. So what's good for one kind of work (finding information/querying/decision support) is bad for another (modifying information/updating, inserting, and deleting/transaction processing).

It's not just indexes that show this dichotomy. Database design, storage allocations, memory, and many other choices reflect the work a database is supposed to do. These decisions are beyond the scope of this book, and you're best off looking at information specifically for the system you are using.

However, returning to the question of indexes, there are some general rules the SQL writer can follow to improve performance (at least with sorted or B-tree indexes, common in many systems). First, find out what indexes you have and put together WHERE clauses that can use the indexes rather than WHERE clauses that can't use the indexes. Often, the difference is minor. For example, in just about every system, the queries in Figure 6–1 will give different performance, assuming an index on prodnum and a nontrivial number of rows.

Figure 6-1Figure 6–1. WHERE and Performance

Why? IN processes (a form of OR) are often slow, requiring multiple reads of the table. BETWEEN is easier for most SQL engines to handle. After all, it's just a range. With an index, it's a matter of finding the low boundary and following the index until encountering the high value.

Once you have general guidelines on efficient SQL, you can look for places where you can substitute faster syntax for slower. Sometimes you won't have the option—often IN (or some other "slow" choice) is the best answer. However, as you review code you've written yourself or inherited, you'll find places where you can improve performance with relatively minor changes.

  • + Share This
  • 🔖 Save To Your Account

Related Resources

There are currently no related titles. Please check back later.