Query Optimization
Now that you have query syntax in hand, it's time to create the query. It's at this point when you should think about optimization. Many times it's tempting to "just get the query out there" and then reevaluate it to make it faster. The issue is that far too often you'll forget to come back and correct the query. Even worse, if you create the query incorrectly you might cause related performance issues that are difficult to detect.
So how do you identify an inefficient query? First, you need to understand the route the query takes to satisfy the results. The route includes the various methods the SQL Server engine uses to locate the rows of data, whether that means reading a table from beginning to end or the indexes the query uses. This route is called a "query execution plan." Microsoft SQL Server provides three tools to evaluate a query and the query plan. Two are available inside Query Analyzer and the other is in the SQL Server Profiler tool.
The two methods inside the Query Analyzer tool include a graphical query plan and a text display of that plan. I'll start with the graphical tool and then examine the text you can receive from Query Analyzer, and then I'll explain how to use SQL Server Profiler to see the execution plan as well.
I'll begin by opening Query Analyzer and connecting to the pubs database. Once there, I type a simple query:
SELECT * FROM authors GO
Before I run the query, however, I'll select Show Execution Plan from the Query menu, as shown in Figure 7.1.
One thing before I actually run the query—you can also choose to display an estimated execution plan. This means the system will attempt to predict what might happen as far as the path goes. I don't use this option very often, since there's rarely a reason to do so. Some might argue that a particular query might take too long to run. My pushback to that is that a developer should be working on a development system—preferably a virtual system anyway. So who cares if it takes a while? The other reason I don't use this option is that if your query creates temporary tables it won't display the plan; since the query doesn't really run, it doesn't create the temp tables, and there you are.
Getting back to my example, I press F5 to run the query, and I get the results. I also get an additional tab in the results pane, where you can see the graphical representation of the query plan (see Figure 7.2).
There's quite a bit of information, even in this small display. You read the plan from right to left. If the query is complex, you'll see a lot of information— I recommend you split the larger query into smaller ones until you know what they are doing, and then put them back together a little at a time.
You can see two icons which display the operations the engine used to get the data. I've clicked on one of them and that graphic then displays more detail about that operation.
Notice also the small arrow pointing to the left. It's a small arrow because the operation wasn't that time consuming. You'll notice on larger queries that this arrow will be much thicker. Small clues like that make this a powerful tool.
For instance, if the icon turns red it indicates that the operation could benefit from statistics. Just right-click that red icon and you can create the statistics on the fly! Also—try moving your cursor over the direction arrows. You'll get the number of records that were transferred in that step.
You'll also see within the information box the CPU and I/O costs. Don't put too much stock in the numbers themselves—they just help the query optimizer create a total cost for the query, and don't reflect real world CPU ticks.
Look for the operations that have the highest cost. Those are the steps you want to attack first.
This particular query used a clustered index scan, which means that the query processor satisfied the query by reading all the rows in order from the index, which is really the table. In the case of a clustered index, the data is physically stored in the order of the index, so reading that index turned out to be 100% of the cost of the query. The icon showing a tree of computers with a blue arrow indicates the clustered index scan.
An index scan isn't a great thing—I can do better (see Figure 7.3).
That is better—look closely at the graphics. This time I got an Index seek instead of an Index scan. That's an important distinction—a scan means that the system had to read through all the rows to get the data it was looking for. A seek means that the query processor was able to find what it needed directly from the pointers in the index.
It's similar to having to look through the whole house to find your keys versus knowing to look on the dresser. The reason I'm doing better now is that I've added a WHERE clause. Getting just the data I need makes use of the index properly. As a matter of fact, without an index, you're more often than not going to receive a table scan, which is usually quite bad.
I say usually, because in some cases it's faster for SQL Server to read an entire table than it is to use an index. This is normally the case for any small table, saw below a few hundred rows or so.
I've still got an issue, though, because now I have a Bookmark Lookup icon as well. As a matter of fact, it's half the cost of the entire query. A Boomark Lookup means the system found the rows quickly, but then had to find which columns to bring back. The reason this happened with my query is that I used a SELECT* statement (which you should never do in production). This means all columns, and some aren't covered by the index, so it had to get the columns from the table rather than the index. While this brings back all the columns without my having to bother with figuring out which ones I want, it's very inefficient.
I'll go back to the program to see what's really needed, and I find that I only really want the last name of the author (see Figure 7.4).
That's better. In fact, it doesn't get any better than this. A 100% index seek is exactly what you're after—if you can get it.
You can use Books Online to find the other query plan symbols and what they mean—just look up the topic Graphically Displaying the Execution Plan Using SQL Query Analyzer. Here are the ones to watch out for:
- Index or table scans. As I mentioned, if you're getting a scan, the system has to read the entire table to find the data. You should look for an index to make the query more useful, or consider creating one.
- Sort. A sort happens when you use an ORDER BY on the query. If you need the data in that order, fine. If it's not necessary, however, consider leaving it out.
- Bookmark lookup. As I mentioned earlier, these are often caused by using a SELECT* statement. There is almost never a reason to do this in production systems.
- Filter. This one is a bit trickier. You'll often see these when you use a function, which are sometimes the best way to get the data. Again, see if you can reconstruct the query to use an index or create one if possible.
There's a textual representation of this kind of data available as well. Just type:
SET SHOWPLAN_TEXT ON
And you'll see it. The information is largely the same, but to be honest, the graphical method is best. I'm normally very biased towards command-line operations, but in this case, the graphical plan really does show you more information quickly.
Finally, there's another method to see the query plans. You can use the SQL Server Profiler tool—just capture these events:
Performance: Execution Plan Performance: Show Plan All Performance: Show Plan Statistics Performance: Show Plan Text And then pick these data columns: Start Time Duration Text data
You might want to limit the duration to the larger queries so that you don't get inundated with data.