Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

General Transact-SQL (T-SQL) Performance Tuning, Part 1

Last updated Mar 28, 2003.

Tuning is a complete exercise, involving everything from the client application code in programming languages like C# or Visual Basic, through the network layer, the hardware, the settings on the server and so on. In this section of the SQL Server Guide I’ve explained a strategy you can follow to determine the components in this path and find out where the bottleneck is in your performance.

But at some point you end up in the database, and more specifically, what you’re asking the database to do. That’s what this tutorial is about — the methods and tools you can use to tune your queries.

By the way, most data professionals refer to any statement that is run on a SQL Server as a “query,” even when it doesn’t return any data. It’s just a convenient way to say “Transact-SQL statements.”

Knowledge is Power

As you read through this series of articles, you’ll learn a lot about how queries work in Transact-SQL in SQL Server 2005 and higher, and how you can change them to work faster. This process is called “re-factoring,” and you should learn to do it during the development process. It’s a lot harder to change functional code, so the best time to do this work is before the code goes to production. Rather than focus on specific query problems, I’ll explain some basic ideas that detail how the query process works, and then you can extrapolate that out to your environment. The general ideas hold for SQL Server 2000 and earlier, but those versions don’t have the tools I’ll show you here.

As always, you should run everything you see here on your test or development workstation. The commands I’ll show you, especially for the testing part of the exercises, will actually slow down your queries in production, so don’t do any of this work there. You should also download and install the sample databases for SQL Server, as I’ve explained in previous tutorials. That will allow you to work through the examples I have here, and then once you’re comfortable you can test out the concepts on your production code.

So how does a query actually work? What happens between the time you send a statement to the server and when you get the results back? Quite a bit.

I’ll assume that you’re on a system separate from your server. While the process is largely the same even if you’re running the queries on the server, adding in the client and network layer does have an impact. The point is that your application will format up the query using the SQL Server “library” on the client system. All Microsoft operating systems have the libraries to communicate with a SQL Server system, but unless newer libraries have been added by an installation or administrator, the communication won’t be as optimized as it could be. So the first step in tuning is to understand what the client environment looks like.

This also holds true for the network library. Once the code formats the request for the SQL Server library, it is bundled onto the network layer. Based on the default protocol the user has in place, and whether or not the developer chooses another, you’ll see different behavior in the application. This has to do with the size of the network transmission (often called a “packet”) and the error-handling for the network.

This process continues on the server end of the transaction. The network library intercepts the packet, and the sends it on to the SQL Server Engine for processing. There are a series of Engine components that dissect the query and send it on to the right place.

It might surprise you to learn that SQL Server does not use the Transact-SQL statements that you send in the same order that you send them. In fact, the T-SQL is turned into something else entirely. And it is processed in this order:

  1. FROM
  2. ON
  3. OUTER JOIN
  4. WHERE
  5. GROUP BY
  6. CUBE and ROLLUP
  7. HAVING
  8. SELECT
  9. DISTINCT
  10. ORDER BY
  11. TOP

When you send a query request, a set of “Virtual Tables” are created that contain this information. The Query Processor uses the order above to see if the data is already available in a series of “caches,” or memory locations. If the data, or the way to get to the data, is there, it sends the data back or uses the “Query Plan” to quickly locate the data. If the data isn’t in the cache, it reads from the disk (and I/O operation). If the Query Plan (just called the “Plan”) isn’t in memory cache, it creates one, pushing the least used Plan out of memory.

So you can see that memory makes a difference, as does the frequency of the data access. If you have the right amount of memory, then the queries get faster as the users run them!

As you can also see from the order of operations above, the data you are retrieving matters as well. It’s best to make everything as “selective” (meaning unique) as possible.

There are lots of ways to look at the cache to see what is available, how the memory is being used and so on, but I’ll leave that for another tutorial. For now, let’s set up an environment so that you can test a query or two to see how things are working. Again, don’t do this on production! You’ll trash your performance there, which is exactly what you’re trying NOT to do.

The Performance Testing Process

If you want to follow along, open a query tool (I’m using SQL Server Management Studio here) and run the statements as I show them. The first thing I do is to force SQL Server to commit all transactions that are “in flight” on my system. I use the CHECKPOINT command for that. The next thing I do is to clear out the cache and all of the memory buffers to make sure I start with a “blank slate.” I use the DBCC commands of FREEPROCCACHE and DROPCLEANBUFFERS to do that. Here are the commands I run when I start my tuning tests:

/* Performance Tuning.sql
Purpose: Demonstrates Performance Tuning for Queries
Author: Buck Woody
Last Edited: 11/15/2008
Notes:
I’ll use the AdventureWorks database for the tests, 
since it has a lot of data and some good indexes.
 */
USE AdventureWorks;
GO
/* First, I need to clear everything out to have a "clean" testing environment */
CHECKPOINT;
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
GO

At this point, I want to start testing my queries. The tables I’m working with might already have Indexes or not, but I’ll leave that discussion alone for a moment - I’ll come back to it later. Right now I just want to show you how you can find out what your queries are doing.

The first thing you normally care about is how fast the queries run. In SQL Server Management Studio (SSMS) you can just look at the bottom of the screen - everything is right there. But you can also display the time your queries are taking by using a few statements to return the time it takes for a query to run. These responses will come back in the “Messages” tab in SSMS:

/* Get info on the query */
SET STATISTICS IO ON
SET STATISTICS TIME ON
-- For example:
SELECT *
FROM Person.Contact;
GO
/* Remember to turn all of that off */
SET STATISTICS IO OFF;
GO
SET STATISTICS TIME OFF;
GO

I run a query this way, and then make my changes (perhaps with the order of the SELECT, a different JOIN or ORDER BY statement and so on, and then run it again.

If you want to perform multiple queries with those differences in one statement, you can “wrap” the statements using a couple of variables where you store the start and stop times. You create the “start” variable, the “end” variable, and then initialize the “start” variable at the top of your code block. You then initialize the “stop” variable at the endo of the run and then use a simple comparison to show the time.

The convenient thing about this method is that you can make several variables to check different sections of your code. You can also store these values in another table for comparison later.

Here’s all of that put into a single script:

/* If you're not in SSMS, this query will measure the time */
DECLARE @BlockStart datetime 
DECLARE @BlockEnd datetime 
-- At the beginning: 
SET @BlockStart = GETDATE()
-- First Code here
SELECT *
FROM Person.Contact
-- At the end:
SET @BlockEnd = GETDATE()
-- Show the difference in Milliseconds
SELECT DATEDIFF(ms, @BlockStart, @BlockEnd) AS 'First Run:'

-- At the beginning: 
SET @BlockStart = GETDATE()
-- Changed code:
SELECT FirstName, LastName
FROM Person.Contact
-- At the end:
SET @BlockEnd = GETDATE()
-- Show the difference in Milliseconds
SELECT DATEDIFF(ms, @BlockStart, @BlockEnd) AS 'Second Run:' 

On my system, this simple test tells me that by not selecting everything (the * part) I saved around 600 milliseconds. While that might not seem like very much, it really adds up when you have thousands of queries a minute running against your data.

In the next article, I’ll explain how you can look at the Query Plan to see the details of what SQL Server is doing to process the query, and how you can use that information to tune the query. In the tutorials that follow I’ll show you the major “problem areas” in most queries so that you can find and fix your performance problems.

InformIT Articles and Sample Chapters

You should read the complete series I have on performance tuning here in the InformIT SQL Server Reference Guide if you’ve come into this tutorial without previously having read it.

Books and eBooks

You can’t do better than SQL Performance Tuning by Peter Gulutzan and Trudy Pelzer. (Read in Safari Books Online)

Online Resources

Here’s the best SQL Server Performance Tuning site on the planet.