Table of Contents
- Microsoft SQL Server Defined
- Microsoft SQL Server Features
- Microsoft SQL Server Administration
Microsoft SQL Server Programming
- An Outline for Development
- Database Services
- Database Objects: Databases
- Database Objects: Tables
- Database Objects: Table Relationships
- Database Objects: Keys
- Database Objects: Constraints
- Database Objects: Data Types
- Database Objects: Views
- Database Objects: Stored Procedures
- Database Objects: Indexes
- Database Objects: User Defined Functions
- Database Objects: Triggers
- Database Design: Requirements, Entities, and Attributes
- Business Process Model Notation (BPMN) and the Data Professional
- Business Questions for Database Design, Part One
- Business Questions for Database Design, Part Two
- Database Design: Finalizing Requirements and Defining Relationships
- Database Design: Creating an Entity Relationship Diagram
- Database Design: The Logical ERD
- Database Design: Adjusting The Model
- Database Design: Normalizing the Model
- Creating The Physical Model
- Database Design: Changing Attributes to Columns
- Database Design: Creating The Physical Database
- Database Design Example: Curriculum Vitae
- The SQL Server Sample Databases
- The SQL Server Sample Databases: pubs
- The SQL Server Sample Databases: NorthWind
- The SQL Server Sample Databases: AdventureWorks
- The SQL Server Sample Databases: Adventureworks Derivatives
- UniversalDB: The Demo and Testing Database, Part 1
- UniversalDB: The Demo and Testing Database, Part 2
- UniversalDB: The Demo and Testing Database, Part 3
- UniversalDB: The Demo and Testing Database, Part 4
- Getting Started with Transact-SQL
- Transact-SQL: Data Definition Language (DDL) Basics
- Transact-SQL: Limiting Results
- Transact-SQL: More Operators
- Transact-SQL: Ordering and Aggregating Data
- Transact-SQL: Subqueries
- Transact-SQL: Joins
- Transact-SQL: Complex Joins - Building a View with Multiple JOINs
- Transact-SQL: Inserts, Updates, and Deletes
- An Introduction to the CLR in SQL Server 2005
- Design Elements Part 1: Programming Flow Overview, Code Format and Commenting your Code
- Design Elements Part 2: Controlling SQL's Scope
- Design Elements Part 3: Error Handling
- Design Elements Part 4: Variables
- Design Elements Part 5: Where Does The Code Live?
- Design Elements Part 6: Math Operators and Functions
- Design Elements Part 7: Statistical Functions
- Design Elements Part 8: Summarization Statistical Algorithms
- Design Elements Part 9:Representing Data with Statistical Algorithms
- Design Elements Part 10: Interpreting the Data—Regression
- Design Elements Part 11: String Manipulation
- Design Elements Part 12: Loops
- Design Elements Part 13: Recursion
- Design Elements Part 14: Arrays
- Design Elements Part 15: Event-Driven Programming Vs. Scheduled Processes
- Design Elements Part 16: Event-Driven Programming
- Design Elements Part 17: Program Flow
- Forming Queries Part 1: Design
- Forming Queries Part 2: Query Basics
- Forming Queries Part 3: Query Optimization
- Forming Queries Part 4: SET Options
- Forming Queries Part 5: Table Optimization Hints
- Using SQL Server Templates
- Transact-SQL Unit Testing
- Index Tuning Wizard
- Unicode and SQL Server
- SQL Server Development Tools
- The SQL Server Transact-SQL Debugger
- The Transact-SQL Debugger, Part 2
- Basic Troubleshooting for Transact-SQL Code
- An Introduction to Spatial Data in SQL Server 2008
- Performance Tuning
- Practical Applications
- Professional Development
- Application Architecture Assessments
- Business Intelligence
- Tips and Troubleshooting
- Additional Resources
Forming Queries Part 3: Query Optimization
Last updated Mar 28, 2003.
We're in our final article in the "Forming Queries" series.
Now that you have the syntax in hand, it's time to create the query. It's at this point when you should think about optimization. Many times it's tempting to "just get the query out there" and re-evaluate it later to make it faster. Far too often, you'll forget to come back to correct the query. Worse, if you create the query incorrectly, you might cause related performance issues that are difficult to detect.
So how do you identify an inefficient query? First, you need to understand the route the query takes to satisfy the results. This route, called a query execution plan, includes the various methods the SQL Server engine uses to locate the rows of data, whether that means reading a table from beginning to end or the indexes the query uses.
The two methods inside the Query Analyzer tool include a graphical query plan and a text display of that plan. We'll start with the graphical tool and then examine the text you can receive from Query Analyzer, and then I'll explain how to use SQL Server Profiler to see the execution plan as well.
I'll begin by opening Query Analyzer and connecting to the pubs database. Once there, I type a simple query:
SELECT * FROM authors GO
Before I run the query, however, I'll select the "Show Execution Plan" from the "Query" menu:
One thing before I actually run the query: you can also choose to display an estimated execution plan. This means the system will attempt to predict what might happen as far as the path goes. I don't use this option very often, since there's rarely a reason to do so.
Some might argue that a particular query might take too long to run. My pushback is that a developer should be working on a development system preferably a virtual system anyway. So who cares if it takes a while? The other reason I don't use this option is that if your query creates temporary tables, it won't display the plan; since the query doesn't really run, it doesn't create the temp tables, and there you are.
Getting back to our example, I press F5 to run the query, and I get the results. I also get an additional tab in the results pane, where I can see the graphical representation of the query plan:
There's quite a bit of information, even in this small display. You read the plan from right to left. If the query is complex, you'll see a lot of information. I recommend that you split the larger query into smaller ones until you know what they are doing, and then put them back together a little at a time.
You can see two icons which display the operations the engine used to get the data. I've clicked on one of them; that graphic then displays more detail about that operation.
Notice the small arrow pointing to the left. It's a small arrow because the operation wasn't that time consuming. On larger queries, this arrow will be much thicker.
Such small clues make this a powerful tool. For instance, if the icon turns red, it indicates that the operation could benefit from statistics. Right-click on the red icon, and you can create the statistics on the fly! Also, try moving your cursor over the direction arrows. You'll get the number of records that were transferred in that step.
You'll also see within the information box the CPU and I/O costs. Don't put too much stock in the numbers themselves they just help the query optimizer create a total cost for the query, and don't reflect real world CPU ticks.
Look for the operations with the highest cost. Those are the steps to attack first.
This particular query used a clustered index scan, which means that the query processor satisfied the query by reading all the rows in order from the index, which is really the table. In the case of a clustered index, the data is physically stored in the order of the index, so reading that index turned out to be 100% of the cost of the query. The icon showing a tree of computers with a blue arrow indicates the clustered index scan.
An index scan isn't a great thing. We can do better.
That is better - look closely at the graphics. This time, we got an Index Seek instead of an Index Scan. That's an important distinction. A scan means that the system had to read through all the rows to get the data it was looking for. A seek means that the query processor found what it needed directly from the pointers in the index.
It's similar to having to look through the whole house to find your keys, versus knowing to look on the dresser. The reason we're doing better now is that I've added a WHERE clause. Getting just the data we need makes use of the index properly. As a matter of fact, without an index, you're more often than not going to receive a table scan, which is usually quite bad.
I say usually, because in some cases it's faster for SQL Server to read an entire table than it is to use an index. This is normally the case for any small table, say under a few hundred rows.
We've still got an issue, though, because now we have a Bookmark Lookup icon as well. As a matter of fact, it's half the cost of the entire query. A Boomark Lookup means the system found the rows quickly, but then had to find which columns to bring back. The reason this happened with my query is that I used a SELECT * statement (which you should never do in production). This means all columns, and some aren't covered by the index, so it had to get the columns from the table rather than the index. While this brings back all the columns without my having to bother with figuring out which ones I want, it's very inefficient.
I'll go back to the program to see what's really needed, and I find that I only really want the last name of the author.
That's better. In fact, it doesn't get any better than this. A 100% index seek is exactly what you're after - if you can get it.
You can use Books Online to find other query plan symbols and what they mean; look up the topic "Graphically Displaying the Execution Plan Using SQL Query Analyzer." Here are some issues to watch out for:
Index or Table Scans
If you're getting a scan, the system has to read the entire table to find the data. You should look for an index to make the query more useful, or consider creating one.
A sort happens when you use an ORDER BY on the query. If you need the data in that order, fine. If it's not necessary, however, consider leaving it out.
As I mentioned earlier, these are often caused by using a SELECT * statement. There is almost never a reason to do this in production systems.
This one is a bit trickier. You'll often see these when you use a function, which are sometimes the best way to get the data. Again, see if you can reconstruct the query to use an index, or create one if possible.
A textual representation of this kind of data is available as well. Type:
SET SHOWPLAN_TEXT ON
The information is largely the same, but the graphical method is better. I'm normally biased towards command-line operations, but in this case, the graphical plan really does show you more information quickly.
Finally, there's another method to see the query plans. You can use the SQL Server Profiler tool just capture these events:
Performance: Execution Plan
Performance: Show Plan All
Performance: Show Plan Statistics
Performance: Show Plan Text
And then pick these data columns:
You might want to limit the duration to the larger queries so that you don't get inundated with data.
The most awesome site for database and query optimization is http://www.sql-server-performance.com.
InformIT Tutorials and Sample Chapters
Using views with your tables? Check out the article by Andy Baron and Mary Chipman called Creating and Optimizing Views in SQL Server.