Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

Performance Tuning: Introduction to Indexes

Last updated Mar 28, 2003.

The Japanese art of Bonsai (pronounced bone-sigh) involves growing very small trees in a minimal amount of soil. The trees they produce are stunningly beautiful, and getting them to look this way and thrive in a very unnatural environment involves a lot of skill. The skill you need to successfully raise a bonsai has many parts, but the one that is the most important is something you might not think of — watering. Because the tree is a small size, and the soil is very light, it's quite easy to kill the tree - or make it thrive - using only the watering can.

It's a similar concept to Indexes in SQL Server. An Index, which I'll explain more about in a moment, is one of the primary tools you have for increasing performance in your database system. An improper Index, on the other hand, can slow your system to a crawl — I've seen this happen myself many times.

I'll start by explaining what an Index is, then how it is used by the system, and then I'll explain how you can make your own. From there, it's important to monitor and maintain the Index — it's not a "set it and forget it" kind of thing.

An Index on a SQL Server database (or more accurately, on a table) is used much as an Index is used in a book. Imagine that I've asked you to locate all of the times the word "SQL" is used in a chapter of a book. Now imagine that you have to do that for the entire book. "Simple," you think, "I'll just look it up in the Index, and then write down the page numbers." That would be the right thing to do. An Index in a book is simply a list of certain terms used in the book, and the page numbers where those words are used. This saves you time in trying to find the words, by at least getting you to the right page.

SQL Server indexes aren't too far removed from that concept. The data in SQL Server is stored in unit of space on the hard disk called a page. A page is a block of space that is defined by a unique number. If SQL Server is asked to look up data, it reads across these pages to find all the times the data you're looking for is stored. It then accesses those page numbers to display the data. This is called a scan, and unless the table is small (say, under 1,000 rows), it's pretty slow. It's a lot like finding the words in a book by reading each and every page, every single time you're asked to find a word.

When you place an Index on a table, the server scans all the pages and records the reference number of the page where those groups of items are. For instance, if the data being indexed is alphabetical, then it might store the page numbers where all the "A through Cs" are, and so forth. It can then use the Index on the table to quickly locate the page the data is on, just as you would in a book. Indexes effectively limit the data being searched, reducing the time spent on the search.

You can place many indexes on a table: 249 in SQL Server 2000, and 1,000 in SQL Server 2008, and of course that number will just increase over time. The Index "covers" or includes one or more columns, the ones you think might be used to look up data most of the time. In other words, you choose the data that will be indexed in a table, by creating an Index that stores certain columns. Choosing those columns is where the real challenge comes in. I'll explain how to do that in a moment.

Normally, Primary Keys are always covered by an Index. Other columns might be candidates for an Index as well. For instance, in a table that stores names, the "LastName" field would be a good Index candidate, because in many cases data about people is queried by their last name. The art of Index design is the tradeoff between an Index that covers as many columns as possible — up to a point.

That point involves the tradeoff between reads and writes. My "book" analogy kind of falls apart here, when compared with a database, since once a book is printed it doesn't change — at least that particular copy. In a database, however, the contents (data) changes all the time.

As database changes are made, the Index must be updated, or it becomes useless. If a table's column is read from a great deal, then indexes are great, because there aren't many changes — so the activity to update the index isn't as frequent.

If, however, the indexed column in the table is written to a great deal, Indexes aren't as great, because now the system has to update both the table and the Index. The problem arises in that most tables are both heavily read and heavily written.

There are two classes of indexes on a table: Clustered and Non-Clustered. (That kind of reminds me of that statement that says "There are two categories of people: Those who put people in two categories and those who don't.") Anyway, the Clustered Index puts data into a physical order. The non-clustered doesn't.

Let's take an example. Here is some data:

2
4
3
5
6
1

A clustered Index on this data conceptually rearranges the table itself, and stores the data physically this way — in fact, in a Clustered Index the Table itself is also the Index — because it's arranged this way on disk: (again, see the links above for a more accurate representation here):

1
2
3
4
5
6

The Non-Clustered Index leaves the data order in the table alone, such that it's still stored like this:

2
4
3
5
6
1

And then it makes another table that stores the "pointers" to the indexed data. This is where the database Index is most like a book. So a Non-Clustered Index is actually another table, pointing to a set of page numbers of the first.

As you can probably guess, there's only one Clustered Index on a table, since the table can only be stored once. This type of Index is pretty fast for data retrieval, assuming that you're looking for something that the Index covers. Once you leave the safety of the cover, the access can slow down again. A disadvantage to the clustered Index is that, when data is added, the table must be physically rearranged to put the data in the right spot. That can become costly if the table is written to very often.

So the decisions involved in designing an Index boil down to these areas:

  • How big is the data?
  • How often is the data updated?
  • What fields are normally used in queries?
  • How is the data shaped?

I'll devote the rest of this article to helping you tie out basic Index design to the answers to these questions. I'll cover a lot more about indexes in other tutorials.

How Big Is The Data?

If the data isn't very large, SQL Server will always scan the table anyway. If SQL Server is bound and determined to scan the data, why create the overhead of an Index?

What is that size limit? Well, there are few caveats, but by and large if the table is composed of fewer than 1,000 rows SQL Server will opt to scan it rather than use an Index. It just doesn't make sense to do several I/O (Input/Output) operations to get at the data through an Index when the whole table can be read quickly in one pass.

You can Query Analyzer (SQL Server 2000) or SQL Server Management Studio to find out if you're doing a table scan even when you've created an Index. Type in the query in question, and then select the Query menu. Select the Show Execution Plan item, and you'll get a tab to see the path your query takes. You can hover over an icon to see more info about it. You can read those icons to see if the Index is being used or not. If it isn't, then you should remove it. I have more on that tool here.

How Often Is The Data Updated?

Indexes benefit read operations, but not write operations. If your data is written more often than its read, you'll benefit less from an Index. The reason is that when you write new data, the Index must be updated to show where that data is.

If the Index is clustered, then the entire table has to move around to accommodate the new data, unless it is written at the end of the table. That's why it's best to save the clustered Index for something like the Primary Key or some other always-increasing, sequential value.

Even if the Index is non-clustered, and is another structure entirely, the Index has to be written to so that the new data is tracked. This takes I/O cycles, and slows down your system.

That's a bit of a generalization, because even a write can use an Index. That's possible because some writes are done through stored procedures — and these may look up data before they write others. These are the kinds of issues you must think through before you create an Index.

But sometimes even those write operation costs are outweighed if there are more reads against that data. There are various tools that can help you determine how well your indexes are used, and if they are helping or hurting. I'll cover these concepts in the Performance Tuning section of this guide.

What Fields Are Normally Used in Your Queries?

This is one of the most important questions to ask, and finding the answer will involve the DBA, the development team, and the users. You can use the SQL Profiler, Query Analyzer, and the Index Wizard to see what indexes might benefit your queries. I have tutorials on all of those in the Performance Tuning section.

If you're the one that defined your database design, then you'll probably know this information. If another developer created it, then you'll need to work with them, even if they are a vendor, to determine the best columns for your indexes.

How Is The Data Shaped?

What this question is asking is the type of data, the ranges of data, and the size of the data that the table stores. If, for instance, the data is updated frequently, you'll want to create your indexes with a fill-factor that leaves room on each page to add more data, without having to create more pages. If the data is read more often, the fill-factor can be left at a more "full" level.

Creating an Index

Once you've determined the columns to Index, there are several ways to create and modify your indexes. The methods fall into two camps: Automated and Manual processes.

If you're just starting out, you can certainly benefit from the automated methods of Index creation. The first of these is the Create Index Wizard (in SQL Server 2000) and the Database Tuning Advisor (in SQL Server 2005 and higher). These tools can either watch the activity on the server or create one to determine which tables need indexes. It will also create them for you. It's often a good place to start — assuming that you'll come back in later and make sure that the indexes that were created are still valid for production. In other words, you should never just run the tools and use the suggestions blindly. Check out the articles on those to find out where they help — and where they cannot.

You can also find graphical helps in the Query Analyzer tool in SQL Server 2000. To use this tool, open Query Analyzer, then type the query in question, and then select the Query menu. Select the Show Execution Plan item, and you'll get a tab to see the path your query takes. Once you've run the query, right-click in the Estimated Execution Tab of Query Analyzer, and then select Manage Indexes. For SQL Server 2005 and higher, just use the menu at the top of the Query Window to find these options before you run the query, and then you'll get an extra tab with those results. Once again, see the previous article I have on the Execution Plan for much more information on that process.

In SQL Server 2008 and higher, the Activity Monitor can also help you with finding missing Indexes, but not directly. Follow the information I show here for more about that process.

There are also a set of Dynamic Management Views you can use in SQL Server 2005 and higher that have more information on the Indexes your system has or needs. See this article for more. Once again, these tools should be used as starting points, not blind application. You need to ensure that you understand

The manual methods of creating indexes aren't used for designing indexes; they're used to implement designs. Of course, the main manual method is the CREATE INDEX T-SQL statement. The format looks like this:

CREATE [ UNIQUE ] [ CLUSTERED | NONCLUSTERED ] INDEX Index_name 
 ON { table | view } ( column [ ASC | DESC ] [ ,...n ] ) 
[ WITH < Index_option > [ ,...n] ] 
[ ON filegroup ]
< Index_option > :: = 
 { PAD_INDEX | 
  FILLFACTOR = fillfactor | 
  IGNORE_DUP_KEY | 
  DROP_EXISTING | 
 STATISTICS_NORECOMPUTE | 
 SORT_IN_TEMPDB 
}

Check Books Online for more information on this command. I cover it again in my tutorials on creating Indexes. Here's a simple example that creates a Non-Clustered Index on a table called "Person" in a database I have. I want to cover the PersonName and PersonType Columns, since those are the ones queried most often:

CREATE NONCLUSTERED INDEX IX_PersonInfo
    ON Person (PersonName, PersonType);
GO

The actual use of the Index is regulated by a component of SQL Server called the Query Optimizer. The Query Optimizer decides things like whether to use a particular Index or just scan the table, which processors can handle the query and the like. You don't normally tune the Query Optimizer, but you can tell it to favor one factor or another with a "hint" in your Transact-SQL code. I'll cover that in another tutorial as well.

Because the indexes change so rapidly over time, they can become fragmented, much like the blocks on an operating system's hard drive. There are various maintenance commands you can run to either move those entries around, or to rebuild them entirely. The point is that there is indeed maintenance that you will have to do on your indexes. I've covered the basics in my section on Maintenance and The Maintenance Wizard on this site.

Primarily the Index Maintenance involves re-ordering the physical pages where the Indexes are on disk, and how full to make those pages. Let me explain those briefly.

Recall that unlike a book, a SQL Server Index is updated as data changes constantly. Rather than taking the time to arrange all those pages on disk in the most optimal way, the system chooses to lay down the data for both the table and the Index in the quickest way possible. Over time, the pages of Index data (remember that the Page is the storage level of the Index data) end up being separated far away from each other, which means the disk subsystem has to go find it all, taking time.

You have two means of dealing with this, called an Index reorganization or an Index rebuild operation. A reorganization takes less time but is less thorough, and an rebuild takes more time (and possibly means the database needs to be locked for a bit) but is very thorough. Again, these are gross generalizations, so see my maintenance articles for more, such as how to test for the level of fragmentation and when to perform either a rebuild or a re-org.

The final concept I'll cover in this brief overview is the idea of the fill-factor for an Index. This takes a bit of explaining, but the general concept is that the fillfactor for an Index (set when you create it or when you rebuild it) sets how much space the Index should use before it makes another page structure for the Index.

You can think about this in terms of a car. Assume you're going to take a bunch of folks to the movies. You count out the number of people who can drive and have a car. If everyone you need to take is in the house with you, you simply fill each car up as much as you can, taking every seat. This assumes nothing will change on the way to the movie. That would be a "Fillfactor" of 100%.

Now assume that you need to drop a couple of folks off on the way to the movie, and pick even more people up along the way. In other words, the number of people in the car is changing as you go.  In that case, you might leave some room in the cars, taking more of them. If you left half the car seats empty to pick up more folks, that would be a "Fillfactor" of 50%, and so on.

So in this analogy, you leave room as you see fit, but it's more expensive in petrol and so on to do so. In a database, you can leave room on those data pages in the same way, but when you do it causes the database to "split" the data across multiple pages for the Index. However, if the Index changes often and grows, you'll see a gain in performance as it has to do that less often as the data is changed. So in essence you want to set a lower number for the fillfactor on the Index if you think the Index data will change and grow often, and a higher number where you think it won't. Once again those Dynamic Management Views I referenced earlier will help you monitor and manage that information.

This just introduces Indexes - there is a lot to learn here. I've used a few analogies that aren't completely technically accurate, which of course is the danger with analogies. But they should serve to help you understand the general concepts — which I trust you'll refine as you read this and other resources.