Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

Partitioning

Last updated Mar 28, 2003.

Partitioning is the act of dividing up data onto different hardware or logical locations. Most database systems, even older manual file/table based versions, partition data. And it's the heart of a relational database system to split data into multiple tables.

But given that a relational system is already partitioned, there are some steps you can take to break out your data even further. Before we talk about that, we need to talk about why you would need (or want) to.

The first reason that you might want to place data on different hardware is for performance. While SQL Server doesn't have a problem with large tables, when you get a lot of data on a single physical device you can experience a slowdown on the I/O system.

Another reason that you would partition data is for maintenance. When you need to rebuild an index, especially one that has a lot of rows on the same physical device as the table your users are working in, you can create a bottleneck again on the I/O subsystem.

One of the most popular reasons to partition data is to distribute it for availability. This means that you can create a kind of "archive" of the data onto another device. I've done this for reporting systems and other kinds of archival.

There are several manual methods you can use to partition data. The simplest to implement is a combination of processes and code. Let's look at a concrete example of this approach. Assume you have an order processing system which takes several thousand orders a day. As time goes on, the orders table can grow quite large. But you can't just remove the data, because from time to time you need to report on it.

Using a bit of code, you can select the records you don't need to process the next day's orders from the main table and copy them to another table. You can then delete those rows from the main table. Of course, eventually the second table gets overly large, so you'll have to write more code to determine the size and create a third table and so on.

You can automate this first part with a job and a stored procedure, but you'll need to code the application now to find the data as it moves more and more data from table to table. And there are other coding and manual process strategies you can use like this one to move your data from a large table to smaller ones, recombining the data when you need it with carefully crafted UNION statements. Here's an example of that:

/* Create a database */
CREATE DATABASE Test;
GO
USE Test;
GO
/* Create a table for current orders, fill it with some data */
CREATE TABLE CurrentOrders 
( OrderNumber int
, OrderDetail varchar (255));
GO
INSERT INTO CurrentOrders
VALUES (3, ’Third Item’)
GO
/* Create a table for archived orders, fill it with some data */
CREATE TABLE ArchivedOrders 
( OrderNumber int
, OrderDetail varchar (255));
GO
INSERT INTO ArchivedOrders
VALUES (1, ’First Item’)
GO
INSERT INTO ArchivedOrders
VALUES (2, ’Second Item’)
GO
/* Now select the data from both tables as if it were one */
SELECT OrderNumber, OrderDetail 
FROM CurrentOrders
UNION ALL
SELECT OrderNumber, OrderDetail 
FROM ArchivedOrders
ORDER BY OrderNumber

/* Clean up */
USE master;
GO
DROP DATABASE Test;
GO

This works just fine, but it can be a nightmare to maintain. What happens when you change a column or its format? What about when you want to change the "window" on the data you're moving from size-based to perhaps date or region?

Not only that, using this method you really haven't solved the initial problem — the data still resides on the same physical I/O subsystem, so you can still have a lag in performance. Not only that, for the data that you delete from the original table, you use a DELETE statement. For systems that are in production use, the databases are normally set to use a logged model, which means that all the delete operations will go through the transaction log — and that can take a lot of time for a lot of deletes.

To solve the first problem, you can use SQL Server's Files and Filegroups. The idea is that you create some Filegroups (that contain files) on different drives (not just drive letters, mind you, but physical drives), and place different tables on those Filegroups. Now when you're work with the tables, you're using separate I/O devices. Once you set up this strategy, you can follow your code and manual process approach.

Note that what I've just described, placing different rows in different tables, is an example of horizontal portioning. When you need to divide the data into different columns, that's called vertical partitioning.

Partitioning in SQL Server 2000

In SQL Server 2000, you can use two primary strategies to extend the programmatic approach I've described: Federating and Distributed Partitioned Views.

Federated Servers are groups of servers used for a single application of data storage. In this scenario you separate not only the I/O but all parts of the hardware. This process requires a pretty disciplined approach for the data planning, a middle-tier for the application, and good coordination between your developers and DBA's. You can learn more about what you need to do to implement this architecture at the end of this tutorial.

The second tool you have in SQL Server 2000 to implement partitioning is using a Distributed Partition View. Of course, you can use a regular view with the UNION statement as I mentioned earlier. Here's that example changed to use a view instead of a simple select statement:

/* Create a database */
CREATE DATABASE Test;
GO
USE Test;
GO
/* Create a table for current orders, fill it with some data */
CREATE TABLE CurrentOrders 
( OrderNumber int
, OrderDetail varchar (255));
GO
INSERT INTO CurrentOrders
VALUES (3, ’Third Item’)
GO
/* Create a table for archived orders, fill it with some data */
CREATE TABLE ArchivedOrders 
( OrderNumber int
, OrderDetail varchar (255));
GO
INSERT INTO ArchivedOrders
VALUES (1, ’First Item’)
GO
INSERT INTO ArchivedOrders
VALUES (2, ’Second Item’)
GO
/* Now create a view that selects the data from both tables as if it were one */
CREATE VIEW TestView AS
SELECT OrderNumber, OrderDetail 
FROM CurrentOrders
UNION ALL
SELECT OrderNumber, OrderDetail 
FROM ArchivedOrders;
GO

SELECT * FROM TestView
ORDER BY OrderNumber

/* Clean up */
USE master;
GO
DROP DATABASE Test;
GO

But to take this to the next level, you can use this new feature in SQL Server 2000. The process is fairly simple:

  1. [lb] Create a linked server.
  2. [lb] Create the view with a four-part name (Server.Database.Owner.Object) for the columns in the view on each of the servers involved.
  3. [lb] Run a query against the view.

I can't show an example of this, since it would involve knowing the name of your servers. But the process is very straightforward. You can even create these kinds of views that allow you to update data, in limited circumstances.

There are, however, some fairly hefty restrictions on these kinds of views, and using distributed data can be slow. This is not only due to the network issues, but also because it is difficult for the Query Processor to figure out what to do. You can read much more about the restrictions and the process for creating the views here.

Partitioning in SQL Server 2005

In SQL Server 2005, the ability to partition data is greatly enhanced. You have a lot more control over when and how the data is separated. In the previous versions, you had to manually "switch" the data over, and you had to make lots of decisions about when you move that data. In SQL Server, you get two constructs that will do a lot of the work for you: A Partition Function and a Partition Scheme. Microsoft documentation presents them in this order, and that's how you need to set them up.

The Partition Function tells the data how to break apart. You can choose a date range, a list of included values, and more. The Partition Scheme tells the data where to go. The way you use them is to create the function and tie it to a scheme. Then the function will intercept the data and use the scheme to store the data properly, so you have to do less work.

But I think it makes more sense to create the scheme first, and then create the function to use it. This is probably because I came from the hardware side of the house lo these many years ago, and Microsoft is full of developers. I guess to a developer, functions come first!

Let's take a look at the basics of the process. Once again, it's hard to demonstrate since your drives will be different than the ones I have. There are some other resources I have at the bottom of this tutorial that will go through a more detailed view using the AdventureWorks sample database.

Create the Partition Function

I normally start with the decision that I need to partition, as I described earlier. I try to do this during the design phase of the system, assuming I know it will either be very large or grow large. Sometimes, however, you can't do that. Things change, and your system needs to change with them.

After you make that decision on the data side, you need to set up your Filegroups. I cover more information on that here.

Next, you create the Partition Function. Unfortunately, this is not a graphical procedure, so you'll need to run a query to do it. You use the CREATE PARTITION FUNCTION statement, followed by how you want to break out the data. Here's the full statement syntax:

CREATE PARTITION FUNCTION partition_function_name ( input parameter type )

AS RANGE LEFT

or

AS RANGE RIGHT

FOR VALUES ( values)

Let's break that down a little. Following the CREATE PARTITION FUNCTION statement is the name of the function, which you'll need in a moment to tie to the scheme. Following that are parenthesis which contains the data type you're going to separate the data on. Let's assume that you want to break down data by a field which happens to be an integer. In that case, it would look something like this:

CREATE PARTITION FUNCTION MyNewPartitionFunction (INT)

Now you have two choices: RANGE LEFT or RANGE RIGHT. If you think of a number line, then the break will happen to the left or right of the numbers (or dates or characters) that you pick in the next line. I'll show you an example in a minute, but let's keep building the statement so far, using the LEFT choice:

CREATE PARTITION FUNCTION MyNewPartitionFunction (INT)

AS RANGE LEFT

Now the final part: the values you want to break on. This is where you really take your time in the design. You're going to pick a number of variables, in the type you defined, to make the break (left or right) on. So for instance if you are going to have the numbers 1-1000 for part numbers made in Canada, and numbers 1001-2000 for parts made in the United States.

The number(s) you pick here divide the data — so if you pick one number (as I will in this example) then you will have two partitions — one on one side of the number and the other on the other side. So here's the completed function:

CREATE PARTITION FUNCTION MyNewPartitionFunction (INT)

AS RANGE LEFT

FOR VALUES (1000)

Now all numbers less than (or equal to) 1000 go on the first partition (which we'll assign in the next step) and the numbers higher than 1000 will go on the second one. If we had a three way split, say up to 1000, up to 2000, and above that, we would change the VALUES statement this way:

FOR VALUES (1000, 2000)

You can see that the numbers are one less than the partitions we need. Now let's turn to the Partition Scheme.

Create the Partition Scheme

To create the Partition Scheme, you just use the CREATE PARTITION SCHEME command. Here's the format:

CREATE PARTITION SCHEME partition scheme name

AS PARTITION partition function name

ALL TO ( single file group name )

or

ALL TO ( PRIMARY)

or

TO (file group names)

Let's break that down again. Following the CREATE PARTITION SCHEME command, you need to give it a name. Here's our example:

CREATE PARTITION SCHEME MyNewPartitionScheme

Next we tie the partition scheme to the partition function with the AS PARTITON command:

AS PARTITION MyNewPartitionFunction

Now you tie out the scheme and the function to an actual Filegroup. You have three choices. If you tell the function to use the ALL optional qualifier, you can either use the DEFAULT Filegroup or another one, but you can only use one. I normally never do this, since it kind of defeats the purpose in most cases.

Instead, I use the third option. All you have to do is make sure that you lay out the Scheme the same way as what you want broken out. Of course this time the number of Filegroups matches the actual number of partitions. So with two Filegroups on our test system called FileGroup1 and FileGroup2, we'll put the 1-1000 values on FileGroup1 and the 1001-2000 values on FileGroup2:

CREATE PARTITION SCHEME partition scheme name

AS PARTITION partition function name

TO (FileGroup1, FileGroup2)

Create Tables on the Partition Scheme

We're to the final step – creating the tables to actually store the data. you'll create it as normal, but with a special twist: using the ON qualifier you specify the table to go to the Partition Scheme you made. Here's how that would look for our example:

CREATE TABLE 
MyNewTable 
( PartNumber INT
, PartDescription varchar(255));
ON MyNewPartitionScheme (PartNumber);
GO

Notice at the end of the ON qualifier you need to specify the column that ties to the function that ties to the scheme.

And you're done! Of course there is a lot more to this, but this quick tutorial should get you started testing. There's more in the references section that follows.

InformIT Articles and Sample Chapters

Eric Brown has more information on working with Partitions and other large database strategies here.

Online Resources

To learn more about implementing a SQL Server 2000 Federated Server, click here.

There's more on the SQL Server 2005 Partitioning strategies here.