Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

Files and Filegroups

Last updated Mar 28, 2003.

If you’re looking for more up-to-date information on this topic, please visit our SQL Server article, podcast, and store pages.

It’s ironic that the Relational Database Management System (RDBMS) was invented to implement data structures in something other than files. The irony is that even though the RDBMS doesn't work with independent text files, files are still at the heart of the engine. Of course, the distinction is that SQL Server uses binary files, not simply textual ones, and it’s the internal representation of data that makes an RDBMS special. But it is files all the way down.

A SQL Server database makes use of two types of files. Data is first stored in a transaction log, which is one type of file, and then subsequently written to the second type, the database file. When a user enters a transaction, the data she inserts, updates, or deletes is sent along to the RDBMS engine, which records that line of data in the transaction log. The log is stored in a file with the default extension of LDF (Log Database Format or Log Database File). Once the server has a spare moment, a process spins up to write that line onto the database, which stores the data in a file with the default extension MDF (Microsoft Database Format or Microsoft Database File). But in fact, SQL Server doesn’t even work directly with these filenames for the purpose of accessing data like this. I’ll explain more about that in a moment.

You can assign more than one file to a database, log, or both. When you create a database, the system automatically creates the first two files, one for data and another for logs. But you can (and should) create other files for the database to use.

SQL Server works with files by assigning them to a Filegroup. Even if you have only one file in the database, it will still live in a Filegroup. Let’s take a look at how this works. I’ll create a simple database on a test system, not specifying anything except the name:

CREATE DATABASE FileTest;
GO
Now, I’ll show the Filegroup that automatically showed up for the database:
SELECT * 
FROM sysfilegroups;
GO
----------------------
groupid allocpolicy status groupname
1 0 16 PRIMARY

Sure enough, I have a Filegroup, called “PRIMARY”. SQL Server created that for me. By default, any tables or other objects I have are stored here — which I can change to another Filegroup later. I’ll show you that in a moment.

Within that Filegroup, I have at least one. I’ll use the sysfiles system table to show the actual physical files that are stored in the PRIMARY Filegroup:

SELECT * 
FROM sysfiles a
INNER JOIN sysfilegroups b
ON a.groupid = b.groupid;
GO
-----------------------
fileid groupid size maxsize growth status perf name filename groupid allocpolicy status groupname
1 1 160 -1 128 2 0 FileTest C:\Program Files\Microsoft SQL Server\MSSQL10.SQL2K8\MSSQL\DATA\FileTest.mdf 1 0 16 PRIMARY

Hmmm. I seem to be missing the log file. Let me take a look at the files in the database, without joining them to the Filegroup:

SELECT * 
FROM sysfiles;
GO
--------------
fileid groupid size maxsize growth status perf name filename
1 1 160 -1 128 2 0 FileTest C:\Program Files\Microsoft SQL Server\MSSQL10.SQL2K8\MSSQL\DATA\FileTest.mdf
2 0 63 268435456 10 1048642 0 FileTest_log C:\Program Files\Microsoft SQL Server\MSSQL10.SQL2K8\MSSQL\DATA\FileTest_log.LDF

There it is. You can see that the groupid field for the log file doesn’t have a Filegroup associated with it — which means that while database files are contained within a Filegroup, log files are not. Hold on to that thought a moment.

By the way, I’m using the old style of querying tables for SQL Server 2000. There are new system files and catalogs that you can use in SQL Server 2005 and higher that shows far more information — I’ll switch to those in a moment. For my purposes here, this will work fine.

You can create another file for the database, and place it within any Filegroups you have. Since I have only one Filegroup so far, here’s the way I’ll add another file to it:

ALTER DATABASE FileTest
ADD FILE 
 (NAME = FileTest2,
  FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL10.SQL2K8\MSSQL\DATA\FileTest2.mdf'
 )
TO FILEGROUP PRIMARY;
GO

Using the earlier queries, I can show the new file that I just added.

As I begin to create objects in this new database, SQL Server handles the files in an interesting way. For any kind of data-bearing object (mostly tables and indexes), SQL Server fills up one transaction log before it uses the next one, if there are indeed more than one. The log process then writes to ALL of the data files in the Filegroup, sort of like filling up multiple glasses of water from the bottom up all at once. That keeps everything very balanced.

So why is this all important?

Ah — therein lies the beauty of this approach. Let’s assume you have a few hard drives in your system, and you want to balance the reads and writes from the database across all of them. By adding a file from another drive to the Filegroup, SQL Server will automatically allow the drives to spin their fastest, making your system faster. Also, if you have less space than you need for an entire database on one drive, you can use multiple drives.

So large tables and indexes can now take advantage of multiple drives. But wait, I mentioned that SQL Server automatically balances out the reads and writes across all of the files in a Filegroup. What if you want more control? What if you would like to put the tables on one file (which is on a certain drive) and the indexes on another file (which is on another drive) so that the updates to the table and the index can happen at the same time?

You can create another Filegroup. Like this:

ALTER DATABASE FileTest
ADD FILEGROUP Secondary;
GO

Now you can add files to that Filegroup, just like I did earlier, by changing the ON FILEGROUP part of the ALTER DATABASE statement.

Using Filegroups

Simply adding a Filegroup to a database, and then adding files to that, doesn’t place any data there. Just having the Filegroup doesn't allow the data to "spill over" into that space automatically. You have to specify where you’d like tables or indexes to go. You can do that when you create a table or index graphically (under the Properties menu item in SQL Server Management Studio or Enterprise Manager), or by using SQL Data Manipulation Commands (DML) like this:

CREATE TABLE [dbo].[tbl_Test] (
 [Test_ID] [int] IDENTITY (1, 1) NOT NULL ,
 [Test_Name] [char] (10) 
) ON [SECONDARY]
GO

The ON [SECONDARY] part of this statement is where you set a table to use a particular Filegroup.

Using Filegroups (and by extension, files), you can selectively place data on your storage subsystem. Performance is enhanced when you place a heavily-read table on a different physical device than another table which is written to at the same time. Using this logic, placing indexes and the tables they support on separate files is often a great way to increase performance.

Of course, if you’ve made two drive letters on the same physical device, say in a SAN environment or on a RAID system, creating separate files and filegroups doesn’t really help performance at all. When I mention drive letters, I'm not just talking about a single physical drive with multiple letters. Those are still on the same set of spindles on the drive. I'm talking about a separate physical drive, channel and I/O processor. Within a SAN, this becomes more problematic to figure out, since the Logical Unit Number (LUN) presented to you may be carved out of several physical drives anyway. In these cases, work with your storage engineer to come up with a viable strategy for drive layouts.

A good strategy to determine your Filegroup layouts is to check whether a table has heavy read or write operations. Then look for other heavy read or write tables and try to separate those from each other.

As I mentioned earlier, it’s also helpful to place indexes on devices opposite the tables they support, so that indexing activities don’t interfere with heavy inserts, such as during Extract, Transform, and Load (ETL) operations.

It’s also best to separate the transaction log files from the database files, so that data being written to the log doesn’t slow down write or read operations from the database. Again, this only helps if the devices on which you place your Filegroups are physically separated from each other. Also, for the best performance gain, it’s best to make sure the drives have their own controllers or channels. On IDE controllers (if you’re forced to use them in a database environment), the second channel is often slower than the primary one.

The tempdb database, which SQL Server uses for sorts and other operations, should also have its own drive or channel. Since some queries will sort the data at the same time that other queries are reading or writing to the database, it’s best to keep them apart.

In fact, any activity which would be useful to have operating at the same time as another activity, such as reading from a table and writing to another, or writing to a table and updating its index at the same time, or writing data to the log file and then having it write out to the database, are candidates for keeping the physical devices where they live separate.

Of course, that’s not really possible. Having a different physical device for every activity would cause you to have dozens, or hundreds of drives. That can be a bit much. But there are some things you should keep separate from each other where possible:

  • Tables
  • Indexes
  • TempDB
  • Logs
  • Backup files
  • Operating system files
  • Operating system cache
  • Full Text catalogs and indexes
  • Replication shares

I know, that looks like a lot of physical drives. And it’s not going to be the end of the world to put everything on a well-performing SAN system. But if you can separate these functions onto different physical devices, you should.

Another use of files and Filegroups has to do with disaster recovery. If you keep a running backup of your database and separate the transaction log to another device, then — should the database drive fail — you have the means to restore the database from tape, and apply the logs from the separate device. Since the log contains the transactions since the last backup (assuming you’re running in the Full recovery model), all the data is safe.

You can also backup and restore both files and Filegroups. In fact, the only way to back up a single table in SQL Server is to place it on its own file. You might do this because there isn’t enough time to back up the whole database, or perhaps you’ve got a tape capacity issue.

To perform a file or Filegroup backups, your database has to be in either the Full or Bulk-Logged recovery model. The reason for this is that you have to apply the transaction log as part of the restore so that the database "knows" where it is. Here’s the syntax for a file backup:

USE master
GO
BACKUP DATABASE FilesArticle
 FILE = ’FilesArticle_Two_Data’,
 FILEGROUP = ’SECONDARY’,
 TO FilesArticle_Backup_DeviceName
GO 

This syntax makes use of a Backup Device called FilesArticle_Backup_DeviceName. You could also use a file name or tape name here.

To restore this backup, as I mentioned, you’ll have to apply the transaction log. Here’s that syntax:

USE master
GO
-- Restore the files and Filesgroups for FilesArticle
RESTORE DATABASE FilesArticle
 FILE = ’FilesArticle_Two_Data’,
 FILEGROUP = ’SECONDARY’
 FROM FilesArticle_Backup_DeviceName
 WITH NORECOVERY
GO
-- Apply the first transaction log backup
RESTORE LOG FilesArticle
 FROM FilesArticle_log1
 WITH NORECOVERY
GO
-- Run through the same process for all logs, then
-- Apply the last transaction log backup
RESTORE LOG FilesArticle
 FROM FilesArticle_log2
 WITH RECOVERY
GO

Notice that I restore the database and then apply all logs forward, using the WITH RECOVERY only at the end.

You can also perform file and Filegroup backups using Enterprise Manager or SQL Server Management Studio if you’re more graphically inclined.

This has been a short introduction to implementing files and Filegroups in your databases. Make sure you check out the references at the end of the article to learn more about optimizing your system this way.

InformIT Articles and Sample Chapters

I have another article on SQL Server I/O, appropriately titled Storage — SQL Server I/O.

Books and eBooks

Ken Henderson has an excellent book on troubleshooting that also covers files, SQL Server 2005 Practical Troubleshooting: The Database Engine (also available as a downloadable eBook and in Safari Books Online).

Online Resources

Paul Randal is quite possibly “the” reference for SQL Server files. He worked on the Microsoft team that created that part of the product. Check out his blog here.