Table of Contents
- Microsoft SQL Server Defined
Microsoft SQL Server Features
- SQL Server Books Online
- Clustering Services
- Data Transformation Services (DTS) Overview
- Replication Services
- Database Mirroring
- Natural Language Processing (NLP)
- Analysis Services
- Microsot SQL Server Reporting Services
- XML Overview
- Notification Services for the DBA
- Full-Text Search
- SQL Server 2005 - Service Broker
- Using SQL Server as a Web Service
- SQL Server Encryption Options Overview
- SQL Server 2008 Overview
- SQL Server 2008 R2 Overview
- SQL Azure
- The Utility Control Point and Data Application Component, Part 1
- The Utility Control Point and Data Application Component, Part 2
- Microsoft SQL Server Administration
- Microsoft SQL Server Programming
- Performance Tuning
- Practical Applications
- Professional Development
- Application Architecture Assessments
- Business Intelligence
- Tips and Troubleshooting
- Additional Resources
Last updated Mar 28, 2003.
It is estimated that most of the data stored in an enterprise isn't contained in structured databases like SQL Server or Oracle. It's true that most of the day to day data entry that is used by multiple members of the organization is entered into Customer Relationship Management (CRM) systems or Enterprise Resource Management (ERM) systems that use a database as the storage medium, but it is also true that much of an employee's time isn't spent in these systems. Just like IT professionals, a lot of the work done throughout the day is stored in Microsoft Word, Excel or even e-mails.
There are a lot of arguments for making all of this information structured, that is, stored in rows and columns in a database. Databases are designed for critical data, they have extended search and location mechanisms, they are secure, and maintenance on the data is scheduled and logged.
But there's also an argument for allowing binary documents to carry a lot of the organization's data. If you think of the data content as needing some kind of front end, then Word and Excel fit that need. Microsoft Office applications are in almost every company, and most every employee that performs data tasks knows how to use them. They are simple and easy to understand.
But they are unstructured. What that means is that the users don't have a standard way of recording the data, they don't use the same terms within the documents, and they don't even name them or store them the same way. When data is locked away in these files you can't easily find them, or search the data they hold. Another big issue is that this data isn't always secured, because the users might save the files on a network share that isn't backed up, or on their local drives.
Since the users will often use the tools they know best, and since the data they enter is important to the organization, you need to find a way to compromise. You need to allow unstructured data in your company, but be able to find it, search within it, and you need to be able to secure it.
SQL Server has a few mechanisms that allow you to store unstructured data in a structured database table. You need to plan for using them, and you need to understand the tools the database provides to store, search and index these objects to take full advantage of these features. But before I explain the Full-Text Search service in SQL Server, you need to understand that it isn't the only way to deal with this problem.
Also, there are some tradeoffs with using SQL Server to store binary documents. Not all document types are searchable. Also, storing large Word or Excel files can make your database quite large, which complicates maintenance and recovery.
But after you've studied the problem and decided that storing the binary files in the database is the right route for you, then you need to plan for a procedural change in your organization as well as a program to insert and retrieve the data from the database. Let's take a look at each of these planning points.
Although Full-Text Search in SQL Server allows you to store binary documents in the database, it doesn't automatically do it for you. The procedural change you'll need to plan out is which documents will be stored in the database and how you will do that.
You don't need to store every document a user creates in a database. A majority of user documents are temporary or single-use only. The criteria for storage is some document that will be used by multiple people, has information that will be needed later, has a need to be safely stored, or a combination of these reasons.
The other part of the planning decision is the programming required to store and retrieve the binary documents from the database. While SQL Server Full-Text Search provides simple statements to search the documents, you don't store them with typical INSERT or UPDATE statements. For binary documents, you'll need to "stream" the data in by opening a channel, sending the documents, and then closing the channel. I'll explain that process in a tutorial, but for now we'll concentrate on where to do that more than how.
The most logical place to create a program hook to send binary documents to SQL Server is inside the Enterprise Management System used by your organization. This program, which normally follows the course of your business, is the primary place that generates the content for a lot of the e-mails, Word documents, and Excel spreadsheets. You should code an "attachments" tab, button or other control to allow users to store the document up into the system. It isn't enough to allow this storage, however. You must also think about allowing others to "check out" or work on the document, whether that should be done sequentially or in parallel, how you handle changes, whether you should store versions of the documents and so forth.
Now that you've thought about storing the documents, you need to plan for using them in your system. You can use the Full-Text Search on char, varchar, nvarchar, varbinary(max) and image data types. The process involves storing the documents (or text data), setting up the Full-Text Service, enabling and indexing the columns in the database, and then using various statements in your searches.
I've already explained a little about storing the documents, so next we need to examine the Full-Text Service. This is a separate program that runs as a service on the SQL Server system that handles the process. It does not need to be started as an administrator, but it does need to be an account that can see the hard drive on the server. If this service is not running, the feature won't work properly.
The next part of the equation is the database. A database needs to be enabled to allow Full-Text indexes. This is the default setting, but you need to check it by running the following command:
USE DatabaseName; GO SELECT DATABASEPROPERTY(’DatabaseName’, ’IsFullTextEnabled’); GO
If the data is not enabled, it's easiest to use the graphical management tools to do so. It's in the Database Properties tab when you right-click the database name.
Once you've determined that the database is ready, you create the Full-Text index on the columns that hold the binary files. I'll give you an example of that in another tutorial. Once the index is created, you need to fill it, which is called a "population." This process crawls through the document, and creates a word-map for the index.
There are a couple of special things to note about the Full-Text Index. Unlike other indexes in SQL Server, a Full-Text Index is actually another structure, stored outside of the database file locations. That means that you have to back it up separately from the database. The other consideration is that the Full-Text Index is not updated with your other automatic maintenance that you have set up. Full-Text Index population is done either manually or with a separate schedule.
There's one more important consideration for Full-Text Indexes. You can only have one per table. That really isn't a big restriction, since you'll normally only store one column of data this way.
Now that you have the documents stored, the indexes created and populated and backed up, you need to work with the searches. In other tutorials we'll explore the following types of query statements:
- CONTAINS (In the WHERE clause of a SELECT statement)
- FREETEXT (In the WHERE clause of a SELECT statement)
- CONTAINSTABLE (In the FROM clause of a SELECT statement)
- FREETEXTTABLE (In the FROM clause of a SELECT statement)
For now, you need to know that you can use Full-Text Search to look not only for a specific word or phrase, but also similar words, inflections, and even thesaurus-type lookups.
Although there is a great deal of planning when using Full-Text Search, it is a feature that can greatly benefit your organization. It allows the best of two worlds: easy creation of familiar documents, and the security, accessibility and maintenance of critical data.
Informit Articles and Sample Chapters
If you're ready to try out indexing a Full-Text column, you can find the tutorial on that here.
I have a blog entry here which might be useful for the administrators among us.
SQL Server 2005 has even more enhancements to Full-Text Search. You can read about those here.