Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

English Query - Part 1

Last updated Mar 28, 2003.

In a previous tutorial I began a thread on using Natural Language Processing (NLP) technology in databases. Microsoft SQL Server 2000 includes a product called English Query that allows your users to type in questions as they would normally frame them in English, and receive query results from a database. In this tutorial I'll start a demonstration of that process that you can follow along with on your test system.

Before you can start, you'll need to install the product. English Query is similar to Analysis Services (in SQL Server 2000, at least) in that it is installed in a separate step from the database engine. Which brings up an important question: where should you put the installation?

Just as in the case of Analysis Services, you can place English Query on the same system that houses your production database. Just because you can, however, doesn't mean that you should. In any system of appreciable size or activity, say more than 20 users or several hundred transactions a minute, you should always separate your data store from processing activities. That means you should have separate servers for Analysis Services, Reporting Services, and English Query from the database server.

To install English Query, insert the SQL Server 2000 CD in the drive, and select SQL Server Components from the menu that appears. From there, select Install English Query. Select the defaults to get the engine, documentation and samples.

According to the previous tutorial on Natural Language Processing, we need to start by developing our questions. For this example we'll use the pubs database since it is small and easily understood. The pubs database is included with SQL Server 2000 and contains data regarding books, authors, sales and stores, and publishers. I'll focus this tutorial on some simple questions from that database regarding author information. What we're after here is an understanding of English Query, not the database itself.

Following the process I've previously outlined, we've asked our users what kind of information they would like to know from the system. They filled out the questionnaires we sent out and from that information we've selected these questions to develop:

  • Who are the authors?
  • How many books has author (Name) written?
  • What books have they written?
  • Who sold the most books?

In production you'll have far more questions, of course. As you'll see by the time we finish this series you'll often be able to answer multiple questions with a single construct.

The next step is to place the questions into categories. The first category (Who are the authors, what books have they written) has to do with general information. The next category (How many books has author (Name) written) deals with numerical operations. The final category deals with action (who sold the most books). These questions form the basic categories that you'll work with even in the most demanding application.

Once again following the outline from the previous tutorial, we now need to ensure we have the information in the database that answers the questions. In the case of this example, I developed the questions from the database, so of course it meets this requirement. In real-world environments this isn't always the case. You may find that the information you are after isn't currently captured in your source database or Analysis Services cubes. In that case you either have to gather or create the information or report back to the users that they can't get that from the system as it currently stands.

We need to understand not only the questions the users might ask of the system, but also the database model within the system. This information is called the logical model of the database, or simply the metadata. You can get this information by examining the tables and relationships manually or by looking at an Entity Relationship Diagram of the database. I have an entire series of tutorials on that here. In any case you'll need to understand what physical constructs support the questions the users want answered. Once you understand the physical tables and relationships, you can query the database to find the answers to the questions.

In our example we examine the ERD of the pubs database and run the following queries to get the answers to each question:

Who are the authors?

USE pubs
SELECT au_fname + ’ ’ + au_lname AS ’Author’
FROM authors
ORDER BY au_lname
How many books has author (Name) written?
SELECT au_fname + ’ ’ + au_lname AS ’Author’
, count(title_id) AS ’Books’
FROM authors
LEFT OUTER JOIN titleauthor 
ON titleauthor.au_id = authors.au_id
GROUP BY au_fname + ’ ’ + au_lname 
What books have they written?
SELECT au_fname + ’ ’ + au_lname AS ’Author’
, title
FROM authors
INNER JOIN titleauthor
ON titleauthor.au_id = authors.au_id
INNER JOIN titles 
ON titleauthor.title_id = titles.title_id
ORDER BY au_lname

Who sold the most books? This question is answered by the previous queries.

So now we know that the data exists to answer the questions. With the questions defined and categorized, we have enough information to create the first part of the structure we need for the application. This first structure is called the semantic model. We'll flesh that out further as we move along. The second set of information consisting of the queries we created a moment ago satisfies the requirements for the physical model. With these two models complete, we can begin to create a matrix of the logical to the physical arrangement. From there we'll tie out the various objects to each other, and then it's a simple matter of laying out the structure in English Query. We'll do that in the next few tutorials.

Informit Articles and Sample Chapters

If you happen to still use SQL Server 7, you can check out this free chapter on English Query for that version.

Online Resources

Here's a great outline of an MIT course on Natural Language Processing. Also check out Microsoft's site on the subject here.