Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Natural Language Processing (NLP)

Last updated Mar 28, 2003.

If you're like me, then you normally like to dive right in to a topic and start doing things. I learn by example. While that approach works great for Word and Excel, it's not a great idea to learn, oh, say, flying! Just as I did in the tutorials on Microsoft's Analysis Services and Replication, I think that the information you need on English Query deserves a little theory first. Also like those articles, I'll break down the process for you to follow to create your application. Don't worry, after you read this tutorial on English Query I'll write another article that will take you step-by-step and let you get that whole "learning by example" thing out of your system.

First of all, what is English Query? This service belongs to a branch of information processing called Natural Language Processing. It's the idea of being able to input instructions to a computer program in standard human language, rather than the structured commands a particular program wants. In other words: "What's the budget for my project?" rather than "SELECT budget_amount FROM tbl_Budget_Main a INNER JOIN tbl_managers b ON a.mgr_no = b.mgr_no WHERE b.mgr_name = 'Buck Woody'". See the advantage?

Microsoft has implemented Natural Language Processing in SQL Server. Note that you can use Oracle as a back-end as well – but that's another tutorial! English Query is the engine that forms this translation layer between humans speaking English and the computer language of Structured Query Language (SQL).

You'll need three forms of knowledge going into this process – You'll need to know your database structure, you'll need to know what your users' needs are, and you'll need to know the best way to deliver the data. Let's dissect those a little further.

First you'll need to know your data. You're going to tie out your data to a semantic model, which is how English Query learns the relationships it needs to develop queries. The important thing to know about as far as the structure goes is that the data is normalized. There are several formal rules for normalization (which I cover in another tutorial), and your data needs to be in at least 3rd normal form. If it isn't, you may have to make other objects such as views or tables that contain this normalization.

Next you'll need to know your users' needs. You may think you know what they want, but there's simply no substitute here for a questionnaire. Ask the users really open questions like "What kind of data do you need from the database?" They'll respond with something like "I need to know how my budget is for my project for a given date." This questionnaire forms the questions and consequently your semantic model for your English Query application. As a matter of fact, psychologists will tell you it's a good idea to ask the same question of different users in different ways, so that you get the broadest spectrum of responses. I believe that this step is the single most important part of developing an English Query application, since the model thrives on the richness of questions you tie together at the start of the application development.

Finally you need to know how to deploy the application. You have only one major choice, in that you'll need a program of some sort to get at the English Query Engine. You have several choices among languages, though. Microsoft provides an object model which you can use with programs like Visual Basic or c#.NET or other compiled Microsoft languages. The other route you can go is to develop the application using Active Sever Pages (ASP) with Internet Information Server (IIS). As a matter of fact, if you have Visual Studio installed (highly recommended for this process) then there's a wizard that will create the ASP pages for you!

Now that you have those pieces of knowledge in hand, I'll tell you the process to follow from start to get your English Query application out the door.

The first thing you need to do is plan. You'll need the information I described earlier, and then you put that together with the objective of the project. Write out the objective first.

List as many questions as you can using the questionnaires you retrieved from the user community. A good rule of thumb is to think of two question formats for each request the user wrote down. It may seem like a lot, but trust me; if you don't have a big enough question pool the application will respond "I don't have enough information to answer that question" way too often. User's get bored with that pretty quickly and your app will be a flop.

The last part of your plan is an Entity Relationship document for your database, or at least the parts that answer the questions. Focus here on the relationships and the normalization levels. This information will define what Microsoft calls an English Query project, which is the grouping of the objects that follow into one package.

The next step is to define the nouns in your database. Yes, that's right; you'll need to break out the English textbooks! No, it's not that bad. Microsoft calls these nouns entities. The easiest way to define an entity (or a noun) in your own mind is to think of things or people that act or are acted on. Let's take an example. If your database tracks shoe sales then a salesperson table will probably have a few nouns you care about, like salesperson_last_name. Another table might contain the inventory of shoes with a noun or two like shoe_model or shoe_style. You've got the idea, salesperson_manager or customer_last_name are good candidates. During this step, you might also want to make a column in your planning spreadsheet (if that's how you're doing your planning) called synonyms. You'll probably notice from your questionnaires that while one user calls a customer a "guest", another calls them a "client". If you cover your bases well, your users won't get the dreaded "I don't have enough information to answer that question" response.

Now you've got the entities, you need to tie them together. These relationships are called phrasings, and there are a few types. The first type of phrasing is called Name/ID. This one's pretty easy; it just says that "Client Names are the Names of Clients." OK, seems obvious to you, but you're smarter than a computer. This phrasing allows the computer to answer questions like "What are the names of the clients?"

The next phrasing type is the Trait. This type of relationship says that "Clients have Addresses" or some other property an entity has. This lets the computer answer questions like "What is the Client's Address?"

The Preposition phrasing type links entities with other parts of a sentence. Check that handy English book to see some prepositions, but a few good ones are at, after, before, into, out, up, down, and so forth. These questions are the toughest to write, an example might be "Which clients bought red shoes after they bought blue ones?" You may need to experiment a bit to get these right. While they are the most difficult, they are also often the most amazing to your users.

The Subset type of phrasing allows the users to ask questions like "Which of the clients are female?" This question can be answered if your tables track this type of data, of course.

The Adjectives type of phrasing is the one you'll use the most. Adjectives describe or modify a noun. Things like "smart" worker or "big" bag are adjectives. Of course, "expensive" shoes or "good" clients are subjective terms, so you'll want to make sure your users define what these mean. "Good" clients might mean clients that spend 30.00 a month on shoes (30 a month on shoes? What's up with that?).

Verbs are the workhorses of phrasings. You'll spend a lot of time here writing phrases like "a salesperson sells shoes". A verb shows action.

One final type of phrasing is the command. This type of phrasing lets your users update or add data – caution is advised here!

Now that you have your entities, synonyms and relationships defined, you create the project. As I mentioned earlier, I'll go through a step-by-step on this process in another article.

Next develop your questions, which will be used to test your model. Once that's done, categorize them into measurement (how many, how much), general information (who, does, where), and verbs (well, verbs!).

Now you'll implement the physical design of your database to the model of your project. You normally do this in Visual Studio.

Next you implement the client program, using one of the methods I described earlier.

Wait! You're not done – now you refine your model. You can trace the results of the queries to make sure that the users are getting the data they asked for, just the data they asked for and the right data that they asked for. This is called regression testing and is also used after you make changes to your program.

OK – scared yet? Don't be – it may take a little work, but English Query is really a cool tool. Now you can dive in!