SQL Server Query Design
- Query Basics
- Query Optimization
The basics of programming against databases requires a firm understanding of the language used to create objects such as tables and views, read data, update data and remove data and objects. You might see these commands referred to as CRUD—for CREATE, READ, UPDATE, and DELETE.
Many new developers rush into this phase of programming as their first step. Instead of jumping right into syntax, however, it's important to understand all of the processes involved in the development cycle. You can find out more about these processes in the series of tutorials that begin with Database Design: Requirements, Entities, and Attributes.
Once you fully understand those concepts, you can begin to work on the heart of the programming task: forming queries. Before I get to the syntax, however, there are a few formalities to deal with.
There are dozens of Transact-SQL commands that fall into two broad categories, or "languages": Data Definition Language (DDL) and Data Manipulation Language (DML). DDL statements create, alter and drop (remove) database objects. Those statements are covered in other tutorials beginning at Database Objects: Databases. The statements I'll focus on in the next few tutorials involve DML.
In this series of tutorials I'll cover the INSERT, UPDATE, DELETE, and SELECT statements. Along the way, I'll map these to the "CRUD" matrix.
There are two methods to get run T-SQL statements against a SQL Server. The first method is to use dynamic SQL, which involves sending SQL statements directly to the database through a client program. If you're using Query Analyzer and typing one of these four commands, that's dynamic SQL. Using Visual Basic or C#, you create can a connection object, build a T-SQL String, and send it to the database server. This is often the most effective way to send the commands, especially if your higher-level programming code builds the query "on the fly." Since the database has no foreknowledge of what the demands are, you can't plan ahead very well to run them any other way.
The second method is to use server-side programming. This involves the use of stored procedures, user-defined functions, and even views in the statements sent to the server. The process in the higher level language remains the same, with the exception that the string you send is the name of the stored procedure with any variables it requires. The statements in the stored procedures or views run on the server, instead of coming from the client program.
The second method has many advantages. For instance, the first time a stored procedure runs, SQL Server places it in a special location in memory called the procedure cache. The next time the stored procedure is called SQL Server accesses the faster memory location rather than direct storage. Designing often-used queries into stored procedures provides a great performance boost.
Which method is best? There's an old saying—"When all you have is a hammer, everything looks like a nail." I've seen fairly heated arguments between developers on whether to use a server-side versus a dynamic-SQL approach in a particular situation. In practice you'll often see a mixture of both, especially in larger, more complex programs.
With that in mind, you can determine the best method use by taking a holistic view of the program, using your previous experience, and testing. That's also the right order to begin the process.
Once you've created the requirements and the outline for the program, stop. Step back, and view the program from end to end. Evaluate the program flow diagrams and see if you can determine any patterns. Follow this step before you write any code.
Patterns form the basis of reusable objects that you should create, which are prime candidates for stored procedures. If there is a common "engine" that evaluates a condition, determine the variables you'll need to pass to make the stored procedure as useful as possible in multiple situations.
Taking a holistic view also includes larger program elements, such as scope, error handling, security, optimization and even data archival.
The original requirements document should contain a good definition of the program's scope. One of the most troubling issues in the modern development world is a disconnect between what the user thinks the program will do and what the developer codes. It's here, more often than not, that the primary stress factors of the programming effort lie. Make sure you have a firm understanding of the program's bounds. If you're not sure, check.
Determine how you'll handle errors early in the process. Of course each component will have unit-level error handling where you deal with any program errors that arise, but you'll want to determine how to handle larger error elements. Questions to ask here are things such as "If the program has an error, would you rather process 'X' roll all the way back to the beginning, or are partial data inserts permissible?" Most often it's best to prevent the user from entering "bad" data (a pessimistic design), but if the data depends on unknown states of data, this might not be possible (an optimistic design). These decisions will determine the transactions you create and even the data-base's original design.
Security is of paramount concern, right from the beginning. Think of the design as a building, and detail all possible entries and exits. This section requires a thorough knowledge of SQL Server access. The task becomes more difficult if the system exports or imports data. If it does, make sure you extend your security boundaries to include the source and destination systems.
Most of the performance of a system lies not in the hardware or database size, but in the proper design and use of indexes and queries. The important thing to remember during the initial design is that it is exponentially more difficult to optimize the design later. I'll explain how to optimize the individual queries in Part 2 of this series, but program optimization encompasses proper design, index creation and use, and effective queries.
Data archival involves a strategy for dealing with the data as time passes. This means that you should determine the effective "life" for each business data element that your program will store. How long will the data be used? Is it governed by any legal or ethical requirements? How often should it be rolled up? Is this the "system of record," where the data is created?
Once you've completed these steps, you're ready to code.
I've covered the holistic view of creating Transact-SQL statements earlier, and I mentioned the "CRUD" matrix—which stands for Create, Read, Update and Delete operations. In this tutorial I'll continue that process with the syntax that forms the basis for all queries.
I'll cover the statements in the "CRUD" order. Most tutorials begin with the SELECT statement, but I think I need to put data into the database before I can select it, so I'll start with the statements that create data. I'm making the assumption that you've already designed your database, tables, and other objects. For this tutorial, I'll use the pubs database.
The "Create" part of the CRUD matrix corresponds to the T-SQL INSERT statement. The INSERT statement has the following syntax:
INSERT INTO table (columns) VALUES (value, or DEFAULT) <<table hints>>
The INTO is actually optional, but I'm old school, so I still use it. What follows is the table name, and then the columns you want to put data into. You don't have to have the columns listed, if you're inserting data into all of them or if you use another syntax, like this:
INSERT INTO table DEFAULT VALUES GO
The caveat for this kind of insert is that each column must have a default value assigned.
When you're inserting data into a column, the data type must match what the column calls for. You must single-quote character strings, and you don't quote numeric values.
Here's an example of putting a new author in the pubs database:
INSERT INTO pubs.dbo.authors ( au_id , au_lname , au_fname , contract) VALUES ( ’123-45-6789’ , ’Woody’ , ’Buck’ , 1) GO
Note that I'm following the syntax I mention throughout all the tutorials with the commas at the front of the field lists. SQL Server ignores the whitespace anyway; I do this just to make sure I don't miss anything.
One other note—it's usually a good idea to have the table name prefaced as I have here. You'll actually save a microsecond or two when the statement is compiled if you do that. You can also bracket the field names with , which is required if the field name has spaces in it. (You didn't do that when you created your database, did you?)
Notice also that I've only filled out four fields. The others either have defaults, or aren't required to have a value—they allow NULLs.
There are a couple of things to note about the INSERT statement. If you're inserting data into a table that has a column with IDENTITY set, then the table will automatically create a new value when you run an INSERT state-ment—assuming that you don't try to put one there explicitly. If you do want to explicitly use a value, you'll need to use the IDENTITY INSERT predicate first. You can find out more about that in Books Online.
You can also insert data into a table based on the result of a SELECT statement. Let's assume that you've got a duplicate of the authors table called authors2:
INSERT INTO authors2 SELECT * FROM authors GO
The reason this statement works is that the column names are in the same order, and the data types all match.
You can also use the results from a stored procedure this way. I'll show you that process a little later on.
Now that I've got the "C" in "CRUD", let's change the data around. You can use the UPDATE statement to change data once it's in the table. Here's the syntax:
UPDATE tablename SET columname=’Value’ WHERE columnname = ’ColumnValue’ GO
You can update as many columns as you want, and the WHERE is optional. If you don't specify the WHERE clause, you'll update all the columns in the column list to have the same value.
Let's use the UPDATE syntax to give Buck a home and a phone number:
UPDATE pubs.dbo.authors SET phone=’123 123-1234’ , address=’1313 Mockingbird Lane’ , city=’Tampa’ , state=’FL’ , zip=’12345’ WHERE au_fname = ’Buck’ GO
You have the full WHERE joining syntax available in this command and you can also use subselects as a condition for the update.
Next, I can get rid of data—the "D" in "CRUD". There are a couple of things you need to learn about this command even before you learn the syntax. For one thing, a delete is a delete. You can't get it back. Make sure you really want to do that when you call the statement.
Second, make sure you're looking for DELETE. If you want to get rid of an object completely such as a table or view, you're looking for the DROP command. DELETE removes rows, always.
Finally DELETE uses the transaction log. This means that if you have to delete an entire table, it's often more optimal to use the TRUNCATE command, like this:
TRUNCATE TABLE test1 GO
If you want to delete rows based on a condition, however, you do want the DELETE command. Here's the syntax:
DELETE FROM tablename WHERE condition
The FROM is optional. Again, the condition has the full power of any of the selection logic in the other statements.
Let's get rid of Buck's entry:
DELETE FROM authors WHERE au_fname = ’Buck’ AND au_lname = ’Woody’ GO
Now that I've created, updated and deleted data, I can use the SELECT command to see the data. This is the most commonly used command in T-SQL, and I've covered it before in the tutorial titled Getting Started with Transact-SQL. The basic syntax looks like this:
SELECT fields FROM tables WHERE conditions ORDER BY sort Only the SELECT is required--here’s an example: SELECT ’Buck’ GO
I won't cover the basics again here, but let's take a look at a couple of simple tricks you can do with the SELECT command. You've seen one already—you can select a constant with a single command. You can do the same thing with a variable.
To format the results of a SELECT command, you can use either a comma or a plus sign (+). A common request is to format results on one line at a time, with all the trailing spaces removed, like a standard mailing address looks. Here's a query that does just that:
SELECT au_fname + ’ ’ + au_lname + CHAR(13) + address + CHAR(13) + city + ’ ’ + state + ’ ’ + zip + CHAR(13)+ CHAR(13) FROM authors GO
Here's how that happens. The plus-signs concatenate the fields rather than use standard spacing on them that you see with a comma. Because of that, you have to add spaces, which you can do with the +’’+ as I've got here, or you could use + SPACE(1) + instead.
At the end of each "row", I include the CHAR(13) function for a carriage-return line-feed. You can see that I end each row with a plus-sign rather than begin the next one with a comma—that's for the formatting again. In essence this is one long string. At the end I throw in a couple of returns for more spacing.