Creating Databases
It is very easy to create a database using the Server Explorer. Hover the mouse over the Server Explorer. Expand the node for the server on which you wish to create the database. Expand SQL Servers node. Right-click on the node. Select New Database from the context menu. A Create Database dialog opens (Figure 3.12). Enter the name of the database you wish to create. Keep Use Windows NT Integrated Security selected, unless the server you are connecting to requires SQL Server Security. If so, select it and enter a valid user name and password that allows database creation on this server. If you are using the MSDE, use Windows Integrated Security. Next click OK. This will create the new database.
Figure 3.12. The Create Database dialog.
NOTE
There is no way to drop (delete) the database throught the server Manager.
Enter TestDB in the database name and click OK. The new database will be created. Next we will add tables.
Creating and Managing Tables
Tables are the basic building blocks of a database and hold the data that the database stores. Tables are structures that consist of rows and columns, similar to a spreadsheet. Once you have created your database, the next step is to create tables.
To create a table, right-click on the Tables node under the database you just created. You may have to expand it first. Select New Table from the context menu. A table designer window will open in the main workspace. At the top is a grid for entering the column names and base data types. On the bottom are additional attributes for the column. These change depending on the data type of the column in question. If you ever used Microsoft Access to create an Access table, this designer should be very familiar to you. Let's fill in the designer as is indicated in Figure 3.13.
Figure 3.13. The table designer.
Adding Columns
When the designer opens, it is blank. You must now add columns to the table. The grid at the top allows you to add columns quickly, and the details appear at the bottom. I usually get the columns in first, then go back and change any defaults in the bottom portion of the screen. You can change the defaults it uses for data types in the Tools/Options dialog, under Database Designers. There are separate entries for SQL Server and Oracle (see Figure 3.14).
Figure 3.14. The Options dialog with Database Designer defaults open.
The column name can be anything you want. It can even include spaces. If you include spaces in a column name, you must surround it with square brackets when referencing it in a Transact-SQL statement.
TIP
Avoid using spaces in column names. Not all database systems support this and you must rember the square brackets. If you wish to create an SQL Statement that displays meaningful headings in grids and on reports, you can always alias the column name to one that contains spaces, again using the square bracket syntax. Keep column names as short as possible, while retaining the meaning of the data they contain. Avoid long names like AllowUserToLogInMultipleTimes. Use something like MultiLogin instead. If you don't, you will be kicking yourself later on when you have to refrence this column in code. Also don't use column names that omit vowels. Other programmers will never figure out what your pattern is and will not be able to understand your schema. If possible, establish a naming standard and stick with it.
SQL Server Data Types
The data type dropdown lists all the possible data types for SQL Server. I will list the basic ones here. The others are either variants of these or rarely used.
NVarChar/VarChar— A variable length string. VarChars can be up to 255 characters, nVarChars can be up to 64 kilobyte (KB) characters in length. We typically use nVarChar. VarChar is there for backward compatibility. The great thing about nVarChar is that it only uses the space needed to store the data. A null column does not take any space.
Char/nChar— These are fixed-length text fields. If the text length is less than the length of the field, it is padded with nulls (ASCII Zero).
Int/BigInt/SmallInt/TinyInt— Non-scaled integers. Can hold whole numbers only. The variations set different sizes. Int is 4 bytes, BigInt is 8 bytes, SmallInt is 2 bytes, and TinyInt is 1 byte.
Bit— Bit fields are true/false only fields. A bit can only have a value of 1 or 0. Bit columns cannot be used in indexes. If you need to index a true/ false value, use a Char(1) and define it as 1 or 0.
Decimal/Numeric— Scaled Integer. Both types mean the same thing. A scaled integer is a whole number plus a fixed number of decimal places. This differs from a floating-point value that is a real number and has no integral value. A scaled integer is an integer with the decimal point arbitrarily moved to the left a set number of places.
Money— A numeric with the decimal point set at position 4.
Float/Real— True fractions or real numbers. Used mainly for scientific applications. Do not use reals to store monetary amounts or quantities. Use int or numeric for this. It is possible for two reals with the same apparent value to fail equality tests because they are not really the same number.
DateTime— And its variations. Holds a date/time value. The variations limit the range of dates accepted.
Text/nText— A pointer to a large amount of textual data. The size of the text is limited only by disk space.
Image/nImage/Binary— A pointer to a large amount of binary data, such as a picture or sound image. Similar to text except will accept any binary data. Anything you can imagine can be stored in an Image field. As with text, the size is limited only by disk space.
TimeStamp— A special field used by the system to record sequential time data.
UniqueIdentifier— A special value that holds a GUID (globally unique identifier). Used to guarantee row uniqueness.
The length column is available for character and numeric values. It determines the maximum size of the data. For numerics it also determines the scaling factor. The Allow Nulls column determines whether or not null values are allowed in the column.
The lower part of the screen contains attributes of each column beyond the basic data type:
Description— You may enter a description of the column. It is advisable to use this attribute because it will help when documenting the schema. Put descriptions in when you create the columns; if you have to go back and fill them in, you may find the task daunting.
Default value— The "default" default value of any column is Null. If you have defined the column as allowing nulls, this is okay. If you have defined the column as not allowing nulls, it is always a good idea to define a default value so you can insert rows without remembering which columns don't allow nulls. You can also use a system-stored procedure or function as the default. There is one situation where you MUST define a default value. This is when you append a new column to an existing table and define it as Not Null. Since SQL Server must be able to store some value in the column other than null, you must define a default. It then loads the new column with the default.
Precision and scale— These two attributes are active only for the numeric or decimal data types. Precision is the maximum number of digits allowed. Scale is the position starting from the right of the decimal point. A scale of zero means no decimal point is allowed.
Identity, identity seed, and increment— These attributes are active only for integer data types. An identity column is used to uniquely identify the row within the table. By definition, it cannot have any meaning beyond this. You can define identity columns that automatically assign the next incremental value when a new row is inserted into the table. The Identity attribute tells SQL Server that you wish this column to be an identity column. Identity seed is the number you wish SQL Server to start numbering with. Increment is how much you wish SQL Server to add to the previous value when inserting a row. You may only have one identity column per table.
IsRowGUID— This attribute is active on columns that are defined as Unique-Identifiers. If set to true, this tells SQL Server that this column is globally unique for this row. No other row in any database in the world will have this GUID. When set to true, the function call to NewID is automatically placed in the default value attribute. This function returns a GUID. This property is used mostly for database replication, but you can also use it along with the time stamp data type to perform your own data synchronization between databases.
Formula— This allows you to specify that a column is the result of a calculation. In this sense, the column is really a pseudocolumn. This is similar to a formula column in a spreadsheet. The calculation may be row-level only. That is, it must reference other columns within the same table and row. No aggregates may be used.
Collation— On SQL Server 2000 and above only. You can now specify column-level collation. Collation is another word for sorting sequence. This attribute is active on character columns only. If you open the dialog by clicking the ellipsis, you will see that this is primarily used for multilanguage support. You can use it to turn off case-sensitivity on certain columns if you wish. My advice is to leave it alone. Too much tampering here can cause confusion later on.
Indexes and Keys Tab
Indexes are data structures built on top of database columns that are used to provide fast access to the data. Indexes may be unique, where every key value must be different for the row it represents, or nonunique, where multiple key values can have the same value for two or more different rows. Every table should have at least one unique index called a primary key. To set up the indexes for a table, click the Indexes button on the toolbar, or select Property Pages from the View menu. Once the Property Pages open, select the Indexes/Keys tab (Figure 3.15).
Figure 3.15. The Indexes/Keys tab of the Table Property page.
To create an index, click the New button. The system will create a default index name for you. Next select the first column for the index. You may select as many columns as you need but be aware that the larger the index, the longer it will take to update the table. Index usage should be planned carefully so it provides the most benefit without adding too much overhead.
If you have more than one file group defined, you can choose to place your index in a different data file than the table. In some circumstances this can be beneficial, especially if the data files are on separate physical volumes. Database administrators frequently choose to do this as it can help improve performance. With smaller desktop databases there will only be one file group. Next you must decide if your index will be unique. A unique index can have at most one row with a specific value in the index. Your primary key will be a unique index by default. You can use a nonunique index to speed access to a table using a nonunique entity, such as a last name or city. If all you need to do is make sure that only one row has any particular value, you can use a unique constraint. A constraint is like an index, except it uses hash values instead of the actual key values. By using a hash value, it makes checking for uniqueness very fast. The drawback is that you cannot use a constraint to speed up access to data.
The Ignore Duplicate Key check box tells SQL Server not to abort an insert statement if one of the statements would create a duplicate key. This does not mean that the row will be inserted; it just means that SQL Server will not abort the transaction, but issue a warning. Any other statements in the Transact-SQL script will be executed. If you uncheck the check box, SQL Server will issue an error message and abort and roll back the transaction. My advice is to leave it unchecked and handle the error in your program. Otherwise you will never know that a table was not updated properly.
The last item is Create Clustered. This selection can be applied to only one index per table at a time. This causes SQL Server to sort the actual rows according to the sort order of the index. This speeds up sequential access to the data dramatically, but slows down inserting and updating because SQL Server must insert or move the row into the correct position instead of just appending it to the end of the table. The last item, Do Not Automatically Recompute Statistics, should be left unchecked. This has to do with how the Query Optimizer decides how to best access a table when running a query. We will look at this later on, but for now leave it alone. I have never had to touch this selection.
Relationships Tab
As long as we have the Property Pages open, let's look at the Relationships tab. Since we are using a relational database it stands to reason that relationships are what set this type of database apart from others. On this tab we can declare relationships between tables and also enable declarative referential integrity (DRI). It is important to note the difference between these two related concepts.
Relations describe links between tables using foreign keys.
Referential integrity tells SQL Server what to do when a related column is updated or inserted.
Let's look at these in more detail.
Relations and Foreign Keys
Before we can use any DRI we must define the relationships between tables. Two or more tables can be related if they share common attributes. For example, you could have a table that describes a work location and a table that describes employees. In real life, many employees can share the same work location. Suppose we assign a value to identify each work location. We can call it the Location ID. Then if we store the Location ID on each employee record we have a way of linking the employee record to the location record. If we define the Location ID as a unique key on the work location record, we can then define the Location ID on the employee record as a foreign key (see Figure 3.16). This establishes a formal relationship between the two tables.
Figure 3.16. Relation defined between Location and Employee tables.
Note how the foreign key on tblEmp is linked to the primary key on tblLocation. This relationship defines a one-to-many relation. For every location record, there can be many employee records. In such a relation, the key on the one side of the relation must be unique. You may have a compound index, as long as the index is unique. It does not have to be the primary key, as long as it is unique. On the many side of the relationship, the foreign key is defined on the Location ID column. If no index exists on the foreign key, SQL Server defines a nonunique index for the purpose of creating the link.
Types of Relations
There are basically three types of relations:
One-to-one — There is no more than one record in Table A for every record in Table B. Depending on referential integrity settings, there may be no records in Table A for a record in Table B or no records in Table B for a record in Table A.
One-to-many— For every record in Table A, there are zero or more records in Table B with the same foreign key.
Many-to-many— For every record in Table A, there are zero or more records in Table B with the same foreign key, and for every record in Table B, there can be zero or more records in Table A.
Whether there can be zero records on the many side of a one-to-many or many-to-many relation depends on how the referential integrity settings are defined.
Referential Integrity
Now that we've seen how to define a relationship between tables, let's look at the DRI. We defined referential integrity as the rules used by SQL Server when inserting or updating rows in tables with defined relationships. The settings for DRI are at the lower portion of the Property Pages as in Figure 3.17.
Figure 3.17. Referential integrity settings.
Check existing data on creation— This option tells SQL Server to validate all the DRI rules when the new relation is saved initially. Be careful with this because on very large tables this could take some time.
Enforce relationship for replication— This tells SQL Server to enforce the relation when replicating data. Unless you will be using replication, this is of no concern to you. If you are using replication, you may want to turn this off if you define a relationship after a table that is in. With this off, SQL Server will not check DRI rules when synchronizing databases.
Enforce relationship for INSERTs and UPDATEs— This tells SQL Server to check the relationship when inserting or updating rows. If a foreign key is missing or does not match a value in the foreign table, the insert or update will fail and SQL Server will throw an error. If this is unchecked, SQL Server will allow the insert or update with invalid data in the foreign key. If this item is checked, the following two items are enabled.
Cascade update related fields— If a related field is changed on the one side of a one-to-many relation, all of the related columns on the many side of the relation are automatically updated with the new value. Be careful here because large tables can cause noticeable performance degradation. If the item is unchecked, SQL Server will throw an error if you change the column and the change would break referential integrity.
Cascade delete related records— If a row is deleted on the one side of a one-to-many relation, all rows in the related table with the same foreign key value will also be deleted. If the item is left unchecked, SQL Server will throw an error if you try to delete a row with related rows on the many side of the relation.
Confusion About Nulls
I have met many experienced programmers who struggle with the mathematical concept of null. Simply put, null means "undefined." It does not mean zero, it does not mean a null string, it does not mean ASCII zero. Zero is a valid mathematical value and does not mean undefined, it means zero. A null string is also called a zero-length string and it does not mean undefined, it means a string with no characters. The ASCII value Null means an ASCII value of zero, not undefined, as in a Null-terminated string or padding with nulls.
What do we mean by undefined? We mean the value is unknown. We do not mean it has no value. We mean we do not know what the value is. Therefore, any database column that is null has an unknown value.
Therefore, given this definition, the result of any operation involving a null is null.
1 + null is null.
2X + 3Y * 12 / null is null
"My Name is " & null is null
Another way of saying this is one plus an undefined (or unknown) value is also unknown.
Also note that we do not say "equals null," we say "is null." That is because any equality test between null and any other value will fail by definition. The statement "If X = Null" will always return false. As a matter of fact, the statement "If Null = Null" will also always return false. In SQL syntax, if we want to test for null we say "Where ColumnX Is Null" or "Where ColumnX Is Not Null." In VB you can use the "IsNull" property of the row's item property: "If X.IsNull Then…"
Check Constraints
Check constraints (see Figure 3.18) are rules that are applied to columns before they can be written to the database. For example, you might specify that a column value must be less than 100 or that a text field cannot contain all spaces.
Figure 3.18. The Check Constraints tab.
The constraint expression must return a true or false value and cannot contain aggregate expressions. The constraint in Figure 3.18 causes SQL Server to reject any value that is all spaces in the Contact Name column. The three check boxes have a similar function to the Relations tab.