Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

UniversalDB: The Demo and Testing Database, Part 4

Last updated Mar 28, 2003.

In Part 1 of this series, I explained the rationale behind the need for a single database that would be able to work on multiple platforms, for multiple industries. Since I do a lot of teaching, demonstrations and testing, I would like something that is simple to understand, and quick to implement and customize.

In Part 2 of this series, I covered the “base” tables I think cover most of the requirements I have. I have the following tables designed:

  1. Person
  2. Organization
  3. Material
  4. Accounting
  5. Activity

Part 3 covered the final joins needed to make the schema work, and also included the script to make the database and its tables. This week I’ll finalize the whole project and show you how I loaded the tables with data, and a few queries I’ve created to show the data.

Preparing a Database for the Data

The general script I explained last week is something that I use to explain the schema of the industry I’m demonstrating. I run that script in front of my audience as I explain each entity and how it will be used. Most folks, especially those in classes, want this kind of information.

But I don’t always create the database in front of the audience. In some cases I’m demonstrating a feature on the platform (like SQL Server Resource Governor, or the Management Data Warehouse) and so the structure really isn’t that important. In those “demo” cases I’m just looking to have valid data that the audience can relate to. And in some cases, I need a LOT of data, and I don’t want the audience to have to wait while I load it. Or perhaps I’m doing testing, where there is no audience at all. In all of these cases, I build the database and load it ahead of time.

As I mentioned in the last tutorial, most of the data types are VARCHAR(), or variable character. This allows a lot of flexibility, but of course is not optimal for proper data validation or performance. If I know I’m keeping that database around for a particular industry or testing, I alter the types before I load any data.

For instance, I mentioned that I have an “assigned-to” field in many of the tables which I use to do self-joins. Those fields are VARCHAR() types, and the key they join to is usually a BIGINT. That doesn’t bode well for performance, so I alter those fields to be the same if I’m using them that way.

Does this change the “spirit” of what I’m trying to do here? Aren’t I trying to make a single database that can be used for multiple purposes and industries? Well, no, and yes. No, changing the data types or adding indexes on a table does not change the queries that are used on them, which is the ultimate point. So I have no problem with making these minor alterations.

I’ll also add any other “ancillary” factors like the Resource Governor and so on, based on what I’m teaching, demonstrating or testing. As long as they don’t materially affect the structure, I’m comfortable with this approach.

One other change — I change the name from UniversalDB to something like MedicalDB or PointOfSaleDB to indicate to the audience what I’m working on.

Loading the Database

With the database created, the next step is to determine the purpose and audience for the database. I’ll then examine that industry or group to come up with valid (or at least representative) data, and then choose a method to load it.

The simplest method is just to write INSERT statements, or create some stored procedures that will insert data into the database. For instance, assume I create a ManufacturingDB database, and I want to load the people that a manufacturing firm deals with as my first step. I want to ensure that I can show who is working for whom in the structure. In that case, I want the AssignedTo field to point back to the PersonID field of the Person table.

So assume that I’ve created the ManufacturingDB database from the UniversalDB script, and I run the following query to load it with some data:

 USE ManufacturingDB;
GO
/* Set up the people */
INSERT INTO [ManufacturingDB].[Base].[Person]
([PersonPK],[PersonStatus],[PersonID],[PersonType],[Title],[Fname],[MName],[Lname]
,[AdressLine],[CityOrMunicipaility],[StateOrRegion],[PostalIdentification],[Country]
,[AssignedTo],[Phones],[EContact]
,[Demographics]
,[Initiation],[Updated])
VALUES
(1, 'Active', '22237', 'Corporate Manager', 'Mrs.', 'Victoria', 'Terrance', 'Lynch'
, '705 East Green New Avenue', 'Dallas', 'Texas', '35146', 'U.S.A.'
, '0', '1231231234', 'Victoria.Lynch@vzcb.tepqkx.org'
,'<xml><string>Level 65</string></xml>'
,'1978-05-21 08:47:33.810', '2007-02-12 07:46:03.330')
,
(2, 'Active', '11795', 'Plant Manager', 'Mr.', 'Robert', 'James', 'Elroy'
, '225 NE 2nd Avenue', 'Fort Worth', 'South Dakota', '85146', 'U.S.A.'
, '1', '7684342079', 'Robert.Elroy@vzcb.tepqkx.org'
,'<xml><string></string>Level 63</xml>'
,'1979-05-27 08:47:33.810', '2009-04-16 07:46:03.330')
,
(3, 'Active', '73075', 'Plant Employee', 'Mr.', 'Greg', 'Robert', 'Elron'
, '25 South Sound Way', 'Fort Worth', 'South Dakota', '85146', 'U.S.A.'
, '2', '7681239897', 'Greg.Elron@vzcb.tepqkx.org'
,'<xml><string>Level 57</string></xml>'
,'1984-07-27 08:47:33.810', '2009-04-16 07:46:03.330')
,
(4, 'Active', '324134', 'Corporate Employee', 'Ms.', 'Dianna', 'Janice', 'Wilson'
, '123123 Civica Court', 'Dallas', 'Texas', '35146', 'U.S.A.'
, '1', '1233421234', 'Dianna.Wilson@vzcb.tepqkx.org'
,'<xml><string>Level 57</string></xml>'
,'1985-03-28 08:47:33.810', '2009-05-15 07:46:03.330')
,
(5, 'Active', '657465', 'Vendor Employee', 'Mr.', 'Don', 'James', 'Alonzo'
, '34215 Tampa Center Drive', 'Tampa', 'Florida', '32935', 'U.S.A.'
, '1', '6576543542', 'Don.Alonzo@vzcb.1234.org'
,'<xml><string>Preferred Vendor</string></xml>'
,'1999-03-28 08:47:33.810', '2009-05-15 07:46:03.330')
, 
(6, 'Active', '98075', 'Buyer', 'Ms.', 'Marjorie', 'Kaye', 'Christianson'
, '23 Center Route', 'Seattle', 'Washington', '98042', 'U.S.A.'
, '1', '5653427654', 'Marjorie.Christianson@telcon.org'
,'<xml><string>Standard Buyer</string></xml>'
,'2001-04-19 08:47:33.810', '2009-05-15 07:46:03.330')

GO

Now I can do self-joins because I was careful to include the number of the person assigned to another person in the same table. I’ve repeated that process for each of the other tables.

This is a manual process — suitable for a few rows at a time. It does, however, allow me to teach about inserting data, grabbing the next available primary key and so on. But anything larger than a few dozen rows becomes a bit more complicated to store and run.

Another method I’ve used to load the data is to set up a “test harness” program, which involves various stored procedures set up to do the INSERT operations. The program then uses a set of text files that contain sample first and last names, another with states and countries and even product names and other information. The test program reads a random line from one file and then another line from a second, which randomizes things like first and last names. The test harness program then figures out things like Primary and Foreign Keys, and I use this for testing load speeds and so on. This also can create a large data set that I can use for demonstrations to a specific industry.

But I’ve now come across another method to load the data that I think is even better. There are several programs that have figured all this out already. I’ve mentioned the “Visual Studio For Database Professionals” before, which most folks just call “Data Dude”. Another is the “SQL Data Generator” from Redgate software. I don’t endorse with of these choices, but they have worked for me. Both have similar features, and I created and loaded a Point-of-Sale (POSDB) database with the Redgate tool recently with great results.

These programs have features that allow you to do all of the work I had put into stored procedures, and make it easier to understand what I’m doing along the way. The feature I liked the most is that I could use a Regular Expression (regex) function to generate a very believable set of data, and a lot of it. This program also has features that allow me to use a list file for any field, or a set of built-in expressions for things like XML blocks or addresses. And the most useful feature I found was that it could use one table to look up values for another — that’s how I did the multiple-joins. I used it to load several thousands of rows into my POSDB database.

Querying the Database

Although each database shows different things to different audiences, one of the powerful concepts within the UniversalDB structure is that the queries largely fit the use-cases of many locations. In the following example, I’ve used the POSDB to show the main queries a retail outlet would be interested in seeing based on the Cash Register activity. But this data is equally useful to hospitals to show activities at various floors, by nurse or patient and more. And it also works in a manufacturing plant to show floor operations.

I’ll end with this: a series of simple-to-understand queries that I use on a daily basis. I trust you’ve found this series useful, if only as a thought-exercise about how you would approach this issue. Take the queries below and morph them into something useful for yourself:

/*		POSDB Queries.sql
Purpose:		Queries for a Point of Sale (POS) Universal Database
Author:		Buck Woody
Last Edited:	10/23/2009
Instructions:	Use with a POS database created from a UniversalDB.
References:
*/
USE POSDB;
GO

/* Breakdown of Customers in system 
Change customer to whatever fits the right industry */
SELECT POSDB.Base.Person.PersonType, COUNT(*)
FROM POSDB.Base.Person
WHERE PersonType LIKE '%customer%'
GROUP BY POSDB.Base.Person.PersonType
ORDER BY 2 DESC;
GO

/* Breakdown of Active Transactions, all accounting 
activities - in the POSDB case, a register */
SELECT Base.Accounting.Fullname, COUNT(*)
FROM Base.Accounting
WHERE Base.Accounting.AccountingStatus = 'Active'
GROUP BY Base.Accounting.Fullname
ORDER BY 2;
GO

/* Which location (store in this case) 
has the highest exchanges. Replace Exchange
for other industries */
SELECT Base.Activity.Location, COUNT(*)
FROM Base.Activity
WHERE Base.Activity.ActivityType = 'Exchange'
GROUP BY Base.Activity.Location
ORDER BY 2 ASC;
GO

/* Current items on order
Change Ordered for other uses */
SELECT Base.Material.ShortName
, Base.Material.Updated 
FROM Base.Material
WHERE Base.Material.MaterialStatus = 'Ordered'
ORDER BY Base.Material.Updated ASC;
GO

/* Active Accounting Items by Material */
SELECT  Base.Accounting.AccountingStatus
      , Base.Accounting.AccountingType
      , Relationships.TableToTable.Category
      , Base.Material.MaterialID
      , Base.Material.MaterialType
FROM    Base.Accounting
        INNER JOIN Relationships.TableToTable
        ON Base.Accounting.AccountingPK = Relationships.TableToTable.AccountingPK
        INNER JOIN Base.Material
        ON Relationships.TableToTable.OrganizationPK = Base.Material.MaterialPK
WHERE Base.Accounting.AccountingStatus = 'Active'
ORDER BY Base.Material.MaterialType;
GO

/* End POSDB Queries.sql */

InformIT Articles and Sample Chapters

To do “proper” design instead of this example for training and demos, check out the series of SQL Server Reference Guide entries starting here.

Books and eBooks

Another great book on design is Designing Effective Database Systems, by Rebecca M. Riordan.

Online Resources

I’ll violate most of these top ten design mistakes — on purpose — in this design. But you should still check it out for production databases.