Table of Contents
- Microsoft SQL Server Defined
- Microsoft SQL Server Features
- Microsoft SQL Server Administration
Microsoft SQL Server Programming
- An Outline for Development
- Database Services
- Database Objects: Databases
- Database Objects: Tables
- Database Objects: Table Relationships
- Database Objects: Keys
- Database Objects: Constraints
- Database Objects: Data Types
- Database Objects: Views
- Database Objects: Stored Procedures
- Database Objects: Indexes
- Database Objects: User Defined Functions
- Database Objects: Triggers
- Database Design: Requirements, Entities, and Attributes
- Business Process Model Notation (BPMN) and the Data Professional
- Business Questions for Database Design, Part One
- Business Questions for Database Design, Part Two
- Database Design: Finalizing Requirements and Defining Relationships
- Database Design: Creating an Entity Relationship Diagram
- Database Design: The Logical ERD
- Database Design: Adjusting The Model
- Database Design: Normalizing the Model
- Creating The Physical Model
- Database Design: Changing Attributes to Columns
- Database Design: Creating The Physical Database
- Database Design Example: Curriculum Vitae
- The SQL Server Sample Databases
- The SQL Server Sample Databases: pubs
- The SQL Server Sample Databases: NorthWind
- The SQL Server Sample Databases: AdventureWorks
- The SQL Server Sample Databases: Adventureworks Derivatives
- UniversalDB: The Demo and Testing Database, Part 1
- UniversalDB: The Demo and Testing Database, Part 2
- UniversalDB: The Demo and Testing Database, Part 3
- UniversalDB: The Demo and Testing Database, Part 4
- Getting Started with Transact-SQL
- Transact-SQL: Data Definition Language (DDL) Basics
- Transact-SQL: Limiting Results
- Transact-SQL: More Operators
- Transact-SQL: Ordering and Aggregating Data
- Transact-SQL: Subqueries
- Transact-SQL: Joins
- Transact-SQL: Complex Joins - Building a View with Multiple JOINs
- Transact-SQL: Inserts, Updates, and Deletes
- An Introduction to the CLR in SQL Server 2005
- Design Elements Part 1: Programming Flow Overview, Code Format and Commenting your Code
- Design Elements Part 2: Controlling SQL's Scope
- Design Elements Part 3: Error Handling
- Design Elements Part 4: Variables
- Design Elements Part 5: Where Does The Code Live?
- Design Elements Part 6: Math Operators and Functions
- Design Elements Part 7: Statistical Functions
- Design Elements Part 8: Summarization Statistical Algorithms
- Design Elements Part 9:Representing Data with Statistical Algorithms
- Design Elements Part 10: Interpreting the Data—Regression
- Design Elements Part 11: String Manipulation
- Design Elements Part 12: Loops
- Design Elements Part 13: Recursion
- Design Elements Part 14: Arrays
- Design Elements Part 15: Event-Driven Programming Vs. Scheduled Processes
- Design Elements Part 16: Event-Driven Programming
- Design Elements Part 17: Program Flow
- Forming Queries Part 1: Design
- Forming Queries Part 2: Query Basics
- Forming Queries Part 3: Query Optimization
- Forming Queries Part 4: SET Options
- Forming Queries Part 5: Table Optimization Hints
- Using SQL Server Templates
- Transact-SQL Unit Testing
- Index Tuning Wizard
- Unicode and SQL Server
- SQL Server Development Tools
- The SQL Server Transact-SQL Debugger
- The Transact-SQL Debugger, Part 2
- Basic Troubleshooting for Transact-SQL Code
- An Introduction to Spatial Data in SQL Server 2008
- Performance Tuning
- Practical Applications
- Professional Development
- Application Architecture Assessments
- Business Intelligence
- Tips and Troubleshooting
- Additional Resources
UniversalDB: The Demo and Testing Database, Part 4
Last updated Mar 28, 2003.
In Part 1 of this series, I explained the rationale behind the need for a single database that would be able to work on multiple platforms, for multiple industries. Since I do a lot of teaching, demonstrations and testing, I would like something that is simple to understand, and quick to implement and customize.
In Part 2 of this series, I covered the “base” tables I think cover most of the requirements I have. I have the following tables designed:
Part 3 covered the final joins needed to make the schema work, and also included the script to make the database and its tables. This week I’ll finalize the whole project and show you how I loaded the tables with data, and a few queries I’ve created to show the data.
Preparing a Database for the Data
The general script I explained last week is something that I use to explain the schema of the industry I’m demonstrating. I run that script in front of my audience as I explain each entity and how it will be used. Most folks, especially those in classes, want this kind of information.
But I don’t always create the database in front of the audience. In some cases I’m demonstrating a feature on the platform (like SQL Server Resource Governor, or the Management Data Warehouse) and so the structure really isn’t that important. In those “demo” cases I’m just looking to have valid data that the audience can relate to. And in some cases, I need a LOT of data, and I don’t want the audience to have to wait while I load it. Or perhaps I’m doing testing, where there is no audience at all. In all of these cases, I build the database and load it ahead of time.
As I mentioned in the last tutorial, most of the data types are VARCHAR(), or variable character. This allows a lot of flexibility, but of course is not optimal for proper data validation or performance. If I know I’m keeping that database around for a particular industry or testing, I alter the types before I load any data.
For instance, I mentioned that I have an “assigned-to” field in many of the tables which I use to do self-joins. Those fields are VARCHAR() types, and the key they join to is usually a BIGINT. That doesn’t bode well for performance, so I alter those fields to be the same if I’m using them that way.
Does this change the “spirit” of what I’m trying to do here? Aren’t I trying to make a single database that can be used for multiple purposes and industries? Well, no, and yes. No, changing the data types or adding indexes on a table does not change the queries that are used on them, which is the ultimate point. So I have no problem with making these minor alterations.
I’ll also add any other “ancillary” factors like the Resource Governor and so on, based on what I’m teaching, demonstrating or testing. As long as they don’t materially affect the structure, I’m comfortable with this approach.
One other change I change the name from UniversalDB to something like MedicalDB or PointOfSaleDB to indicate to the audience what I’m working on.
Loading the Database
With the database created, the next step is to determine the purpose and audience for the database. I’ll then examine that industry or group to come up with valid (or at least representative) data, and then choose a method to load it.
The simplest method is just to write INSERT statements, or create some stored procedures that will insert data into the database. For instance, assume I create a ManufacturingDB database, and I want to load the people that a manufacturing firm deals with as my first step. I want to ensure that I can show who is working for whom in the structure. In that case, I want the AssignedTo field to point back to the PersonID field of the Person table.
So assume that I’ve created the ManufacturingDB database from the UniversalDB script, and I run the following query to load it with some data:
USE ManufacturingDB; GO /* Set up the people */ INSERT INTO [ManufacturingDB].[Base].[Person] ([PersonPK],[PersonStatus],[PersonID],[PersonType],[Title],[Fname],[MName],[Lname] ,[AdressLine],[CityOrMunicipaility],[StateOrRegion],[PostalIdentification],[Country] ,[AssignedTo],[Phones],[EContact] ,[Demographics] ,[Initiation],[Updated]) VALUES (1, 'Active', '22237', 'Corporate Manager', 'Mrs.', 'Victoria', 'Terrance', 'Lynch' , '705 East Green New Avenue', 'Dallas', 'Texas', '35146', 'U.S.A.' , '0', '1231231234', 'Victoria.Lynch@vzcb.tepqkx.org' ,'<xml><string>Level 65</string></xml>' ,'1978-05-21 08:47:33.810', '2007-02-12 07:46:03.330') , (2, 'Active', '11795', 'Plant Manager', 'Mr.', 'Robert', 'James', 'Elroy' , '225 NE 2nd Avenue', 'Fort Worth', 'South Dakota', '85146', 'U.S.A.' , '1', '7684342079', 'Robert.Elroy@vzcb.tepqkx.org' ,'<xml><string></string>Level 63</xml>' ,'1979-05-27 08:47:33.810', '2009-04-16 07:46:03.330') , (3, 'Active', '73075', 'Plant Employee', 'Mr.', 'Greg', 'Robert', 'Elron' , '25 South Sound Way', 'Fort Worth', 'South Dakota', '85146', 'U.S.A.' , '2', '7681239897', 'Greg.Elron@vzcb.tepqkx.org' ,'<xml><string>Level 57</string></xml>' ,'1984-07-27 08:47:33.810', '2009-04-16 07:46:03.330') , (4, 'Active', '324134', 'Corporate Employee', 'Ms.', 'Dianna', 'Janice', 'Wilson' , '123123 Civica Court', 'Dallas', 'Texas', '35146', 'U.S.A.' , '1', '1233421234', 'Dianna.Wilson@vzcb.tepqkx.org' ,'<xml><string>Level 57</string></xml>' ,'1985-03-28 08:47:33.810', '2009-05-15 07:46:03.330') , (5, 'Active', '657465', 'Vendor Employee', 'Mr.', 'Don', 'James', 'Alonzo' , '34215 Tampa Center Drive', 'Tampa', 'Florida', '32935', 'U.S.A.' , '1', '6576543542', 'Don.Alonzo@vzcb.1234.org' ,'<xml><string>Preferred Vendor</string></xml>' ,'1999-03-28 08:47:33.810', '2009-05-15 07:46:03.330') , (6, 'Active', '98075', 'Buyer', 'Ms.', 'Marjorie', 'Kaye', 'Christianson' , '23 Center Route', 'Seattle', 'Washington', '98042', 'U.S.A.' , '1', '5653427654', 'Marjorie.Christianson@telcon.org' ,'<xml><string>Standard Buyer</string></xml>' ,'2001-04-19 08:47:33.810', '2009-05-15 07:46:03.330') GO
Now I can do self-joins because I was careful to include the number of the person assigned to another person in the same table. I’ve repeated that process for each of the other tables.
This is a manual process suitable for a few rows at a time. It does, however, allow me to teach about inserting data, grabbing the next available primary key and so on. But anything larger than a few dozen rows becomes a bit more complicated to store and run.
Another method I’ve used to load the data is to set up a “test harness” program, which involves various stored procedures set up to do the INSERT operations. The program then uses a set of text files that contain sample first and last names, another with states and countries and even product names and other information. The test program reads a random line from one file and then another line from a second, which randomizes things like first and last names. The test harness program then figures out things like Primary and Foreign Keys, and I use this for testing load speeds and so on. This also can create a large data set that I can use for demonstrations to a specific industry.
But I’ve now come across another method to load the data that I think is even better. There are several programs that have figured all this out already. I’ve mentioned the “Visual Studio For Database Professionals” before, which most folks just call “Data Dude”. Another is the “SQL Data Generator” from Redgate software. I don’t endorse with of these choices, but they have worked for me. Both have similar features, and I created and loaded a Point-of-Sale (POSDB) database with the Redgate tool recently with great results.
These programs have features that allow you to do all of the work I had put into stored procedures, and make it easier to understand what I’m doing along the way. The feature I liked the most is that I could use a Regular Expression (regex) function to generate a very believable set of data, and a lot of it. This program also has features that allow me to use a list file for any field, or a set of built-in expressions for things like XML blocks or addresses. And the most useful feature I found was that it could use one table to look up values for another that’s how I did the multiple-joins. I used it to load several thousands of rows into my POSDB database.
Querying the Database
Although each database shows different things to different audiences, one of the powerful concepts within the UniversalDB structure is that the queries largely fit the use-cases of many locations. In the following example, I’ve used the POSDB to show the main queries a retail outlet would be interested in seeing based on the Cash Register activity. But this data is equally useful to hospitals to show activities at various floors, by nurse or patient and more. And it also works in a manufacturing plant to show floor operations.
I’ll end with this: a series of simple-to-understand queries that I use on a daily basis. I trust you’ve found this series useful, if only as a thought-exercise about how you would approach this issue. Take the queries below and morph them into something useful for yourself:
/* POSDB Queries.sql Purpose: Queries for a Point of Sale (POS) Universal Database Author: Buck Woody Last Edited: 10/23/2009 Instructions: Use with a POS database created from a UniversalDB. References: */ USE POSDB; GO /* Breakdown of Customers in system Change customer to whatever fits the right industry */ SELECT POSDB.Base.Person.PersonType, COUNT(*) FROM POSDB.Base.Person WHERE PersonType LIKE '%customer%' GROUP BY POSDB.Base.Person.PersonType ORDER BY 2 DESC; GO /* Breakdown of Active Transactions, all accounting activities - in the POSDB case, a register */ SELECT Base.Accounting.Fullname, COUNT(*) FROM Base.Accounting WHERE Base.Accounting.AccountingStatus = 'Active' GROUP BY Base.Accounting.Fullname ORDER BY 2; GO /* Which location (store in this case) has the highest exchanges. Replace Exchange for other industries */ SELECT Base.Activity.Location, COUNT(*) FROM Base.Activity WHERE Base.Activity.ActivityType = 'Exchange' GROUP BY Base.Activity.Location ORDER BY 2 ASC; GO /* Current items on order Change Ordered for other uses */ SELECT Base.Material.ShortName , Base.Material.Updated FROM Base.Material WHERE Base.Material.MaterialStatus = 'Ordered' ORDER BY Base.Material.Updated ASC; GO /* Active Accounting Items by Material */ SELECT Base.Accounting.AccountingStatus , Base.Accounting.AccountingType , Relationships.TableToTable.Category , Base.Material.MaterialID , Base.Material.MaterialType FROM Base.Accounting INNER JOIN Relationships.TableToTable ON Base.Accounting.AccountingPK = Relationships.TableToTable.AccountingPK INNER JOIN Base.Material ON Relationships.TableToTable.OrganizationPK = Base.Material.MaterialPK WHERE Base.Accounting.AccountingStatus = 'Active' ORDER BY Base.Material.MaterialType; GO /* End POSDB Queries.sql */
InformIT Articles and Sample Chapters
To do “proper” design instead of this example for training and demos, check out the series of SQL Server Reference Guide entries starting here.
Books and eBooks
Another great book on design is Designing Effective Database Systems, by Rebecca M. Riordan.
I’ll violate most of these top ten design mistakes on purpose in this design. But you should still check it out for production databases.