Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

UniversalDB: The Demo and Testing Database, Part 1

Last updated Mar 28, 2003.

In my many years as a data professional, I’ve worn a lot of hats. I’ve been a database administrator, a database developer, a data architect, I’ve worked with business intelligence systems, reporting systems, and I’ve been a manager of all of those roles.

I’ve also been a consultant, a vendor, and a teacher. I’ve taught courses for non-profit organizations, small and large companies, and I still teach a college course at the University of Washington. I’ve taught not only on SQL Server but Oracle and MySQL as well.

In almost all of those situations, I had one requirement in common. I needed a small, understandable, configurable, portable database to use. Sure, almost every platform comes with a set of sample databases. I think Microsoft takes the prize with not one example but several, and I’ve covered those in a series that starts here. Those databases are excellent uses of the platform, each once showing off more features than the last. And Oracle and even MySQL comes with a demonstration database as well, even if they aren’t as developed as the Microsoft offerings.

But all of these left me…lacking. When I was a vendor, the databases weren’t applicable to the software I represented. When I was a consultant, and even to this day, the clients I work with don’t want to see a manufacturing database when they are a medical or financial firm. Also, what works in one platform doesn’t work for another.

So, with these things in mind, I’ve decided to do what any self-respecting DBA would do: I’ll build my own. And I’ll document the process here.

Like any software development process, I’ll start with a specification of what I need. I won’t do a full, formal specification here since I’m my only customer, but I will detail here what I need the system to do for me.

The UniversalDB Database Requirements

Why create yet another example database? The same reason that we have lots of software products — because they fill a certain need. The needs I have are as follows.

A Simple Schema

The first requirement I have is a very simple, understandable schema. This means that having thousands of tables with dozens of joins, views, stored procedures and other extras are not as important, as I’ll explain in the Constraints section that follows.

One of the reasons I need this simplicity is because I would like to do demonstrations and training with this database. I have to be able to explain the situation I’m about to demonstrate or teach. The point is that I’m trying to show a feature, explain a process, or teach a function, and not cloud that discussion with the database structure. I’ve seen folks take half an hour to explain a database to show off a single feature. I want to focus on the point of the class or demo rather than the database. For instance, inserting data should focus on the INSERT statement or graphical tool, and not the structure.

Another reason I need the simplicity is that I want to load-test various systems and ensure that I have the best possible configuration for my hardware and software. For that, I want to focus on the activity in those areas, and not spend time working through the joins — at least in some cases. Sometimes I want that complexity, and for that I use the standard samples that come with the various platforms.

Yet another reason the simplicity is useful is that sometimes I’m not demonstrating, testing or teaching a database concept, but a middle-tier or front-end program. Keeping the schema simple allows me to focus on that object instead of having to explain the schema once again.

It all boils down to understandability — by keeping the schema simple, I can focus on the task at hand. And this simplicity actually becomes a feeder to the next set of requirements…

Portability

The next requirement I have is that this database is portable. That has a couple of meanings for me — one is that it is small enough to carry from one system to another, and the other is that the schema can be implemented on multiple systems.

The last statement there is a really big deal. It means that I will make some sacrifices along the way that would not be acceptable in any kind of a production system. The fact that the schema needs to be run on multiple environments means that I have to think about how I will create the database. I might decide to create a script for each platform and try to keep it up to date, or it might mean that I will create a single script that is generic enough to be used on multiple systems – but since this is the requirements phase, I don’t want to make that decision here. I just need to lay out what I want, and I want to have a small, re-usable script.

Another reason this is important is because I might want to test a back-end platform to see if it performs as well as another. By keeping the schema simple and portable, I can write a single testing program and measure the time it takes those platforms to complete the tasks. More on this later.

I do have to decide what systems I want to port the database to, however. It’s not like the same script would work with any data store, such as XML or text files. Sure, I can create the same schema’s there, but the systems I’m interested in are Oracle, SQL Server, IBM’s DB/2, and any other ANSI SQL-99 compliant system.

Works for Multiple Industries

The next requirement is that the database is able to show medical data, manufacturing movements, financial transactions, things like that. In fact, I’m not even certain which industries I want to model.

Why is this important? It’s very similar to the need for a simple schema — you want to focus on the task at hand, rather than the database. By using medical data when I talk to hospitals, and financial data when I’m teaching DBA’s at a bank, I give the audience something to latch onto that they understand, and they can focus on my demo or training. Since they recognize the data already, they focus on the task at hand.

Also, demonstrations are far more compelling when the data is what the audience is familiar with.

Works for Demos, Testing and Training

As I’ve mentioned, the primary point of this database is to be able to teach a class, show a demonstration, or test simple insert, update and delete operations (Sometimes called “CRUD”, for Create, Read, Update and Delete). It is not a goal to run a company, be the best possible N-Tier application platform, or even work well for any production system whatsoever.

This is important to keep in mind, since some of the design decisions I’ll take would never make sense in production. I’ll explore this further in the Constraints section which follows.

Constraints in Design

I’ll hold the requirements there. It’s important to mention here that based on these requirements, that there are some fairly interesting restrictions that happen. Let’s look at a few of these constraints.

A Simple Schema

The fact that the schema needs to be simple means that I can’t have a lot of tables. I’ll risk losing some detail here to make a point: the “right” way to design a database system is to take the nouns and verbs and spread them out until that spread doesn’t make sense. Again, that’s an oversimplification, but it will do for this tutorial.

Because I want only a few objects that are easily understood, I’ll use vague terms for column names, and the tables will only show a few levels of activity and data within an organization. But again, I’m jumping to the design — I’ll stay at the constraints level for the moment.

Portability

This requirement is probably the biggest reason the design for the database won’t follow “standard” practices. I need to be able to use the scripts on multiple platforms, and most of those platforms don’t follow the ANSI SQL-99 standard perfectly, which is completely understandable. That means I’ll have to pick data types, structures and so on that are transportable between all those systems.

It also means that I won’t include things that you would normally see in a database, such as views, stored procedures, functions and so on. Sure, some of those things are transportable, but I felt that if I needed them I could create them as part of the demonstration or class. In fact, I use that creation to explain the schema to my audience.

I have a couple of interesting choices on this portability, such as creating a script per platform so that the SELECT statements would all work, or having a single script that would work with all of them. I’ll explain my decision in the next article — for now, this is the constraint for this requirement.

Works for Multiple Industries

To represent multiple industries, I’ll have to limit the data and activities I store. The reason is fairly straightforward — a bank doesn’t operate on patients, and hospitals don’t store money in a vault on-premises or make loans.

What this means is that I’ll have to limit the “slice” of business I can store in a single schema. I started out originally by having a lot of schemas to represent multiple kinds of industries, thinking that hospitals and pharmacies, for instance, would have similar needs. It turned out to be far too large a set of possibilities to do even that, so I backed off and thought about what I really needed to show.

This does have an underlying effect, however. I don’t store the same data for every industry. In my implementation phase later you’ll see that I kept the schema generation as one task, and the insertion of data for a particular industry separate.

Works for Demos, Testing and Training

This constraint isn’t too bad. Most data that I played with worked well for all these areas, with one exception: for the “description” fields, I can have “junk” or nonsense statements to show for training or testing. For demonstrations, however, it’s best if those descriptions are as “real-world” as possible, although you have to think that through carefully.

OK, I have my base requirements, and my constraints on those requirements. The next step is to create a logical design that meets those requirements, implement the design, load data, and test it all. I’ll do that in the next tutorial in this series.

InformIT Articles and Sample Chapters

To do “proper” design instead of this example for training and demos, check out the article series starting here.

Books and eBooks

Another great book on design is Designing Effective Database Systems, by Rebecca M. Riordan.

Online Resources

I’ll violate most of these top ten design mistakes — on purpose — in this design. But you should still check it out for production databases.