Table of Contents
- Microsoft SQL Server Defined
- Microsoft SQL Server Features
- Microsoft SQL Server Administration
- Microsoft SQL Server Programming
- Performance Tuning
- Choosing the Back End
- The DBA's Toolbox, Part 1
- The DBA's Toolbox, Part 2
- Scripting Solutions for SQL Server
- Building a SQL Server Lab
- Using Graphics Files with SQL Server
- Enterprise Resource Planning
- Customer Relationship Management (CRM)
- Building a Reporting Data Server
- Building a Database Documenter, Part 1
- Building a Database Documenter, Part 2
- Data Management Objects
- Data Management Objects: The Server Object
- Data Management Objects: Server Object Methods
- Data Management Objects: Collections and the Database Object
- Data Management Objects: Database Information
- Data Management Objects: Database Control
- Data Management Objects: Database Maintenance
- Data Management Objects: Logging the Process
- Data Management Objects: Running SQL Statements
- Data Management Objects: Multiple Row Returns
- Data Management Objects: Other Database Objects
- Data Management Objects: Security
- Data Management Objects: Scripting
- Powershell and SQL Server - Overview
- PowerShell and SQL Server - Objects and Providers
- Powershell and SQL Server - A Script Framework
- Powershell and SQL Server - Logging the Process
- Powershell and SQL Server - Reading a Control File
- Powershell and SQL Server - SQL Server Access
- Powershell and SQL Server - Web Pages from a SQL Query
- Powershell and SQL Server - Scrubbing the Event Logs
- SQL Server 2008 PowerShell Provider
- SQL Server I/O: Importing and Exporting Data
- SQL Server I/O: XML in Database Terms
- SQL Server I/O: Creating XML Output
- SQL Server I/O: Reading XML Documents
- SQL Server I/O: Using XML Control Mechanisms
- SQL Server I/O: Creating Hierarchies
- SQL Server I/O: Using HTTP with SQL Server XML
- SQL Server I/O: Using HTTP with SQL Server XML Templates
- SQL Server I/O: Remote Queries
- SQL Server I/O: Working with Text Files
- Using Microsoft SQL Server on Handheld Devices
- Front-Ends 101: Microsoft Access
- Comparing Two SQL Server Databases
- English Query - Part 1
- English Query - Part 2
- English Query - Part 3
- English Query - Part 4
- English Query - Part 5
- RSS Feeds from SQL Server
- Using SQL Server Agent to Monitor Backups
- Reporting Services - Creating a Maintenance Report
- SQL Server Chargeback Strategies, Part 1
- SQL Server Chargeback Strategies, Part 2
- SQL Server Replication Example
- Creating a Master Agent and Alert Server
- The SQL Server Central Management System: Definition
- The SQL Server Central Management System: Base Tables
- The SQL Server Central Management System: Execution of Server Information (Part 1)
- The SQL Server Central Management System: Execution of Server Information (Part 2)
- The SQL Server Central Management System: Collecting Performance Metrics
- The SQL Server Central Management System: Centralizing Agent Jobs, Events and Scripts
- The SQL Server Central Management System: Reporting the Data and Project Summary
- Time Tracking for SQL Server Operations
- Migrating Departmental Data Stores to SQL Server
- Migrating Departmental Data Stores to SQL Server: Model the System
- Migrating Departmental Data Stores to SQL Server: Model the System, Continued
- Migrating Departmental Data Stores to SQL Server: Decide on the Destination
- Migrating Departmental Data Stores to SQL Server: Design the ETL
- Migrating Departmental Data Stores to SQL Server: Design the ETL, Continued
- Migrating Departmental Data Stores to SQL Server: Attach the Front End, Test, and Monitor
- Tracking SQL Server Timed Events, Part 1
- Tracking SQL Server Timed Events, Part 2
- Patterns and Practices for the Data Professional
- Managing Vendor Databases
- Consolidation Options
- Connecting to a SQL Azure Database from Microsoft Access
- SharePoint 2007 and SQL Server, Part One
- SharePoint 2007 and SQL Server, Part Two
- SharePoint 2007 and SQL Server, Part Three
- Querying Multiple Data Sources from a Single Location (Distributed Queries)
- Importing and Exporting Data for SQL Azure
- Working on Distributed Teams
- Professional Development
- Application Architecture Assessments
- Business Intelligence
- Tips and Troubleshooting
- Additional Resources
Migrating Departmental Data Stores to SQL Server: Decide on the Destination
Last updated Mar 28, 2003.
This is the third article in a series of a formal process you can follow to migrate data stored in “departmental data stores” (such as Excel, text files, XML documents and so on) into a Relational Database Management System (RDBMS) like SQL Server. The first article in this series is here, and the second is here, and the third is here.
In the first article, I explained what these data stores are, what they mean to your organization, and when they should be considered for migration. Some data doesn’t need to be stored in an RDBMS, and other data does. I also explained a few methods you can use to locate that data. That’s the first step.
In the second installment I explained how to take that data and model it so that you can tease out the requirements from the discovery you’ve done and how to model that data so that everyone agrees on its final format. I wasn’t able to complete the entire part of that process there, so I’ll finish that step in this article.
In the third tutorial I covered a Business Requirements document in more depth and I explained how to normalize the model into tables and columns.
Here’s the outline of the process and where we are so far:
- Locate the data
- Model the system
- Decide on the Destination
- Design the ETL and Migrate the Data
- Attach the front-ends
- Test and Monitor
The model I created had some issues, and wasn’t quite complete. So in this tutorial I’ll fix that and then start deciding on where the data will land.
Refining the Model
As I created the model from the Business Requirements and the source data (in this case, and Excel spreadsheet) I noticed some relationships had developed that didn’t make sense based on the requirements. I went back to the business team and asked a few more questions on how they view two things: references and activities.
The response was what I expected [md] they view everything through the lens of a contact. They never, for instance, call a “Vendor” but a contact at that vendor. While the original source system for their data was very vendor-centric, they were contact-centric.
Because of this focus, I changed the “Last Contacted” field to its own table [md] and added a column to describe the activity. I made that a one-to-many table and now there can be multiple activities for a contact, who in turn belongs to a vendor. Much cleaner. Not only does the system now tell the user the last time the vendor was contacted, but there’s now a place for them to store information about why they contacted them and so on. I also added a column I had neglected [md] a bit that represents whether a contact is “primary” or not.
Notice that this information is tied to a Contact, not a Vendor. That’s because the business told me this is how they view the data. It’s this kind of process that shows the great value of using a design document and working well with the business on what they need. It gives you a tool to have a discussion, and everyone gets what they need [md] the business can store and grow the data they need, the DBA ends up with a sustainable design, and can assist the developers in writing efficient code.
The design is (for now) pretty final, so here is what I ended up with as the database diagram:
If you compare this one with the previous article you’ll see the differences. After the next step, I’ll show you the Transact-SQL Data Definition Language (DDL) statements I created from this model.
I take that model and marry it up with the documents from the business folks and the Unified Markup Language (UML) document the assigned developer is using. We have a final meeting to sign off on the design, and I’ll next begin evaluating the destination for the data.
Decide on the Destination
With the design in hand, you might think it’s time to create the database. But the next step is to decide where the data will land. The reason you do this part before you create the statements is that the platform you choose might change the design a bit. For instance, in the “higher” editions of SQL Server there are more datatypes to choose from, and you have features such as compression and encryption that change with each version and edition.
This design is simple enough to span from SQL Server Express Edition all the way to the DataCenter edition in SQL Server 2008R2 and later. So the data types and features I’ve chosen in the design still stand. But I have multiple SQL Servers in my shop, and each has a little room on it. I also have the option of installing an entirely new server, or even using a Virtual Server to handle the load. It really comes down to just a few decision points to narrow down the selection.
The first question to ask is “which system has the capacity for this data?” That includes not only drive space but network load and CPU averages. After all, that server might not be busy right now, but when end of month closing hits...
And you need to watch here for not only the current capacity, but how large you think the data will grow. A single spreadsheet will certainly fit inside any database, but years from now, will this still be the right Instance of SQL Server?
It goes without saying that you need to make sure you have licenses to cover anything you install (other than Express, of course). Well, I guess it doesn’t go without saying, since I just mentioned it.
It’s not just the SQL Server licenses you need to be concerned about. If there are third-party products in your shop that backup and maintain the servers, you may have licensing implications there as well. It’s important to check those out as well.
This question has not only to do with how much uptime the system requires (High Availability or HA), the Recovery Point Objective (RPO) and Recovery Time Objective (RTO) and so on, but also how close, network-wise, the system should be to the users. If you take a local application and move it to a server on the other side of the world which has a poor response rate, you aren’t going to make any of the users happy. It’s important to keep in mind that you’re trying to make the system more robust, not less, so pick as server that has a good availability to the users.
And I don’t mean to minimize the RPO, RTO and HA requirements in the system. You should carefully evaluate each of those as well.
Management and Administration
Another choice for where the data will go is who will do the administration of the system. If you’re going to allow another group to administer the database system, you have to think about the security boundary you need. You also need to think about the storage space and backups, index maintenance and all of the things an administrator does.
For this small system, I was able to put the database on a current server and just use database roles to allow access. I evaluated the entire thing and the server I used could handle all of the load, and it wasn’t an increased burden on me to manage it.
Implementing the Code
With all of those decisions made, I can now implement the Data Definition Language (DDL) statements.
Keep in mind this is the design I came up with [md] no doubt you might think of other things to add in, change or even take out. For instance, I don’t have a primary key on every table. Will that be a problem? I’ll leave that for you to decide, and to edit this script as necessary.
If you don’t have an Entity Relationship Diagram (ERD) tool, you can create this database (on a test system, of course) and use the “Database Diagram” feature in SQL Server Management Studio (SSMS) to create one. I’ve been evaluating Quest’s TOAD Database Designer for these graphics. Here’s the code:
/* Created: 2/19/2010, Buck Woody Modified: 2/24/2010, Buck Woody Database Platform: MS SQL Server 2008 Setup for testing: USE master; GO DROP DATABASE Vendors; GO */ USE master; GO CREATE DATABASE Vendors; GO USE Vendors ; GO -- Table Vendor CREATE TABLE [Vendor] ( [VendorCode] VARCHAR(5) NOT NULL , [VendorName] VARCHAR(150) NOT NULL , [Description] VARCHAR(255) NULL , [Address1] VARCHAR(55) NOT NULL , [Address2] VARCHAR(55) NULL , [City] VARCHAR(75) NOT NULL , [State] CHAR(2) NOT NULL , [PostalCode] CHAR(9) NOT NULL ) GO -- Add keys for table Vendor ALTER TABLE [Vendor] ADD CONSTRAINT [VednorCodePK] PRIMARY KEY ([VendorCode]) GO -- Table VendorType CREATE TABLE [VendorType] ([VendorTypeID] INT IDENTITY NOT NULL ,[Description] VARCHAR(100) NOT NULL ) GO -- Add keys for table VendorType ALTER TABLE [VendorType] ADD CONSTRAINT [VendorTypePK] PRIMARY KEY ([VendorTypeID]) GO -- Table VendorToVendorType CREATE TABLE [VendorToVendorType] ([VendorTypeID] INT NOT NULL ,[VendorCode] VARCHAR(5) NOT NULL ) GO -- Add keys for table VendorToVendorType ALTER TABLE [VendorToVendorType] ADD CONSTRAINT [VendorToVendorPK] PRIMARY KEY ([VendorTypeID], [VendorCode]) GO -- Table Contact CREATE TABLE [Contact] ( [ContactID] INT NOT NULL , [VendorCode] VARCHAR(5) NOT NULL , [ContactFullName] VARCHAR(150) NOT NULL , [LastContacted] DATETIME NOT NULL , [IsPrimary] BIT DEFAULT 0 NOT NULL ) GO -- Add keys for table Contact ALTER TABLE [Contact] ADD CONSTRAINT [ContactPK] PRIMARY KEY ([ContactID]) GO -- Table Reference CREATE TABLE [Reference] ( [ContactID] INT NOT NULL , [ReferenceType] VARCHAR(50) NOT NULL , [ReferenceDescription] VARCHAR(255) NOT NULL ) GO -- Table Activity CREATE TABLE [Activity] ( [ContactID] INT NOT NULL , [ActivityDate] DATETIME DEFAULT GETDATE() NOT NULL , [RecordedBy] VARCHAR(150) NOT NULL , [ActivityDescription] VARCHAR(255) NOT NULL ) GO -- Relationships ------------------------------------------------- ALTER TABLE [VendorToVendorType] ADD CONSTRAINT [VendorFK] FOREIGN KEY ([VendorCode]) REFERENCES [Vendor] ([VendorCode]) GO ALTER TABLE [VendorToVendorType] ADD CONSTRAINT [VendorTypeFK] FOREIGN KEY ([VendorTypeID]) REFERENCES [VendorType] ([VendorTypeID]) GO ALTER TABLE [Contact] ADD CONSTRAINT [VendorContactFK] FOREIGN KEY ([VendorCode]) REFERENCES [Vendor] ([VendorCode]) GO ALTER TABLE [Activity] ADD CONSTRAINT [ContactActivityFK] FOREIGN KEY ([ContactID]) REFERENCES [Contact] ([ContactID]) GO ALTER TABLE [Reference] ADD CONSTRAINT [ContactReferenceFK] FOREIGN KEY ([ContactID]) REFERENCES [Contact] ([ContactID]) GO /* End Script */
I’ll follow up in the next in this series with designing my Extract, Transform and Load (ETL) process and I’ll migrate the data.
InformIT Articles and Sample Chapters
My article series on database design starts here.
Books and eBooks
Database Design for Mere Mortals is a great book on database design.