- Table of Contents
- Microsoft SQL Server Defined
- Microsoft SQL Server Features
- Microsoft SQL Server Administration
- Microsoft SQL Server Programming
- Performance Tuning
- Practical Applications
- Choosing the Back End
- The DBA's Toolbox, Part 1
- The DBA's Toolbox, Part 2
- Scripting Solutions for SQL Server
- Building a SQL Server Lab
- Using Graphics Files with SQL Server
- Enterprise Resource Planning
- Customer Relationship Management (CRM)
- Building a Reporting Data Server
- Building a Database Documenter, Part 1
- Building a Database Documenter, Part 2
- Data Management Objects
- Data Management Objects: The Server Object
- Data Management Objects: Server Object Methods
- Data Management Objects: Collections and the Database Object
- Data Management Objects: Database Information
- Data Management Objects: Database Control
- Data Management Objects: Database Maintenance
- Data Management Objects: Logging the Process
- Data Management Objects: Running SQL Statements
- Data Management Objects: Multiple Row Returns
- Data Management Objects: Other Database Objects
- Data Management Objects: Security
- Data Management Objects: Scripting
- Powershell and SQL Server - Overview
- PowerShell and SQL Server - Objects and Providers
- Powershell and SQL Server - A Script Framework
- Powershell and SQL Server - Logging the Process
- Powershell and SQL Server - Reading a Control File
- Powershell and SQL Server - SQL Server Access
- Powershell and SQL Server - Web Pages from a SQL Query
- Powershell and SQL Server - Scrubbing the Event Logs
- SQL Server 2008 PowerShell Provider
- SQL Server I/O: Importing and Exporting Data
- SQL Server I/O: XML in Database Terms
- SQL Server I/O: Creating XML Output
- SQL Server I/O: Reading XML Documents
- SQL Server I/O: Using XML Control Mechanisms
- SQL Server I/O: Creating Hierarchies
- SQL Server I/O: Using HTTP with SQL Server XML
- SQL Server I/O: Using HTTP with SQL Server XML Templates
- SQL Server I/O: Remote Queries
- SQL Server I/O: Working with Text Files
- Using Microsoft SQL Server on Handheld Devices
- Front-Ends 101: Microsoft Access
- Comparing Two SQL Server Databases
- English Query - Part 1
- English Query - Part 2
- English Query - Part 3
- English Query - Part 4
- English Query - Part 5
- RSS Feeds from SQL Server
- Using SQL Server Agent to Monitor Backups
- Reporting Services - Creating a Maintenance Report
- SQL Server Chargeback Strategies, Part 1
- SQL Server Chargeback Strategies, Part 2
- SQL Server Replication Example
- Creating a Master Agent and Alert Server
- The SQL Server Central Management System: Definition
- The SQL Server Central Management System: Base Tables
- The SQL Server Central Management System: Execution of Server Information (Part 1)
- The SQL Server Central Management System: Execution of Server Information (Part 2)
- The SQL Server Central Management System: Collecting Performance Metrics
- The SQL Server Central Management System: Centralizing Agent Jobs, Events and Scripts
- The SQL Server Central Management System: Reporting the Data and Project Summary
- Time Tracking for SQL Server Operations
- Migrating Departmental Data Stores to SQL Server
- Migrating Departmental Data Stores to SQL Server: Model the System
- Migrating Departmental Data Stores to SQL Server: Model the System, Continued
- Migrating Departmental Data Stores to SQL Server: Decide on the Destination
- Migrating Departmental Data Stores to SQL Server: Design the ETL
- Migrating Departmental Data Stores to SQL Server: Design the ETL, Continued
- Migrating Departmental Data Stores to SQL Server: Attach the Front End, Test, and Monitor
- Tracking SQL Server Timed Events, Part 1
- Tracking SQL Server Timed Events, Part 2
- Patterns and Practices for the Data Professional
- Managing Vendor Databases
- Consolidation Options
- Connecting to a SQL Azure Database from Microsoft Access
- SharePoint 2007 and SQL Server, Part One
- SharePoint 2007 and SQL Server, Part Two
- SharePoint 2007 and SQL Server, Part Three
- Querying Multiple Data Sources from a Single Location (Distributed Queries)
- Importing and Exporting Data for SQL Azure
- Working on Distributed Teams
- Professional Development
- Application Architecture Assessments
- Business Intelligence
- Tips and Troubleshooting
- Additional Resources
Migrating Departmental Data Stores to SQL Server: Model the System
Last updated Feb 19, 2010.
This is the second article in a series of a formal process you can follow to migrate data stored in “departmental data stores” (such as Excel, text files, XML documents and so on) into a Relational Database Management System (RDBMS) like SQL Server. The first article in this series is here.
In the first article, I explained what these data stores are, what they mean to your organization, and when they should be considered for migration. Some data doesn’t need to be stored in an RDBMS, and other data does. I also explained a few methods you can use to locate that data. That’s the first step.
In this article I’ll explain how to take that data and model it so that you can tease out the requirements from the discovery you’ve done and how to model that data so that everyone agrees on its final format.
Document Business Requirements
The first step in the transition (once you’ve decided you need one) is to define the business requirements. You’ll recognize this as the first step in any good database design. That’s on purpose you’re doing a few things here by completing this process. The first thing you’re doing is ensuring that the application actually stores the data everyone needs. There may be parts of the data that are stored because of a poor design, a link to the “source” system or systems, to serve multiple needs and so on.
In fact, this might have been the reason that you were called in to begin with perhaps a set of users needs access to the spreadsheet but some parts of it weren’t meant for them to see, or perhaps they need more data than the spreadsheet holds. Again, I’m using a “spreadsheet” here just as a placeholder so that you can see the design process.
I’ll start a fictitious example here to show you one possibility. Using the processes I explained in the last tutorial, I’ll show you what I “find” and how I deal with it.
Notes from meetings
In this example, I speak with the department heads, and show them the advantages of storing the data in a more secure, accessible and protected architecture. I have a list of “Go-Do’s” that I give them to talk with their departments about the fact that I’ll help if they think they have an application that meets the criteria.
In about two weeks, I get a call from one of the manager. She tells me there is a spreadsheet that they are using to track vendor information. Would I be willing to take a look? I follow up with a meeting to go over what the data is, where they are at with it and how they use it. I take a lot of notes, already teasing out some of the requirements.
I’ll follow the next two steps, but after those steps I follow up with the meetings again to discuss my findings. I’ll mention my findings in a moment.
Check the applications
While I’m waiting for calls from managers, I coordinate with my server admins, and let them know what I’m looking for. They set up some monitoring to look for two things: people that have Microsoft Office applications, and files that are being accessed by more than one user on a “frequent” basis.
Examine the files, question the owners
Once I get the data back from my meetings and other investigations, I end up with a few files to examine. I say “files,” even though the application might be multi-user, like Visual basic or Microsoft Access. Those programs all end up using a file, and that’s what I look for.
As I mentioned earlier, in the example I’m using a combination of meetings and investigation turns up a single Microsoft Excel spreadsheet. The users tell me that they are using an extract from a larger system to get the names of vendors the organization uses and pays. They’ve added some columns that they use to create a report from for different departments.
To see the spreadsheet you can download it here.
They’ve turned on filtering, sorting and so on for the Excel spreadsheet users. Other users are taking the spreadsheet and basing a mail-merge on it.
So my challenge is to break down the elements and find out what they are doing with the data. I’ll do that with a set of business requirements statements. It’s tempting to look at the data they already have and start building – but that’s not proper design. You can certainly use it as a data-point, and as a validation tool later.
So I begin by asking a few standard design questions:
- What are you trying to do?
- What data is available already?
- What data is still needed?
- Who enters it?
- Who can see it?
- How do they see it?
Of course there are many more questions that go into a “real” design exercise, but I’ll keep it simple for this example, and show you what I’ve gotten back as an answer.
Starting with the questions and the spreadsheet in hand, I poll the manager and the team, and get the responses to my questions:
Q: What are you trying to do?
A: There are two groups of folks. One needs to be able to enter “ratings” for vendors that we use, and a rating system to find out if they are “on-time” or not. The other group needs to know which vendors we use to send them periodic notifications. Some vendors get e-mails, others phone calls, some we enter data on their web site and others get regular mail, depending on what we’re doing.
Q: What data is available already?
A: Our vendors and their addresses and phones are in another system. We export that and then update what we know about already. There’s lots of other data in that system, but we only care about the current vendors.
Q: What data is still needed?
A: We have to enter the web addresses, e-mails and the contact names.
We also track the date we contact them to make sure that they don’t get two notifications.
Sometimes we enter notes, and we also track when the vendor is “on-time,” using an X out of 10 rating. If the vendor is on-time and we think the prices are good, we mark them as “Preferred.”
If we don’t use the vendor any more, we just mark it as “Closed.”
Q: Who enters it?
A: We have three people who deal with vendors and enter that data. All of them do roughly the same type of job. The people that run reports set the “Last Contacted” date.
Q: Who can see it?
A: We have 5 people beyond the 3 that enter data that run reports. All of them just use the name, addresses and contact information.
Q: How do they see it?
A: Most of them are using Microsoft Word in a mail-merge. Even when they call or enter the data on a web form, they just pull it into Word and then do something with it.
In a “real” data modeling exercise, the business requirements would be built from the answers to these questions, but I want to be able to show a small example quickly. Standard business requirements normally state something like “The system shall track names and addresses for organizations termed as ‘vendor’ to our organization.” Each requirement is labeled out like that, and then everyone agrees to what the system will do. For now, I’ll stick with just those questions and answers that’s good enough for this example.
Now that I’ve finish that portion of the process, I’ll make the next tutorial a diagram of the process I follow for turning that information into a data model. Again, I’ll use the spreadsheet, but only as a guide. Getting this Information Into a usable format takes more work but a clear understanding of the business (or organizational) needs is the first and most important step.
As I mentioned at the top of this tutorial, I’ll “circle back” on the meetings with the users to ensure that the requirements fit their needs. The users will normally be willing to tell you that everything “looks OK,” so it’s important to explain the detail carefully, and to explain that the decisions taken here really matter. They’ll see this further as you begin to normalize the model which is the next step, and the next tutorial.
InformIT Articles and Sample Chapters
I have a section elsewhere in this Reference Guide that details the use of the MAPS tool. It’s the first place you should start.
Books and eBooks
As I explain how to migrate the data from one source to another, you’ll most definitely need to know about SQL Server Integration Services, or SSIS. Microsoft SQL Server 2008 Integration Services Unleashed can help.
In some cases, you may decide to leave the data where it is and simply link to it. In that case, you might want to research your programming options for data here.