Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

Migrating Departmental Data Stores to SQL Server: Model the System

Last updated Mar 28, 2003.

This is the second article in a series of a formal process you can follow to migrate data stored in “departmental data stores” (such as Excel, text files, XML documents and so on) into a Relational Database Management System (RDBMS) like SQL Server. The first article in this series is here.

In the first article, I explained what these data stores are, what they mean to your organization, and when they should be considered for migration. Some data doesn’t need to be stored in an RDBMS, and other data does. I also explained a few methods you can use to locate that data. That’s the first step.

In this article I’ll explain how to take that data and model it so that you can tease out the requirements from the discovery you’ve done and how to model that data so that everyone agrees on its final format.

Document Business Requirements

The first step in the transition (once you’ve decided you need one) is to define the business requirements. You’ll recognize this as the first step in any good database design. That’s on purpose — you’re doing a few things here by completing this process. The first thing you’re doing is ensuring that the application actually stores the data everyone needs. There may be parts of the data that are stored because of a poor design, a link to the “source” system or systems, to serve multiple needs and so on.

In fact, this might have been the reason that you were called in to begin with — perhaps a set of users needs access to the spreadsheet but some parts of it weren’t meant for them to see, or perhaps they need more data than the spreadsheet holds. Again, I’m using a “spreadsheet” here just as a placeholder so that you can see the design process.

I’ll start a fictitious example here to show you one possibility. Using the processes I explained in the last tutorial, I’ll show you what I “find” and how I deal with it.

Notes from meetings

In this example, I speak with the department heads, and show them the advantages of storing the data in a more secure, accessible and protected architecture. I have a list of “Go-Do’s” that I give them to talk with their departments about the fact that I’ll help if they think they have an application that meets the criteria.

In about two weeks, I get a call from one of the manager. She tells me there is a spreadsheet that they are using to track vendor information. Would I be willing to take a look? I follow up with a meeting to go over what the data is, where they are at with it and how they use it. I take a lot of notes, already teasing out some of the requirements.

I’ll follow the next two steps, but after those steps I follow up with the meetings again to discuss my findings. I’ll mention my findings in a moment.

Check the applications

While I’m waiting for calls from managers, I coordinate with my server admins, and let them know what I’m looking for. They set up some monitoring to look for two things: people that have Microsoft Office applications, and files that are being accessed by more than one user on a “frequent” basis.

Examine the files, question the owners

Once I get the data back from my meetings and other investigations, I end up with a few files to examine. I say “files,” even though the application might be multi-user, like Visual basic or Microsoft Access. Those programs all end up using a file, and that’s what I look for.

As I mentioned earlier, in the example I’m using a combination of meetings and investigation turns up a single Microsoft Excel spreadsheet. The users tell me that they are using an extract from a larger system to get the names of vendors the organization uses and pays. They’ve added some columns that they use to create a report from for different departments.

To see the spreadsheet you can download it here.

They’ve turned on filtering, sorting and so on for the Excel spreadsheet users. Other users are taking the spreadsheet and basing a mail-merge on it.

So my challenge is to break down the elements and find out what they are doing with the data. I’ll do that with a set of business requirements statements. It’s tempting to look at the data they already have and start building – but that’s not proper design. You can certainly use it as a data-point, and as a validation tool later.

So I begin by asking a few standard design questions:

  • What are you trying to do?
  • What data is available already?
  • What data is still needed?
  • Who enters it?
  • Who can see it?
  • How do they see it?

Of course there are many more questions that go into a “real” design exercise, but I’ll keep it simple for this example, and show you what I’ve gotten back as an answer.

Starting with the questions and the spreadsheet in hand, I poll the manager and the team, and get the responses to my questions:

Q: What are you trying to do?

A: There are two groups of folks. One needs to be able to enter “ratings” for vendors that we use, and a rating system to find out if they are “on-time” or not. The other group needs to know which vendors we use to send them periodic notifications. Some vendors get e-mails, others phone calls, some we enter data on their web site and others get regular mail, depending on what we’re doing.

Q: What data is available already?

A: Our vendors and their addresses and phones are in another system. We export that and then update what we know about already. There’s lots of other data in that system, but we only care about the current vendors.

Q: What data is still needed?

A: We have to enter the web addresses, e-mails and the contact names.

We also track the date we contact them to make sure that they don’t get two notifications.

Sometimes we enter notes, and we also track when the vendor is “on-time,” using an X out of 10 rating. If the vendor is on-time and we think the prices are good, we mark them as “Preferred.”

If we don’t use the vendor any more, we just mark it as “Closed.”

Q: Who enters it?

A: We have three people who deal with vendors and enter that data. All of them do roughly the same type of job. The people that run reports set the “Last Contacted” date.

Q: Who can see it?

A: We have 5 people beyond the 3 that enter data that run reports. All of them just use the name, addresses and contact information.

Q: How do they see it?

A: Most of them are using Microsoft Word in a mail-merge. Even when they call or enter the data on a web form, they just pull it into Word and then do something with it.

In a “real” data modeling exercise, the business requirements would be built from the answers to these questions, but I want to be able to show a small example quickly. Standard business requirements normally state something like “The system shall track names and addresses for organizations termed as ‘vendor’ to our organization.” Each requirement is labeled out like that, and then everyone agrees to what the system will do. For now, I’ll stick with just those questions and answers — that’s good enough for this example.

Now that I’ve finish that portion of the process, I’ll make the next tutorial a diagram of the process I follow for turning that information into a data model. Again, I’ll use the spreadsheet, but only as a guide. Getting this Information Into a usable format takes more work — but a clear understanding of the business (or organizational) needs is the first and most important step.

As I mentioned at the top of this tutorial, I’ll “circle back” on the meetings with the users to ensure that the requirements fit their needs. The users will normally be willing to tell you that everything “looks OK,” so it’s important to explain the detail carefully, and to explain that the decisions taken here really matter. They’ll see this further as you begin to normalize the model — which is the next step, and the next tutorial.

InformIT Articles and Sample Chapters

I have a section elsewhere in this Reference Guide that details the use of the MAPS tool. It’s the first place you should start.

Books and eBooks

As I explain how to migrate the data from one source to another, you’ll most definitely need to know about SQL Server Integration Services, or SSIS. Microsoft SQL Server 2008 Integration Services Unleashed can help.

Online Resources

In some cases, you may decide to leave the data where it is and simply link to it. In that case, you might want to research your programming options for data here.