Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

Migrating Departmental Data Stores to SQL Server

Last updated Mar 28, 2003.

All organizations deal with, store and process data. Whether you are for-profit, non-profit, small or large, we store and process data.

To the Data Professional or Database Administrator, whenever we hear the word “data,” we think “database.” But in fact, most of the data in an organization isn’t in a database. By importance, or in some cases even in volume, data is all over the organization, in word-processing documents, spreadsheets, text files, pictures and of course e-mails and their attachments.

For the most part, Data Professionals don’t concern themselves with this data. We not only don’t control it, we’re not always sure where it is. But in some cases, we should.

Often the data stored all over the organization has a use beyond the person that created or saved it. For instance, many organizations use a lot of spreadsheets to track events, actions or things that are important to a particular department. These spreadsheets are stored on a user’s network share location, and the user maintains the data. At some point, another user wants access to that same data. The first user grants access to the other, either by copying the data or just placing the file on a share where others can get to it.

In another case, a larger database holds data for a program the entire organization uses. If it’s a “closed” system, the users might want to extract a report that only a single department needs. The users ask IT (or in some cases, they don’t) to extract a subset of data into a text file or spreadsheet. The department then uses that data to create reports not found in the larger system.

Perhaps the users are a bit more sophisticated than just spreadsheets. They use another data program such as Microsoft Access or perhaps even something open-source like MySQL with some sort of front-end.

Over time, lots of people start using that data — not just reporting on it, but adding their own data there. And that’s where the issue really starts.

Why Move the Data?

When data, like that in a spreadsheet or a smaller database product like Microsoft Access is accessed by multiple users, it’s normally not a problem for the Data Professional. The data is created, controlled and accessed by a small group of people, and the data isn’t affected or doesn’t affect anyone else.

But in some cases it does. I can’t tell you how many times I’ve “inherited” a data system or departmental application. This usually comes about from two vectors. The first event that brings the department’s application (and subsequently its data) to my attention is the loss of that application. Perhaps someone deletes that file, unintentionally or otherwise, or perhaps it gets corrupted in some way.

The second vector that brings the Data Professional into a departmental data store is when the data store in the application “hits a wall.” What I mean by that is either the design of the data or the application that stores and processes it just can’t handle the load or width of access. In some cases, they have the data, but they just can’t make it display or report like they wanted to, because the design evolved from something simple to something a bit more (or a lot more) complicated.

So you should consider moving that data store when two primary conditions are met:

  1. The application is “mission critical” if lost
  2. The data needs a formal level of security and access control

Notice that it doesn’t matter how many people need access to the data — which is often a requirement cited for movement to SQL Server. If the data is that important, and if it has security ramifications, then the Data Professional should follow proper protocol to protect the organization.

Consider that you may not have to migrate the data — only integrate it. In that case, you can query the data in data stores like text files, Excel and Access with the OPENQUERY statement as well as other methods. If you follow this route, you fall outside of what I am discussing in this serial of articles.

Following the Process

There are two sides of moving a departmental data store to SQL Server. The first is technical — and that’s what I’ll deal with in this series of tutorials. The second, possibly more difficult side of the process is political. The reason many departments created the application (and subsequently the data store) was because they didn’t want to wait on the IT department, or felt it wasn’t important enough for them to worry about.

And of course now you’re going to take that control away from them. Almost no one I know likes that, so you’ll need to work with management to impress on the group that you are there to help, and not to hurt. In fact, if you’re careful, you can actually help them keep their application, and explain that you’ll just handle keeping the data safe, protected and performing well. Once you’ve developed that trust, you can move forward.

Before you can start the process of bringing departmental data into SQL Server, however, you have to find it. That’s the first step.

Locate the Data

As I mentioned, many times the application data owners will come to you and tell you about the data issues they face. But you may want to take the lead and locate potential data sources first.

Locating data sources also gives you intelligence around how wide-spread these silos of data are. So how do you locate the data?

There are no foolproof methods of finding the data if the user really doesn’t want it found – but there are ways, both technically and socially, of locating the major applications, and their data.

Meetings

The first way to locate department data is simply to ask. I have had far more success with this method than you might think.

The process I follow is to schedule meetings for the department heads to explain my logic mentioned above. I take no more than 15-20 minutes to tell them what the criteria are for identifying departmental applications, and when it should be considered for migration.

After I brief the executives, I ask them to brief their own people. I ask them to emphasize that it isn’t about wresting control — it’s all about protecting the data. In fact, explain you’ll work hard to ensure that the system will look like it does now, as much as possible. I ask for follow-on meetings with the department when they aren’t sure about whether something should be in spreadsheets, small databases or in SQL Server.

If the department trusts me, they bring me in earlier to talk with them about their applications, and in some cases I’ve even managed to hold a few “lunch and learns” where I explain the basics of data design. Over time this makes it easier for everyone. I keep a channel of communication open, so I can intercept new projects and host them properly to begin with. The users end up with more reliable and better performing data, and if I do ever have to bring the data in, it’s in a much better format for me to deal with.

Tools

If the users or managers don’t want to communicate, you may have to resort to a little detective work. The basic process is pretty simple — you just interrogate file locations for certain patterns and check the software installed on the workstations for potential “targets.” You then monitor those targets to see who (or what) changes them.

There are, of course, security constraints to consider, so you will want to involve your system administrators in this endeavor. In fact, they may already have the information you need, so be sure and get with them first.

MAPS

The first tool in your arsenal is the Microsoft Assessment and Planning Solution Accelerator, or much more simply, MAPS.

The MAPS tool is a free download and install from Microsoft that can work across your domain to locate all kinds of data. You can use it to find SQL Server Instances, capacity limits and even consolidation advice, but where I think it is most useful for in this context is that it can locate various versions of Microsoft Office.

You can configure the tool to use a network range, a list of machine names and more, so you have a lot of control for the discovery. I have a pointer at the end of this article that will show you how to use this tool.

Something to keep in mind is that this tool only finds Microsoft Office products on the systems you interrogate. Just finding Office doesn’t indicate that the users have a data store you want to migrate — it just limits the targets.

Also keep in mind that the users might have installed something other than Microsoft Office. If they are using another program you’ll have to rely on whatever methods that vendor uses to discover their products.

PowerShell

If you don’t want (or can’t) use MAPS to locate the Microsoft Office installations, you can use PowerShell to ask Windows what is installed on a computer. Once again, you’ll need rights to do that and a more manual method of querying the system. This script uses the Windows Management Interface (WMI) to “ask” a system what software is installed:

gwmi win32_product | format-list -Property Name,Vendor,Version

With that list developed, you can now audit the list of three kinds of files that are often used as data stores: spreadsheets (like Excel), database files (Like Access or FileMaker Pro) and XML. The XML documents will get a lot of hits, so I tend not to focus on these unless the names stand out.

I normally only focus on shared locations. I especially suspect shares on a user’s workstation — that’s often an indication that something is shared out for a department. It’s easy enough to find a share on Windows — just type this at a command-prompt or in PowerShell:

NET SHARE

I also look for shares that have full department access. Once I find the shares, I detail out the files using this command:

DIR *.XLS /S

And of course I change the extensions. I look for the “last modified date” column, and if it’s current within a day, I check that again. Then I check that again in a week, and then each week for a month. If I find that file is being accessed a lot, I try to see who is doing that. More than one or two people? Time for a few questions. For a product like Access, FileMaker Pro or MySQL, I just assume that I need to have the discussion.

From there I follow the same process as the meeting approach. I just explain what I’ve seen and ask if the data store meets the criteria for the migration, and then offer to help. Most of the time this approach works pretty well.

PowerShell has other uses as well, like interrogating services that are running (to find things like MySQL) and to locate other potential targets. The key is that the general approach is to find files, software and services that could run the engine for the data store, and then watch the files to see if they are being accessed frequently and by other folks.

In the next few articles, I’ll explain what to do now that you’ve found the potential files for movement.

InformIT Articles and Sample Chapters

I have an entry elsewhere in this Reference Guide that details the use of the MAPS tool, Microsoft Assessment and Planning Solution Accelerator. It’s the first place you should start.

Books and eBooks

As I explain how to migrate the data from one source to another, you’ll most definitely need to know about SQL Server Integration Services, or SSIS. Microsoft SQL Server 2008 Integration Services Unleashed, by Kirk Haselden, can help.

Online Resources

In some cases, you may decide to leave the data where it is and simply link to it. In that case, you might want to research your programming options for data here.