Table of Contents
- Microsoft SQL Server Defined
- Microsoft SQL Server Features
- Microsoft SQL Server Administration
- Microsoft SQL Server Programming
- Performance Tuning
- Practical Applications
- Professional Development
- Application Architecture Assessments
- BI Explained
- Developing a Data Dictionary
- BI Security
- Gathering BI Requirements
- Source System Extracts and Transforms
- ETL Mechanisms
- Business Intelligence Landscapes
- Business Intelligence Layouts and the Build or Buy Decision
- A Single Version of the Truth
- The Operational Data Store (ODS)
- Data Marts – Combining and Transforming Data
- Designing Data Elements
- The Enterprise Data Warehouse — Aggregations and the Star Schema
- On-Line Analytical Processing (OLAP)
- Data Mining
- Key Performance Indicators
- BI Presentation - Client Tools
- BI Presentation - Portals
- Implementing ETL - Introduction to SQL Server 2005 Integration Services
- Building a Business Intelligence Solution, Part 1
- Building a Business Intelligence Solution, Part 2
- Building a Business Intelligence Solution, Part 3
- Tips and Troubleshooting
- Additional Resources
Implementing ETL - Introduction to SQL Server 2005 Integration Services
Last updated Mar 28, 2003.
I've been covering Business Intelligence using general topics in a series of tutorials. Within that series I covered the Extract, Transform and Load concepts that you need to keep in mind as you transfer the data from the Source Systems to the first part of the Analytical System. In this tutorial I'll put those concepts to work, using SQL Server 2005's Integration Services (SSIS). This software is available only for SQL Server 2005; in SQL Server 2000, you can use Data Transformation Services to do much the same thing. I'll cover that tool in another tutorial, in case you're using that version of the software.
SSIS deserves to be treated as a complete software package of its own. It has its own set of design tools, and can be used not only for the Business Intelligence process but any data movement or cleansing needs you have. In this tutorial I'll explain the basics behind the package and the primary tools you use to design and manage, and in future tutorials I'll explain much more about how you can use SSIS to perform all of the Extract, Transform and Load procedures you need to stock your Business Intelligence landscape with the data it needs.
Microsoft SQL Server Integration Services Concepts
Let's start with a few concepts. Most other Extract, Transform and Load (ETL) tools work by transferring data from the Source Systems to a set of staging tables. These tables are "wide open", in that they don't enforce any kind of integrity of the Source Data. They are more interested in moving the data from the Source System quickly and efficiently. This is the "Extract" part of the process.
Once the data arrives in the staging tables the system applies various transforms to the data. In subsequent steps the data is once again transformed, aggregated or split into smaller parts based on what the analysis needs are, moving it into a final format, at least for that step. This is the "Transform" part of the process.
Finally the data is moved from the cleansed tables into the final location in that part of the landscape. That's the "Load" part of the process. Because of this arrangement, other tools usually focus their interface in these three areas. You'll see a data path for the Extract process, some code or widgets to do the Transform, and the data path continues through to the end result, which is the "Load" part of the process. It's usually diagramed something like this:
Microsoft takes a different approach. Although SSIS allows for the use of a set of staging tables, you no longer have to use them. Microsoft has the idea of a "pipeline", which is a view of the data moving from the source files to the destination in one movement. It involves setting up one connection flow with multiple events in the line from start to finish. Looked at this way, the process comes out something like this:
Once again, you can still use a Staging Table arrangement; you just no longer need to. That brings up the two parts of the conceptual framework for SSIS: The Control Flow and the Data Flow. Both of these make up a Package, and multiple Packages make up a Project. Multiple Projects make up Solutions. You'll see all that come together when we explore the Business Intelligence Development Studio in a moment.
The Control Flow part of SSIS involves any steps you want to take prior or subsequent to connecting to the data, and then any Data Flow elements, which I'll describe next.
You can think of Control Flow objects as all of those things you want to do in addition to moving the data. Perhaps you need to download a file from an FTP location, send a trigger script, notify an operator or start a logging file. All of those items are placed in an outer container, which also includes one or more Data Flow objects. You bracket the data movement with tasks that you need to perform prior to and after the data movement.
You may have only one object inside your Control Flow container, a Data Flow object. If you're doing a simple data transfer from one database to another or from the database out, that might be all that you have. In more complex situations like your Business Information strategy you'll often see multiple
The Data Flow container has objects to connect to various data sources and destinations. Using those connections you add the sources and destinations, and define the inputs and outputs for each. You connect the output of the source to the input of the destination objects, making a basic data path.
You can also make the path more complex by adding various transformation objects in between the sources and destinations. These transformations can combine and aggregate, or split apart and separate data. They can perform numerical calculations; create new (derived) fields, and much more. You'll spend the bulk of your time working on these transformations.
You're not limited to working with a single source and destination. You can route the outputs of several sources to the inputs of a single destination, or a single source can feed multiple destinations. You can route the outputs of one source into the inputs of a data object that in turn outputs to another, each having a different transformation.
In times past, we used to draw all of these movements out on paper or using some electronic tool, which we then coded into the system using data statements or specialized tools. SQL Server Integration Services includes a special environment, called the Business Intelligence Development Studio that contains all of the elements and tools you need to design and implement the solution. Everything is a graphical component, but you can always get to code if you need to do more granular work.
The BI Development Studio isn't the only way that you can work with SSIS. SQL Server Integration Services is also completely programmable. The object model is exposed so that you can create and control the entire system with code.
In the next tutorial, we'll open up the Business Intelligence Development Studio and explore the interface. After that we'll spend some time creating various scenarios to load the data into the ODS.
Informit Articles and Sample Chapters
Although you might get data directly from your Source Systems, in many cases the data may come through a Business to Business (B2B) interface like Microsoft's Biztalk. Staying in the "family" allows direct feeds from Biztalk to Integration Services. You can read more in the book BizTalk Unleashed by Susie Adams et al, which you can find here.
There are a lot of resources that you can search on the web regarding Integration Services. The place you should start is on Microsoft's site. There are actually a few entry points into the product since it does so many things, but this is probably the best resource to begin with.