Table of Contents
- Microsoft SQL Server Defined
- Microsoft SQL Server Features
- Microsoft SQL Server Administration
- Microsoft SQL Server Programming
- Performance Tuning
- Practical Applications
- Professional Development
- Application Architecture Assessments
- BI Explained
- Developing a Data Dictionary
- BI Security
- Gathering BI Requirements
- Source System Extracts and Transforms
- ETL Mechanisms
- Business Intelligence Landscapes
- Business Intelligence Layouts and the Build or Buy Decision
- A Single Version of the Truth
- The Operational Data Store (ODS)
- Data Marts – Combining and Transforming Data
- Designing Data Elements
- The Enterprise Data Warehouse — Aggregations and the Star Schema
- On-Line Analytical Processing (OLAP)
- Data Mining
- Key Performance Indicators
- BI Presentation - Client Tools
- BI Presentation - Portals
- Implementing ETL - Introduction to SQL Server 2005 Integration Services
- Building a Business Intelligence Solution, Part 1
- Building a Business Intelligence Solution, Part 2
- Building a Business Intelligence Solution, Part 3
- Tips and Troubleshooting
- Additional Resources
Designing Data Elements
Last updated Mar 28, 2003.
It might seem a bit strange to begin talking about data elements this far along in my series on creating a Business Intelligence landscape. I've already covered the concepts of the sources where the data originates to the Operational Data Store (ODS), using the already explained the Extract, Transform and Load process, and also how to think about your Data Marts. All of those systems contain tables with multiple columns. Aren't they full of data elements already?
The answer is that these storage areas do have data in them, and you should have created their structures using the design information from the requirements gathering phase. But the data elements I'm talking about now are those that you will use in a Data Warehouse, or the collection of Data Warehouses that comprise an Enterprise Data Warehouse. The processes you use to design these elements are different than those used in a transactional reporting system.
It may be helpful to describe exactly what the Data Warehouse should store. There are two schools of thought on the purpose for the Data Warehouse. The first is that it should be exactly what it sounds like – a single place to house all of the data a company has.
The issue with this view is what to do with the system when you're done. In the last tutorial I explained that you will need to combine data from various systems into the Data Warehouse to be relevant. In a system that tracks clothing sales, your data might have the value "Jacket, Men's" in one source and "Men's Coats" in another. Often it's not a simple matter of making the values equal, because in certain cases they aren't. The end result of storing both descriptions is that your users would have to manually tally each set to come up with a grand total, assuming they knew the two were the same. That's just not possible, even though you've met the goal of storing all the company's data in one place.
So in the first system, you have to begin the arduous task of mapping or transforming the detailed data to come up with a single set of data that is relevant to the users. Even then, storing so much detail makes the system very large and slow to process queries. Not only that, this is a redundant use of data. You already have all that detail out in the Data Marts and the Operational Data Store. If you need a detailed report, you can always get it from there.
The second definition of a Data Warehouse is exemplified in the goal for a Business Intelligence landscape that I've been repeating in the previous tutorials. What we're trying to create is a set of consolidated, aggregated, strategic data presented in an analytical format to upper management. The Data Warehouse part of that system provides the aggregated, strategic and analytical parts of the definition. The sources, ODS and Data Marts are how we get the data staged in a meaningful way.
So that is how we'll proceed with the data element design process for the Data Warehouse. The outline for the process is to find out what aggregated strategic data we need, and the best way to format it. From our definition, we also have to think like upper-level management. What do they really want to see? Do they need to know how many paperclips someone in a regional office bought, or whether that office is adhering to the business supplies budget? Do they care about what supplier we used for parts, or that we are getting a 10 per cent reduction in costs for the same quality? And how are we measuring quality?
What this means is that we are really after only two types of data: Things that we want to measure, and the measurements. I'll explain the star schema that uses these concepts in the terms dimensions (for the things we want to measure) and facts (for the numbers showing the measurements) in the next tutorial, but for now, we need to concentrate on getting this data from our Data Marts.
So to begin our design we first ask the managers the question, what do we want to measure? I've given you a couple of examples already, and you can probably think of more for your own organization. For a pharmaceutical firm you might find dimensions such as chemicals, trials, marketing, efficacy, lifespan, sales and so forth. For a marketing company dimensions might include region, time, advertising method and the like. In all cases you care about time, since strategic analysis is rarely useful without it.
For the data involving the facts, examples include the number of items sold, moved, changed or produced. Other numbers might be customers contacted, amount donated, total fees charged and so forth. Any numerical aggregate that applies to the dimensions I mentioned a moment ago are applicable to the facts data.
Now that you know the type of data you're after, how do you find out what it is at your organization? You can use the same process as you did when you perform any requirements analysis, with a slight twist. In most requirements gathering exercise, the form is quite rigid. You ask the users what kind of reports they want, how they want them formatted and so on. You then decompose those requirements into tables and views. This is the process you followed for the ODS and perhaps even the Data Mart.
In the Data Warehouse, however, you're going to provide a tool to upper management that is more freeform in nature. You may still produce reports, but the main power of this type of system is in the ability to ask questions and flip them around, on the fly. Interview your upper-level management and ask them what they want to know. What are the reports that they ask their staff to produce every month? Once they get those reports, what do they do with them? You'll find that most of the time they are culling through the reports trying to ferret out one or two answers, so that they can base their strategy on sound numbers.
Rather than manually collating this data, wouldn't it be better for managers to ask the question they are really after, and be able to look at the number right away? That's what you need to design as the data element.
Once you have those dimensions and facts, you'll need to determine where they are buried in all the data sources you have. If you designed your ODS and Data Marts properly, you should be able to get the measurements from there, after summing and grouping the data.
But what if the elements aren't easily discerned from the data you already have? What if the manager is asking for something that you don't store directly? In the last tutorial, I mentioned that when you begin to combine and transform data, you are assigning meaning. That meaning should be defined by the business, not the technical staff. This is the crux of the design effort. You'll need to involve managers from the level just below the top and then work your way down to the line managers. Along the way, each will give you a more detailed description of the measures the top managers are looking for. From there, you can build your model.
While this sounds simple, there are a few problems you'll encounter along the way. The first is that aggregation and combination problem we discussed earlier. In reality, the primary issue there is one of definition, and you can overcome that by asking the managers to set a business owner for the element you're designing. Only that manager can state what the element means, which will give you the description of how to aggregate it.
The second problem comes when top management isn't aware that regions, locations or plants are doing things differently than other places. As the "data detective" you'll uncover all manner of unseemliness, and the only thing you can do in that case is have your Business Analyst diagram the business processes used at each location to vet the information from a business perspective. Let the managers sort that out; that's what they get paid for. Your job is to report the data, set up the elements in a cohesive fashion. Present the business process analysis to the managers, and explain the difficulty in combining disparate processes into a single meaning. Ask them how to proceed, get a Business Owner to sign off, and design the element with that definition.
In the next tutorial I'll show you a little more about dimensions and facts, and how you can design your Data Warehouse.
InformIT Articles and Sample Chapters
Jill Dyché has a great book in the bookstore called e-Data: Turning Data Into Information With Data Warehousing. It has a very clear description of Business Intelligence and how various firms can apply the data they've collected.
Microsoft has a great resource for Data Warehousing over at MSDN. You can read more here.