Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Data Marts – Combining and Transforming Data

Last updated Mar 28, 2003.

I want to reiterate the purpose of Business Intelligence (BI) that I've been covering in this series on creating a Business Intelligence landscape. I'll hammer this home in most of the tutorials, since the definitions often vary throughout software vendors on the topic. My definition of Business Intelligence is a set of consolidated, aggregated, strategic data presented in an analytical format to upper management. One additional note – in all parts of the systems I'll describe in this series other than the source systems, users don't enter new data. I've seen the pros and cons of this process, and I stick with this tenant. That doesn't include, of course, meta-data involved with gathering the metrics or user selections for reports. What I'm talking about here is not allowing users to create new numbers within the system or alter the ones that are there. BI, by my definition, is an extended reporting system. Don't worry, we'll cover changes later.

Embedded within that BI definition are the core concepts of what we are learning to implement throughout this series. I've explained the sources where the data originates, and in the last tutorial I explained the Operational Data Store (ODS) which is next in line. At this point the system doesn't really fit my earlier definition. In a large enterprise there are far too many source systems for data to analyze using spreadsheets or in your head. Not only that, one site or plant might call "excess" one thing and another might figure that number a different way. The ODS will only contain data that is strategic to what its sources are, not for the entire enterprise. Although the reporting from an ODS might include some analytical elements, it is largely used for tactical, line-item reporting. And finally, the audience for an ODS isn't upper management, but management associated with its source data.

This isn't to say that you can't stop there and still have some level of Business Intelligence. Managers might pull data from the ODS or even the source systems to create their own analytical reports. The issues you run into when this happens is that since the managers aren't trained in data aggregation or data analytics, they can misinterpret the data. Not only that, upper management might not receive these reports, and so the interpretation is less strategic than it could be.

So we need another construct in the business to provide strategic, analytical data, but still have that data contain a bit more detail than is needed by the entire organization. For instance, does the CEO need to know which box a particular part was shipped in, or does the lead business owner need to know which chemical a particular scientist ordered on a certain day? Most often they don't. But if you omit this data, the region, area or district management might not get the level of detail they want. It seems that two separate systems are required for this need.

The first of these systems is the Data Mart. Again, we need to set a definition for a Data Mart since multiple vendors implement this reporting level differently. A Data Mart is a system that collects data from the ODS at a regional or functional level for strategic analysis.

The next level above that, which I'll cover in the next tutorial, is the Data Warehouse or the Enterprise Data Warehouse. While the Data Mart and the Data Warehouse are similar, the primary factors for determining when to split the data onto these systems is to answer the questions the business asked in the requirements gathering phase, and determine who the audience for the reports is. You'll find that different levels of detail are required for each of these systems.

So if it's just a matter of storing a little more detail, why not just use a single system? You certainly can do that, but the primary determination is the size of your organization. If you have a large organization, for performance reasons it's best to have the data between the two systems separated, and use two teams to manage and control them. Also, the farther out you get from the source the less detail you need to store and the less frequently you need to update the data.

Think about it this way: How often do you really make a strategic decision? Most decisions in our lives as well as our business are tactical. What to eat and what to wear on a daily basis is tactics. Whether you are a vegetarian and whether you purchase a formal or informal wardrobe is strategic. You don't buy a new wardrobe every day or decide at each meal whether you are a vegetarian or not.

The same holds true in business. How many parts to order or what supplier to use are tactical decisions, but decisions about what the firm produces as services or goods are strategic. You don't make the strategic decisions on data involving the details from today.

How much detail and what data refresh interval are other deciding factors on whether you're designing a Data Mart for a region or a Data Warehouse comprising them all.

We'll assume for this tutorial that you need a regional or functional break for the BI landscape.

Combining Data

The first part of creating the Data Mart is figuring out how to combine the data. You'll need two people for this exercise: the data architect and the business analyst. You're the data architect – you know database technology, the limits of the hardware, and how you can string together all the protocols, hard drives, software and the like to store and manage the data. The Business Analyst is there to help you find out, from the business, what the data means.

One of the greatest pitfalls I've seen in BI implementations is that technical people try to answer business questions with technology. You can't do that. The reason is that you don't know every part of your company's business if it is of any appreciable size. Your discipline is technology, not business. For you to combine data, you have to know what it means. Let's go back to that earlier example I mentioned regarding "excess". Excess is normally defined as the materials obtained for a process but not used. For instance, I order fifty pounds of rubber to create a tire, but I only use thirty pounds of it. Most business processes define the leftover 20 pounds as excess, if I don't have it destined for another tire.

But some businesses define excess as only those materials ordered, received, and paid for as excess. Still others include that the material is paid for by the company. Some manufacturing companies will contract other firms to create parts of the finished product. To save money, the larger firm will also pay for the parts the smaller firm uses. In this case, either firm might expense the excess, all depending on various choices.

You can see that the data architect won't have this kind of information. That's where you need to bring in the Business Analyst, who interviews the business to derive the process, and tell you what the data means.

But it goes farther than that. You should have a "Business Owner" for every data element you will store. Think of yourself as the bank. Your bank stores your money, moves it around when you tell them to, and tracks it for you. They will tell you how much you have in dollars, pesos and euros. What they won't do is tell you what that means. If you call the bank and ask "How much money do I have?" they'll answer you. But if you then ask them "Is that good?" they won't answer that. You should be the same way. Once your organization tells you what to store, you should understand everything there is to know about how that data got there, but you should refrain from making a call on what it means. Trust me on this one.

So how do you combine the data? I'll explain how to design data elements in the next tutorial, but the short answer is to begin with a series of business definitions. Create a glossary starting at the source system and work your way up to the top, all the time referring back to the requirements you gathered in the first phase. Along the way you'll notice that the users will begin to say words like "I want to know all of the ..." and "We need the complete view of..." or even "Everywhere I have X recorded..." The words All, Complete, and Everywhere are the keys to letting you know that you need to combine something.

Don't combine anything without an explanation from the Business Owner you appointed earlier. Record that information and the Business Owner's name and the date for your meta-data later.

Along the way your glossary will run into an issue where a data element (such as excess) will have the same name in two or more places but refer to different concepts. When that happens, go back to the Business Owners and ask them to select another name for the element that steers the furthest away from the accepted business term. Now you can combine at the proper level. The simple way of putting this is to call apples, apples; oranges, oranges; and the combination of them fruit. Expect a little friction here.

If you can't come to a meeting of the minds in this area, make separate table structures to store the data from various ODS systems, and use views to display them as one. At this point you aren't concerned with the physical design of the Data Mart – we'll come to that soon enough – but instead you are trying to determine what you are going to store and how it will be used. This is by far the bigger battle. Focus here on getting all of the same types of data together in one place.

The difference between combining data and transforming data is that in a transform you change the data. Combining doesn't change data; it just puts elements from disparate systems together.

Transforming Data

I've already explained the Extract, Transform and Load process a little in a previous tutorial, and I'll explain the mechanics of how you are going to bring the data in and combine and transform it in another tutorial. For now let's focus on the concepts.

Transforming the data is a similar process to combining it. Once again you'll need the Business Analyst to get the Business Owners to define the proper transformation. You'll find this quite frequently in currency, since it can be recorded in multiple ways.

It's at this point that you begin to change data from the source. If you are recording sales, for instance, then you'll have to come up with a new number in either pounds, euros, yen or dollars that wasn't there before. Make sure that you get that Business Owner's name and explanation before you start. Record this information in your glossary and meta-data. Every transform should be documented.

Note that a transform changes the data. You had dollars, you now have euros. You have one part number in system A, and when transformed to show how many parts there are called "X" you changed the number to be in line with system B. Some software vendors confuse combining data (where no changes are made to the data) with transforming the data (where changes are made). If you don't have the original value anymore, you have a transformation.

For numerical data, this is a straightforward mechanical process. Pick a conversion rate, and multiply it. Pick a markup, and add it. For part numbers or other mixed data, there are two basic methods to perform the transform.

The first is using mapping values. In this method, you leave the source data pure in one table, and then use joins to a tertiary entity to map the values. That allows a great deal of flexibility when you need to remap, but since we're talking about strategic reports, that's often a bad idea. It's confusing to see the headings on a report change month after month.

The other method is to pick a constant transform and apply a rule, transforming the data on the way in. This is a cleaner method, but isn't as flexible if the meaning changes later. In either case you need to document the method for each element in the glossary and the meta-data.

Earlier I mentioned that you may require a Data Warehouse in addition to a set of Data Marts. The physical storage concepts for these systems are similar, so in the next tutorial I'll explain how to design the data elements for the ODS, Data Marts and Data Warehouse.

After you learn about the data elements, I'll explain aggregation and the Star Schema you'll use for the data in the Data Warehouse tutorial.

Informit Articles and Sample Chapters

Luis Garcia did an article some time ago on Understanding Microsoft SQL Server 2000 Analysis Services, and in it he covers some Data Mart information.

Online Resources

Although there's absolutely nothing regarding Data Mart design on this site, it's a great example of a Data Mart in use.