Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Business Intelligence Landscapes

Last updated Mar 28, 2003.

I've been explaining some of the base concepts for setting up a Business Intelligence (BI) landscape, and as part of the original tutorials I explained what BI is. I've also explained some of the preliminary work and planning you'll need to do before you begin your implementation, and in the last tutorial I talked about extracting data from the source systems where your operational data lives and the process to begin bringing that over to your BI system. But just where does the data go after you extract it?

If you've read any of the industry whitepapers on Business Intelligence, you might be a little confused as to how your system should be arranged. That's because most whitepapers are based on a particular product, and not the solution matrix I showed you how to create in the early part of this series. What I'll do in this tutorial is explain what the different parts of the system are, and in the next I'll show you some common arrangements for those components. Once you have detailed your organization's needs and figured out which piece fits where, you can evaluate the various products on the market to create the best landscape for you.

In this tutorial I'll just briefly mention the various parts – I'll cover each of them in depth in future articles. We'll keep this high-level so that you can evaluate the overall system before I explain the details. I'll present this overview in a "back to front" method, starting with the source system and moving to the highest storage component – the Enterprise Data Warehouse.

Source Systems

The source systems are your original "sources of record" for your organization's data. In some industries this has legal ramifications, since many automated systems replace a paper trail for audit purposes. You should resist the effort to make your BI system fit this purpose, as it increases the backup and recovery needs significantly.

Source systems may include spreadsheets and other semi-structured data. If this is the case, the ETL process needs to take that into account. In any case, the source system is where the data is generated, and has one other key factor: this is the only place the data should be edited by the users. If users need to edit consolidated data from the BI system, feed it back into a source system from the BI servers.

Operational Data Store

The operational data store (ODS) is the place where the source system data is available for "line-item" reporting. This is the system that contains the lowest level of detail, and normally this system stores the least amount of historical data – usually anywhere from a day to a month's worth. The ODS normally contains data from only a few source systems, and it's often in a one-for-one table arrangement. An ODS system helps offload the reporting burden from a transactional system.

If you are using an ODS (some applications of BI don't need them), you should architect staging tables for the ETL process. These tables should duplicate the ETL source, should be a little wider in each column, and have no restrictions, constraints or triggers. This is to facilitate "catching" the data as easily as possible. Make sure each column type is varchar or nvarchar to avoid type conflicts from the ETL source. Deal with those in the next step.

After the data is imported into the staging tables, you can then run a process that transforms the data into the reporting tables. The transforms clean up any type issues such as dates, currency translations or number formats and then moves the cleansed data into reporting tables.

Data Mart

If you're using an ODS, you'll pull the data from that into the next system, called a data mart. If you're not using an ODS, you'll get the data from the ETL process, and the same staging and transformation process is used in the data mart instead.

The data mart is a set of data from one or more operational systems, usually limited to a region, department or other logical partition of use. Data marts are used when you have a global or geographically separated system so that the primary users of the data don't have to transfer large volumes of data across the WAN. They provide consolidated, aggregated, historical data.

The information in a data mart is only consolidated at the regional or other partition level. For instance, if your data mart stores data about book sales, the Asian region is only going to care about the combined numbers from Asia, and how their store relates to that number. They are less interested in the European store sales. You might also partition the data in a data mart by department or organization function.

Data Warehouse

Data warehouses are rolled up from the data marts. The primary difference here is that several regions or functions are combined and even less detail is stored.

A data warehouse is also where the transformations and combinations become a bit more difficult to manage. In this situation you may have to consider multiple currencies, languages, and the most difficult difference of all – meaning.

To illustrate this difficulty, consider a global manufacturing company. A simple concept such as "excess inventory" becomes a thorny issue. What one plant considers excess might be the raw materials purchased but not used. Another plant might consider the same material excess only if the item is purchased and paid for but not used. Still another might consider anything not shipped as excess due to tax laws in that country. So when you're asked to create a report in the BI system showing the "excess" position of the firm, what do you do?

Only the business can tell you how to proceed. As I've mentioned in the last few articles, IT cannot and should not try to assign a business meaning to the data. This is highly exacerbated in a scientific or medical environment, as meanings there might be of a life-or-death matter.

Enterprise Data Warehouse

The Enterprise Data Warehouse is really an extension of the data warehouse concept. I break it out because the issues I mentioned in the data warehouse are really intensified in an enterprise, business or otherwise. Building an enterprise data warehouse is a task that must involve a team of professionals, and cross-matrixed to the business. Without deep experience and full organizational buy in, the effort is doomed to a very expensive and traumatic failure.

Your Business Intelligence landscape is made up of some or all of these parts. If you build the system in a hierarchical fashion your system will be able to grow and scale, and you can take advantage of the software advances as you implement the next level. The key to this is to keep the source data as native as possible, for as long as possible. Don't worry about storing data redundantly. A data warehouse is a historical copy of data, so the redundancy is acceptable.

With all of this information in mind, you can now focus on the products you want to implement for the solution. You have two basic choices: build or buy. In the next tutorial I'll explain some common layouts for a landscape and the advantages and disadvantages of each.

Informit Articles and Sample Chapters

If you're getting serious about a Business Intelligence system, check out some real-world experiences in Impossible Data Warehouse Situations: Solutions from the Expertsby Sid Adelman, et. al.

Online Resources

InfoGoal has a good rundown on Data Marts and more.