Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

The Operational Data Store (ODS)

Last updated Mar 28, 2003.

In this series on creating a Business Intelligence landscape I've explained various concepts for routing your data from its source to a full analysis system for your organization. Business Intelligence, I believe, is at its core a set of consolidated, aggregated, strategic data presented in an analytical format to upper management. Notice the words I chose to emphasize here.

I've also explained various layouts that you might use to meet that goal. Remember, you start off with the questions that need to be answered to get your analysis requirements.

Your analysis needs might be very simple, and so all you really need is a near-real-time storage location for reporting data – something called and Operational Data Store, or ODS. After you create this near-storage, you might add the next set of layers in the chain, such as a Data Mart and then on to a Data Warehouse and so on. But in all of the architectures I've presented, I've included the ODS.

You should be aware that there are some people who don't hold this position. The belief here is that the reporting needs can be satisfied by the next few layers without involving the overhead of an ODS. You certainly can implement a BI solution this way, and there are many books and web sites you can visit that will explain how to do that.

I still believe in the ODS, however, and I'll explain why. When you boil it all down, Business Intelligence is really just an extreme form of reporting. As you'll see when I cover the OLAP sections in the future there are methods and techniques of storing and manipulating data that are not available without dimensional analysis, but when you're done it still ends up in some kind of reporting tool. Because of that, I tend to view BI as a reporting function across all systems, even down to the source data. In some paradigms, the source data is a necessary evil, but certainly not part of the landscape.

I depart from that view because there are types of reports that show very detailed, real-time operations and still provide a certain level of strategic relevance. How many items I'm turning out in a manufacturing line, how many customers are at the register, or how many isotopes have passed through the measuring device all have strategic as well as tactical implications.

So certain data must be reported on real-time. There's simply no getting away from that. However, because viewing data blocks or at least slows down writing of data, there's a need to move the reporting infrastructure to another set of databases or systems. These systems are often called "Reporting Servers" by the applications that use them. The ODS is this kind of data, but it's a bit more than that.

An ODS might also contain data from another system, or even have new data created in it. You'll read later that I don't advocate users changing data within your BI landscape, other than any meta-data they might need such as last report run, favorites, or things like that. The data that might be created in an ODS isn't (or shouldn't be) created by user activity.

Recall that BI systems created consolidated, aggregated, strategic data. The first line in that process is the ODS. The ODS answers questions regarding operations. I've seen ODS's that are no such thing – they contain reporting tables directly mimicking the source system. What you're after here is a data structure that will support analysis by a local manager at a site, plant, or operation.

The criteria for creating the ODS structure is twofold. Start with the question of "how much detail should I gather in regards to what is downstream" and "what strategic questions should this data answer?"

The first question is one that you'll answer based on the requirements that you gathered in the steps I've explained earlier in this series. From the questions the business wants answered, you have a good idea of what you need to cull from the source systems. The second question as well as the first should be answered by another requirements gathering session with the local consumers of the results.

I haven't spoken a lot about this yet, but now that you're planning to gather data, it's important to understand who the consumer of the BI data will be. These systems aren't like the OLTP systems your company uses. In those systems almost everyone in the organization has a login, at least on the larger applications. The BI system is for those people whose job it is to perform analysis. When you think about that, you've probably got a very short list of people in mind. That audience is what you're playing to in this building exercise.

Question the users and determine what they need to see, locate the data, and decide on how it should be presented, as I've explained before. The rest of this tutorial will explain how to build an ODS and move the data in.

I've already explained the Extract, Transform and Load process, and I'll come back to it in some detail again. What I'll focus on here is what happens when you start the ETL process.

Based on the data you need, determine how it should be stored. There are two basic formats you'll use for this. One is the relational model that forms the basis for any OLTP application. That means following the rules of normalization that I've explained in other tutorials. This type of data stores keys within the tables pointing to other tables, which might also have keys and so forth. This design is very useful for data that changes often and ad-hoc queries. It still has a place for some analysis.

However, in most cases the BI structure isn't like that at all. Data in this system is historical – it shows what happens in the past, and includes the changes as well. In an OLTP system if an order changes from 1 item to 2 items, you only record the last set of data. If someone's name changes from Smith to Jones, you only record Jones. In a purely relational, normalized data set you have several skinny, small tables. In addition, normalized OLTP storage you have few indexes, since they cost more to keep up to date during frequent writes. Storage is also a concern, since the larger the table the longer the maintenance.

But in an analytical system, it's vital that you see changes over time. You need to know that the order had 1 item and then was changed the next day to 2 items, so you keep both transactions. That's the strategic view. In the case of analytical reporting you don't want to pay the penalty for all those joins that a normalized model brings. Indexes are far more plentiful in an analytical system, since you're interested in the speed of reads, and the only updates are normally inserts.

So in the case of an ODS, while you may have normalized tables, you may also de-normalize the structure, storing the same data elements over and over on the same line as the detail. If you think in terms of what a SELECT statement would produce, then you have the idea. Storing the data this way makes the tables much wider and deeper, and there are fewer of them.

And that's the key. Find out what questions you need answered, and then find out the best storage method to answer that question. You'll probably find that your ODS database will have a little of both.

Of course, keeping all the history might make the storage requirements far too high in an ODS system. However, it doesn't have to be all the data, forever. Normally ODS systems contain only a few or even one day's worth of data. You might think of it as a staging area for less-detailed data than you store in the source system. You will keep longer periods of data higher in the chain, but they will be rolled up into aggregated data sets making them smaller.

That brings up the final point on the ODS storage. Remember that the data we're storing in a BI system is consolidated and aggregated. At the ODS level, you're doing some consolidation (perhaps from a couple of source systems) and a little aggregation. The aggregation level will depend on the level of detail you need to store, both for the local reporting requirements and what you need to support the downstream systems.

In the next tutorial we'll talk about the next step in line, the Data Mart. Understanding what you store there will help you understand more about what you need to store here in the ODS.

InformIT Articles and Sample Chapters

Lou Agosta has a great text in the bookstore called The Essential Guide to Data Warehousing.

Online Resources

DMReview has a portal on ODS systems here.