Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

On-Line Analytical Processing (OLAP)

Last updated Mar 28, 2003.

In the last article in this series on Business Intelligence, I described the star schema, which lays out data in a different way than the relational database I've explained in other tutorials. The star schema is linked closely with a form of reporting called On Line Analytical Processing, or OLAP.

The primary need for a new storage paradigm involves speed. When you look at data from a strategic perspective, wading through a huge amount of detail is too slow. On the other hand, analytics are normally ad hoc, or on-the-fly. As the system's designer, you are never sure of all of the questions the users are attempting to ask of the BI landscape. That's OK, in fact, it's expected.

That's the goal of OLAP – to provide a system that can quickly display data in more than one dimension. I used that word to describe one of the star schema elements, and we need to understand its practical use before we move on.

Picture a typical spreadsheet in your mind involving store sales. Across the top you can imagine columns headings of store locations. Along the side you might imagine product categories. Both of these are dimensions of the questions you might ask. The numbers in the cell where the columns and rows intersect are the facts.

Using this spreadsheet, you could answer questions like "which store sold the most items?" and "Which categories sold the best at a given store?" Using a simple sort, you can rank the stores or categories. But what if you want to see not only the stores and the categories, but the months when the products sold? That's pretty difficult to model in a two-dimensional spreadsheet, especially in your mind! It can be easily done inside a query on a two-dimensional table, however.

Now add in the dimension of salespeople. Your users would like to know the categories that sold the most in a certain store by salesperson. We're definitely out of spreadsheet territory now, but still within the capabilities of good T-SQL kung-fu. To make things more difficult, change the order of what you would like to see, such as sales by person by category at all stores. If the database is large, the queries at this point begin to slow down. Not only that, when the user changes their mind about what they want to see they either need to learn T-SQL or come get you. This gets much worse, the more dimensions and facts you create.

That's where OLAP comes in. Given a proper star schema containing the dimensions the users care about and the facts that show the numbers, the OLAP system provides pre-defined result sets (called cubes) that quickly show the data in multiple dimensions. These systems come with either a front end for viewing the results, or a set of interfaces that allow front ends to interact with the system. Microsoft's version, called Analysis Services in SQL Server 2000, also include a query language called MDX that you can use to get results from the server.

The main elements of OLAP are the star schema and the processing engine, as well as any presentation tools that it includes. The star schema also contains a few more elements. Within a dimension, you may want to include subdivisions. In our example, we might want to see not only product categories but also the brands we have on sale. For the time dimension the users may wish to see years, quarters, months, days and so forth. Most OLAP systems provide for levels, which show the category first, and then a breakdown of the lower levels within it.

Another change to the star schema involves extensions. A normal star schema is made up of a set of dimension tables surrounding a fact table. The facts are linked to dimensions by use of a key, just like in a regular relational table. In an extended star or snowflake schema, the facts are linked to a middle set of tables before they find their ultimate dimensions. There are a lot of uses for this, but a simple example will help you understand what these are used for.

Using our store example again, it's easy to imagine the dimensions of time, salespeople, categories, stores and so forth surrounding a fact table of sales figures. But what if the categories were different for different regions? What if the categories were even in different languages? One solution might be to build multiple schemas with the same facts. But an easier method is to borrow a concept from relational databases. Whenever you have many things that might relate to many other things in a relational database, you create a third table (called a relational entity) to bring them together. So that's just what we do in the multi-dimensional environment. If a category is called A in one place and B in another, we create a dimension table that contains the A and another that contains B. The facts, however, don't need to change. We just need a key to a "middle" table from the fact table to the two dimension tables. Here's how that looks:

Dimension One

 

KeyOne

ValueOne

35

To-May-Toe

Dimension Two

 

KeyTwo

ValueTwo

42

To-Mah-Toe

Middle Dimension

 

 

MiddleKey

KeyOne

KeyTwo

1

35

42

Facts

 

MiddleKey

AmountSold

1

100

You can now find the Amount Sold whether you call it "to-may-toe" or "to-mah-toe", by connecting the keys from the facts, to the middle dimension, to the dimensions.

The general process within an OLAP system is to present a star (or extended star) schema of data, which is read by the processing engine and stored in various methods on the drive. Users query the data using a reporting tool or query language-driven application.

In newer implementations of OLAP systems, you can "drill through" the aggregations the cube by clicking an aggregate cell. The OLAP system opens the source location and displays the details associated with the aggregate.

The OLAP system is one part of your Business Intelligence landscape. If you combine OLAP engines with de-normalized reporting systems, you provide a complete analytical system for your executives. The key is a coherent presentation strategy, which we'll examine shortly.

We're nearing the end of the conceptual tutorials. In the next few we'll round out our discussions on Business Intelligence and then begin some practical exercises using mostly Microsoft tools.

Informit Articles and Sample Chapters

If you want a practical exercise now, there's a great set of articles called Integrating Microsoft SQL Server 2000 OLAP and Microsoft Office, Part 1: Creating an Excel PivotTable Report with an OLAP Cube by William E. Pearson, III. You can find that here.

Online Resources

You'll find no surplus of humility in the OLAP world. Even so, this site by Nigel Pendse does a fairly decent job of explaining some of the components of an OLAP system. I'm not sure I'd take all of his comments as gospel; but the points he makes in regards to FASMI are useful.