Home > Articles > Data > SQL Server

SQL Server Reference Guide

Hosted by

Analysis Services

Last updated Mar 28, 2003.

In SQL Server 2000, Microsoft offered a powerful Business Intelligence tool they called Analysis Service. This feature was greatly enhanced in SQL Server 2005. Both versions, starting with the Standard Edition, contain this software as part of the standard licensing — there isn't anything else to buy to use it. You will, of course, have to pay for each user or processor, but if a user accesses both SQL Server and Analysis Services, you won't have to get a separate license for each.

In this overview I'll provide an overview of Business Intelligence and Analysis Services in the two versions. I'll also cover some of the vocabulary that you need to navigate Microsoft's documentation, as well as those of other industry vendors. I have an entire section on creating your own Business Intelligence system that you can read here.

Let's start out with a definition of Business Intelligence software, as opposed to the database application software you are already familiar with. In database applications, a front-end program of some sort (even if it is a web page) presents fields for a user to enter or read. The application connects directly to a database on one or more servers, or a "middle tier" intercepts the call and accesses, enters or changes the data on the application's behalf. This is usually in short bursts, no more than a few rows of specific data at a time. The data is stored in relational tables, which are highly normalized. Normalized in this case means that the data is broken out to many tables, related by a "key,"which is just numbers or text that represents a record in another table. You can read more about that here.

The following data represents some items from a car parts store sales table. It’s normalized, to some degree, which means that data is pointed to rather than stored over and over — the "key" points to another table, so that you could update one key record in the other table and instantly keep all the data current on the screen for the user when it is displayed.

Sales Table:

Sales Key

Item Key

Store Key

Salesperson Key

Date Sold

1

2

1

2

01/01/03

2

4

7

4

03/15/02

Item Table:

Item Key

Item Name

Catalog Number

1

BuckSpeed High Impacts

17593

2

BuckSpeed Low Profiles

72342

This is what the user will see when the application reads the data from the different tables, linking the first record (Sales Key number 1) with the other tables. In this case it reads the Item Name from the Item table, where the Item Key is the same in both the Sales table and the Item table. That way the text is shown to the users, but only the numbers have to be stored.

Record 1 Shown to user:

Sale Number

Item

Store

Salesperson

Date Sold

1

BuckSpeed Low Profiles

Tampa, FL

John Smith

01/01/03

If you would like to rename item number 2 to something like "BuckSpeed Super-Low Profiles,"you can just change the Item Name entry in the Item table for record number 2. Since the tables "fly together" when the application reads them, all of the "old" sales records will instantly change to the new number.

This arrangement of data is very efficient for applications that work with small bits of data very often. These applications are called On-Line Transaction Processing, or OLTP for short.

If you examine this table you’ll see that it’s fairly two-dimensional. You can only look at the data in rows or columns. It isn’t inherently calculated in any way. It’s kind of wide, in that it stores many attributes of the data in one set of columns.

A Business Intelligence application works slightly different. Although there is an interface the user chooses, they don't enter data, other than to specify what they want to see. The data that the user works with isn't very normalized. It has a different shape and storage method altogether. Even the levels of storage are different.

Most often in Business Intelligence (BI) applications you are working with historical data. So we won't change the name of the Item we sold, we'll just record that on a certain date a certain tire was sold to a certain customer. So the record for that sale looks more like the record shown to the user:

Item

Store

Salesperson

Date Sold

BuckSpeed Low Profiles

Tampa, FL

John Smith

01/01/03

And many times the data is further modified to show groups of data, rather than an individual record. You may not care that a particular tire sold on that day, but you might want to know how many were sold on that day across all stores, or perhaps by which salespeople. The data begins to shape itself into a pattern of not very columns but many rows (skinny and deep), with the numbers of the combinations of the way you want to group things in the middle, joining them once again by a key. This is often called a "star" shape. If you combine many of these tables together, they can be called a "snowflake" arrangement.

When these shapes of data (stored in a regular database) are combined with an engine that can perform the calculations and present them to the user in various views, called dimensions, this structure is often called a "cube." Think of turning a Rubik's cube puzzle and you'll get the idea of how you can manipulate the data to show sales by salesperson, by region, by date and so on. This is called On-Line Analytical Processing or OLTP for short.

And that's the difference between the two applications — OLTP applications show two-dimensional data, and OLAP applications show data in multiple dimensions, or views.

Microsoft provides a great tutorial for Analysis Services to get you started with creating and assembling cubes, but it assumes you have the base data in the shape it needs to be before you start. This is a big assumption! As a matter of fact, getting the original data into the right format is probably one of the hardest things to learn about cubes.

While most of us are used to working with the data in the OLTP format, business people don’t often think like this. They want to see things repositioned, by various friendly criteria, on the fly. The two things that this data format lacks to do that are more than these two dimensions and summary functions to bring it to life. That’s where cubes come in.

Cube engines (such as the one in Microsoft Analysis Services) have the ability to process data by dimensions — which are just a particular view of the data. For instance, "Time" is a dimension, as are "Item" and "Salesperson." "Show me all sales by salesperson" you might ask the application. In that case the dimension is "Salesperson." All of the different salespeople's names are stored as a dimension.

The real power comes in the numbers that combine the salespeople with the number of things they sold. These numbers, with a key back to the salesperson table, are called facts and sometimes measures. Here is the salesperson dimension table:

1

Buck Woody

2

John Smith

3

Jane Doe

4

Keisha Wilson

And here is a small fact or measure table:

1

14

2

13

3

12

4

20

You can see that Buck Woody (record 1) sold 14 tires. This is a simple example of course, because you would want another dimension for the Date measure, so you could tell when he sold them.

You can also have levels within your measures, which are further break-downs of measures. Think of the measure called "Time." Within time you might have levels of "year,""month" and "day." If you add a Time dimension table to our example above, you'll need to change the facts or measures to have keys that point to the dates. Also, you'll need more facts to break down the levels. You can see it gets complicated quite quickly.

So let’s sum all this up. A cube is a database of de-normalized tables. Some of the tables are the dimensions (things you want to measure) and others are the measures (counts, sums and so forth of the items in those other tables). Using this fairly simple concept it’s now possible to answer questions like, "Which salesperson sold the most parts, and in which month?" and so forth. Quite powerful indeed.

So how do you get from what you’ve just learned to the tutorials Microsoft bundles with Analysis Services? And what is the roadmap for the entire process?

Here’s what you need to do to get your cube from concept to completion:

Determine the data that your users need to see

This is probably the most important step of all. You need to adequately plan for what the goal of your cube system is and where the data is currently located. Don’t worry about whether it’s in Access databases, SQL Server tables or Oracle sources — just define it all out.

Transform and load the data to the data warehouse

Next you’ll need to transport the data to a central location using batch files, programs and/or SQL Server 2000’s Data Transformation Services (DTS) or SQL Server 2005's SQL Server Integration Services (SSIS). In this step you also change the data from its native multiple table formats to those skinny, deep tables I mentioned earlier. The data will end up in one of two kinds of tables — dimension tables (which you’ve already seen) and fact tables (which are just the measures you learned about earlier). The two kinds of tables will be joined by a key field between them. This step is often the most laborious part of the job.

Create the cube

Now that your data is in the data warehouse, you can build the cube with Analysis Services. This is where the Microsoft tutorial picks up. Along the way you’ll pick the storage type for the cube, such as MOLAP, ROLAP, and HOLAP. For further information on these storage options, see the references section at the end of this article.

Process the cube

During this step the aggregations designed for the cube are calculated and loaded. The engine reads the dimension tables and the fact table, calculates the aggregations and then stores the results in the cube.

Distribute the results

Users can look at the cube data by using the Cube Browser in the Analysis Manager, by using Microsoft Excel, or by using other applications like the Microsoft Data Analyzer. Most cubes I’ve worked with use either the Microsoft Excel route or a custom designed program. Microsoft has a special language like SQL called MDX, which allows you to write queries to show the data stored in a cube.

In other articles on this site you’ll learn how to perform each of these steps.

InformIT Articles and Sample Chapters

Sometimes you don't really need a full BI application, you want a reporting system. You can find out more here.

Online Resources

The SQL Server 2000 reference documentation for Analysis Services is here. For SQL Server 2005 it is here, and for SQL Server 2008 it is here.