InformIT

Working with Analysis Services Cubes in SQL Server

Date: Oct 1, 2004

Return to the article

Data warehousing is not for the faint of heart; it takes much diligence and dedication to learn the various tools of the trade. In this article, Baya Pavliashvili details some features available with Microsoft Analysis Services, whose Analysis Manager allows you to create and modify cubes to suit your end users' needs.

After you have populated your dimensional database (as discussed in the previous article in this series), you're ready to build Analysis Services cubes (Analysis Services is free when you purchase SQL Server 2000). The tool that lets you build and manage cubes with Analysis Services is called Analysis Manager, which is similar to Enterprise Manager. The language used to query Analysis Services cubes is called MDX (Multi Dimensional eXpressions). Although simple SELECT statements in MDX resemble SQL SELECT statements, don't expect many similarities between these two languages. Although you can navigate data within the cube directly from Analysis Manager, if you build custom applications based on Analysis Services you might wish to test your MDX queries through a tool called MDX Sample Application.

Creating cubes with Analysis Manager is very simple; by clicking through a few wizard screens, you can build a decent foundation. Fortunately, there are numerous articles and tutorials that show you how to build cubes with Analysis Manager. Instead of writing yet another tutorial on building cubes, I want to show you some of the advanced functionality available with Analysis Manager; to someone who hasn't spent many hours working with this tool, such functionality might not be immediately apparent. I will show you how to pivot cube data based on a different set of dimensions and measures. In addition, I'll show you how to write MDX queries to accomplish the same functionality within the MDX sample application.

I will also discuss the calculated members and member properties; both of these features can be used very effectively for performing calculations on cube data, as well as making the output of MDX queries more meaningful.

Terminology

Before you dive into the details of Analysis Manager, you need to be familiar with the terms used within this tool. Throughout this article, I'll be using examples from the Foodmart2000 sample database that comes with Analysis Services 2000.

Analysis Services dimensions are based on dimension tables in the SQL Server database and describes the fact in some way. For example, the sales cube includes product, time, gender, promotion media, and other dimensions that let you slice the sales data accordingly. Within Analysis Services, you can have either shared or private dimensions—shared dimensions can be used within multiple cubes, whereas private dimensions are specific to a particular cube.

Dimension levels are columns of the dimension table that are used for breaking down report data. For example, the product dimension within the sales cube has the product family, product department, product category, product subcategory, brand name and product name levels. Similarly the time dimension has levels of year, quarter and month.

Dimension members are individual values within a dimension level. For example, the time dimension has members 1997 and 1998 in the year level. Product family level has drink, food, and nonconsumable members.

Measure is a (typically) numeric fact presented on the report or analyzed through an OLAP tool. Sales cube has the following measures: unit sales, store cost, store sales, sales count, and store sales net.

A calculated member is a special type of measure, derived from other existing measures, which can potentially involve calculations within the cube.

A member property is an attribute of a dimension member that you expose within your cube to provide additional information to your users. Store name level of the store dimension contains three member properties: store manager, store square feet, and store type.

Aggregations are precalculated summary values that are generated each time a cube is processed. Aggregations make end-user queries very fast; however, the more aggregations you design the longer it takes to process the cubes. Any structural changes to the cube require full processing of the cube, which involves rebuilding aggregations. Because some cubes take a very long time to process, you need to be careful about changing dimension structures or adding measures to a processed cube.

Pivoting Cube Data

You can browse cube data directly from the Analysis Manager by selecting the desired cube and clicking the data tab. However, this window cannot display the structure of the cube. A more practical place to view the cube data is within the Cube Editor.

The Cube Editor window can be activated by right-clicking a cube and choosing the Edit option. The sales cube within the FoodMart2000 database looks like the following figure.

Figure 1Figure 1

First, notice that you can easily switch to a different cube by simply choosing the cube name from the drop-down box in the upper-right corner. This is a nice feature if you have shared dimensions and you want to make changes to different cubes before reprocessing all of them. Note that if you change a shared dimension's structure, you must reprocess all cubes that include that dimension.

Now if you look at the bottom of the right pane on the previous screen, you'll see that the Cube Editor has two tabs: schema and data. Most people have seen the schema tab, but not the data tab. If you click the data tab, you can browse cube data directly from the Cube Editor, as shown in the following figure:

Figure 2Figure 2

The data grid contains some members that have no data: empty members. In this example, you view data in the tiny sample database. In the real world, you're likely to have many more empty cells. Viewing such empty cells can be quite annoying and inconvenient. Fortunately, you can click anywhere in the gray area of the data grid to get a small option box as shown below.

Figure 3Figure 3

If you uncheck Show empty members, the data grid will condense to only the nonempty members.

How could you go about doing the same using MDX? You could use the NON EMPTY function, as follows:

SELECT {[measures].members} ON COLUMNS,
NON EMPTY {[customers].[country].members} ON ROWS
FROM sales

The following figure shows the result you will see in MDX sample application.

Figure 4Figure 4

Let's go back to the Cube Editor. If you hold the left mouse button and drag the measures dimension to the time dimension, Analysis Manager will pivot the data based on the selected measure. For instance, instead of seeing all measures broken down by year, you can view the value of store sales for every store type and for every year in the time dimension, as shown below:

Figure 5Figure 5

This feature is particularly helpful if you're having trouble with one of your MDX queries and you want to browse the cube data to find the correct values. How do you pivot dimensions and measures within MDX? Simply treat measures as you would any other dimension. You can specify the measures on ROWS, on COLUMNS, or in the WHERE clause. For example, suppose that you wanted to see the breakdown of store sales for each store type during 1997 and 1998. The MDX query would look like this:

SELECT {[store type].members} ON ROWS,
{[time].[year].[1997], [time].[year].[1998]} ON COLUMNS
FROM sales
WHERE (measures.[store sales])

The output would be identical to what we saw in the Cube Editor on the previous screen shot.

If you wanted to see the value of all measures for each store type during 1997, you'd have to change the query slightly:

SELECT {[store type].members} ON ROWS,
{[measures].members} ON COLUMNS
FROM sales
WHERE ([time].[year].[1997])

The results are shown here:

Figure 6Figure 6

If you look closely at the MeasuresLevel heading within the Cube Editor you see that it includes calculated members profit and sales average; if you run an MDX query returning measures.members on columns (or rows), the output does not include calculated members. You can easily fix this by explicitly specifying the calculated members in the SELECT statement as follows:

SELECT {[store type].members} ON ROWS,
{measures.members, measures.profit, measures.[sales average] } ON COLUMNS
FROM sales
WHERE ([time].[year].[1997])

Alternatively, you could use the allmembers function to accomplish the same thing:

SELECT {[store type].members} ON ROWS,
{measures.allmembers } ON COLUMNS
FROM sales
WHERE ([time].[year].[1997])

NOTE

I discuss calculated members in greater detail later in this article.

Member Names and Keys

Now let's examine the left pane of the Cube Editor. This pane displays cube dimensions, measures, calculated members, calculated cells, actions, and named sets. For this example, click on the Customers dimension and then navigate to the Name level. In the Properties box on the bottom of the left pane you'll see the Member Key column and the Member Name column properties, as shown next:

Figure 7Figure 7

What is the difference between the two? Within Analysis Services, you can look up the dimension member by using either its key or name. In MDX, you can refer to a dimension member by specifying its name or by appending "&" to the key of that member. Hence the Foodmart database uses Customer.customer_id as the Member Key column and customer.fname + '' + customer.lname as the Member Name column. Data retrieval usually works faster if you know the key; however, there will be times when you don't know the key—if so, you can use the Name column to get to the same data. For example, using the sales cube, you can get the same information by using the following MDX:

SELECT {[measures].members } ON COLUMNS,
{[customers].[name].&[1058]} ON ROWS
FROM sales

Or

SELECT {[measures].members} ON COLUMNS,
{[customers].[name].[amy petranoff]} ON ROWS
FROM sales

The output in both cases would be the following:

Figure 8Figure 8

Whether you use member names or keys in your MDX queries depends on the parameters you can collect from your users.

Calculated Members

We saw calculated members in the left pane of the Cube Editor. In a nutshell, calculated members are measures derived from other measures. The Foodmart sample database contains very simple calculated members: profit and sales average. Profit is calculated by subtracting the cost from total sales, whereas sales average is total sales divided by count of sales. Notice that we could easily perform the same calculations within an MDX query, as follows:

WITH MEMBER [measures].[store profit] AS '[Measures].[Store Sales]-[Measures].[Store Cost]'
MEMBER [measures].[sales avg] AS '[measures].[store sales] / [measures].[sales count]'
SELECT {[store type].members} ON ROWS,
{measures.[store profit], measures.[sales avg] } ON COLUMNS
FROM sales
WHERE ([time].[year].[1997])

So what's the advantage of calculated members? Analysis Services treats the calculated members just as any other measure; so the aggregations for calculated members are calculated during cube processing. If you perform calculations directly within your query, Analysis Services has to calculate the summary values during query execution. Therefore, calculated members will perform better than equivalent calculations performed within queries.

In the real world, the calculated members can get much more complex than average or profit. For example, let's try adding a standard deviation calculation to the sales cube. Standard deviation can be calculated as follows:

  1. Calculate the sum of the square of each sale.

  2. Multiply the result of step 1 times the sales count

  3. Sum all sales.

  4. Square the result of step 3.

  5. Subtract the result of step 4 from the result of step 2.

  6. Multiply the sales count by one less than sales count ("sales_count" * ("sales_count" – 1)).

  7. Divide the result of step 5 by the result of step 6.

  8. Standard deviation will be the square root of step 7.

Sounds like fun, right? Indeed it's not simple, but it can be done. First, let's make sure that we have all the needed measures. Perhaps one of the challenges is to realize that sum of the squares is not the same as the square of the sum. In order to calculate the sum of the store sales squares, we need to add a measure store_sales_squared to the cube. The easiest way to accomplish that is to open the MS Access database containing Foodmart and add a column called store_sales_squared to the sales_fact_1997 table.

NOTE

you have to close the Cube Editor before you can modify the FoodMart2000 Access database. Foodmart2000 data resides in a Microsoft Access database called FoodMart2000.mdb found in the Analysis Services installation directory in the Samples folder.

Next, you can run a query within Access:

UPDATE sales_fact_1997 
SET store_sales_squared = store_sales * store_sales;

Next, you need to add a measure to the Sales cube; let's close the Access database and go back to the Cube Editor. Right-click Measures within the Cube Editor and choose New Measure to get the following dialog box:

Figure 9Figure 9

Here, choose the store_sales_squared column and click OK. Now you have all necessary measures: store sales, store sales squared and sales count. The standard deviation calculation will look as follows:

((([Measures].[sales count]*[Measures].[store sales squared]) - ([Measures].[store sales]*[measures].[store sales]))/ ([Measures].[sales count]*([Measures].[sales count]-1) )) ^ (.5)

To add a calculated member, right-click Calculated Members within the Cube Editor and choose New Calculated Member. Paste the standard deviation calculation within the Value expression box, as shown next:

Figure 10Figure 10

After you process the cube, the standard deviation and store sales squared will be available within your MDX queries. For example, the following query returns sales standard deviation, sales average, and sales squared for various store types within 1997:

SELECT {[store type].members} ON ROWS,
{measures.[store sales squared], measures.[standard_deviation], measures.[sales average] } ON COLUMNS
FROM sales
WHERE ([time].[year].[1997])

Results are shown as follows.

Figure 11Figure 11

Calculated members can thus be used effectively to extend the existing measures and perform calculations within the cubes. Calculated members can also be used to expose member properties as measures.

Member Properties

A member property is a characteristic of a dimension member that can be exposed to provide additional information to data warehouse users. For example, suppose that your users want to see customer occupation along with total sales per customer on their reports. You can edit the customer dimension and expose the occupation as the member property pointing to the Occupation column within the customer table. To do this, click Shared Dimensions within the Analysis Manager, right-click the customer dimension and choose Edit. Next, navigate to the name level within the customer dimension, right-click the Member Properties folder and choose New Member Property. From the dialog box that opens, simply choose Occupation. The dialog box you'll see is identical to the one I showed you when adding a measure, so I won't include another screenshot here.

Notice that any time you add a member property you must process the dimension because you are changing its structure. Any cubes that contain the changed dimension also have to be fully processed.

Process the customer's dimension and sales cube (this can be accomplished by right-clicking the cube and choosing Process). Then get back into the Cube Editor and add a calculated member called customer_occupation using the following formula:

Customers.CurrentMember.Properties("occupation")

Please consult the calculated members section of this article for detailed instructions for adding calculated members. After you add the occupation calculated member, you must save the cube before this member is available for MDX queries. However, you don't have to reprocesses the cube again; this happens because you're simply adding a calculated member—you're not changing the structure of the cube.

Next, run the following statement within the MDX sample application:

SELECT {[measures].members, measures.[customer_occupation] } ON COLUMNS,
FILTER({[customers].[name].members}, ISEMPTY(measures.[unit sales]) = FALSE) ON ROWS
FROM sales
WHERE ([store].[store name].[store 11])

The results will contain all measures as well as the customer occupation for each customer who has shopped in store 11. The abbreviated results are presented in the next figure.

Figure 12Figure 12

This output might allow a marketing manager to see the opportunities to target a particular occupation with more aggressive sales or adjust the current promotional campaign. So member properties allow you to add value to the reports and make them more meaningful.

Summary

This article detailed some features available with Microsoft Analysis Services. Analysis Manager allows you to create and modify cubes to suit your end users' needs. Analysis Manager makes building cubes very simple after you have built the cubes you need to browse the cube data and perhaps use calculated members and member properties to make the output of your reports more useful. The MDX sample application allows you to test your queries to ensure the correctness of your result sets. I also showed you how to derive standard deviation within your cubes using calculated members.

Data warehousing is not for the faint of heart; it takes much diligence and dedication to learn the various tools of the trade. This article showed you some functionality that isn't apparent to an inexperienced eye. Hopefully, it will be a good place for you to start digging into Microsoft's powerful data warehousing tools.

800 East 96th Street, Indianapolis, Indiana 46240