Creating a Local Cube from a Relational Source
Our second approach to creating a local cube will consist of a return to Microsoft Query. We also saw Microsoft Query appear in the creation of the database and connection phases of the PivotTable report setup in the earlier part of this lesson.
We use Microsoft Query to create a subset of a relational data source and then finish the cube-creation process using the OLAP Cube Wizard.
Start with a Relational Data Subset
Sometimes, we have neither a server-based cube nor an Analysis Server available, but we can access a relational data source. For example, we may become involved in the early phases of transitioning an enterprise to the use of local cubes for remote users, say a team of salespeople or managers who need analytical capabilities on their laptops. We may need to prototype an eventual solution by outfitting a representative test group with basic cubes in order to obtain useful feedback regarding the contemplated design for a server-based generation and distribution environment. After obtaining the specifics from the information consumers, we can more effectively place the final design in production and allow the server to generate the cubes in an automated fashion for field deployment.
The capability to design and create cubes directly from a relational data source provides an excellent opportunity to perform "proof-of-concept" exercises with a reality-based model that will prompt consumers to more clearly specify their requirements. Users who are unfamiliar with OLAP and its uses can see the model in action, and can relate it to their day-to-day needs and signal suggested improvements from the standpoint of both the overall cube structure (including the dimensions, measures, and other components that are critical to effective design) and the interface and usability considerations of a more mechanical nature.
Furthermore, local cubes offer the capability of "parallel testing" proposed model changes while a production cube build process takes place in an insulated manner, both as a troubleshooting and continuous-improvement process. For example, many clients with whom I have worked in the past have asked me to investigate issues with cube size and performance. The cube under consideration was often the Production model, so we were not afforded the luxury of simply taking it offline to modify and tune its build cycle. Using a "zero-based" approach to constructing a new cube (a rapid process using a local prototype) based upon the currently desired reporting output of the Production cube, many issues were subsequently resolved through a close investigation of what data in the cube was actually being used in the field; redundancies and abandoned elements often came to light through a rapid redesign that was based on current needs. Moreover, examination of the "layers" (which were gradually added onto the original cube structure as consumer needs became known) at which summarization was being attempted often led to the derivation of a creation cycle that was more finely attuned to the effective use of preprocessing and aggregation at the RDBMS level, where appropriate. In situations such as these, and in many others, the generation of a local cube from the relational source tables can be highly useful to the organization in a number of ways.
As discussed previously, we begin the direct-from-relational creation of a cube by first deriving a subset of the relational data. Preparation involves 1) setup of a data source for the relational database; then 2) creation of the query that specifies the selection criteria for the subset. A query tool is needed to precisely identify the parameters of that subset; once again, Microsoft Query provides a straightforward means of achieving the creation of this subset (or rowset, as it is called when using the SQLbased Microsoft Query tool). Microsoft Query allows us to perform joins between the dimension and fact tables, to create calculated columns, and to otherwise prepare a "virtual warehouse" on a miniature scale from which to build a cube. After we establish the rowset, we use the OLAP Cube Wizard to design the cube and its member objects (including measures, dimensions, hierarchical levels, and so forth).
Let's get started by defining the rowset upon which the cube will be based. We'll keep the structure fairly simple to allow us to focus on design and creation concepts as we proceed.
Open a new Microsoft Excel worksheet.
Select Data from the top menu.
Select Get External Data from the dropdown menu.
Click New Database Query from the flyout menu, as shown in Figure 21.
Figure 21 Initializing the rowset query.
Microsoft Query is initialized, and the Choose Data Source dialog box appears, as shown in Figure 22.
Figure 22 Choose Data Source dialog box for Microsoft Query.
On the Database tab of the dialog box (the default when the dialog box appears), choose Microsoft Access Database, as shown in Figure 22.
Notice that the Query Wizard is enabled via the default setpoint on the dialog boxthe checkboxto assist in guiding us through the design process.
The Select Database dialog box appears, along with a Connecting to Data Source progress dialog box.
Find and select the FoodMart 2000 Access database (FoodMart 2000.mdb), which is typically installed in the Samples folder of the Microsoft Analysis Services directory under Program Files on the drive upon which the Typical installation took place. (In my case, for example, the .mdb is located at D:\Program Files\Microsoft Analysis Services\Samples).
The Select Database dialog box should resemble Figure 23.
Figure 23 Select Database dialog box.
The Query Wizard Choose Columns dialog box appears, as shown in Figure 24.
Figure 24 Query Wizard Choose Columns dialog box.
Select the tables listed below from the Available Tables and Columns box on the left by double-clicking each (or by selecting each and clicking the > button) to move the respective columns to the right.
- product class
While we would probably expand each table and select the specific columns we needed for our cube in a real world scenario, we will select each of the tables in its entirety at this point, to make the process quicker for our lesson.
The Query Wizard Filter Data dialog box appears, as shown in Figure 25.
Figure 25 Query Wizard Filter Data dialog box.
We will simply click Next to skip this step, again noting the importance of filtering the data in a real-world scenario to keep cube size minimal when the relational data source is large.
The Query Wizard Sort Order dialog box appears, as shown in Figure 26.
Figure 26 Query Wizard Sort Order dialog box.
Again, we will pass on sorting the data; we will not be working with the end product in this lesson. This, too, would obviously be handled differently in the business environment.
The Query Wizard Finish dialog box appears, as shown in Figure 27.
Figure 27 Query Wizard Finish dialog box.
Click Save Query, and name the query Tutorial-LocalCube, placing it in a convenient directory or accepting the default.
The Save As dialog box resembles that shown in Figure 28.
Figure 28 Saving the underlying relational query.
The Query Wizard Finish dialog box returns.
Click the Create an OLAP Cube from this query radio button to select it (refer to Figure 27).
The OLAP Cube Wizard is launched, based on our selection in the Finish dialog box. It appears as shown in Figure 29, awaiting instructions about how we want it to create the cube for which we have created a source definition in Microsoft Query.
Figure 29 The OLAP Cube Wizard welcome dialog box appears.
Finish the Job with the OLAP Cube Wizard
We now enter the cube design and creation phase of our second approach for creating a local cube. Our next steps focus upon the organization of the external data we have defined for extraction, and the manner in which we want it to summarize and to appear for analysis and reporting. Many reporting options exist, including PivotTable reports, PivotTable lists, PivotChart reports, and others. Many options also exist outside of the Microsoft Office suite because various third-party reporting tools can access the OLAP cube that we will generate.
The OLAP Cube Wizard allows us to begin with the output (a flat series of records) of the query we have designed in Microsoft Query, and to then apply a hierarchical organization to the fields. It also allows us to define the summary values we want to calculate for optimal reporting purposes. In addition to summarized values, our cube will contain descriptive facts surrounding those values. The values to be summarized, or measures as we know them from other OLAP scenarios, are called data fields within the context of the OLAP Cube Wizard. The descriptive facts, such as the date and location of a transaction, are organized into the hierarchical levels of detail that we know as dimensions.
The successful definition of the dimensions and their associated levels depends upon determining the kinds of categories that the information consumers employ (or want to be able to employ) when they analyze the data in reports and browsers. We can organize data fields and dimensions to endow organizational reports with high-level summaries (such as total costs worldwide, or at country or regional levels); while also enabling the presentation of lower-level details, filtered for a myriad of criteria (such as locations or areas of management responsibility where costs are particularly high, or, alternatively, well-controlled and minimal).
As discussed, the local cube design and creation process is easy, flexible and (best of all from the perspective of "proof of concept" and other prototyping exercises) fast. After we create and view reports based upon a new version of a local cube, we can return immediately to the OLAP Cube Wizard to make changes to adjust for consumer suggestions and comments regarding usability and performance, as well as to test ideas we formulate on an ad hoc basis. The local cube means isolation of the development process and uninterrupted operation of any production cubes that we have in place. It also means ultimate portability and convenience, both in the design phase and in a distributed production scenario.
Let's begin exploring the process involved in working with the wizard to create our local cube. The steps consist of the following:
Defining the data fields
Defining the dimensions and levels
Selecting the type of cube
Click Next at the Welcome screen for the OLAP Cube Wizard.
The OLAP Cube Wizard Step 1 of 3 dialog box appears, as shown in Figure 30.
Figure 30 OLAP Cube Wizard Step 1 of 3 dialog box.
The Step 1 of 3 dialog box is a great example of the way the wizard makes design straightforward and rapidassuming that planning (based upon a solid understanding of the business requirements of the information consumers) has taken place before we embark upon cube design. We simply select from a list the data source fields that we wish to present, and how we wish to summarize each of those fields in an efficient and easy-to-use screen.
In this step, it is important that we decide which of our source data fields it makes sense to use as data fields. Data fields contain values (that is, they are measures) that we want to summarize, such as store costs for which information consumers have a need for totals. The wizard requires that we select at least one field to be a data field.
When the dialog box initially appears, the wizard has several boxes checked already. These are selected by the well-meaning (but not necessarily correct) wizard, based upon its conclusion that these fields appear to contain measure-like data. It "proposes" them, as a result, for selection in this step. It is critical to verify whether the wizard's proposals are correct and to make any changes to fit our business requirements. The fields that we leave unchecked in this step will comprise the set of available dimension fields in Step 2, from which we will select and organize those we need to design our dimension hierarchy structures.
Fill out the Step 1 of 3 dialog box, ensuring that only the setpoints in Table 1 below exist (clearing any unwanted checkboxes).
Table 1 Initial Measures List with Suggested New Names.
Data Field Name
Store Unit Sales
Total Sq. Ft.
Grocery Sq. Ft.
Frozen Sq. Ft.
Meats Sq. Ft.
In the above setpoints, we made minor modifications to the field names because we might wish to fit terminology that exists in current reports, and so forth.
In addition to the "typical" measures for sales, costs, and unit sales, we selected the square footage information to illustrate the use of non-measure information to derive summaries. The square footage data could be stored at the member level as a property were we designing a cube in MSSQL Server 2000 Analysis Servicesthis approach would offer numerous advantages, not the least of which might be in areas of optimization.
The OLAP Cube Wizard Step 1 of 3 dialog box now appears, with all relevant selections displayed, as shown in Figure 31.
Figure 31 Step 1 of 3 dialog box with our selections.
The OLAP Cube Wizard Step 2 of 3 dialog box appears.
In this step, we organize the descriptive data into dimensions, each of which can be used as a field in any reports we generate from our cube. The organization of the fields in levels of detail that we design at this stage should allow information consumers to select the level of detail to view, starting with high-level summaries, drilling to details, and zooming back to summaries as appropriate to meet their reporting needs.
The wizard requires that we designate at least one dimension for a cube. We can designate fields that provide isolated facts and do not belong in any particular hierarchy, such as the Store Type in our example, as dimensions with a single level. Rather obviously, our cube will be more useful for reports if we design some of the fields as levels to "roll up" to higher levels and dimensions.
To create a level within a dimension, drag each field from the Source Fields list onto an existing dimension or level in the Dimensions box, as shown in the following steps. To rename a selection, simply right-click and select Rename from the shortcut menu that appears. (The "click label and wait" routine also enables the direct typing of changes.)
Move the selections shown in Table 2 from the Source Field list on the left to the appropriate position in the Dimension list on the right. (To correctly place the dimensions/levels under the dimensions, use the "template" guide that automatically adjusts itself to remain at the bottom of the existing Dimension list for each new dimension created.)
Rename each selection (the "source field" table name), with the suggested New Name below it, as shown in Table 2.
Table 2 Initial Dimensions List with Suggested New Names.
Source Field Table Name
Source Field Table Name
Source Field Table Name
The OLAP Cube Wizard Step 2 of 3 dialog box now resembles that partially illustrated in Figure 32.
Figure 32 Partial View of Step 2 of 3 dialog box with our selections.
The OLAP Cube Wizard Step 3 of 3 dialog box now appears.
Select the Save a cube file containing all data for the cube radio button by clicking it, if necessary.
Select a location in which to save the cube file.
The OLAP Cube Wizard Step 3 of 3 dialog box now appears, as shown in Figure 33.
Figure 33 Step 3 of 3 dialog box with cube type selection and file name/location.
With another selection in this dialog box, we decide what kind of cube we want to create. While our choice here depends on several factors in our operating environment, including the amount of data our cube will contain; the type and complexity of reports we plan to create based upon our cube; and the memory, disk space, and other resources on our systems (as well as those of the systems upon which the information consumers will be interacting with our cube). Experience and planning will be our best guides as we develop larger and more complex cubes for various organizational needs.
The Save a cube file containing all data for the cube option creates a separate cube file on our PCs, retrieving all the data we have designated within our cube design, and storing it in this file. This selection is not appropriate in all situations, but might represent a good choice when the following occur:
We are constructing and generating a cube for frequently changing interactive reports.
The amount of disk space used by the report is not a limiting concern.
We want to store the cube on a network server as a standalone source that information consumers can access to create their own reports (an alternative but often innovative use for a "local" cube. Local cubes are also great to use for training sources for fledgling report writers.)
A cube file can act as an intermediate source of data from the original relational database that excludes source data to which we might want to prevent access. A cube file can also provide a snapshot of some or all of the source database to facilitate offline access and analysis, either for a consumer community (for instance, on an isolated separate network), or for a sole remote consumer or consumers.
As in most scenarios of variable processing and storage scenarios, resource and speed tradeoffs are factors to consider. With the Save a cube file containing all data for the cube option, we can expect more time and resources to be necessary for the initial creation of the cube; but read operations, such as opening and modifying reports, will likely be faster (although cube size is a factor to consider in read speeds). In addition, the fact that the cubes we generate are self-contained is often a deciding factor for the selection of the Save a cube file containing all data for the cube option.
The sheer amount of the data we include in the cube, together with the number of dimensions and levels we attach to the model, are key factors in predicting ultimate cube size. Flatter hierarchies and filtered data selections are considerations in reducing cube size, as are other cube type options that can be selected on the Step 3 of 3 dialog box. A close study of the options and prudent design of the cube, combined with testing in an appropriate development environment (and well-facilitated by the ability to create and modify local cubes quickly and easily, as I have emphasized), contribute heavily to efficient cube generation, delivery, and overall operations.
For more information regarding the details of the various choices that appear on this dialog box, as well as optimization techniques for cube building in general, see the Books Online that are installed with the Typical MSSQL Server 2000/Analysis Services installation (or that can be accessed on the installation CDs or on the Microsoft MSSQL Server website).
The Save As dialog box appears, prompting us to name the definition file of the OLAP query (.oqy).
Name the file Tutorial-LocalCube, and navigate to store it in a convenient place.
The Save As dialog box appears, as shown in Figure 34.
Figure 34 Save As Dialog box, with location indicated for the new definition file.
Microsoft Query prompts us to save the cube definition (.oqy) file, which is separate from the cube file that we create to store data. We can reuse the .oqy file in Excel for report creation or for other possible purposes later. When we chose the Save a cube file containing all data for the cube option at the last dialog box, a file with a .cub extension was "scheduled" to be created in a location to be specified. The .cub file actually contains the data for the cube and is not created immediately when we click Finish. In our case, it is created when we save the cube definition as a file; once Microsoft Query creates the OLAP query file, it hands off instructions to the PivotTable Service to use the newly saved definition to kick off creation of the local cube.
To modify our initial cube design, we have only to open the .oqy file to initialize the OLAP Cube Wizard once more. For more details regarding the .oqy files, see the Microsoft Query Online Help.
Click the Save button in the Save As dialog box.
The .oqy file is quickly saved, and the Creating Offline Cube dialog box appears and remains until the cube is created, confirming that the build is taking place.
Microsoft Excel returns, in which we are greeted by the PivotTable and PivotChart Wizard Step 3 of 3 dialog box, as shown in Figure 35.
Figure 35 PivotTable and PivotChart Wizard Step 3 of 3 dialog box.
Choose where to place the new PivotTable on the current worksheet or on a new worksheet entirely.
We see the standard PivotTable "map" appear, signaling that 1) we have a connection to the new cube; and 2) that the cube is ready to be reported upon using standard PivotTable report procedures. The PivotTable report appears, as shown in Figure 36.
Figure 36 PivotTable reportready for reporting action.
The cube is now ready for reporting; and, from the perspective of a reporting application, is in many ways identical to a server-generated cube.
Save the Excel worksheet as desired.
Keep in mind that we can call Microsoft Query at any time to rapidly edit the initial .oqy fileto make modifications based upon the results obtained in reporting efforts or as new requirements arise from the information consumers who test the new cube design in development. A quick rebuild of the cube will implant any changes, giving us the opportunity to immediately test for desired results again.