Home > Articles > Data > SQL Server

Dimensional Databases: Building A Data Warehouse

  • Print
  • + Share This
Building a data warehouse involves several complicated steps and can take time if the population routines aren't designed carefully. But the effort is worth it. Baya Pavliashvili addresses some of the challenges involved in building and maintaining dimensional databases that serve as the foundation of a data warehouse.
Like this article? We recommend

Building a data warehouse involves several complicated steps (some of which I described in my previous articles for InformIT). After the data warehousing architect locates all data elements necessary to support the data warehouse, it is time to build a dimensional model. As data change in transactional systems, the data warehouse needs to have a way of tracking and reflecting such changes. Populating fact and dimension tables can take considerable amount of time if the population routines aren't designed carefully. This article addresses some of the challenges involved in building and maintaining dimensional databases that serve as the foundation of a data warehouse.

NOTE

This article focuses on SQL Server-specific implementations, but concepts discussed here can also apply to data warehouses built using other database management systems.

Dimensional Modeling

Most developers are very familiar with relational data modeling principles. Although you can still find some "brilliant" systems out there that keep all data in a single table, many IT professionals know the normalization rules and adhere to them as they should. Normalized tables work very well in transactional systems. However, when it comes to reporting, joining dozens of tables isn't very efficient. Thus, most organizations use separate databases for reporting and for transaction processing. Reporting systems allow some denormalization, but are still built using relational modeling concepts.

Dimensional modeling is somewhat different from its relational counterpart. No, I'm not referring to the height, width, and length when I talk about dimensional tables. All tables consist of rows and columns, so don't expect any magic here. The term dimensional modeling stems from the fact that dimensional databases are used to slice data along certain dimensions. For example, a cake factory dimensional database could have dimensions of product, ingredient, shift, employee, supervisor, time, and so forth. Such database could be used to generate reports that break down total production by each sort of cake, by date and time when cakes were baked, by shift that produced the goods, and so on.

Dimensional models consist of one or several fact tables and many dimension tables. For this reason they are sometimes referred to as star schemas—one fact table surrounded by numerous dimension tables.

Fact tables record something tangible that can be reported. For example, a car dealership might record the fact that customer Jim White bought a 2002 Toyota Camry for $10,529 on February 20, 2005. Fact tables typically have foreign keys to all dimension tables; continuing with the dealership example, the sales fact table would have foreign keys to the customer dimension, the time dimension, and the vehicle dimension, among others.

Dimension tables are typically very simple—they contain the levels on which you want to group your reports. For example, the vehicle dimension could look similar to the following:

Vehicle_dimension
Vehicle_key
Vehicle_Identification_Number (VIN)
Vehicle_type (car, van, truck, SUV, RV)
Vehicle_class (luxury, sports, etc)
Brand 
Make
Model
Year_manufactured

Such a design would allow breaking down total sales by the vehicle class, type, brand, make, model, or year in which the vehicle was made.

Time dimension, which is found in most data warehouses, usually has the following structure:

Time_dimension

Time_key

Year

Quarter

Month

Day

Day_of_the_year


If you need a finer grain of detail on your reports, the time dimension can also include hours, minutes, and even seconds.

So how is a dimensional model different from a relational model? Well, fact or dimension tables aren't found in the relational database. Normalization rules command to keep each piece of data only in a single spot in the database. Hence, the vehicle dimension that you just examined would be represented by several normalized tables in the relational world, as shown next:

Table

Vehicle

Vehicle_type

Brand

Make

Model

Column

Vehicle_key

Vehicle_type_key

Brand_key

Make_key

Model_key

Column

Vehicle_type_key

Type_name

Brand_name

Brand_key

Make_key

Column

Model_key

 

 

Make_name

Model_name

Column

Year_made


A dimensional model doesn't celebrate each dimension level with a separate table. Rather, it allows redundant data within a dimension.

Some data warehouses are implemented using the "snowflake" schema, which is a special case of the star schema. Snowflake simply normalizes one or multiple dimensions, or each dimension might be made up of more than one table. For example, a human resources data warehouse might contain a job dimension, which could be made up of job title and job category tables. Job title might or might not be associated with the job category; therefore, the job dimension would look as follows:

Job_title

Job_category

Job_key

job_category_key

job_title_key

Job_category

Job_title

Grade_level

Job_description

 

Job_category_key

 


Most data warehouse implementations tend to begin with the star schema. If one of the dimensions grows large and could be more manageable by having multiple tables instead of one, the star schema evolves into the snowflake model.

Overall dimensional modeling isn't too complicated—mere mortals like you and me can get used to it with some practice. However, be sure to present the finished model to the end users at least twice: Changes to the model can be costly for a couple of reasons. If you have overlooked a dimension that users want to slice their data with, you have to do the following:

  1. Find where data for this dimension are located.

  2. Figure out how to extract this data.

  3. Determine how to maintain changes to this dimension (see more on this in the next section).

  4. Change fact table and DW population routines.

  • + Share This
  • 🔖 Save To Your Account