- Data Warehouse Lifecycle
- The Sample Scenario
- Dimensional Modeling
- Summary
The Sample Scenario
Suppose I work for Northwind trader’s company. This hypothetical company sells products around the world and records data into the sample database that is created when you install SQL Server 2000. Let’s further suppose that business owners would like to have analytical views, including graphs and charts that display the company’s performance broken down by customer, employee, supplier, and product. Having such a tool would help stakeholders to promote the products that fall short of selling in hot trading areas.
As I mentioned before, the example in this article is very simplistic and therefore considerably easier to build than a usual warehouse. Let me give you a few reasons why this is the case.
First, we have a set of reports in mind to be supported by the warehouse. That’s better than what a DW architect typically gets at the beginning of the project. Usually business users know they need a DW but can’t really give you a concrete idea of what the warehouse will do for them. It might take at least a few interviews to figure out exactly what type of reports and analytical views users would like to see.
Because data is already in a SQL Server database that has a fairly simple structure, the first few steps of a typical warehouse project are already done for us. In reality, you don’t always get this lucky: The DW architect usually has to identify multiple data sources that will be used to populate the warehouse. The organization’s data could be stored in various relational database management systems (Oracle, SQL Server, DB2, and MS Access being the most common), spreadsheets, email systems, and even in paper format. Once you identify all data sources, you need to create data extraction routines to transfer data from its source to a SQL Server database. Furthermore, depending on sources you’re working with, you might not be able to manipulate the data until it is in SQL Server format.
The Northwind database has intuitive object names; for example, the orders table tracks customer orders, employees table for records data about employees, and order details table tracking details of each order. Again, in the real world this might not be the case—you might have to figure out what cryptic object names mean and exactly which data elements you’re after. The DW architect often needs to create a list of data mappings and clean the data as it is loaded into the warehouse. For example, customer names might not always be stored in the same format in various data sources. The Oracle database might record a product name as "Sleeveless Tee for Men," whereas in Access you could have the same product referred to as "Men’s T-Shirt (sleeveless)." Similarly, the field used to record product names could be called "product" in one source, "product_name," in another and "pdct" in the other.
Once you have determined which data you need, you can create and populate a staging database and then the dimensional data model. Depending on the project, you may or might not have to have a staging database. If you have multiple data sources and you need to correlate data from these sources prior to populating a dimensional data structure, then having a staging database is convenient. Furthermore, staging database will be a handy tool for testing. You can compare a number of records in the original data source with the number of records in the staging tables to ensure that your ETL routines work correctly. Northwind database already has all data I need in easily accessible format; therefore, I won’t create a staging database.