What Is OLAP and Why Do We Need It?
Companies have large quantities of information from day-to-day operations and from other sources. This information is usually stored in relational databases called online transaction processing (OLTP) systems. OLTP systems are designed to store data in an efficient way and to keep track of the daily operations of a company or organization. These systems are capable of processing a large number of transactions at the same time because only small amounts of data are involved in each transaction.
In today's ever-changing market, it's very important for companies to gather data from every operational system as quickly as possible, transform that data into information, and use the information to create knowledge that will allow them to make better decisions. But analyzing data stored in OLTP systems is time-consuming and can utilize great amounts of computer resources.
Online analytical processing (OLAP) systems are designed to discover trends and critical factors by extracting data from OLTP systems and then transforming and integrating the extracted data into useful information.
The following example will help you better understand the process of transforming data into knowledge. Suppose a certain police department collects data about the crimes committed in a certain city. Every crime registered is called a transaction. At the relational data level, such a transaction might look like Figure 1.
Figure 1 A transaction at the data level.
The data shown in Figure 1 is not particularly useful to the Chief of Police in helping the department in the crime-fighting process. He can only derive from this data that patrol car #003 went to a house at 1234 South 12th Avenue at 10:32 p.m. on 01/11/02 to investigate a reported robbery, and that the case is closed (1=closed, 0=open). It's impossible to determine from this data whether the police department is achieving its goals, what the efficiency of patrol car #003 is, or how many crimes occurred in 2002. To try to answer these sorts of questions, the police department will need to add information to the relational model; for example, determining the types of crimes committed on a specific date, the number of times a specific crime occurred, the percentage of cases closed, and so on. With this information, it's easy to further determine what crime was committed most or least often. Figure 2 shows the information for 2002.
Figure 2 Accumulating data into information.
With this information, the police department can determine that in 2002 the crime that occurred most was auto theft, that only 27.7% of murder cases were closed, and that DUI has the highest percentage of cases closed.
But the police department still doesn't know whether it's achieving its goals, or the seasonal behavior of criminals. To answer these questions, the police department needs to use other tools that will allow it to perform a more complete analysis of the data.
OLAP tools can help the police department to separate by period the information stored in the relational database. Figure 3 shows the OLAP analysis.
Figure 3 Analyzing the information.
From this analysis, the police department can conclude that house robbery and auto theft were influenced by period, while murder and DUI remained generally constant throughout all four periods.
The OLAP analysis answers almost of the police department's questions, allowing the department to measure its performance, know whether it's achieving its goals, and determine whether it's consistent with crime-prevention strategies.
The department now has enough information to improve its crime-fighting performance. If the police department uses data mining techniques, it might discover the reasons behind the behavior observed with the OLAP tools. A data mining application may discover a relationship between the patterns of house robbery and auto theft, determining that house robbery and auto theft increase or decrease each period by almost the same percentage. This is called a rule. Figure 4 shows this analysis.
Figure 4 Identifying a pattern.
With this analysis, the police department can test the rule found by the data mining application and see that the rule was true for periods 2 and 3, but period 4 has a different behaviorthe percentage of change was greater for auto theft than for house robbery during this period. The department can further analyze the crime-fighting strategies in period 4, comparing them with the strategies used in previous periods to find the reason for this behavior.