You can leverage data mining for various business requirements in many ways, and you have a whole set of new possibilities available if we add the Page Hits fact table to the data warehouse. This would involve handling a large fact table, which has ramifications for the relational database design, ETL processes, and even the cube structure. See Chapter 11, "Very Large Data Warehouses," for a full description of the issues associated with very large databases.
Sequence Clustering to Build Smarter Web Sites
Each visit has an associated path that the user took through the pages of the Web site. We could use this data with the Sequence Clustering algorithm, which finds clusters of cases that contain similar paths in a sequence. This mining model could then be used in the Web site to suggest the next page that the user might like to visit.
Other Data Mining Possibilities
This chapter has given a basic introduction to the rich and broad set of applications possible using the algorithms in Analysis Services data mining. One area that we have only scratched the surface of is prediction. For example, applications that require predictions of an attribute are possible using the classification algorithms including Decision Trees, Neural Network, and Naive Bayes; and continuous variables such as future profit levels can be predicted by the Time Series and Decision Trees algorithms.
Using Data Mining in Integration Services to Improve Data Quality
One of the major scenarios that we have not looked at in this chapter is the use of data mining in Integration Services. We could create a clustering model against a subset of data that is already in the data warehouse and is known to have clean, correct values, and then query this model in an Integration Services package that loads new data to determine the probability that each new record is valid. Records that are selected as likely "bad data" can be split out during the load process into a separate table for further validation or human checking and correction.
Integration Services also has other data mining features, such as loading data directly into data models within the data flow and specifying samples of data to be used for training models rather than just using all the data.