Covers all the applications businesses have discovered for data warehousing, including supply chain applications, cross-selling, total quality management, profitability analysis, and much more.
Walks step-by-step through data warehouse development planning, project management, and deployment, helping students understand the decision-making process, in-depth, at each stage.
Using the Web as delivery system, data store, and business intelligence portal.
"Lou Agosta's book is a guide to a better understanding of how collections of separately collected bits of information can be organized to serve the needs of an enterprise. Agosta's book offers technical and managerial insights into how to leverage the data warehousing concept for best advantage."- Paul A. Strassman, Chairman and CEO, Software Testing Assurance Corporation
Data warehousing for everyone! The essential guide for all business people.
This is the only data warehousing book that speaks directly to business leaders and data warehousing newcomers -- explaining the benefits, risks, technologies, and processes with remarkable clarity and insight. Leading consultant and industry analyst Lou Agosta shows how data warehousing can dramatically reduce business uncertainty by transforming a tidal wave of information into knowledge you can act on.
Agosta presents the quantitative business case for (and against) data warehousing, and helps you evaluate every key data warehousing application in the context of your own enterprise. Learn how to use data warehousing to slash supply chain management costs, make cross-selling more effective, strengthen customer and brand relationships, promote product quality, and more. Discover how to align your business and technical goals for data warehousing; then review every stage of the data warehousing project lifecycle, from planning and design through deployment and optimization. Understand what can go wrong -- and how to keep it from happening to you! Coverage includes:
Read by business and technical leaders from Paul A. Strassmann to Arno Penzias, The Essential Guide to Data Warehousing is for every business executive and IT professional seeking to understand the benefits, risks, and technologies of data warehousing -- without the jargon and hype!
Click here for a sample chapter for this book: 013085087X.pdf
I. FUNDAMENTAL COMMITMENTS.1. Basic Data Warehousing Distinctions.
An Architecture, Not A Product. The One Fundamental Question. The One Question—The Thousand and One Answers…. The First Distinction: Transaction and Decision Support System. Data Warehouse Sources of Data. Dimensions. The Data Warehouse Fact. The Data Warehouse Model of the Business: Alignment. The Data Cube. Aggregation. Data Warehouse Professional Roles. The Data Warehouse Process Model. Summary.2. A Short History of Data.
In the Beginning…. Fast Forward to Modern Times. The Very Idea of Decision Support. From Mainframes To PCs. The Promise of the Relational Database. Data Every Which Way. From Client-Server to Thin Client Computing. Why Will Things Be Different This Time? The More Things Change, the More They Stay the Same. Model of Technology Dynamics. Summary.3. Justifying Data Warehousing.
Competition for Limited Resources. An Integrated Business and Technology Solution. Economic Value, Not Business Benefits. Selling the Data Warehouse. The Reporting Data Warehouse: Running Fewer Errands. The Supply Chain Warehouse. The Cross-Selling Warehouse. The Total Quality Management Data Warehouse. The Profitability Warehouse. Data Warehousing Case Vignettes in the Press. Summary.4. Data Warehousing Project Management.
Simulating a Rational Design Process. Managing Project Requirements. Managing the Development of Architecture. Managing Project Schedule. Managing Project Quality. Managing Project Risks. Managing Project Documentation. Managing the Project Development Team. Managing Project Management. Summary.
II. DESIGN AND CONSTRUCTION.5. Business Design: The Unified Representations of The Customer and Product.
The Critical Path: Alignment. A Unified Representation of the Customer. Data Scrubbing. The Cross-Functional Team. Hierarchical Structure. Customer Demographics. A Unified Representation of the Product. Data Marts: Between Prototype and Retrotype. Summary.6. Total Data Warehouse Quality.
The Information Product. Data Quality as Data Integrity. Intrinsic Qualities. Ambiguity. Timeliness and Consistency in Time. Security. Secondary Qualities. Credibility. Quality Data, Quality Reports. Information Quality, System Quality. Performance. Availability. Scalability. Functionality. Maintainability. Reinterpreting the Past. Summary.7. Data Warehousing Technical Design.
Use case Scenarios. Abstract Data Types and Concrete Data Dimensions. Data Normalization: Relevance and Limitations. Dimensions and Facts. Primary and Foreign Keys. Design for Performance: Technical Interlude. Summary.8. Data Warehouse Construction Technologies: SQL.
The Relational Database: A Dominant Design. Twelve Principles. Thinking in Sets: Declarative and Procedural Approaches. Data Definition Language. Indexing: B-Tree. Indexing: Hashing. Indexing: Bitmap. Indexing Rules of Thumb. Data Manipulation Language. Data Control Language. Stored Procedures. User-Defined Functions. Summary.9. Data Warehouse Construction Technologies: Transaction Management.
The Case For Transaction Management: The ACID Test. The Logical Unit of Work. Two-tier and Three-tier Architectures. Distributed Architecture. Middleware: Remote Procedure Call Model. Middleware: Message-Oriented Middleware. The Long Transaction. Summary.
III. OPERATIONS AND TRANSFORMATIONS.10. Data Warehouse Operation Technologies: Data Management.
Database Administration. Backing Up the Data (in the Ever-Narrowing Backup Window). Recovering the Database: Crash Recovery. Recovering the Database: Version (Point-in-Time) Recovery. Recovering the Database: Roll-Forward Recovery. Managing Lots of Data: Acres of Disk. Managing Lots of Data: System-Controlled Storage. Managing Lots of Data: Automated Tape Robots. RAID Configurations. Summary.11. Data Warehousing Performance.
Performance Parameters. Denormalization for Performance. Aggregation For Performance. Buffering For Performance. Partitioning For Performance. Parallel Processing: Shared Memory. Parallel Processing: Shared Disk. Parallel Processing: Shared Nothing. Data Placement: Colocated Join. Summary.12. Data Warehousing Operations: The Information Supply Chain.
A Process, Not an Application. The Great Chain of Data. Partitioning: Divide and Conquer. Determining Temporal Granularity. Aggregate Up To the Data Warehouse. Aggregates in the Data Warehouse. The Debate about the Data Warehouse Data Model. The Presentation Layer. Integrated Decision Support Processes. Summary.13. Metadata and Metaphor.
Metaphors Alter Our Perceptions. A New Technology, a New Metaphor. Metadata are Metaphorical. Semantics. Forms of Data Normalization and Denormalization. Metadata Architecture. Metadata Repository. Models and Metamodels. Metadata Interchange Specification (MDIS). Metadata: A Computing Grand Challenge. Summary.14. Aggregation.
On-line Aggregation, Real-Time Aggravation. The Manager's Rule of Thumb. A Management Challenge. Aggregate Navigation. Information Density. Canonical Aggregates. Summary.
IV. APPLICATIONS AND SPECULATIONS.15. OLAP Technologies.
OLAP Architecture. Cubes, Hypercubes, and Multicubes. OLAP Features. The Strengths of OLAP. Limitations. Summary.16. Data Warehousing and the Web.
The Business Case. The Web as a Delivery System. Key Internet Technologies. Web Harvesting: The Web as the Ultimate Data Store. The Business Intelligence Portal. Summary.17. Data Mining.
Data Mining and Data Warehousing. Data Mining Enabling Technologies. Data Mining Methods. Data Mining: Management Perspective. Summary.18. Breakdowns: What Can Go Wrong.
The Short List. The Leaning Cube of Data. The Data Warehouse Garage Sale. Will the Future be Like the Past? Model Becomes Obsolete. Missing Variables. Obsessive Washing. Combinatorial Explosion. Technology and Business Misalignment. Becoming a Commodity. Summary.19. Future Prospects.
Enterprise Server Skills to be in High Demand. The Cross-Fictional, Oops, -Functional Team. Governance. The Operational Data Warehouse. Request for Update. The Web Opportunity: Agent Technology. The Future of Data Warehousing. Summary.Glossary.
Naturally, as the author, I would like to give everyone permission to read this book from cover-to-cover. But if priorities require choices in what to read first, the following suggestions will be useful. The extensive and detailed Glossary is a valuable resource. Although a conscientious effort is made in this book to define technical terms upon first use, if a reader wishes to skip around, then that reader is encouraged to consult the Glossary frequently for those terms whose introduction may have been skipped. Everyone will want to read Chapter One: Basic Data Warehousing Distinctions, Chapter Two: A Short History of Data, Chapter Five: Business Design, Chapter Six: Total Data Warehouse Quality, Chapter Twelve: The Information Supply Chain, Chapter Thirteen: Metadata and Metaphor, and Chapter Eighteen: Breakdowns: What Can Go Wrong. In addition, executives, business leaders, and information consumers of all kinds, will benefit most directly from the Introduction: Data Warehousing Between Uncertainty and Knowledge, Chapter Three: Justifying Data Warehousing, and Chapter Nineteen: Prospects for the Future; project managers, data architects, and designers should add Chapter Four: Data Warehouse Project Management, Chapter Five: Business Design (even though already mentioned), Chapter Seven: Technical Design, Chapter Fourteen: Aggregation, and Chapter Fifteen: OLAP Technologies; developers will be interested in Chapter Eight: Data Warehouse Construction Technologies: SQL, and Chapter Nine: Transaction Management (a word of caution this book does not teach SQL or coding and the reader is referred to the extensive bibliography for help with that); database administrators and information managers of all kinds will be interested in Chapter Ten: Data Warehouse Operation Technologies: Data Management and Chapter Eleven: Data Warehousing System Performance; business analysts, technical specialists, and other information producers in various areas will want to consider Chapter Fifteen: OLAP Technologies, Chapter Sixteen: Data Warehousing and the Web, and Chapter Seventeen: Data Mining.
The real purpose of the Preface is to give the reader insight into the origin of the book. What could the author possibly have been thinking that led him to produce this result? In comparison with the book itself, the Preface functions rather like the distinction in science between the context of discovery and the context of justification. How an idea or discovery is first formulated is often irrelevant to how it is justified. Many readers benefit and enjoy receiving background on how ideas and views originate aside from their validity. The accidental human details and feelings that occasioned the emergence of something new or at least an original synthesis of existing ideas add texture and motivation to the drier, more impersonal logical structure that is required to validate or justify the undertaking objectively. However, if you, dear reader, are one who is not interested in such a background briefing and longs to get into the details of data warehousing, then please feel free to skip immediately to Chapter One: (Basic Data Warehousing Distinctions), with the option of returning later. The author, whose job after all it is to serve the reader, will not be offended, and no significant loss of continuity will occur.
Now, after having tried to discourage the reluctant reader, to those still reading this Preface, let me say this book is born out of three convictions. First, amidst paradigm shifts, data warehousing is not another paradigm shift. Second, the essential key to producing knowledge rather than more data by means of data warehousing is the alignment between the dimensions of the data warehousing system and basic business drivers and imperatives. Third, the data warehouse system is both an information product and a method of knowledge generation. In this, it is rather like a lens on a telescope, which incorporates knowledge of the principles of optics in the service of enabling us to see (and so come to know) things at a distance that we had not previously imagined. This sometimes also requires a new kind of seeing.
This finds confirmation in places that are more relevant than they might at first seem. So, for instance, when the Renaissance scientist Galileo pointed his new fangled telescope at the moon and saw mountains (as on Earth) instead of heavenly perfection, Galileo's seeing was informed by a framework for understanding of a unified system of heavenly bodies to which the earth and sun and moon all belong. But when the learned Church scholars of the day looked through the same strange device, the patterns detected by their differently informed seeing were not mountains but rather were designs so unfamiliar as to result in what they saw being labeled the work of the devil. Do not believe that such matters are trivial, because this resulted in poor Galileo being placed under house arrest by the Inquisition. Although the data warehousing architect and data miner can realistically expect to escape such a fate, they should be careful when raising data quality issues.
The point is that the origins of this essential guide which differs from its justification is to be found in three dynamic terms: "paradigm," "alignment," and "knowledge." These are protean words that have at times been badly abused. The promise here as that they will be carefully explored, rigorously defined, and used selectively. (All three terms are defined in the Glossary.)
The case of "paradigm shift" is of interest because it was first popularized in 1962 in a controversial book on how scientific knowledge develops, Thomas Kuhn's The Structure of Scientific Revolution (1962). Amid many engaging examples from the development of scientific knowledge, Kuhn focused on the celebrated example of the shift from regarding the Earth as the center of the solar system (and universe) to regarding the sun as the center. The shift from the sun going around the Earth to the Earth going around the sun is a fundamental one. This is the very paradigm of a paradigm shift. In fact, that the Earth revolves around the sun is on the list of the great discoveries of the past thousand years. (Usually dated as 1643 with the publication of Nicolaus Copernicus's On the Revolution of the Heavenly Bodies.) Many of the same facts as are contained in the older systems are accommodated by the new system, many new facts that previously resisted clarification are integrated, and additional facts are constituted ("constructed") by the framework of the new paradigm. This example underscores the point that how a discovery is made is often tangential (if not irrelevant) to how it is justified. Copernicus's paradigm shift his new theory was actually the result of an intricate logical speculation that was nearly as confusing and counter-intuitive as the system it was designed to supplant. For instance, Copernicus continued to reason that the path traced by the planets was a circle, not a ellipse. But much of the alleged simplification of the new candidate paradigm is lost without the use of elliptical motion. Another two centuries of step-by-step advances in physics and mathematics were required to satisfactorily justify what according to rigorous scientific criteria we now take for granted. Indeed, a significant part of that justification was the construction of a data warehouse yes, the term must be used of accurate, consistent, conformed observations of the planets by Tycho Brahe and his student Johannes Kepler (published in 1625 as the Tabulae Rudolfinae (Rudolf's Tables) after the local monarch, Prince Rudolf). But when thus completed by a series of sustained, incremental improvements (including the history of modern science in the works of Galileo, Kepler, and Newton), the new candidate paradigm finally begins to look like a breakthrough.
Thus we get to the essential conviction with which this book began data warehousing is not a paradigm shift. The enticing but superficial analogy between enterprise data and the graphical user interface (GUI) between the sun and the Earth has its charms. But is far too limiting. There is no shift of emphasis either from the mainframe to the presentation layer or from a data-centric to a GUI-centric model (or back). Both were available at the beginning, and both continue to be essential components of a complete computing system architecture. Hearing this may disappoint some readers for whom the term "paradigm shift" has come to represent the possibility of breakthrough or solution. On the other hand, those readers fatigued by being bombarded by paradigm shifts in the trade press and marketing releases will be relieved to learn that incremental progress is still possible. In either case, we are not dealing with something like client-server, the relational database, the network computer, or the Web-based Internet "revolutions." To be sure, data warehousing is a system architecture that builds on these designs and products in a variety of forms. But in every form we are dealing with the technological imperative to manage data customer, product, market, information as a corporate asset to be understood and applied to decision making. The injunction that the data is the business is one that has stood fast. It is as true today as it was in 1980 when the author first encountered it or when statistical (and other) methods were first applied to decision making in the late 1960s.
The second idea with which this book began is referred to as "alignment." One of the essential methods of producing knowledge rather than merely more data through data warehousing is the alignment between the data dimensions defining the warehouse and basic business imperatives and drivers. This idea leads in the direction of how we use language to describe the world. It turns out that everyday language is just the more general case of all kinds of systems of signs, including computing systems, used to express and constrain situations in the world.
Naturally, the uses of language are not restricted to the simple representations of factual situations in the world (here, at the risk of redundancy, "language" is a synonym for all kinds of computing system applications). Language is also both instrumental and pragmatic in its alignment with business imperatives. It leverages results through means-ends relations and the embedding of systems in the context of business processes in which coordination and communication are of the essence. Thus, "alignment" becomes a proxy for the way the computing systems and data warehousing in particular represents the world of business interactions between customers, products, and markets. The data warehousing system represents and as designers like to say "exposes" the levers and dials of the targeted markets, products, and customers. These levers and dials, in turn, are structures that literally help the builders and operators of the data warehousing system find and define the real world references of the system. For example, the forecast of the demand planner is almost completely a system artifact. It may seem like a miracle that the world (mostly) operates according to it. Nevertheless, it is not all relative accurate features of the business situation are really expressed, and the system objectively refers to things that are happening in the environment. If this seems circular, well, it is. The system designer (and builder) is an essential part of this loop, which is a productive, not vicious, circle. This suggests why the process is inherently iterative an alignment between data warehousing system and the context of its use is not merely discovered. It is constructed. (Of course, selected, given business practices will already always be in place in any situation. These are abstracted, captured, and sometimes transformed beyond recognition in being implemented or supported by the automated system.) The resulting system both represents the business situation as well as makes it accessible. The part of the designer's science that remains part art is to know when to stop iterating, because the essential minimal characteristics of the situation have been captured in the system architecture, and the remaining features are distractions from the business imperatives. Thus, we have a two-way alignment of three things the system, the world, and the system designer (and builder). The result is knowledge.
This leads us to the third of the three dynamic words, "knowledge," out of which this book was born. These days, knowledge means anything from content to patented intellectual property to an inventory of PowerPoint presentations on a consulting firm Intranet. Intuitively, to say that something is "knowledge" confers on it a certain dignity, a mark of excellence, or a suggestion of high esteem. This intuition is unpacked and motivated in Chapter Six in the section entitled "The Information Product." The approach taken in this book suggests that a data warehousing system is an information product. Furthermore, the data warehouse is indeed a source of knowledge. It is an "enabler." It is a condition of possibility of knowledge. But, in itself, it is not knowledge. Rather, the knowledge in question emerges in the conversation between the staff and the data warehouse content. The knowledge is not "in the heads" of the staff who pose queries using SQL or other user interface methods. Rather, the knowledge is in the interaction between the query poser and the answer provider. The knowledge is in the relation between the question and answer. In short, the knowledge is in the coordination of information and actions that are reflected in a firm's commitment to answering questions, at least some of which haven't even been posed yet. Thus, the data warehousing system is an essential part of a business process whose outcome and results are knowledge.The relationship between information and knowledge looks different depending on whether it is approached from a technology or a business perspective. From a technology perspective, knowledge is on a continuum with information. As the quality of the information improves, it gets closer and closer to being knowledge in the full sense. Knowledge is a point on the horizon toward which information is always improving and progressing. Information is just data that has been subjected to a defined process of improvement. If this process is sufficiently extended, then (the argument goes) knowledge is the result. From a business perspective, knowledge is qualitatively different than information. There is a yawning abyss separating information, no matter how high the quality, from knowledge. The "best available information" never results in knowledge without something special mixed in to fortify it. There is a certain something that has to be added to information in order to yield knowledge. That something is commitment. When information is made the basis for a business decision, then a commitment is implied and mobilized. Decision support questions are addressed and answered by the data warehousing system by providing instrumental and pragmatic knowledge for action. This book will especially emphasize the practical aspects of knowledge in business in knowing customers and knowing the behavior of product brands in the market. It is not a miracle that dirty data gets scrubbed in the information supply chain and knowledge is one of the results. But it does sometimes seem like a miracle because the commitment of those exceptional firms that make this happen is not in the headlines or on the surface. The result is instrumental knowledge in which means are applied to ends. Knowledge generates actionable results for the benefit of the business. This further entails pragmatic knowledge in which knowledge becomes the basis for commitments, in which business processes are coordinated, essential business imperatives addressed, and fundamental decision support questions answered by the data warehousing system. The latter way in which knowledge is defined (and fortified) by commitment is the singular "spin" that this work puts on our understanding of knowledge in a business context. This commitment of data warehousing to knowledge is what puts the decision back in decision support.