In this section, we first take a look at MDM in the application landscape for an enterprise followed by the introduction of an architecture overview. When we introduce the architecture overview, we first take a look at the ecosystem prior to the rise of Big Data. In a second step, we show how the architecture overview evolved due to the impact of Big Data.
MDM as Central Nervous System for Enterprise Data
Although many MDM implementations historically focused on operational use cases, with the rise of Social MDM, an MDM system truly becomes the central nervous system for the enterprise, as shown in Figure 4.1. It is connected to the operational landscape as well as to a broad range of analytical applications. A key observation in Figure 4.1 is that in many cases the connections are bidirectional because with Social MDM, the MDM system becomes a core essential part of the operational fabric. For example, although social media analytics might enrich a particular customer record with insights gleaned from unstructured sources such as social media, customer interaction logs from the call center, and so on, the starting point for that analysis is the customer records that define a “search scope” to the analysis. Similarly, with self-service capabilities to update their master data record exposed to the customer through various operational channels, the link between operational applications and MDM becomes more and more bidirectional where a couple of years ago many MDM systems were fed with a consolidation style architecture pattern.
Figure 4.1 MDM—the central nervous system for enterprise data
MDM: Architecture Overview
Now that you have a better understanding of the functional scope of the discussed capabilities in the previous chapter, let’s switch gears to implementation architecture. A few quick words regarding nomenclature will help to more easily convey key messages in the drawings. A functional area is a collection of related subsystems delivering a major IT function. A technical capability is a specialized type of technology performing a specific role; we introduced those relevant to us in Chapter 3. With information provisioning as an example, there are collections. In this example, it is a collection of mechanisms for locating, transforming, or aggregating information from all types of sources and repositories. A zone is a scope of concern describing a usage intent for a particular cross-cutting service. It has associated requirements and governance that any system in the zone must adhere to. Figure 4.2 shows iconic examples we use for these concepts in the drawings.
Figure 4.2 Nomenclature
To understand what is changing with Social MDM, we first need to understand common deployment architectures today, such as shown in Figure 4.3.
Figure 4.3 Architecture overview—a traditional viewpoint
In Figure 4.3, you can see two types of capabilities:
- Technical capabilities introduced in Chapter 3: Examples include (but are not limited to) Master Data Hubs, Reference Data Hubs, and so on, which are technical capabilities introduced in the Information Engine capability layer in the category Managed Operational Data Hub. Other capabilities are grouped in functional areas; for example, the Analytic Sources Area is composed of the capabilities in the Data Server category from the Information Engine capability layer as well as some analytical functions from the Insight capability layer.
- Technical capabilities external to the capabilities defined in Chapter 3: These are primarily well-known IT systems such as customer relationship management (CRM) applications.
In the functional area of traditional sources on the left side in Figure 4.3 are the sources for master-data-comprised third-party data sources such as Dun & Bradstreet, as well as operational applications such as customer relationship management (CRM), enterprise resource planning (ERP), human resources (HR), supply chain management (SCM), supplier relationship management (SRM), and eCommerce. In a typical enterprise, some of these applications are packaged from vendors like SAP and Oracle, or from software as a service (SaaS) providers like Workday and Salesforce.com, or custom-built applications.
The functional area of information ingestion has transformation engines providing, for example, ETL or CDC capabilities. Using these transformation engines master data can be moved from the sources to MDM or from MDM into the data harmonization processes feeding the analytical sources. The MDM system resides in the functional area of shared operational information systems alongside Reference Data Management and Content Management Systems. The name “Master Data Hubs” is intentionally plural for two reasons: first, commercial software vendors historically provided Master Data Management software for a single domain only, such as for a customer or product, creating the two disciplines customer data integration (CDI) and product information management (PIM). Early adopters of MDM sometimes implemented multiple MDM products for different purposes, from the same or different vendors resulting in multiple master data hubs. Today, many MDM software vendors provide multi-domain MDM software often reducing the number of distinct hubs. Multiple Master Data Hubs can also be the result of a merger and acquisition where both companies have an MDM system already. Yet another reason could be that the company adopted different MDM software solutions from different vendors to address different MDM requirements. The functional area for analytical sources is composed of the landing zone where the data harmonization for operational data stores and data warehouses located in the integrated warehouse and marts zone is done. For exploratory analytics such as pattern detection, a dedicated exploration zone exists. For the functional area of information consumption where business users consume information, the figure shows various well-known technical capabilities such as data mining and reporting. For governing the information architecture, the functional area of information governing systems provides a metadata catalog storing business technical and operational metadata, among other capabilities. The functional area of security and business continuity management provides necessary security features for controlling and auditing information access as well as features for backup and restore, high availability, disaster recovery, maintenance, and so on.
With the rise of Big Data, the implementation landscape changes to reflect the new sources and capabilities available as shown in Figure 4.4.
Figure 4.4 Architecture overview—impact of Big Data
Major changes in key functional areas are:
- Data sources: A whole new group of data sources has emerged. As internet-connected sensors and devices become more common (often called the Internet of Things), more information can be collected, integrated, and analyzed to improve operational efficiencies and quality of life across a number of areas. Examples include instrumentation for food transport (“farm to fork”), utility networks (smart water/gas/electricity networks and smart meters), and smarter homes as just three examples implementing sensors producing data at an unprecedented rate and massive volume. New kinds of unstructured content sources have also emerged including blogs and wikis. Social media sources grow at a rapid pace as well, and examples include Facebook, Twitter, LinkedIn, and Yelp.
- Information ingestion: A new technique known as streams processing has emerged to address new use cases where data is produced at speeds and volumes too large to actually persist all the data. A streams engine can apply real-time analytics as information is created to make timely decisions and to selectively store the most interesting information.
- Analytical sources: A new zone of deep analytics is added—the location of new analytical capabilities based on the MapReduce paradigm, as we will see. With a Hadoop platform to implement a Map-Reduce platform allows you to land the data, perform possibly some cleansing, do some analytics, and persist the results of the analytics which might be also moved to a DW. With such a system, you would have all historic and current data. This possibly changes the DW procedures because instead of archiving of the DW you can simply delete because the full history is still in the Hadoop platform.1 The second major change is that the consumption of information is radically simplified, creating a true shared analytics information zone.
- Information consumption: New techniques of collaboration and new insight engines appear as novel technical capabilities. Examples include new matching engines to search for duplicates and nonobvious relationships, pattern mining, and natural language analytics.
- Information governing systems: Major functional enhancements include the extension of the metadata catalog to enable a broad class of users to find and provision the information they need from across the variety of systems and zones.
Figure 4.5 shows the architecture overview from an Social MDM perspective. Integration and analysis of new sources of information, especially social media sources, is one of the most striking changes from Figure 4.5. Another key change is the introduction of activity hubs.
Figure 4.5 Architecture overview—focus on Social MDM