The DB2 for Linux, UNIX, and Windows Data Server
In the distributed environment, DB2 is available in a number of different packaging options, called editions. Furthermore, DB2 is also available as part of other packages that contain additional features and tooling rather than just the base data services provided by DB2.
The mainstream DB2 editions are shown in Figure 1-5:
Figure 1-5 The distributed DB2 family
For the most part, each edition builds on its child in this hierarchy. For example, if a feature or functionality is available in DB2 Workgroup Edition, it's likely that it's also a part of a higher-level edition, like DB2 Enterprise Edition.
DB2 Everyplace Edition
DB2 Everyplace (DB2e) is a tiny "fingerprint" database that's about 350K in size. It is designed for low-cost, low-power, small form-factor devices such as personal digital assistants (PDAs), handheld personal computers (HPCs), and embedded devices. DB2e runs on a wide variety of handheld devices, with support for Palm OS 5.x, Windows Mobile 2003 for Pocket PC, Windows Mobile 2005 for Pocket PC, Windows CE.NET, traditional Windows desktop platforms, Symbian OS Version 7/7s, QNX® Neutrino® 6.2, Linux distributions running with the 2.4 or 2.6 kernel, embedded Linux distributions (like BlueCat) running with the 2.4 or 2.6 kernel, and more.
The SQL API used to develop DB2e applications is a subset of that used for building full-fledged DB2 data server applications. This means that enterprise applications, for the most part, can be easily extended to include mobile devices. More importantly, it means that if you have DB2 skills, you have DB2e skills. In addition, DB2e is extremely flexible for developers, with support for Open Database Connectivity (ODBC), Java Database Connectivity (JDBC), .NET (including the ADO.NET 2.0 API), and the DB2 Call Level Interface (CLI) APIs.
DB2e is a very simple-to-use data server that requires virtually no maintenance. Typical database administrator (DBA) operations like reorganizations and statistics collection are all performed automatically. Another nice thing about developing DB2e applications is that the database engine is platform independent, so it provides flexibility: You can seamlessly move DB2e databases between devices. For example, you could move a DB2e database populated on a Pocket PC device to a Symbian smartphone, or whatever other supported device you have, without the need to do anything. This feature, coupled with the rich support for application development, enables developers to quickly build, deploy, and support mobile applications on all platforms.
DB2e is available in two editions: DB2 Everyplace Database Edition (DB2e DE) and DB2 Everyplace Enterprise Edition (DB2e EE). The database component of DB2e DE is the same as DB2e EE; however, DB2e DE has no synchronization middleware to extend or synchronize data to back-end enterprise data servers (although it does come with command line-based import and export utilities). DB2e DE is primarily used for applications that require an embedded database or a local relational storage facility that is exposed to endusers through some sort of application (they never really see the database) yet have stringent footprint requirements because of the device.
DB2e EE distinguishes itself from DB2e DE in that it comes with a data synchronization component called the DB2e Synchronization Server (DB2e Sync Server). The DB2e Sync Server allows you to manage subscriptions and security controls for data that is distributed wirelessly to your hand-held devices and manage data changes on the client devices back into the data center. The DB2e Sync Server also comes with facilities for conflict resolution, application deployment, device identification controls, management policies, and more.
The DB2e Sync Server can synchronize DB2e and Apache Derby/IBM Cloudscape data servers with back-end JDBC-enabled compliant data servers (for example, DB2, Oracle, Informix, SQL Server™, and so on). In addition, there is a special DB2 family synchronization adapter that uses the Data Propagator™ (DPROPR) SQL-based replication technology (which is included in the distributed version of DB2).
The number of concurrent synchronizations that the DB2e Sync Server can support is dependent on the hardware configuration of that server, the associated workload, and data change rates. If you need to scale to handle very large concurrent synchronizations, you can install any Java application server (like IBM WebSphere Application Server). DB2e also supports enhanced scalability and high-availability through its support for DB2e Sync Server farm configurations that allow you to cluster a number of DB2e Sync Servers to provide load balancing and high-availability services.
In Figure 1-6 you can see the flow of data in a DB2e EE environment. For example, data is pulled from a database in Tier 3 (the far right of the figure) and placed on a mobile device in Tier 1 (the far left). Tier 1 is typically composed of occasionally connected clients that operate on data and then use the services provided by Tier 2 (the middle of the figure where the DB2e Sync Server resides) to push those changes back to Tier 3. Tier 2 handles issues like conflict remediation and subscription management to ensure that the data quality is maintained throughout its lifecycle until it's at rest.
Figure 1-6 A DB2e Enterprise Edition environment
Apache Derby/IBM Cloudscape
In 2005, IBM donated $85 million worth of relational database management system (RDBMS) code to the open source community, and the Apache Derby database was born. Apache Derby and IBM Cloudscape are the same databases; the difference is that IBM Cloudscape is sold by IBM with IBM's award-winning 24*7 support and has some add-on features as well.
If you hadn't heard of IBM Cloudscape before the donation news, you'll probably be surprised to learn how many partners, customers, and software packages use this data server. In fact, more than 80 different IBM products use the IBM Cloudscape data server for its portability, easy deployment, open standards-based Java engine, small footprint, and more. IBM Cloudscape is a component that is transparent to products such as WebSphere Application Server, DB2 Content Manager, WebSphere Portal Server, IBM Director, Lotus Workplace, and many others.
IBM Cloudscape is a Java-based RDBMS that has a 2MB footprint. It's compatible with DB2, supports advanced functions (such as triggers and stored procedures), is easy to deploy, and requires no DBA effort. These same characteristics hold true for the open source Apache Derby as well.
We chose to include the Apache Derby/IBM Cloudscape data servers in this discussion because their SQL API is 100% compatible with the DB2 data server editions in Figure 1-5. This means that you can take any Apache Derby/IBM Cloudscape database and application and move it to a full-fledged DB2 data server if you need more scalability, or you need to take advantage of features that aren't found in these data servers. In fact, a component of DB2 9, called the DB2 Developer Workbench, provides a built-in facility to migrate Apache Derby/IBM Cloudscape schemas and data to a DB2 data server.
DB2 Personal Edition
DB2 Personal Edition (DB2 PE) is a full-function database that enables single users to create databases on their workstations. Since it's limited to single users (it doesn't support inbound client request for code), it's generally not referred to as a data server (although the DB2 engine behind DB2 PE is that same DB2 engine for all editions in Figure 1-5). This product is only available on Linux and Windows. DB2 PE can also be used as a remote client to a DB2 data server. Applications written to execute on DB2 PE are fully portable to the higher-level editions of the DB2 family in Figure 1-5.
DB2 PE is often used by end users requiring access to local and remote DB2 databases, or developers prototyping applications that will be accessing other DB2 databases. In addition, since it includes the pureXML™ technology free of charge, DB2 PE is also a good choice for those looking to acquire DB2 9 pureXML skills. In many cases, because it includes replication features, DB2 PE is used for occasionally connected applications (like field research, sales force automation, and so on) where a richer feature set is required than what's offered by DB2e or Apache Derby/IBM Cloudscape.
The DB2 Express and DB2 Workgroup Editions
Both DB2 Express Edition (DB2 Express) and DB2 Workgroup Edition (DB2 Workgroup) offer the same functions, features, and benefits; they are differentiated only with regard to licensing restrictions, which have a direct affect on the amount of scalability and performance that can be derived from each.
This section will detail these editions, point out differences where they exist, and describe the set of add-on feature packs that you can purchase to extend the scalability, capability, and availability of these data servers.
DB2 Express Edition
DB2 Express is a full-function, Web-enabled client/server RDBMS. DB2 Express is only available for Windows- and Linux-based workstations (unlike DB2 Workgroup). DB2 Express provides a low-cost, entry-level server that is intended primarily for small business and departmental computing. As previously mentioned, it shares the same functions and features as DB2 Workgroup, but is mainly differentiated from DB2 Workgroup by the amount of memory available on the server and the server's Value Unit (VU) rating (which equates to the power of a server's processor cores) on which it can be installed.
DB2 Express can be licensed using the VU methodology, which applies a per-VU charge for the VU rating of a server, or by an Authorized User metric. A DB2 Express server cannot use more than 4 GB of RAM on the server where it is installed. Authorized Users represent individual users that are registered to access the services and data of a single DB2 data server in the environment. For example, if you had a user that needed to access two different DB2 Express 9 data servers and wanted to license this environment with Authorized Users, that single user would require two DB2 Express Authorized User licenses (one for each server).
You can also license DB2 Express using the VU model. No matter what licensing methodology you choose, you cannot install DB2 Express on a server with more than 200 VUs.
DB2 Express can play many roles in a business. It is a good fit for small businesses that need a full-fledged relational data server. A small business may not have the scalability requirements of some more mature or important applications, but they like knowing they have an enterprise quality data server backing their application that can easily scale (without a change to the application) if they need it to. As noted, an application written for any edition of DB2 is transparently portable to another edition on any distributed platform
DB2 Express-C: The Little Data Server that Could
DB2 Express-C isn't considered a "real" edition of DB2; however, we chose to include it here because it's very likely the case that this is the copy of DB2 that you're using to learn DB2.
DB2 Express-C is a free data server offering from IBM that you can download from www.ibm.com/software/data/db2/udb/db2express/ or order for free as part of the DB2 9 Discovery Kit. At the heart of the DB2 Express-C is the same scalable and robust data engine that you'll find in the other editions covered throughout this chapter. DB2 Express-C is optimized for two-way dual core servers with no more than 4 GB of memory. Because of the specific optimization in the code base for this architecture, many consider these the "license limitations" for this product.
DB2 Express-C was designed for the partner and development communities, but as you get to know this version, you'll realize it has applicability almost anywhere: as a student trying to learn or get certified in DB2, a hobbyist, and even large enterprises will find this product useful in their environments.
A defining characteristic of DB2 Express-C is that it's generally considered to be a no limits data server. DB2 Express-C doesn't have the limits that are typically associated with other competitor's free offerings (ironically, they also carry the Express moniker). For example, there is no database size limit with DB2 Express-C, you can address a 64-bit memory architecture, there are no limits on concurrency or on built-in self-managing features, and more. Where limits do exist, they are more than generous for the workloads for which DB2 Express-C has been optimized to run.
The main features that are not included in DB2 Express-C when compared to DB2 Express are:
- Support for high-availability clustering
- The ability to enhance the capabilities of the core data server using add-on feature packs (more on these later).
- Replication Data Capture
- 24x7 IBM Passport Advantage® support model (a special packaging of DB2 Express-C, called DB2 Express-C Fixed Term License, offers this support for DB2 Express-C servers, but details on this option are outside the scope of this book).
If you want to use any of these features in your environment, you need to, at a minimum, purchase DB2 Express.
For the most part, all of the features found in DB2 Express-C are also available in any of the higher editions found in Figure 1-5. The exception for this product is the pureXML component. To help proliferate and grow XML skills across the database community, IBM generously decided to make this feature available for free with the DB2 Express-C data server. You'll note as you read this chapter that this feature is a chargeable add-on feature pack for all other DB2 data servers.
DB2 Workgroup Edition
DB2 Workgroup is also a full-function, Web-enabled client/server database. Unlike its DB2 Express-C and DB2 Express counterparts, it is available on all supported flavors of UNIX (AIX, HP-UX, and Solaris), Linux, and Windows—this is the main non-resource differentiator between DB2 Workgroup and DB2 Express.
DB2 Workgroup provides a low-cost, entry-level server that is intended primarily for small business and departmental computing. DB2 Workgroup supports all the same features as DB2 Express. Additional features and capabilities can also be added, via feature packs, without having to purchase DB2 Enterprise.
DB2 Workgroup can be licensed using the same options as DB2 Express—it only differs with respect to the architecture limits, which in turn optimize it for specific workloads. For example, the RAM limit for DB2 Workgroup is 16 GB, which is four times the amount you are entitled to use with a DB2 Express data server (which generally translates into better performance, or more supported users). The VU restriction is also more generous. DB2 Workgroup cannot be installed on a server that has a rating of more than 400 VUs, whereas the limit for DB2 Express is 200 VUs. DB2 Workgroup can also be licensed via the Authorized User model and shares the same minimum (five Authorized User licenses) as DB2 Express.
DB2 Workgroup can play many roles in a business. It's a good fit for small- or medium-sized businesses (SMBs) that need a full-fledged relational data server that is scalable and available over a wide area network (WAN) or local area network (LAN). DB2 Workgroup is also useful for enterprise environments that need silo servers for lines of business, or for departments that need the ability to scale in the future. As previously noted, an application written for any edition of DB2 is transparently portable to another edition on any distributed platform.
Add-on Feature Packs for DB2 Express and DB2 Workgroup Editions
DB2 Express and DB2 Workgroup come with the unique flexibility to add enterprise-like services (generally found in DB2 Enterprise) without having to buy a more expensive edition of DB2. Generally the price of these feature packs is such that if you only need one or two specific features, you can save money by purchasing the appropriate feature packs instead of purchasing DB2 Enterprise (as long as you remain within the architectural limitations of the DB2 edition you are implementing). This isn't the case with other competitive data server offerings.
Features Packs for DB2 Express and DB2 Workgroup are licensed in the same manner as the underlying data server. In other words, if you licensed your DB2 Express data server using the VU metric, you have to license any add-on feature packs using the VU metric as well.
The following feature packs are available for DB2 Express and DB2 Workgroup data servers:
Workload Management Feature Pack
Allows you to use the Connection Concentrator and the DB2 Governor, as well as install DB2 Query Patroller on a DB2 Express or DB2 Workgroup data server.
The Connection Concentrator is useful for applications where multiple transient connections perform a limited amount of work in intervals. For example, think about a Web-based application where you browse around and selectively choose items to buy. You may be logged onto the system for a longer period of time while you browse potential items you wish to buy, but you're not making the data server work all the time because you're likely reading the page rather than continually clicking buttons. Concentrating a data server connection improves performance by allowing many more client connections to be processed efficiently, and it also reduces the memory used for each connection. This capability is part of a base DB2 Enterprise installation.
The DB2 Governor, also included by default with DB2 Enterprise, is used to reactively monitor the behavior of applications that run against a DB2 data server. Using the DB2 Governor, you can alter the behavior of applications or the data server by taking corrective actions in response to thresholds that you define in a configuration file. For example, if an application is using too much CPU, you can set a rule that when this threshold is breached, the application is terminated or given less CPU priority.
DB2 Query Patroller (DB2 QP) is used to proactively manage the workload of a data server, the opposite of the reactive DB2 Governor. With DB2 QP, you can define a set of user and group business policies that are proactively monitored. For example, if a user submitted a query that the optimizer estimated would cost 1,000,000 timerons, and you set a business rule stating that no queries can be larger than 100,000 timerons, DB2 QP would stop this query from being processed on the data server. You can also use DB2 QP to perform charge-back accounting because it tracks valuable usage information (this information can also be used for performance tuning and more). DB2 QP is detailed in the "DB2 Query Patroller" section later in this chapter.
Performance Optimization Feature Pack
Makes available the use of materialized query tables (MQTs), multi-dimensional clustering (MDC) tables, and query parallelism for DB2 Express and DB2 Workgroup servers. All of these features are used to provide exceptional performance and are part of a base DB2 Enterprise installation.
DB2 comes with a number of high-performance objects and capabilities that allow it to scale to hundreds of thousands of users and into the millions of transactions per minute or queries per hour. This feature pack provides the ability to create MDC tables and MQTs in your DB2 Express and DB2 Workgroup data servers. These objects provide immense benefits for applications running on DB2. In fact, we'd say that some of the most important components for any high-performing application are part of this feature pack. (If you're running a data warehouse be sure that you know what MDCs and MQTs are.) If you're looking to really boost the performance of an application running on the smaller servers for which DB2 Express and DB2 Workgroup were made, this feature pack has components that could prove very valuable to your business.
High-Availability Feature Pack
Gives DBAs a free two-node license of Tivoli System Automation (TSA) for high-availability failover clustering, the ability to run online table reorganizations, and the High-Availability Disaster Recovery (HADR) feature. All of the features in this feature pack are part of a base DB2 Enterprise installation.
HADR is a high-availability feature that provides a database availability protection plan that is very simple to set up and use. The best part about HADR is that you set it up with mere clicks of a button. The online table reorganization capability, as its name implies, allows you to reorganize tables online. Finally, this feature pack includes a two-node cluster license for Tivoli System Automation (TSA) for AIX and Linux—you can use it to cluster together your servers for high-availability or to automate the failover of an HADR environment.
pureXML Feature Pack
Provides the ability to create pureXML columns in a DB2 Express or DB2 Workgroup data server and use an associated set of XML services when working with this data.
You might be confused about the DB2 XML Extender (covered later in the "DB2 Extenders" section) and the pureXML add-on feature pack that's available in DB2 9. The DB2 XML Extender provides the XML capabilities that were part of the DB2 8 release. In contrast, the pureXML feature enables DB2 servers to exploit the new hybrid storage engine that stores XML naturally in DB2 9. The performance, usability, flexibility, and overall XML experience of pureXML can't even be compared to the older DB2 XML Extender technology; however, the DB2 XML Extender is still shipped in DB2 9 free-of-charge. If you are planning to use XML in your data environment we strongly recommended you use the pureXML feature.
The pureXML feature lets you store XML in a parsed tree representation on disk, without having to store the XML in a large object or shred it to relational columns as you are forced to with the DB2 XML Extender. This can be very beneficial for applications that need to persist XML data. Access to XML data via the pureXML feature pack is a very natural experience; for example, you can use SQL or XQuery to get to relational or XML data.
The pureXML feature also has facilities to store XML Schema Definition (XSD) documents in a native XML Schema Repository (XSR) service. It also supports schema annotations for document shredding, validation services, and more.
Homogeneous Federation Feature Pack
Provides the ability to create nicknames across the DB2 family of data servers. This feature allows developers to build applications that access DB2 tables that reside on different platforms without regard to their location. For example, you could use this feature to easily create an application that performs a join of data that resides on a DB2 for i5/OS data server with one that's running on DB2 for Windows. Even if you were working within an integrated development environment (IDE) such as IBM Rational Application Developer or Microsoft Visual Studio 2005, you still wouldn't be able to tell where each table actually resides—which is the whole point. The capability of this feature pack is a subset of the WebSphere Federated Server product covered later in this chapter.
DB2 Enterprise Edition
DB2 Enterprise Edition (DB2 Enterprise) is the premier data server offering from IBM that is the foundation of many mission-critical systems and the primary focus of this book. It is fully Web-enabled, scalable from single core servers to symmetric multicore servers and to massively parallel systems.
This edition of DB2 is considered to be the full-function offering from IBM (for example, it's the only DB2 edition that includes table partitioning). It is available on the same supported flavors of Linux, UNIX, and Windows as DB2 Workgroup. Applications built for DB2 Enterprise can scale upward and execute on massively parallel systems or downward to smaller servers.
DB2 Enterprise is meant for large and mid-sized departmental servers. DB2 Enterprise includes all the functions of the DB2 Express and DB2 Workgroup editions, and more. Additionally, there is a set of feature packs that are exclusive to this edition, such as the new DB2 9 Storage Optimization feature that provides deep row compression, and more.
DB2 Enterprise can be licensed using the same VU metric as DB2 Express and DB2 Workgroup; however, in the case of DB2 Enterprise, there is no RAM or VU server rating limits for the server on which this product is installed. DB2 Enterprise can also be licensed via the Authorized User metric. In contrast to DB2 Express and DB2 Workgroup, when licensing DB2 Enterprise using the Authorized User metric, the minimum number of Authorized Users for which it must be licensed is 25 Authorized Users for every 100 VUs on the server where it is installed. For example, if you installed DB2 Enterprise on an 8-core System p server (rated at 800 VUs), you would have to buy 200 Authorized User licenses at a minimum to use this licensing option, even if you intended for this data server to support fewer than 200 users.
Add-on Feature Packs for DB2 Enterprise Edition
DB2 Enterprise also comes with a set of add-on feature packs just like DB2 Express and DB2 Workgroup. For the most part (the exception being the pureXML feature pack), the extensibility features that can be purchased for DB2 Enterprise are unique to this edition of the DB2 family and provide even richer enterprise-capabilities to the data run-time environment.
The way you license DB2 Enterprise feature packs must be identical to how the DB2 Enterprise data server was licensed. This is one area where DBAs must pay particular attention when deciding how to license their DB2 Enterprise data servers because some feature packs aren't available with the DB2 Enterprise Authorized User license.
The feature packs that are available in DB2 Enterprise via Authorized User or Value Unit licensing are:
Performance Optimization Feature Pack
DBAs get licenses to install DB2 Query Patroller and the DB2 Performance Expert tool with this feature pack. The DB2 Query Patroller component of this feature pack is the same one that's part of the Workload Management feature pack available for DB2 Express and DB2 Workgroup.
The DB2 Performance Expert is a separately purchasable tool that's used to simplify performance management and tuning. It offers DBAs a consistent view into their instances, subsystems, databases, and applications across the DB2 family. It has a set of pre-canned reports to identify resource shortage and exception conditions in DB2 including locking conflicts, deadlocks, and application and SQL statements causing high workload. The DB2 Performance Expert also includes a set of detailed reports about SQL, database, and buffer pool activity with trend analysis and what-if hypothesis testing for performance evaluation.
pureXML Feature Pack
This feature pack provides the same capabilities as the one offered for DB2 Express and DB2 workgroup detailed earlier in this section.
Advanced Access Control Feature Pack
Provides label-based access control (LBAC) protection services to data stored in a DB2 Enterprise data server. Using this feature, data stewards can control the read and write access of a user at the table column and row level. LBAC implements data access controls by creating an affinity between columns and generated protection security labels. If users attempt to get data from a table, they must have matching label credentials (or a parent label) granted to them. When there's a match, access is permitted; without a match, access is denied.
You can use this feature pack to create a security framework whose architecture is built on a hierarchal representation that matches the data access hierarchy, an array of that business entity, or a mix of the two. LBAC can also be used to restrict access to XML documents in columns (though as of the time of this writing, you cannot attach labels to fragments within the document itself).
Geodetic Data Management Feature Pack
This feature pack includes the DB2 Geodetic Data Management Extender that can be used to provide advanced spatial analysis capabilities. What separates this feature from the free DB2 Spatial Extender that comes with all DB2 data servers is that the Geodetic Data Management feature pack contains a built-in set of algorithms that take into consideration the curvature of objects, such as the earth's surface and so on. For example, maps generally are associated with some sort of applied projection in consideration of the map's purpose. The Mercator projection is very popular for navigational maps. When you look at a map generated using this projection, you'll notice that the top and bottom of the map seem much bigger than they really are. The fact that Greenland is one fourteenth the size of Africa often comes as a surprise when people think back to their public school atlas—this feature pack compensates for these distortions.
While distortions caused by projections may not be of significance to applications that attempt to locate an address or division of a city, for weather pattern analysis or defense programs it could be very significant. If you can't afford to lose accuracy because of a projection, the Geodetic Data Management feature pack may be appropriate for your applications.
Real-Time Insight Feature Pack
Useful for managing large volumes of incoming data streams. Existing infrastructures can be easily overwhelmed when trying to manage large volumes of incoming data. Incoming data with message rates of tens to hundreds of thousands of messages per second can make it difficult to analyze this high volume of data.
The DB2 Real-Time Insight feature pack is powered by the DB2 Data Stream Engine (not discussed in this chapter since it's beyond the scope of this book) that enables organizations to store and forward high volumes of data from multiple data streams. The data messages from the feed can be aggregated, filtered, and enriched in real time before being stored or forwarded.
DB2 Data Stream Engine can load high volumes of data into the DB2 data server and make that data available to queries in real time through SQL. One example is a financial market data stream that provides information about financial transactions, such as stock trades and quotes.
The benefits of the Real-Time Insight Feature include:
- Scalable solution loads large volumes of data with high throughput and low latency
- Simultaneous storing and publishing of data from multiple feeds
- Insight into the data with filtering and aggregation from feeds before storing and publishing
- Maintenance of metadata, such as current state, for entities that are processed from the feeds
- Simultaneous persistence of data to multiple database servers on multiple hosts
- Real-time access by use of shared memory storage
- Easy access to both real-time and historical data through standard SQL, C-API, and Java API interfaces.
Mobility on Demand Feature Pack
This feature pack provides the components of DB2 Everyplace Enterprise Edition detailed earlier in this chapter. When using DB2e via the feature pack, you need to be aware that the DB2 SyncServer must be collocated with the DB2 Enterprise data server (this is the reason for its reduced cost when compared to a full licensed version of DB2 Everyplace Enterprise Edition).
Homogeneous Federation Feature Pack
This feature pack provides the same capabilities as the one offered for DB2 Express and DB2 Workgroup detailed earlier in this section.
Two feature packs available in DB2 Enterprise only through Value Unit licensing (in other words, you can't buy these feature packs with DB2 Enterprise servers that are licensed with authorized users) are:
Database Partitioning Feature (DPF)
DB2 Enterprise provides the capability to enable DB2 to partition data across clusters or massively parallel servers. To the end user or application developer, a partitioned database appears to be on a single system, yet SQL statements are processed in parallel across all servers, thus increasing the execution speed for any given query.
The DPF delivers the true principals of scalability to a DB2 Enterprise environment, namely:
- Double the resources, double the data: Each partition processes the same amount of data as before, and response times and throughput remain constant.
- Double the resources, keep data constant: Each partition processes half the amount of data as before, and response times will be cut in half, and throughput will double.
- Keep resources constant, double the data: Each partition processes double the amount of data as before, response times should double, and throughput will be cut in half.
You can partition data using the DPF across logical (within a larger SMP) and physical servers. An example of a partitioned database across multiple physical servers (though each server typically is small SMP server) is shown in Figure 1-7.
Figure 1-7 DB2 partitioned across multiple servers using DPF
In Figure 1-7 you can see that there are actually six copies of DB2. However, they all appear as a single copy to applications and administrators. Imagine the performance difference between a single copy of DB2 scanning 600,000 rows versus each copy of DB2 owning its own data and resources scanning just 100,000 rows. This is the power of the DPF.
Parallelism in DB2 is automatic and extended to the hash partitioning algorithm used in the DPF. If you are selecting just a few records (where partitioning key value = X), then DB2 will send that query directly to the node that contains that data. However, if you are scanning large amounts of data (as shown in Figure 1-7, typical in data warehousing), then DB2 will send the query to all partitions in the cluster and automatically parallelize the data access operations, driving more resources (RAM, CPU, and I/O) to get the job done faster.
It's not just query performance and faster maintenance operations that are delivered by the DPF. There is a significant resource savings per server because each server owns 1/nth of that data and generally requires fewer resources. Compare the partitioned database (shown on the left side in Figure 1-8) and the non-partitioned database on the right.
Figure 1-8 Saving resources with the Database Partitioning Feature
You can see that the servers comprising the partitioned database on the left side of Figure 1-8 require less memory, as each server is only responsible for 1/nth of the data. In contrast, the non-partitioned database on the right side of Figure 1-8 requires that much more memory be allocated to the buffer pool to accommodate the data.
Finally, DB2 can mix its intra-partition parallelism capabilities (where it runs components of an SQL statement in parallel on a single server) with its inter-partition parallelism (the DPF feature) as shown in Figure 1-9.
Figure 1-9 Two kinds of parallelism with DB2 and DPF
You should consider the DPF feature pack as an add-on to DB2 Enterprise in the following circumstances:
- The speed of database utility operations is key to your business operations. Operations like reorganizations, backups, and so on are parallelized with DPF so they can be performed much quicker. For example, if the six servers in Figure 1-7 all run the backup utility, they each only need to back-up one-sixth of the total data, and this operation should complete in one-sixth of the time when compared to a non-partitioned database.
- You need to shrink your batch window because of long extract, transform, and load processes (ETL). For example, data server load jobs are also parallelized. For example, if you had to load 600 GB of data, only 100 GB of data would need to be loaded into each server.
- Rolling window data update requirements for the warehouse make parallel SQL processing and additional log space essential. Not only does DPF give you more power to process the SQL, it gives you more resources (log space, memory, and so on).
- DPF should be considered if the database contains more than 400 GB of RAW data. Other indicators for using DPF are the total number of rows in the table and whether scan performance is critical to workload performance. Large databases can be supported with DB2 Enterprise Edition, but large databases typically benefit from the DPF.
- Your environment is characterized by complex queries that involve large aggregations, multi-table joins, and so on. Having multiple servers working on the same SQL problem can generally return results faster than a single server.
- Your servers have plenty of available memory. Even though DB2 has 64-bit memory support for non-partitioned databases, multiple partitions have proven to provide more linear scalability and more efficient usage of memory than SMP parallelism alone.
DB2 Storage Optimization Feature
This feature provides deep row compression for you DB2 tables, as well as a backup compression utility that can significantly reduce the on disk size of your database backups. Tests have shown that more than 70 percent of table compression (data only) can be achieved when using this feature pack. Note that indexes remain uncompressed for performance reasons.
A number of other benefits, in addition to disk savings from having smaller tables, arise from row compression. Backups will be smaller (even without backup compression); maintenance operations (like backup) should run faster since there are fewer data pages to backup up; Q/A and test environments will be smaller; heating, ventilation, and air conditioning (HVAC) charges will be decreased since you need fewer disks; and query performance likely will increase because each I/O to disk brings back more rows into memory.
Row compression will drive up the CPU utilization of a data server. However, most systems (especially data warehouse systems) are I/O bound and, since compression will allow more rows on a data page and therefore more rows in the resident memory buffers, you'll likely see improved overall performance of your application.
Backup compression is also a part of this feature pack. The database backup architecture in DB2 is based on an open pluggable interface that allows you to use either the default compression algorithm that's shipped with DB2 or one of your own. The compression algorithm you use is embedded within the backup image so that it can be restored on a different server. Quite simply, if disaster happens and you lose the server where the compression algorithm is located, you can still get your data back because the compression algorithm is part of the backup image.