Home > Store

XML Data Management: Native XML and XML-Enabled Database Systems

By Akmal B. Chaudhri, Awais Rashid, Roberto Zicari
Published Mar 12, 2003 by Addison-Wesley Professional.

Book

Sorry, this book is no longer in print.

Not for Sale

Description

Extras

Sample Content

Updates

More Information

Description

Copyright 2003
Dimensions: 7-3/8" x 9-1/4"
Pages: 688
Edition: 1st

Book
ISBN-10: 0-201-84452-4
ISBN-13: 978-0-201-84452-8

Within a very short amount of time XML has become an essential part of almost every developer's arsenal of tools. It has affected every area of software. One of the fields where the impact of XML is still being worked out is in the world of databases and data management. Will XML and native XML databases replace traditional relational databases? How can XML be used as a tool to make relational databases even stronger? This book is intended to address these questions. It provides a discusssion of the various XML data management approaches employed in a range of products and applications. The book is based on a series of presentations at last year's OOPSLA conference. Topics covered range from using XML with Oracle9i or SQL Server to embedded XML databases to Tamino. Individual chapters are written by experts in those fields. In all cases the authors use concrete, practical examples, explore alternative approaches, and examine their strengths and weaknesses. There is no other book with the breadth of coverage offered here.



Extras

Web Resources

Click for Web Resources related to this title.



Sample Content

Downloadable Sample Chapter

Click below for Sample Chapter(s) related to this title:
Sample Chapter 1

Preface.

Acknowledgments.

I. WHAT IS XML?

1. Information Modeling with XML.

Introduction.

XML as an Information Domain.

How XML Expresses Information.

Patterns in XML.

Common XML Information-Modeling Pitfalls.

Attributes Used as Data Elements.

Data Elements Used as Metadata.

Inadequate Use of Tags.

A Very Simple Way to Design XML.

Conclusion.

II. NATIVE XML DATABASES.

2. TaminoSoftware AG's Native XML Server.

Introduction.

Tamino Architecture and APIs.

XML Storage.

Collections and Doctypes.

Schemas.

Access to Other DatabasesTamino X-Node.

Mapping Data to FunctionsTamino X-Tension.

Internationalization Issues.

Indexing.

Organization on Disk.

Querying XML.

Query LanguageTamino X-Query.

Sessions and Transactions.

Handling of Results.

Query Execution.

Tools.

Database Browsing.

Schema Editing.

WebDAV Access.

X-Application.

Full Database Functionality.

Conclusion.

3. eXist Native XML Database.

Introduction.

Features.

Schema-less XML Data Store.

Collections.

Index-Based Query Processing.

Extensions for Full-Text Searching.

System Architecture Overview.

Pluggable Storage Backends.

Deployment.

Application Development.

Getting Started.

Query Language Extensions.

Specifying the Input Document Set.

Querying Text.

Outstanding Features.

Application Development.

Programming Java Applications with the XML:DB API.

Accessing eXist with SOAP.

Integration with Cocoon.

Technical Background.

Approaches to Query Execution.

Indexing Scheme.

Index and Storage Implementation.

Query Language Processing.

Query Performance.

Conclusion.

4. Embedded XML Databases.

Introduction.

A Primer on Embedded Databases.

Embedded XML Databases.

Building Applications for Embedded XML Databases.

Overview of Berkeley DB XML.

Configuration.

Indexing and Index Types.

XPath Query Processing.

Programming for Transactions.

Two-Phase Locking and Deadlocks.

Reducing Contention.

Checkpoints.

Recovery Processing after Failures.

Conclusion.

III. XML AND RELATIONAL DATABASES.

5. IBM XML-Enabled Data Management Product Architecture and Technology.

Introduction.

Product and Technology Offering Summaries.

DB2 Universal Database.

Information Integration Technology.

Current Architecture and Technology.

Shared Architecture and Technology.

XML Extender Architecture.

XML Extender Technology.

Using Both XML Collections and XML Columns.

Transforming XML Data.

Searching, Parsing, and Validating XML Data.

XML Extender Federated Support.

SQL XML Support Architecture.

SQL XML Support Technology.

Data Management Web Services Architecture.

Data Management Web Services Technology.

Information Integration-Specific Architecture and Technology.

Future Architecture and Technology.

The Vision.

Application Interface, Data Type, and API Goals.

Storage, Engine, and Data Manager Goals.

Why Support Both XML and Relational Storage in One System?

Why Not Object-Relational Long Term?

Impacted Technology Areas.

Conclusion.

Notices.

6. Supporting XML in Oracle9i.

Introduction.

Storing XML as CLOB.

Using CLOB and the OracleText Cartridge.

Search Predicates in OracleText.

XML-Specific Functionality.

Prerequisites.

XMLType.

Object Type XMLType.

Processing of XMLType in Java.

Using XSU for Fine-Grained Storage.

Canonical Mapping.

Retrieval.

Modifications.

Building XML Documents from Relational Data.

SQL Functions existsNode and extract.

The SQL Function SYS_XMLGen.

The SQL Function SYS_XMLAgg.

PL/SQL Package DBMS_XMLGen.

Web Access to the Database.

The Principle of XSQL.

Posting XML Data into the Database.

Parameterization.

Servlet Invocations.

Special Oracle Features.

URI Support.

Parsers.

Class Generator.

Special Java Beans.

Conclusion.

7. XML Support in Microsoft SQL Server 2000 165

Introduction.

XML and Relational Data.

XML Access to SQL Server.

Access via HTTP.

Using the XML Features through SQLOLEDB, ADO, and .NET.

Serializing SQL Query Results into XML.

The Raw Mode.

The Auto and Nested Modes.

The Explicit Mode.

Providing Relational Views over XML.

SQLXML Templates.

Providing XML Views over Relational Data.

Annotated Schemata.

Querying Using XPath.

Updating Using Updategrams.

Bulk Loading.

Conclusion.

8. A Generic Architecture for Storing XML Documents in a Relational Database.

Introduction.

System Architecture.

Installing Xerces.

The Data Model.

DOM Storage in Relational Databases.

The Nested Sets Model.

Creating the Database.

The Physical Data Model.

Creating User-Defined Data Types.

Creating the Tables.

Serializing a Document out of the Repository.

Building an XML Document Manually.

Connecting to the Repository.

The xmlrepDB Class.

Uploading XML Documents.

The xmlrepSAX Class.

Stored Procedures for Data Entry.

The uploadXML Class.

The extractXML Class.

Querying the Repository.

Ad Hoc SQL Queries.

Searching for Text.

Some More Stored Procedures.

Generating XPath Expressions.

Further Enhancements.

Conclusion.

9. An Object-Relational Approach to Building a High-Performance XML Repository.

Introduction.

Overview of XML Use-Case Scenario.

High-Level System Architecture.

Detailed Design Descriptions.

Conclusion.

IV. APPLICATIONS OF XML.

10. Knowledge Management in Bioinformatics.

Introduction.

A Brief Molecular Biology Background.

Life Sciences Are Turning to XML to Model Their Information.

A Genetic Information Model.

NeoCore XMS.

Integration of BLAST into NeoCore XMS.

Sequence Search Types.

Conclusion.

11. Case Studies of XML Used with IBM DB2 Universal Database.

Introduction.

Case Study 1: “Our Most Valued Customers Come First”.

Company Scenario.

How This Business Problem Is Addressed.

Future Extensions.

Case Study 2: “Improve Cash Flow”.

Company Scenario.

How This Business Problem Is Addressed.

Future Extensions.

Conclusion.

Notices.

12. The Design and Implementation of an Engineering Data Management System Using XML and J2EE.

Introduction.

Background and Requirements.

Overview.

Security Service.

Query Service.

Image Query Service.

Print Service.

Design Choices.

Using XML in OAI.

Conversion of XML Input into Objects.

Conversion of Database Data into XML.

Conversion of Image Data into XML.

Database Access.

Validation.

Future Directions.

XSLT.

Web Services.

Mass Transfer Capability.

Messaging.

Conclusion.

13. Geographical Data Interchange Using XML-Enabled Technology within the GIDB System.

Introduction

GIDB METOC Data Integration.

Background.

Implementation.

GIDB Web Map Service Implementation.

GIDB GML Import and Export.

Conclusion.

14. Space Wide Web by Adapters in Distributed Systems Configuration from Reusable Components.

Introduction.

Advanced Concept Description: The Research Problem.

Future Supporting Communications Satellites Constellations.

Integration of Components with Architecture.

Example.

Future Generation NASA Institute for Advanced Concepts, Space Wide Web Research, and Boundaries.

Advanced Concept Development.

The Research Approach.

The Research Tasks.

Conclusion.

15. XML as a Unifying Framework for Inductive Databases.

Introduction.

Past Work.

Extracting and Evaluating Association Rules.

Classifying Data.

Inductive Databases.

PMML.

The Proposed Data Model: XDM.

Basic Concepts.

Classification with XDM.

Association Rules with XDM.

Benefits of XDM.

Toward Flexible and Open Systems.

Related Work.

Conclusion.

16. Designing and Managing an XML Warehouse.

Introduction.

Why a View Mechanism for XML?

Contributions.

Outline.

Architecture.

Data Warehouse Specification.

View Model for XML Documents.

Graphic Tool for Data Warehouse Specification.

Managing the Metadata.

Data Warehouse.

View Definition.

Mediated Schema Definition.

Storage and Management of the Data Warehouse.

The Different Approaches to Storing XML Data.

Mapping XML to Relational.

View Storage.

Extraction of Data.

DAWAX: A Graphic Tool for the Specification and Management of a Data Warehouse.

Data Warehouse Manager.

The Different DAWAX Packages.

Related Work.

Query Languages for XML.

Storing XML Data.

Systems for XML Data Integration.

Conclusion.

V. PERFORMANCE AND BENCHMARKS.

17. XML Management System Benchmarks.

Introduction.

Benchmark Specification.

Benchmark Data Set.

Benchmark Queries.

Existing Benchmarks for XML.

The XOO7 Benchmark.

The XMach-1 Benchmark.

The XMark Benchmark.

Conclusion.

18. The Michigan Benchmark: A Micro-Benchmark for XML Query Performance Diagnostics.

Introduction.

Related Work.

Benchmark Data Set.

A Discussion of the Data Characteristics.

Schema of Benchmark Data.

Generating the String Attributes and Element Content.

Benchmark Queries.

Selection.

Value-Based Join.

Pointer-Based Join.

Aggregation.

Updates.

Using the Benchmark.

Conclusion.

19. A Comparison of Database Approaches for Storing XML Documents.

Introduction.

Data Models for XML Documents.

The Nontyped DOM Implementation.

The Typed DOM Implementation.

Databases for Storing XML Documents.

Relational Databases.

Object-Oriented Databases.

Directory Servers.

Native XML Databases.

Benchmarking Specification.

Benchmarking a Relational Database.

Benchmarking an Object-Oriented Database.

Benchmarking a Directory Server.

Benchmarking a Native XML Database.

Test Results.

Evaluation of Performance.

Evaluation of Space.

Conclusion.

Related Work.

Studies in Storing and Retrieving XML Documents.

XML and Relational Databases

XML and Object-Relational Databases.

XML and Object-Oriented Databases.

XML and Directory Servers.

Benchmarks for XML Databases.

Guidelines for Benchmarking XML Databases.

Summary.

20. Performance Analysis between an XML-Enabled Database and a Native XML Database.

Introduction.

Related Work.

Methodology.

Database Design.

Discussion.

Experiment Result.

Database Size.

SQL Operations (Single Record).

SQL Operations (Mass Records).

Reporting.

Conclusion.

21. Conclusion.
References.
Contributors.
Editors.

Chapter 1: Information Modeling with XML.

Chapter 2: TaminoSoftware AG's Native XML Server.

Chapter 3: eXist Native XML Database.

Chapter 4: Embedded XML Databases.

Chapter 5: IBM XML-Enabled Data Management Product Architecture and Technology.

Chapter 6: Supporting XML in Oracle9i.

Chapter 7: XML Support in Microsoft SQL Server 2000.

Chapter 8: A Generic Architecture for Storing XML Documents in a Relational Database.

Chapter 9: An Object-Relational Approach to Building a High-Performance XML Repository.

Chapter 10: Knowledge Management in Bioinformatics.

Chapter 11: Case Studies of XML Used with IBM DB2 Universal Database.

Chapter 12: The Design and Implementation of an Engineering Data Management System Using XML and J2EE.

Chapter 13: Geographical Data Interchange Using XML-Enabled Technology within the GIDB System.

Chapter 14: Space Wide Web by Adapters in Distributed Systems Configuration from Reusable Components.

Chapter 15: XML as a Unifying Framework for Inductive Databases.

Chapter 16: Designing and Managing an XML Warehouse.

Chapter 17: XML Management System Benchmarks.

Chapter 18: The Michigan Benchmark: A Micro-Benchmark for XML Query Performance Diagnostics.

Chapter 19: A Comparison of Database Approaches for Storing XML Documents.

Chapter 20: Performance Analysis between an XML-Enabled Database and a Native XML Database.

Index. 0201844524T02182003

Preface

The past few years have seen a dramatic increase in the popularity and adoption of XML: the eXtensible Markup Language. This explosive growth is driven by its ability to provide a standardized, extensible means of including semantic information within documents describing semi-structured data. This makes it possible to address the shortcomings of existing markup languages such as HTML and support data exchange in e-business environments.

Consider, for instance, the simple HTML document in Figure 1. The data contained in the document is intertwined with information about its presentation. In fact, the tags only describe how the data are to be formatted. There is no semantic information that the data represents a person's name and address. Consequently, an interpreter cannot make any sound judgments about the semantics as the tags could as well have enclosed information about a car and its parts. Systems such as WIRE Aggarwal et al. 1998 can interpret the information by using search templates based on the structure of HTML files and the importance of information enclosed in tags defining headings, etc. However, such interpretation lacks soundness and its accuracy is context dependent.

Dynamic web pages, where the data resides in a back-end database and is served using pre-defined templates, reduce the coupling between the data and its representation. However, the semantics of the data can still be confusing when exchanging information in an e-business environment. A particular item could be represented using different names (in the simplest case) in two systems in a business-to-business transaction. This enforces adherence to complex, often proprietary, document standards.

XML provides inherent support for addressing the above problems, as the data in an XML document is self-describing. However, the increasing adoption of XML has also raised new challenges. One of the key issues is the management of large collections of XML documents. There is a need for tools and techniques for effective storage, retrieval and manipulation of XML data. The aim of this book is to discuss the state-of-the-art in such tools and techniques.This chapter introduces the basics of XML and some related technologies before moving on to providing an overview of issues relating to XML data management and approaches addressing these issues. Only an overview of XML and related technologies is provided as there are several sources covering these concepts in depth.

What is XML?

XML is a W3C standard for document markup. It makes it possible to define custom tags describing the data enclosed by them. An example XML document containing data about a person is shown in Figure 2. Note that tags in XML can have attributes. However, for simplicity these have not been used in this example.

Unlike the HTML document in Figure 1, the document in Figure 2 contains only the data about the person and no representational information. The data and its meaning can be read from the document and formatted in a range of fashions as desired. One standard approach is to use XSL: the eXtensible Stylesheet Language.

The flexible nature of XML makes it an ideal basis for defining arbitrary languages. One such example is WML: the Wireless Markup Language. Similarly, the XML schema language used to describe the structure of XML documents is based on XML itself.

Well-Formed and Valid XML

Although XML syntax is flexible, it is constrained by a grammar that governs the permitted tag names, attachment of attributes to tags and so on. All XML documents must conform to these basic grammar rules. Such conformant documents are said to be well formed and can be interpreted by an XML interpreter. This avoids having to write an interpreter for each XML document instance.

In addition to being well formed, the structure of a particular XML document can be validated against a Document Type Definition (DTD) or an XML schema. An XML document conforming to a given DTD or schema is said to be valid.Data-Centric and Document-Centric XML

XML documents can be classified on the basis of data they contain. Data-centric documents capture structured data such as that pertaining to a product catalog, order or invoice. Document-centric documents, on the other hand, capture unstructured data as in articles, books or emails. Of course, the two types can be combined to form hybrid documents that are both data-centric and document-centric. Figure 3 provides examples of data-centric and document-centric XML.

XML Concepts

DTDs and XML Schemas

Both DTDs and XML schemas are mechanisms used to define the structure of XML documents. They determine what elements can be contained within the XML document, how they are to be used, what default values their attributes can have and so on. Given a DTD or XML schema and its corresponding XML document, a parser can validate whether the document conforms to the desired structure and constraints. This is particularly useful in data exchange scenarios as DTDs and XML schemas provide and enforce a common vocabulary for the data to be exchanged.

XML DTDs are subsets of SGML (Standard Generalized Markup Language) DTDs. An XML DTD lists the various elements and attributes in a document and the context in which they are to be used. It can also list any elements a document cannot contain. However, it does not define constraints such as the number of instances of a particular element within a document, the data type of data within each element and so on. Consequently, they are inherently suitable for document-centric XML as compared to data-centric XML. This is because data typing and instantiation constraints are less critical in the former case. However, they can be and are being used for both types of documents.

Figure 4 shows a DTD for the simple XML document in Figure 2. It describes which primitive elements form valid components for the three composite ones: PERSON, NAME and ADDRESS. The keyword #PCDATA signifies that the element does not contain any tags or child elements and only parsed character data.

XML schemas differ from DTDs in that the XML schema definition language is based on XML itself. As a result, unlike DTDs, the set of constructs available for defining an XML document is extensible. XML schemas also support namespaces and richer and more complex structures than DTDs. In addition, stronger typing constraints on the data enclosed by a tag can be described as a range of primitive data types such as string, decimal, integer, etc. are supported. This makes XML schemas highly suitable for defining data-centric documents. Another significant advantage is that XML schema definitions can exploit the same data management mechanisms as designed for XML; an XML schema is an XML document itself. This is in direct contrast with DTDs, which require specific support to be built into an XML data management system.

Figure 5 shows an XML schema for the simple XML document in Figure 2. The sequence tag is a compositor indicating an ordered sequence of sub-elements. There are other compositors for choice and all. Also, note that, as shown for the ADDRESS element, it is possible to constrain the minimum and maximum instances of an element within a document. Although not shown in the example, it is possible to define custom complex and simple types. For instance, a complex type Address could have been defined for the address element.

DOM and SAX

DOM and SAX are the two main APIs for manipulating XML documents in an application. They are now part of the Java API for XML Processing (JAXP version 1.1). DOM is the W3C standard Document Object Model, an operating system and programming language independent model for storing and manipulating hierarchical documents in memory. A DOM parser parses an XML document and builds a DOM tree, which can then be used to traverse the various nodes. However, the tree has to be constructed before traversal can commence. As a result, memory management is an issue when manipulating large XML documents. This is highly resource intensive especially in cases where only a small section of the document is to be manipulated.

SAX, the Simple API for XML, is a de-facto standard. It differs from DOM in that it uses an event-driven model. Each time a starting or closing tag, or a processing instruction is encountered the program is notified. As a result, the whole document does not need to be parsed before it is manipulated. In fact, sections of the document can be manipulated as they are parsed. Therefore, SAX is better suited to manipulating large documents as compared to DOM.

XML-Related Technologies

XPath

XPath, the XML Path Language, provides common syntax and semantics for locating and linking to information contained within an XML document. Using XPath the information can be addressed in two ways:Sum A hierarchical fashion based on the ordering of elements in a document treeSum An arbitrary manner relying on elements in a document tree having unique identifiersA few example XPath expressions, based on the sample XML document in Figure 2, are shown in Figure 6. Example 1 expresses all children named FIRSTNAME in the current focus element. Example 2 selects the child node SURNAME whose parent node is NAME within the current focus element while example 3 tests whether an element is present in the union of the elements NAME and ADDRESS. Note that, although not shown in the examples, it is also possible to specify constraints such as first ADDRESS of the third PERSON in the document.

XSL

Since an XML document does not contain any representational information, it can be formatted in a flexible manner. A standard approach to formatting XML documents is using XSL, the eXtensible Style sheet Language. The W3C XSL specification is composed of two parts: XSL Formatting Objects (XSL FO) and XSL Transformations (XSLT).

XSL FO provides formatting and flow semantics for rendering an XML document. A rendering agent is responsible for interpreting the abstract constructs provided by XSL FO in order to instantiate the representation for a particular medium.XSLT offers constructs to transform information from one organization to another. Although designed to transform an XML vocabulary to an XSL FO vocabulary, XSLT can be used for a range of transformations including those to HTML as shown in Figure 7. The example stylesheet uses a set of simple XSLT templates and XPath expressions to transform a part of the XML document in Figure 2 to HTML.

SOAP

SOAP is the Simple Object Access Protocol used to invoke code over the Internet using XML and HTTP. The mechanism is similar to Java Remote Method Invocation (RMI). In SOAP, method calls are converted to XML and transmitted over HTTP. SOAP was designed for compatibility with XML schemas though their use is not mandatory. Being based on XML they offer a seamless means to describe and transmit SOAP types.

XML Data Management

So far, we have discussed the basics of XML and some of its related technologies. The discussion brings to front the fundamental advantages of XML hence providing an insight into the reasons behind its growing popularity and adoption. As more and more organizations and systems employ XML within their information management and exchange strategies, classical data management issues pertaining to its efficient and effective storage, retrieval, querying, indexing and manipulation arise. At the same time, previously uncharted information modeling challenges appear.

Database vendors have reacted to these new data and information management needs. Most commercial relational, object-relational and object-oriented database systems offer extensions, plug-ins and other mechanisms to support management of XML data. In addition to this XML support within existing database management systems, native XML databases have been born. These are designed for seamless storage, retrieval and manipulation of XML data and integration with related technologies.

With the large number of approaches and solutions available in the market, organizations and system developers with XML data management needs face a variety of challenges:

What are the various XML data management solutions available?

What are the features, services and tools offered by these different XML data management systems?

How can an in-house, custom solution be developed instead of using a commercially available system?

Which XML data management system or approach is the best in terms of performance and efficiency for a particular application?

Are there any good practice and domain or application-specific guidelines for information modeling with XML?

Are there other examples and applications of XML data management within a particular domain?

This book is aimed as a support mechanism to address the above challenges. It provides a discussion of the various XML data management approaches employed in a range of products and applications. It also offers some performance and benchmarking results and guidelines relating to information modeling with XML.

How this Book Is Organized

This book is divided into five parts each containing a coherent and closely related set of chapters. The five parts are as follows. It should be noted that these are self-contained and can be read in any order.

Introduction
Native XML Databases
XML and Relational Databases
Applications of XML
Performance and Benchmarks

Each part is summarized below.

Part 1: Introduction

This part contains a chapter by Brandin which focuses on guidelines for achieving good grammar and style when modeling information using XML. The author argues that good grammar alleviates the need for redundant domain knowledge required for interpretation of XML by application programs. Good style, on the other hand, ensures improved application performance, especially when it comes to storing, retrieving and managing information. The discussion offers insight into information modeling patterns inherent to XML and common XML information modeling pitfalls.

Part 2: Native XML Databases

Two native XML database systems: Tamino and eXist are covered in this part. In Chapter 2 Schoening provides an overview of Tamino's architecture and APIs before moving on to discussing its XML storage and indexing features. Querying, tool support and access to data in other types of repositories is also described. The chapter offers a comprehensive discussion of these features that are of key importance during the development of an XML data management application.

In a similar fashion Chapter 3 by Meier introduces the various features and APIs of the open source system eXist. However, in contrast with Chapter 2, the main focus is on how query processing works within the system. As a result, the author provides a deeper insight into its indexing and storage architectures. Together both chapters offer a balanced discussion, both on high level application programming features of the two systems and underlying indexing and storage mechanisms pertaining to efficient query processing.

Finally in Chapter 4, we have included an example of an embedded XML database system. This is based upon the general-purpose embedded database engine Berkeley DB. Berkeley DB XML is able to store XML documents natively, provides indexing and an XPath query interface. Some of the capabilities of the product are demonstrated through code examples.

Part 3: XML and Relational Databases

This part provides an interesting mix of products and approaches to XML data management in relational and object-relational database systems. Chapters 5, 6 and 7 discuss three commercial products: IBM DB2, Oracle 9i and MS SQL Server 2000 respectively, while chapters 8 and 9 describe more general, roll-your-own strategies for relational and object-relational systems.Chapter 5 by Benham highlights the technology and architecture of XML data management and information integration products from IBM. The focus is on the DB2 Universal Database and Xperanto. The former is the family of products providing relational and object-relational data management support for XML applications through the DB2 XML Extender, extended SQL and support for web services. The latter is the planned set of products and functions to address information integration requirements. These are aimed at complementing DB2 capabilities with additional support for XML and both structured and unstructured applications.

In Chapter 6, Hohenstein discusses similar features of Oracle 9i: the use of Oracle's CLOB functionality and OracleText Cartridge for handling data centric XML documents and the XMLType, a new object type based on the object-relational functionality in Oracle 9i, for managing document centric ones. He presents the Oracle SQL extensions for XML and provides examples on how to use these in order to build XML documents from relational data. Special features and tools for XML such as URI (Uniform Resource Identifier) support, parsers, class generator and Java Beans encapsulating these are also described.

In Chapter 7, Rys covers a feature set, similar to the ones in Chapters 5 and 6, for MS SQL Server 2000. He focuses on scenarios involving exporting and importing structured XML data. As a result the focus is on the different building blocks such as HTTP and SOAP access, queryable and updateable XML views, rowset views over XML and XML serialization of relational results. Rowset views and XML serialization are aimed at providing XML support for users more familiar with the relational world. XML views, on the other hand, offer XML-based access to the database for users more comfortable with XML.

Collectively, Chapters 5, 6 and 7 furnish an interesting comparison of the functionality offered by the three commercial systems and the various similarities and differences in their XML data management approaches. In contrast, Chapters 8 and 9 by Edwards and Brown respectively focus on generic, vendor independent solutions.

Edwards describes a generic architecture for storing XML documents in a relational database. The approach is aimed at avoiding vendor-specific database extensions and providing the database application programmer an opportunity to experiment with XML data storage without recourse to implementing much new technology. The database model is based on merging DOM with the Nested Sets Model hence offering the ability to store any well-formed XML document and ease of navigation. This results in fast serialization and querying but at the expense of update performance.

While Edwards' architecture is aimed at supporting the traditional relational database programmer, Brown's approach seeks to exploit the advanced features offered by the object-relational model and respective extensions of most relational database systems. He discusses object-relational schema design based on introducing, into the DBMS core, types and operators equivalent to the ones standardized in XML. The key functionality required of the DBMS core is an extensible indexing system allowing the comparison operator for built-in SQL types to be overloaded. The new SQL 3 types thus defined act as a basis during the mapping of XPath expressions to SQL 3 queries over the schema.

Part 4: Applications of XML

This part presents several applications and case studies in XML data management ranging from bioinformatics, geographical and engineering data management to customer services and cash flow improvement through to large scale distributed systems, data warehouses and inductive database systems.

In Chapter 10, Direen and Jones discuss various challenges in bioinformatics data management and the role of XML as a means to capture and express complex biological information. They argue that the flexible and extensible information model employed by XML is well suited for the purpose and that database technology must exhibit the same characteristics if it is keep in step with biological data management requirements. They discuss the role of NeoCore XML management system in this context and the integration of a BLAST (Basic Local Alignment Search Tool) sequence search engine to enhance its ability to capture, manipulate, analyze and grow the information pertaining to complex systems that make up living organisms.

Kowalski presents two case studies involving XML and IBM's DB2 Universal Database in Chapter 11. Her first case study is that of a customer services unit which needs to react to the problems from the most important customers first. The second case study focuses on improving cash flow in a school by reducing the time for reimbursement from the Department of Education. For each case study the author presents the scenario and the particular problem to be solved. This is followed by an analysis identifying existing conditions stopping the problem to be solved. A description of how XML and DB2 have been used to devise an appropriate solution concludes each case study.

Chapter 12, by Eglin, Hendra and Pentakalos, describes the design and implementation of the JEDMICS Open Access Interface, an EJB-based API that provides access to image data stored on a variety of storage media and meta-data stored in a relational database. The JEDMICS system uses XML as a portable data exchange solution and the authors discuss issues relating to its integration with the object-oriented core of the system and the relational database providing the persistent storage. A very interesting feature of the chapter is the authors' reflection on their experiences with a range of XML technologies such as DOM, JDOM, JAXB, XSLT and Oracle XSU in the context of JEDMICS.

In Chapter 13, Wilson and her co-authors offer an insight into the use of XML to enhance the GIDB (Geospatial Information Database) system to exchange geographical data over the Internet. They describe the integration of meteorological and oceanographic data, received remotely via the METCAST system, into GIDB. XML plays a key role here as it is utilized to express the data model catalog for METCAST. The authors also describe their implementation of the OpenGIS Web Map Server (WMS) specification to facilitate displaying georeferenced map layers from multiple WMS-compliant servers. Another interesting feature of this chapter is the implementation of the ability to read and write vector data using the OpenGIS Geographic Markup Language (GML), an XML-based language standard for data interchange in Geographic Information Systems (GISs).

Rine sketches his vision of an Interstellar Space Wide Web in Chapter 14. He contrasts the issues relating to the development and deployment of such a facility with the problems encountered in today's World Wide Web. Mainly, he focuses on adapters as configuration mechanisms for large scale, next generation distributed systems and means to increase the reusability of software components and architectures in this context. His approach to solving the problem is a configuration model and network-aware run-time environment called Space Wide Web Adapter Configuration eXtensible Markup Language (SWWACXML). The language associated with the environment captures component interaction properties and network-level QoS constraints. Adapters are automatically generated from the SWWACXML specifications. This facilitates reuse, as components are not tied to interactions or environments. Rine also discusses the role of the SWWACXML run-time system from this perspective as it supports automatic configuration and dynamic reconfiguration.

In Chapter 15, Meo and Psaila present an XML-based data model used to bridge the gap between various analysis models and the constraints they place on data representation, retrieval and manipulation in inductive databases. The model called XDM (XML for Data Mining) allows simultaneous representation of source raw data and patterns. It also represents the pattern definition resulting from the pattern derivation process hence supporting pattern reuse by the inductive database system. One of the significant advantages of XML in this context is the ability to describe complex heterogeneous topologies such as trees and association rules. In addition, the inherent flexibility of XML makes it possible to extend the inductive database framework with new pattern models and data mining operators resulting in an open system customizable to the needs of the analyst.

Chapter 16, the last chapter in this part, describes Baril's and Bellahsene's experiences in designing and managing an XML data warehouse. They propose the use of a view model and a graphical tool for the warehouse specification. Views defined in the warehouse allow filtering and restructuring of XML sources. The warehouse is defined as a set of materialized views and provides a mediated schema that constitutes a uniform query interface. They also discuss mapping techniques to store XML data using a relational database system without redundancies and with optimized storage space. Lastly, the DAWAX system implementing these concepts is presented.

Part 5: Performance and Benchmarks

XML database management systems face the same stringent efficiency and performance requirements as any other database technology. Therefore, the final part of this book is devoted to a discussion of benchmarks and performance analyses of such systems.

Chapter 17 is driven by the need to design and adopt benchmarks to allow comparative performance analyses of the fast growing number of XML database management systems. Here Bressan and his colleagues describe three existing benchmarks for this purpose, namely XOO7, XMach-1 and XMark. They present the database and queries for each of the three benchmarks and compare them against four quality attributes: simplicity, relevance, portability and scalability. The discussion is aimed at identifying challenges facing the definition of a complete benchmark for XML database management systems.

In Chapter 18, Patel and Jagadish describe a benchmark that is aimed at measuring lower-level operations than those described in Chapter 17. The inspiration for their work is the Wisconsin Benchmark that was used to measure the performance of relational database systems in the early 1980s.

Schmauch and Fellhauer describe a detailed performance analysis in Chapter 19. They compare the time and space consumed by a range of XML data management approaches: relational databases, object-oriented databases, directory servers and native XML databases. XML documents are converted to DOM trees, hence reducing the problem to storing and extracting trees. Instead of using a particular benchmark they derive their test suite from general requirements that the storage of XML documents have to meet. Different sized XML documents are stored using the four types of systems, selected fragments and complete documents are extracted and the performance and disk space used is measured. Similar to the next chapter, Chapter 20, the authors offer a thorough set of empirical results. They also provide a detailed insight into existing XML data management approaches using the four systems analyzed. Finally, the experiences presented in the chapter are used as a basis to derive guidelines for benchmarking XML data management systems.

In Chapter 20, Fong, Wong and Fong present a comparative performance analysis of a native XML database and a relational database extended with XML data management features. They do not use any existing benchmarks but instead devise their own methodology and database. The key contribution of this chapter is a detailed set of empirical results presented as bar graphs.

Who Should Read This Book

This book is primarily aimed at professionals that are experienced in database technology and possibly XML and wish to learn how these two technologies can be used together. We hope to achieve this through discussions about alternative architectural approaches, case studies and performance benchmarks. Since the book is divided into a number of self-contained sections, it can also be used as a reference and only the relevant sections that the reader is interested-in can be read. The book may also be useful to students studying on advanced database courses.

0201844524P10182002

Index

Click below to download the Index file related to this title:
Index



Updates

Submit Errata



More Information



InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Email Address

XML Data Management: Native XML and XML-Enabled Database Systems

Book

Description

Extras

Web Resources

Sample Content

Downloadable Sample Chapter

Table of Contents

Preface

What is XML?

Well-Formed and Valid XML

XML Concepts

DTDs and XML Schemas

DOM and SAX

XML-Related Technologies

XPath

XSL

SOAP

XML Data Management

How this Book Is Organized

Part 1: Introduction

Part 2: Native XML Databases

Part 3: XML and Relational Databases

Part 4: Applications of XML

Part 5: Performance and Benchmarks

Who Should Read This Book

Index

Updates

Submit Errata

More Information

InformIT Promotional Mailings & Special Offers

Other Things You Might Like