Enforcing Referential Integrity with XML Schema Identity Constraints

By Cliff Binstock

Date: Apr 11, 2003

Article is provided courtesy of Addison Wesley.

Return to the article


Discover how to keep your XML schema consistent through three simple, yet powerful elements: key, unique, and keyref.

Introduction

A schema—database, XML, or otherwise—specifies the structure and content of data. For the most part, the XML Schema Recommendation provides mechanisms for validating distinct datum (via simple types) and the structure of data (via complex types). XML Schema also provides for identity constraints, which permit a simplistic but powerful way to assure referential integrity.

The following scenario provides a common ground for discussing referential integrity:

While there are lots of details covered in this article, there are just three keywords that combine to provide XML Schema referential integrity:

Due to the complexity of constructing meaningful paths for the selectors and fields of an identity constraint, the individual element examples in this article do not make much sense by themselves. An XML schema with identity constraints and a corresponding XML instance provide context for the rest of this document.

Terminology

Identity Constraint: An identity constraint identifies a set of nodes that must be unique or that require referential integrity. The grammar for locating these nodes is a subset of XPath. Each identity constraint has a specific scope, which is the enclosing element type: Any element type may provide any number of identity constraints.

Node: The Post-Schema Validation Infoset (PSVI) describes a tree that represents an XML instance. The tree is made up of a set of nodes. Some of these nodes, particularly those associated with elements and attributes, provide the infrastructure for the validation provided by identity constraints.

Selector: In an XML schema, each identity constraint specifies a selector. Conceptually, the value of a selector is an XPath that identifies an element. From the perspective of writing the schema, the XPath identifies an element type. The XPath must specify an element type that is a descendant of the element type enclosing the identity constraint.

Field: In an XML schema, each identity constraint specifies one or more fields. Conceptually, the value of each field is an XPath that identifies a child element or an attribute of the element identified by the XPath of the corresponding selector. From the perspective of writing the schema, the XPath identifies an element type or attribute type. The XPath must specify an element type or attribute type that is the element type—or is a descendant of the element type—that encloses the identity constraint.

Target node set: During XML instance validation, the target node set is the set of nodes identified by the selector of an identity constraint.

Key sequence: During XML instance validation, the key sequence is the set of values associated with the elements and attributes identified by the fields of an identity constraint. There is one key sequence for each node in the target node set.

The key Element

The key identity constraint assures that each key sequence—the set of values specified by the XPath expressions of the nested field elements—is unique. A key sequence is unique if no two key sequences have "equal" values for all keys. Unlike the unique element, a key sequence must exist for each node in the target node set. The key identity constraint is appropriate, for example, for required elements or attributes.

A catalog part number, where every catalog entry must have a part number, is a great example of using a key:

<xsd:key name="partNumberKey">
  <xsd:annotation>
    <xsd:documentation xml:lang="en">
      The part number uniquely identifies
      each orderable item.
    </xsd:documentation>
  </xsd:annotation>
  <xsd:selector xpath="catalog/*"/>
  <xsd:field xpath="partNumber"/>
</xsd:key>

The unique Element

The unique identity constraint assures that each key sequence—the set of values specified by the XPath expressions of the nested field elements—is unique. A key sequence is unique if no two key sequences have "equal" values for all keys. Unlike the key element, the key sequence does not have to exist for each node in the target node set. The unique identity constraint is appropriate, for example, for optional elements or attributes.

In practice, this identity constraint is slightly more difficult to explain. The next example specifies that if an order has a shipping identifier attribute, that shipping identifier must be unique:

<xsd:unique name="orderShippedUnique">
  <xsd:annotation>
    <xsd:documentation xml:lang="en">
      The customerID uniquely identifies
      each customer.
    </xsd:documentation>
  </xsd:annotation>
  <xsd:selector xpath="order"/>
  <xsd:field xpath="@shipmentID"/>
</xsd:unique>

The keyref Element

The keyref identity constraint assures that each key sequence specified by the keyref exists, by virtue of equality, as a key sequence in the reference identity constraint. The reference identity constraint is always a unique or key identity constraint.

The following example specifies that every item in an order must exist in the list specified by the already specified partNumberKey:

<xsd:keyref name="partNumberRef"
      refer="partNumberKey">
  <xsd:annotation>
    <xsd:documentation xml:lang="en">
      Each order must refer to a known part number.
    </xsd:documentation>
  </xsd:annotation>
  <xsd:selector xpath="compressedOrder/order/item"/>
  <xsd:field xpath="partNumber"/>
</xsd:keyref>

Note that while this article does not provide examples (the book does!), a unique, key, and keyref each might specify multiple fields (that is, a multi-valued key such as "last name" and "first name" combined).

Selectors and Fields

Each identity constraint specifies a selector and one or more fields. The value of the XPath of each selector ultimately identifies a target node set in a corresponding XML instance. Each field, which is slightly more complicated, identifies one of the following:

Like a selector, a field ultimately identifies an element type or an attribute type. The XPath specified by the field is relative to the XPath specified by the selector. A key sequence is the set of values for a specific node in the target set that correspond to the set of fields specified by an identity constraint.

Here, again, is the selector from the key example:

<xsd:selector xpath="catalog/*"/>

Note that each catalog entry is one of unitCatalogEntryType, bulkCatalogEntryType, or assemblyCatalogEntryType. The selector selects every element directly below catalog elements via the '*'.

The field for the partNumberRef key specifies partNumber, which appears in each catalog entry, regardless of the entry type:

<xsd:field xpath="partNumber"/>

The effect of the partNumberRef key is to enforce that there is a distinct part number for each catalog entry, regardless of entry type.

XPath

The grammar for specifying a selector or field is a subset of the XPath grammar. While this article does not go into details regarding this grammar, suffice it to say that selectors specify element types via a '/' delimited path. The field grammar is identical to the selector grammar, with the addition of '@' to prefix an attribute type. Note that the following special sub-expressions:

The next table provides a set of paths, descriptions, and corresponding XML as fairly encompassing examples of XPath for selectors and fields:

XPath

Description

Structure

.

The current element.

<current>foo</current>
Xyz

The child element xyz of the current element.

<current>
 <xyz>foo</xyz>
 <xyz>bar</xyz>
</current>
abc/xyz

The xyz child element of the abc child element of the current element.

<current>
 <abc>
  <xyz>foo</xyz>
 </abc>
 <abc>
  <xyz>bar</xyz>
 </abc>
</current>
.//xyz

Any xyz descendant element of the current element

<current>
 <xyz>foo</xyz>
 <abc>
  <xyz>bar</xyz>
 <abc/>
<current>
*/xyz

The xyz child of any child of the current element.

<current>
 <abc>
  <xyz>foo</xyz>
 </abc>
 <def>
  <xyz>bar</xyz>
 </def>
</current>
@attr

The attr attribute of the current element.

<current attr="1"/>
xyz/@attr

The attr attribute of the xyz child element.

<current>
 <xyz @attr="1">
 <xyz @attr="2"/>
</current>
.//@attr

Any attr attribute of any descendant element of the current element.

<current>
 <xyz @attr="1"/>
 <abc @attr="2">
  <xyz @attr="3"/>
 <abc/>
<current>
*/@attr

The attr attribute of any child of the current element.

<current>
 <abc @attr="1">
 <def @attr="2"/>
</current>

Value Equality

The foundation of identity constraints is a test for equality. Each key sequence specified by a unique or key identity constraint must be distinct: the test is equality (or more correctly, lack of equality). Additionally, each key sequence specified by a keyref identity constraint must exist as a key sequence in the target node set to which the keyref refers: again, the test is equality. In an XML schema, two values are equal when:

For example, none of the string '"3"', the integer '3', the float '3.0', or the double '3.0' are equal. Conversely, any of the following with the value '3' are equal: an unsignedInt, an unsignedLong, a nonNegativeInteger, an integer, or a decimal. Logically, the value in the lexical space does not have to be identical. For example, a float with the value '3.0' is equal to a float with the value '3'.

Scope

Each identity constraint identifies a set of nodes that must be unique or that require referential integrity. Each identity constraint has a specific scope, which is the enclosing element type; any element type may provide any number of identity constraints.

In the example schema, the following schema excerpt specifies a key relative to the element type idConstraintDemo:

<xsd:element name= "idConstraintDemo">
    <xsd:complexType>

* * * *

  </xsd:complexType>

* * * *

  <xsd:key name="partNumberKey">
    <xsd:annotation>
      <xsd:documentation xml:lang="en">
        The part number uniquely identifies
        each orderable item.
      </xsd:documentation>
    </xsd:annotation>
    <xsd:selector xpath="catalog/*"/>
    <xsd:field xpath="partNumber"/>
  </xsd:key>

* * * *

</xsd:element>

The example for this book was written awhile ago: in retrospect, the catalog should be a standalone element type in its own namespace (a good exercise!). Regardless, the existing schema demonstrates not only this key, but others that have the same scope as idConstraintDemo.

Examples

More Information

This article contains summaries, excerpts, and examples from Chapter 13 of The XML Schema Complete Reference, Cliff Binstock, Dave Peterson, Mitchell Smith, Mike Wooding, Chris Dix, published by Addison-Wesley, 2002. Please see the book for far more explanation, greater details, and numerous examples.

Also, please visit http://www.XMLSchemaReference.com, which directly supports the book, and provides many online pointers, reference tables, and examples.

About the Author

Cliff Binstock, the owner of Robust Software, has more than twenty years of software development experience ranging from hands-on architecture and coding to mentoring and project leadership.

Cliff can be reached for questions or Java, XML, and Oracle consulting services at cliff@XMLSchemaReference.com.