Enforcing Referential Integrity with XML Schema Identity Constraints
Date: Apr 11, 2003
Article is provided courtesy of Addison Wesley.
Discover how to keep your XML schema consistent through three simple, yet powerful elements: key, unique, and keyref.
Introduction
A schemadatabase, XML, or otherwisespecifies the structure and content of data. For the most part, the XML Schema Recommendation provides mechanisms for validating distinct datum (via simple types) and the structure of data (via complex types). XML Schema also provides for identity constraints, which permit a simplistic but powerful way to assure referential integrity.
The following scenario provides a common ground for discussing referential integrity:
There is a catalog consisting of part numbers, descriptions, prices, etc. Each part number must be unique, and the part number is a key against which other values (listed next) are validated.
Each order has a unique order identifier.
An order contains a distinct set of part numbers validated against the catalog. One slight complication of selecting the intended elements is that part numbers do not have to be unique across orders.
Each order has at most onebut often noshipment identifier.
While there are lots of details covered in this article, there are just three keywords that combine to provide XML Schema referential integrity:
key specifies that the values for a particular element might be a key against which integrity is validated. Every key must be unique, and every key must exist. For example, a part number might be a key in a catalog. That is, every item in the catalog has a part number, and that part number is distinct from every other part number.
unique is exactly like key, except that the key does not have to exist. For example, for all orders, the shipping identifier must be unique. However, not every order has a shipping identifier (the order may be pending shipment).
keyref specifies a reference to an element specified by key or unique. For example, each part number in an order must refer to a part number in the catalog.
Due to the complexity of constructing meaningful paths for the selectors and fields of an identity constraint, the individual element examples in this article do not make much sense by themselves. An XML schema with identity constraints and a corresponding XML instance provide context for the rest of this document.
Terminology
Identity Constraint: An identity constraint identifies a set of nodes that must be unique or that require referential integrity. The grammar for locating these nodes is a subset of XPath. Each identity constraint has a specific scope, which is the enclosing element type: Any element type may provide any number of identity constraints.
Node: The Post-Schema Validation Infoset (PSVI) describes a tree that represents an XML instance. The tree is made up of a set of nodes. Some of these nodes, particularly those associated with elements and attributes, provide the infrastructure for the validation provided by identity constraints.
Selector: In an XML schema, each identity constraint specifies a selector. Conceptually, the value of a selector is an XPath that identifies an element. From the perspective of writing the schema, the XPath identifies an element type. The XPath must specify an element type that is a descendant of the element type enclosing the identity constraint.
Field: In an XML schema, each identity constraint specifies one or more fields. Conceptually, the value of each field is an XPath that identifies a child element or an attribute of the element identified by the XPath of the corresponding selector. From the perspective of writing the schema, the XPath identifies an element type or attribute type. The XPath must specify an element type or attribute type that is the element typeor is a descendant of the element typethat encloses the identity constraint.
Target node set: During XML instance validation, the target node set is the set of nodes identified by the selector of an identity constraint.
Key sequence: During XML instance validation, the key sequence is the set of values associated with the elements and attributes identified by the fields of an identity constraint. There is one key sequence for each node in the target node set.
The key Element
The key identity constraint assures that each key sequencethe set of values specified by the XPath expressions of the nested field elementsis unique. A key sequence is unique if no two key sequences have "equal" values for all keys. Unlike the unique element, a key sequence must exist for each node in the target node set. The key identity constraint is appropriate, for example, for required elements or attributes.
A catalog part number, where every catalog entry must have a part number, is a great example of using a key:
<xsd:key name="partNumberKey">
<xsd:annotation>
<xsd:documentation xml:lang="en">
The part number uniquely identifies
each orderable item.
</xsd:documentation>
</xsd:annotation>
<xsd:selector xpath="catalog/*"/>
<xsd:field xpath="partNumber"/>
</xsd:key>
The unique Element
The unique identity constraint assures that each key sequencethe set of values specified by the XPath expressions of the nested field elementsis unique. A key sequence is unique if no two key sequences have "equal" values for all keys. Unlike the key element, the key sequence does not have to exist for each node in the target node set. The unique identity constraint is appropriate, for example, for optional elements or attributes.
In practice, this identity constraint is slightly more difficult to explain. The next example specifies that if an order has a shipping identifier attribute, that shipping identifier must be unique:
<xsd:unique name="orderShippedUnique">
<xsd:annotation>
<xsd:documentation xml:lang="en">
The customerID uniquely identifies
each customer.
</xsd:documentation>
</xsd:annotation>
<xsd:selector xpath="order"/>
<xsd:field xpath="@shipmentID"/>
</xsd:unique>
The keyref Element
The keyref identity constraint assures that each key sequence specified by the keyref exists, by virtue of equality, as a key sequence in the reference identity constraint. The reference identity constraint is always a unique or key identity constraint.
The following example specifies that every item in an order must exist in the list specified by the already specified partNumberKey:
<xsd:keyref name="partNumberRef"
refer="partNumberKey">
<xsd:annotation>
<xsd:documentation xml:lang="en">
Each order must refer to a known part number.
</xsd:documentation>
</xsd:annotation>
<xsd:selector xpath="compressedOrder/order/item"/>
<xsd:field xpath="partNumber"/>
</xsd:keyref>
Note that while this article does not provide examples (the book does!), a unique, key, and keyref each might specify multiple fields (that is, a multi-valued key such as "last name" and "first name" combined).
Selectors and Fields
Each identity constraint specifies a selector and one or more fields. The value of the XPath of each selector ultimately identifies a target node set in a corresponding XML instance. Each field, which is slightly more complicated, identifies one of the following:
The value of an element corresponding to a node in the target node set.
The value of an attribute of the element corresponding to a node in the target node set.
The value of a descendant of the element corresponding to a node in the target node set.
The value of an attribute of a descendant of the element corresponding to a node in the target node set.
Like a selector, a field ultimately identifies an element type or an attribute type. The XPath specified by the field is relative to the XPath specified by the selector. A key sequence is the set of values for a specific node in the target set that correspond to the set of fields specified by an identity constraint.
Here, again, is the selector from the key example:
<xsd:selector xpath="catalog/*"/>
Note that each catalog entry is one of unitCatalogEntryType, bulkCatalogEntryType, or assemblyCatalogEntryType. The selector selects every element directly below catalog elements via the '*'.
The field for the partNumberRef key specifies partNumber, which appears in each catalog entry, regardless of the entry type:
<xsd:field xpath="partNumber"/>
The effect of the partNumberRef key is to enforce that there is a distinct part number for each catalog entry, regardless of entry type.
XPath
The grammar for specifying a selector or field is a subset of the XPath grammar. While this article does not go into details regarding this grammar, suffice it to say that selectors specify element types via a '/' delimited path. The field grammar is identical to the selector grammar, with the addition of '@' to prefix an attribute type. Note that the following special sub-expressions:
'.' denotes the current element type (much like a directory path)
'*' denotes "all" element types at a given level, as already demonstrated in the previous example.
'//' denotes all children. Like '*', but recursive. This is unlike directory paths, where '//' normally denotes root.
'@' is a prefix that indicates an attribute type. Only a field (not a selector) may specify an attribute type.
The next table provides a set of paths, descriptions, and corresponding XML as fairly encompassing examples of XPath for selectors and fields:
| XPath |
Description |
Structure |
. |
The current element. |
<current>foo</current> |
Xyz |
The child element xyz of the current element. |
<current> <xyz>foo</xyz> <xyz>bar</xyz> </current> |
abc/xyz |
The xyz child element of the abc child element of the current element. |
<current> <abc> <xyz>foo</xyz> </abc> <abc> <xyz>bar</xyz> </abc> </current> |
.//xyz |
Any xyz descendant element of the current element |
<current> <xyz>foo</xyz> <abc> <xyz>bar</xyz> <abc/> <current> |
*/xyz |
The xyz child of any child of the current element. |
<current> <abc> <xyz>foo</xyz> </abc> <def> <xyz>bar</xyz> </def> </current> |
@attr |
The attr attribute of the current element. |
<current attr="1"/> |
xyz/@attr |
The attr attribute of the xyz child element. |
<current> <xyz @attr="1"> <xyz @attr="2"/> </current> |
.//@attr |
Any attr attribute of any descendant element of the current element. |
<current> <xyz @attr="1"/> <abc @attr="2"> <xyz @attr="3"/> <abc/> <current> |
*/@attr |
The attr attribute of any child of the current element. |
<current> <abc @attr="1"> <def @attr="2"/> </current> |
Value Equality
The foundation of identity constraints is a test for equality. Each key sequence specified by a unique or key identity constraint must be distinct: the test is equality (or more correctly, lack of equality). Additionally, each key sequence specified by a keyref identity constraint must exist as a key sequence in the target node set to which the keyref refers: again, the test is equality. In an XML schema, two values are equal when:
The simple types for the two values are identical, or one simple type is a derivation of the other. The derived simple type may be a built-in derived datatype or a user-derived simple type.
The value in the value space is equal.
For example, none of the string '"3"', the integer '3', the float '3.0', or the double '3.0' are equal. Conversely, any of the following with the value '3' are equal: an unsignedInt, an unsignedLong, a nonNegativeInteger, an integer, or a decimal. Logically, the value in the lexical space does not have to be identical. For example, a float with the value '3.0' is equal to a float with the value '3'.
Scope
Each identity constraint identifies a set of nodes that must be unique or that require referential integrity. Each identity constraint has a specific scope, which is the enclosing element type; any element type may provide any number of identity constraints.
In the example schema, the following schema excerpt specifies a key relative to the element type idConstraintDemo:
<xsd:element name= "idConstraintDemo">
<xsd:complexType>
* * * *
</xsd:complexType>
* * * *
<xsd:key name="partNumberKey">
<xsd:annotation>
<xsd:documentation xml:lang="en">
The part number uniquely identifies
each orderable item.
</xsd:documentation>
</xsd:annotation>
<xsd:selector xpath="catalog/*"/>
<xsd:field xpath="partNumber"/>
</xsd:key>
* * * *
</xsd:element>
The example for this book was written awhile ago: in retrospect, the catalog should be a standalone element type in its own namespace (a good exercise!). Regardless, the existing schema demonstrates not only this key, but others that have the same scope as idConstraintDemo.
Examples
An XML Schema that contains a number of identity constraint examples.
A corresponding XML instance that contains data that conforms to the identity constraints in the XML schema.
More Information
This article contains summaries, excerpts, and examples from Chapter 13 of The XML Schema Complete Reference, Cliff Binstock, Dave Peterson, Mitchell Smith, Mike Wooding, Chris Dix, published by Addison-Wesley, 2002. Please see the book for far more explanation, greater details, and numerous examples.
Also, please visit http://www.XMLSchemaReference.com, which directly supports the book, and provides many online pointers, reference tables, and examples.
About the Author
Cliff Binstock, the owner of Robust Software, has more than twenty years of software development experience ranging from hands-on architecture and coding to mentoring and project leadership.
Cliff can be reached for questions or Java, XML, and Oracle consulting services at cliff@XMLSchemaReference.com.