Home > Articles > Web Services > XML

  • Print
  • + Share This
This chapter is from the book

Canonical XML Terminology

The specification defines the canonical form of an XML document as the physical representation that results when a certain algorithm is applied, changing the document in a number of ways:

The document is encoded in UTF-8

  • Line breaks normalized to #xA on input, before parsing

  • Attribute values are normalized, as if by a validating processor

  • Character and parsed entity references are replaced

  • CDATA sections are replaced with their character content

  • The XML declaration and document type declaration (DTD) are removed

  • Empty elements are converted to start-end tag pairs

  • Whitespace outside of the document element and within start and end tags is normalized

  • All whitespace in character content is retained (excluding characters removed during line feed normalization)

  • Attribute value delimiters are set to quotation marks (double quotes)

  • Special characters in attribute values and character content are replaced by character references

  • Superfluous namespace declarations are removed from each element

  • Default attributes are added to each element

  • Lexicographic order is imposed on the namespace declarations and attributes of each element1

The term canonical XML refers to XML reduced to its simplest, canonical form. The XML canonicalization method is the algorithm defined by the specification that produces this canonical form. The process of applying this algorithm is called XML canonicalization. Since the specification borrows from the XPath data model that defines node types (root, element, attribute, text, comment, processing instruction, and namespace), it also borrows the term node-set. However, in contrast to XPath, which selects or rejects nodes based on expression evaluation (as we will see in chapter 11), in this context, node-set directly determines whether particular nodes should be output in the canonical form, independent of the determination for its parent or descendant nodes.

  • + Share This
  • 🔖 Save To Your Account