XML is probably the most fundamental of today's core technologies for implementing distributed applications. These applications often cross the traditional inter-organizational boundaries of security, heterogeneous computing platforms, and differing programming languages. As an open standard for structured, self-describing data, XML is ideal for cross-platform, language-neutral data representation in these applications.
This article demonstrates how to determine at runtime whether a string containing an XML document is valid, or adheres to the expected rules for document structure and syntax. This technique is useful for ensuring that dynamically-generated XML documentssuch as those received or sent by a Web service, passed as a method argument or returned as the result of a database queryare as expected.
There are two different measures of correctness of an XML document: well-formed and valid. A well-formed XML document adheres to a few basic rules:
A well-formed XML document has a single unique root element.
Elements must be properly nested, and cannot overlap.
XML is case-sensitive. Start and end tags must match.
All empty elements must be closed.
Reserved characters must be represented using appropriate entity codes.
A valid XML document is, first of all, a well-formed XML document. In addition, a valid XML document adheres to an agreed-upon set of rules for document structure, content, syntax, and hierarchical relationships that are domain-specific. For example, a group of trading partners in the real estate industry might jointly define a set of rules for the representation of property listings in XML.
These rules may be laid out in one of three primary formats: Document Type Definition (DTD), XML Schema, and XML Data Reduced (XDR) Schema. In all three cases, the basic approach is the same. The rules for valid structure, content, and syntax are decided ex ante, and are captured using a special syntax. This results in a document (DTD or schema) that is persisted as a text file and/or as a business rule.
As part of the validation process, software called a validating parser is used to examine the subject XML, and compare it to these rules. The rules may be specified to the parser, either as a reference to a file (a URI) or by passing the rules as part of the XML documentreferred to as inline.