Home > Articles > Web Services > XML

  • Print
  • + Share This
This chapter is from the book

Validating Documents with Namespaces

So then, how do we produce a valid document when namespaces are involved? Since namespace-enhanced XML is syntactically Good Old Fashioned XML that happens to have elements (and possibly attributes) with colons in their names, at least well-formedness is not at all affected.

As Ron Bourett points out, validity is a concept that is independent of the proper use of XML namespaces. He provides examples that are invalid (although you might think they should be valid), that are valid but incorrectly use namespaces, and that are invalid even though namespaces are handled properly (http://www.rpbourret.com/xml/NamespacesFAQ.htm#q7).

NOTE

To ensure that your documents both are valid XML 1.0 and comply with the Namespaces in XML Recommendation:

  • In the DTD, you must declare all xmlns attributes corresponding to namespaces in the XML document;

  • Match qualified names in the DTD to universal names in the XML document (by using the same prefix and declaring the exact namespace names given as fixed values in xmlns attributes);

  • Limit yourself to one default XML namespace (at most); and

  • Ensure that prefixes are unique among document collections you will be processing.

For example, regarding the attribute-list described in the previous section, we would declare and use the Disc prefix in the XML instance like this:

<Disc:DiscountCatalog 
  xmlns:Disc="http://www.HouseOfDiscounts.com/namespaces/Discounts">

Consider the following document (invalid-internal.xml), which uses an internal subset for a DTD and correctly declares the xmlns attribute. It is not valid, however, because the instance document uses universal names only for the kbs:myRoot element, but not for child1 or child2. That conflicts with the DTD, which indicates that the content of kbs:myRoot is kbs:child1 followed by one or more kbs:child2 elements, not child1 and child2.

<?xml version="1.0" ?>
<!DOCTYPE kbs:myRoot [
 <!ELEMENT kbs:myRoot (kbs:child1, kbs:child2+) >
 <!ATTLIST kbs:myRoot
    xmlns:kbs CDATA #FIXED "http://www.example.com/">
 <!ELEMENT kbs:child1 (#PCDATA) >
 <!ELEMENT kbs:child2 (#PCDATA) >
]>
<kbs:myRoot>
 <child1>invalid</child1>
 <child2>doc</child2>
</kbs:myRoot>

We can make this document valid simply by attaching the appropriate prefix to each of the children (valid-internal.xml).

<?xml version="1.0" ?>
<!DOCTYPE kbs:myRoot [
 <!ELEMENT kbs:myRoot (kbs:child1, kbs:child2+) >
 <!ATTLIST kbs:myRoot
    xmlns:kbs CDATA #FIXED "http://www.example.com/">
 <!ELEMENT kbs:child1 (#PCDATA) >
 <!ELEMENT kbs:child2 (#PCDATA) >
]>
<kbs:myRoot>
 <kbs:child1>valid</kbs:child1>
 <kbs:child2>doc</kbs:child2>
</kbs:myRoot>

As long as we can guarantee that we'll be using a validating parser, which is required to process any external subset, we can move the DTD into an external subset. Suppose we store the DTD in the file valid-external.dtd.

<!ELEMENT kbs:myRoot (kbs:child1, kbs:child2+) >
<!ATTLIST kbs:myRoot
     xmlns:kbs CDATA #FIXED "http://www.example.com/">
<!ELEMENT kbs:child1 (#PCDATA) >
<!ELEMENT kbs:child2 (#PCDATA) >

We then alter the document type declaration of the document instance to reference the DTD, as usual:

<?xml version="1.0" standalone="no" ?>
<!DOCTYPE kbs:myRoot SYSTEM "valid-external.dtd" >
<kbs:myRoot>
 <kbs:child1>valid</kbs:child1>
 <kbs:child2>doc</kbs:child2>
</kbs:myRoot>

However, this XML example, although valid, doesn't use namespaces at all; it merely uses element names that contain a colon. In the external DTD subset, once again we have an xmlns attribute declared, but in the XML instance, there is no matching namespace declaration, so these elements aren't in any namespace, even though they have what appears to be a prefix, and it is in fact just part of their element name. (Is your head hurting yet?)

Thus, to satisfy both the constraints of the Namespaces in XML Recommendation and the XML 1.0 Recommendation for validity, you have to specifically declare the namespace using the same prefix given in the xmlns attribute declaration in the DTD ("kbs" in this case). So the DTD is the same as shown previously:

<!ELEMENT kbs:myRoot (kbs:child1, kbs:child2+) >
<!ATTLIST kbs:myRoot
     xmlns:kbs CDATA #FIXED "http://www.example.com/">
<!ELEMENT kbs:child1 (#PCDATA) >
<!ELEMENT kbs:child2 (#PCDATA) >

But we need to modify the root element in the XML instance (ns-valid-explicitNS.xml):

<?xml version="1.0" standalone="no" ?>
<!DOCTYPE kbs:myRoot SYSTEM "ns-valid-explicitNS.dtd" >
<kbs:myRoot
   xmlns:kbs="http://www.example.com/">
 <kbs:child1>valid</kbs:child1>
 <kbs:child2>doc</kbs:child2>
</kbs:myRoot>

This poses a bit of an editing headache because not only must the DTD and XML instances use the same prefix (and namespace URI), but this method also requires all elements in the instance to use universal names. In this trivial example, that's no problem, but imagine having to add the prefix to every element name in a several-thousand-line XML document! One solution, when dealing with a single namespace, is to declare a default namespace in the instance, attached to the root element. In this case, we remove the prefixes from all elements in the DTD and use xmlns without the colon or prefix, as we saw earlier:

<!ELEMENT myRoot (child1, child2+) >
<!-- Declaration for what is the default namespace in instances. -->
<!ATTLIST myRoot
     xmlns CDATA #FIXED "http://www.example.com/">
<!ELEMENT child1 (#PCDATA) >
<!ELEMENT child2 (#PCDATA) >

Then make the corresponding change to the instance:

<?xml version="1.0" standalone="no" ?>
<!DOCTYPE myRoot SYSTEM "ns-valid-defaultNS.dtd" >
<myRoot
   xmlns="http://www.example.com/">
 <child1>valid</child1>
 <child2>doc</child2>
</myRoot>

Now the only line that changes from a non-namespace, nonprefix version of the XML document is the document element itself, certainly a much more manageable editing task.

However, if we are dealing with multiple namespaces, we need a prefix for each namespace declared in the DTD and used in the document, in general. We can declare multiple xmlns attributes for a single element, for example with URIs corresponding to the XLink and XHTML namespaces for a Recording element. In the fragment shown below, the Recording element does not need a prefix because it is not part of either language; it just uses elements from XHTML and attributes from XLink. (This fragment appears in a complete example in chapter 13 about XLink.) The attribute-list declaration

<!ATTLIST Recording
  xmlns:xlink   CDATA    #FIXED "http://www.w3.org/1999/xlink"
  xmlns:xhtml   CDATA    #FIXED "http://www.w3.org/1999/xhtml"
  xlink:type   (extended) #FIXED  "extended"
  xlink:title   CDATA    #IMPLIED >

corresponds to an instance that begins:

<?xml version="1.0" standalone="no"?>
<!DOCTYPE Recording SYSTEM "recording9.dtd" >

<Recording
  xmlns:xlink="http://www.w3.org/1999/xlink"
  xmlns:xhtml="http://www.w3.org/1999/xhtml"
  xlink:type="extended"
  xlink:title="Various versions of a recording by Wings" >

We could also declare one of the namespaces as the default and use explicit prefixes for the others. That is precisely what we saw in Listing 5-2. In that example, the default namespace was XHTML and prefixes were used to represent MathML, SVG, and XLink namespaces. (We did not concern ourselves with validity.)

As you can see, using namespaces in valid documents can be a little tricky. You have to keep switching your point of orientation. When you consider validity, the normal rules apply and the focus is on the DTD and xmlns attributes; ignore that the fixed values of these attributes are namespace names (URIs). However, when you consider the document instances, namespace declarations must be added; prefixes must be attached to every element that has an xmlns attribute in the DTD, and the prefix must match the xmlns attribute. For a more detailed discussion of issues concerning namespace-aware validation, see the XML Namespaces FAQ by Bourret.

  • + Share This
  • 🔖 Save To Your Account