Home > Articles

  • Print
  • + Share This
This chapter is from the book

This chapter is from the book

Understanding XML

An XML document is a text document. It can be written using any tool that is capable of producing text. Your programs can also write XML as a form of output. While the purpose of creating an XML document is normally to provide data to software, a human with a trained eye can read it. This is very comforting to programmers because it makes debugging simpler. If you have a problem in the XML document, you can open it in any text editor and look at the characters that compose the file.

The textual representation of an XML document, in a nutshell, is a text file full of data and markup. Markup is just text surrounded very carefully by combinations of the ampersand (&), the less-than symbol (<), the greater-than symbol (>), the apostrophe ('), and the quotation mark ("). These characters have been taken out of common usage and forbidden in your data, and have been replaced by entity references:

  • &amp replaces &

  • &lt replaces <

  • &gt replaces >

  • &apos replaces '

  • &quot replaces "

You can define your own entities that get expanded to the strings that you define. This is very useful for headers, footers, and other boilerplate items that you commonly include by reference.

Actually, this is not a very long list. Whenever you see one of these characters, know that you are looking at markup of some sort. The trick is to understand what sort of special instruction you are looking at. Conceptually, the process is very simple. One program combines its data with instructions (markup) that tell what the data means. Another program looks in that document and uses the markup to navigate through the data to find the part that it needs.

There is a set of rules associated with an XML document that it must follow:

  • Each start tag must have a corresponding end tag.

  • Attribute values must be enclosed in quotes.

  • Some characters in data must be represented by entity references. If they appear in text as ordinary characters, the XML parser becomes confused.

  • Overlapping tags are not permitted. If you start a tag sequence <a><b>, it must end <b><a>, not <a><b>.

  • It must have the XML prolog: <?xml version='1.0'?>.

Documents that follow these rules are considered "well formed." If a document is not well formed, it will cause errors to be thrown during parsing.

  • + Share This
  • 🔖 Save To Your Account