What Is XML?
Extensible markup language (XML) is a simple data storage language. It represents data using a series of straightforward tags that you can create in any way you like. It is text-based, so it takes more space than a binary data format or even a delimited format, but it is extremely easy to use and understand.
XML provides no features for manipulating data, and anyone who tells you otherwise is selling something. In contrast, true databases such as Access, Oracle, and SQL Server provide a host of powerful data selection and analysis features such as indexing, sorting, searching, relational integrity, and cross-table selection. XML just holds data.
In fact, XML has only one real advantage over previous forms of data representation: It is extremely simple. That may seem like a trivial benefit, but in this case it makes all the difference in the world.
XML's great simplicity makes it easy for almost any application to read and write data. That puts XML in a unique position to become the lingua franca of data exchange. Although different applications may support a variety of data exchange formats, soon they can all support XML. That means your programs can easily combine information produced under Windows, Mac OS, Linux, mainframes, and any other platform you can think of. Then you can easily load XML data into your program, analyze it, and produce output in portable XML format.
What Is XML Good For?
The previous paragraph might have tipped you off that XML is good for exchanging data between different applications. If all your corporate systems produce XML output, you can easily combine the data to provide company-wide analyses.
XML files are also good for making small databases. Not too long ago, programs used INI files to store configuration information, user preferences, and other small amounts of data. Then Microsoft introduced the system registry and said developers should no longer use INI files. Since then, support for INI files has been dwindling in Visual Basic. Unfortunately, the registry has several disadvantages. It's not a simple text file, so it's hard to read; it can become bloated when you install and remove many applications from your system; and if it somehow becomes corrupted, the registry can make your system unusable.
Putting configuration information in XML files prevents these problems. You can even mount XML files on a shared file system so users on different computers can share the data they containa proposition that is tricky at best using the registry.
ASP.NET, the successor to ASP, lets you build XML directly into a Web page to form a "data island." You can then attach data-bound controls directly to the data, and display information automatically.
You can certainly do without XML. Text files, the registry, and relational databases can do everything XML can do and more. XML just adds one more tool to your data storage and retrieval arsenal.
XML syntax is extremely simple, with a few exceptions that I'm going to ignore in this article. XML documents are composed of nodes. You represent a node using open and closing tags that are very similar to the tags used by HTML. One big difference is that you get to define the tag names. For example, the following tags might represent a telephone number:
You don't need to declare the tag names anywhere; you can just start using them.
The start and end tags must have exactly the same name with the same capitalization. If the node begins with <Phone>, it must end with </Phone>not </phone> or </PHONE>. One common mistake is to copy and paste the opening tag to use as a closing tag and then to forget the slash, like this:
A node can include attributes in its opening tag. Simply include the attribute name, an equals sign, and the attribute value enclosed in quotation marks. For instance, the following code gives a Phone node the attribute Type. In this example, the node's Type attribute has value WorkFax.
If you don't need to include a value in a node, you can omit the closing tag and end the node with a slash before the closing bracket. For instance, the following code stores a phone number in the node's Number attribute, so it doesn't need separate start and end tags.
<Phone Type="WorkFax" Number="987-654-3210" />
XML documents have a tree-like, hierarchical structure. A document must have a single root data node that contains all the other data nodes. Each node can contain other nodes nested to any depth.
That's about all there is to XML syntax. Here's a small example document:
<Addresses> <Entry Type="Personal"> <FirstName>Andy</FirstName> <LastName>Fickle</LastName> <Street>1234 Programmer Place</Street> <City>Bugsville</City> <State>CO</State> <Zip>82379</Zip> <Phone Type="Home">354-493-9489</Phone> </Entry> <Entry Type="Work"> <FirstName>Betty</FirstName> <LastName>Masterson</LastName> <Phone Type="Work">937-878-4958</Phone> <Phone Type="WorkFax">937-878-4900</Phone> </Entry> ... </Addresses>
Note that similar nodes in the tree do not necessarily contain the same information. In this example, the first Entry node contains address information and a home phone number. The second entry contains no address information, and has Work and WorkFax phone numbers that are missing from the first Entry node.
The root node Addresses need not contain only one type of node, either. For an address book application, it might make sense for the root to hold only Entry nodes. In another application, they could be anything.
As you can see from the previous example, XML syntax is quite simple. So simple, in fact, that you could probably hack together an XML parser in only a few hours. Fortunately there's no real need to do that because XML tools are available on a wide variety of platforms, including Windows where you run your Visual Basic programs.
It is these tools, not XML itself, that add the power and complexity to XML. Different parsers let you load an XML document all at once or one node at a time in a forward-only view. Conversely, XML writers let you write out a complete document or build one a node at a time.
Document object model (DOM) parsers let you load, clone, rearrange, modify, and save XML files. They let you search for nodes with certain names or attributes, order the results, and iterate over the matching nodes. Although they are not as powerful as a true relational database, these features are quite useful.
XSD (XML schema definition) and other schema languages let you define the format of an XML file to ensure that the data makes sense. For example, an XSD file can prevent someone from entering a street address in a Zip code field. XSL (extensible stylesheet language) lets you define how to convert an XML document into another format such as an HTML page for viewing on a Web browser.
These tools are actually much more complex than XML itself so XML books spend a fair amount of time explaining them. For more information on these topics, see my book, Visual Basic.NET and XML (Wiley, 2002, http://www.vb-helper.com/xml.htm).