An Introduction to XML Syntax
Introduction
XML stands for the eXtensible Markup Language. It is a new markup language, developed by the W3C (World Wide Web Consortium), mainly to overcome limitations in HTML. The W3C is the organization in charge of the development and maintenance of most Web standards, most notably HTML. For more information on the W3C, visit their web site at http://www.w3.org/.
HTML is an immensely popular markup language. According to some studies there are 800 million Web pages, all based on HTML. HTML is supported by thousands of applications including browsers, editors, email software, databases, contact managers, and more.
Originally the Web was a solution to publish scientific documents. Today it has grown into a fully fledged medium, equal to print and TV. More importantly, the Web is an interactive medium because it supports applications such as online shops, electronic banking, and trading and forums.
To accommodate this phenomenal popularity, HTML has been extended over the years. Many new tags have been introduced. The first version of HTML had a dozen tags; the latest version (HTML 4.0) is close to 100 tags (not counting browser-specific tags).
Furthermore a large set of supporting technologies also has been introduced: JavaScript, Java, Flash, CGI, ASP, streaming media, MP3, and more. Some of these technologies were developed by the W3C while others were introduced by vendors.
However everything is not rosy with HTML. It has grown into a complex language. At almost 100 tags, it is definitively not a small language. The combinations of tags are almost endless and the result of a particular combination of tag might be different from one browser to the other.
Finally despite all these tags already included in HTML, more are needed. Electronic commerce applications would need tags for product references, prices, name, addresses, and more. Streaming would need tags to control the flow of images and sound. Search engines would need more precise tags for keywords and description. Security would need tags for signing. The list of applications that would need new HTML tags is almost endless.
However adding even more tags to an overblown language is hardly a satisfactory solution. It appears that HTML is already on the verge of collapsing under its own weight so why continue adding tags?
Worse, while many applications need more tags, some applications would greatly benefit if there were less, not more, tags in HTML. The W3C expects that by the year 2002, 75% of surfers won't be using a PC. Rather they will access the Web from a personal digital assistant, such as the popular PalmPilot, or from so-called smart phones
These machines are not as powerful as PCs. They cannot process a complex language like HTML, much less a version of HTML that would include more tags.
Another, but related, problem is that it takes many tags to format a page. It is not uncommon to see pages that have more markup than content! These pages are slow to download and to display.
In conclusion, even though HTML is a popular and successful markup language, it has some major shortcomings. XML was developed to address these shortcomings. It was not introduced for the sake of novelty.
XML exists because HTML was successful. Therefore XML incorporates many successful features of HTML. XML also exists because HTML could not live up to new demands. Therefore XML breaks new ground where it is appropriate.
It is difficult to change a successful technology like HTML so, not surprisingly, XML has raised some level of controversy.
Let's make it clear: XML is unlikely to replace HTML in the near or medium-term. XML does not threaten the Web but introduces new possibilities. XML and HTML have been combined in XHTML 1.0, an XML version of HTML. (For more information about XHTML, see Que’s XHTML by Example, (0-7897-2385-9), published October, 2000.)
Some of the areas where XML will be useful in the near-term include:
-
large Web site maintenance. XML would work behind the scene to simplify the creation of HTML documents
-
exchange of information between organizations
-
offloading and reloading of databases
-
syndicated content, where content is being made available to different Web sites
-
electronic commerce applications where different organizations collaborate to serve a customer
-
scientific applications with new markup languages for mathematical and chemical formulas
-
electronic books with new markup languages to express rights and ownership
-
handheld devices and smart phones with new markup languages optimized for these "alternative" devices
There are two classes of applications for XML: publishing and data exchange. Data exchange applications are currently the most popular.
To learn XML for data exchange, you should be familiar with the Web, insofar that you can read, understand, and write basic HMTL pages as well as read and understand a simple JavaScript application. XML programming can be done with either a scripting language, such as JavaScript, or a fully-compiled language, such as Java. However, you don't have to be a master at HTML to learn XML. Nor do you need to be a guru of JavaScript or Java.