Home > Articles > Software Development & Management

  • Print
  • + Share This
Like this article? We recommend

HTML and XML

The explosive adoption of the Internet was strongly aided by the invention of the browser, which displays information by interpreting instructions and data in HTML documents (this process is known as rendering). While HTML is easy to learn and lightweight, it's a poor language for business integration primarily because HTML is a markup language that mixes business data with formatting instructions (which are specified by using a predefined set of tags). Automating the retrieval of business information stored in an HTML document requires an extraction program that can differentiate between the formatting instructions and business information. While this may sound trivial on the surface, the real difficulty lies in the fact that HTML doesn't allow the author of the document to specify the format of the business information. Without this, the extraction program can't reliably extract the desired information. This limitation exists because the HMTL language only defines tags for displaying information, and the language is non-extensible.

To address this and other limitations, XML (Extensible Markup Language) was invented. Unlike HTML, which is a markup language, XML is a meta-language—a language from which to create other languages. Thus, to produce a document that contains information about an order, we might use the following syntax. (This is not a complete XML document, but it gets the point across.)

<order>
  <order_number>A100</order_number>
   <price>50</price>
   <product>Foobar</product>
   <quantity>25</quantity>
  </order>

To extract the information from the document, we would create a template that defines the syntax and context of this information. (The information between the <order_number> and </order_number > tags is the order number in this example.) With a predefined template to define a predetermined information structure, it's straightforward to write an extraction program to process each incoming document. The template can either be a document type definition (DTD) or schema.

Note that XML doesn't define any tags for output; hence, another language is required (in many cases, the language is XSL) to define style sheets and output format.

The example above is a very simple demonstration of the power of XML. As discussed in the next article, web services leverage XML's extensible capabilities to deliver a standards-based integration platform that solves many large-scale integration problems with a considerably less learning curve than its predecessor technologies (OOT, distributed computing, and CBD).

  • + Share This
  • 🔖 Save To Your Account