Sams Teach Yourself XML in 21 Days

Sams Teach Yourself XML in 21 Days

By Steven Holzner

All About Markup Languages

The term markup refers to codes or tokens you put into a document to indicate how to interpret the (non-markup) data in the document. In other words, markup describes the data in the document and how it should be interpreted. For example, a markup language most people have heard of is HTML for creating Web pages, and you can see a sample HTML Web page in Listing 1.1.

Example 1.1. A Sample HTML Web Page (ch01_01.html)

<HTML>
    <HEAD>
        <TITLE>Hello From HTML</TITLE>
    </HEAD>
    <BODY>
        <CENTER>
            <H1>
               An HTML Document
            </H1>
        </CENTER>
        This is an HTML document!
    </BODY>
</HTML>

The markup in this HTML document is there to tell a browser how to interpret the document's data—which data is a header, which is text for the body of the document, and so on. This HTML markup is made up of HTML tags such as <HEAD>, <BODY>, and so on, and those tags give directions to the browser. You can see this HTML page in the Netscape Navigator in Figure 1.1. Note in particular that because the HTML markup in this document is only there to give directions to the browser, none of the markup itself appears directly in the browser's display of this document.

01fig01.gif

Figure 1.1 An HTML page in a browser.

When you think of it, there are already many markup languages around. For example, you might use a word processor like Microsoft Word, or a text editor like Windows WordPad, which can store text in Rich Text Format (RTF) files. RTF files are usually filled with markup indicating how to display text and holding directions to the word processor. For example, here's the RTF markup for a file created with Microsoft Word holding the text "No worries!" in bold (hint: the "No worries!" text is at the very end) :

{\rtf1\ansi\ansicpg1252\uc1 \deff0\deflang1033\deflangfe1033
{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose 02020603050405020304}
Times New Roman;}{\f153\froman\fcharset238\fprq2 Times New Roman CE;}
{\f154\froman\fcharset204\fprq2 Times New Roman Cyr;}
{\f156\froman\fcharset161\fprq2 Times New Roman Greek;}
{\f157\froman\fcharset162\fprq2 Times New Roman Tur;}
{\f158\froman\fcharset177\fprq2 Times New Roman (Hebrew);}
{\f159\froman\fcharset178\fprq2 Times New Roman (Arabic);}
{\f160\froman\fcharset186\fprq2 Times New Roman Baltic;}}
{\colortbl;\red0\green0\blue0;\red0\green0\blue255;\red0\green255\blue255;
\red0\green255\blue0;\red255\green0\blue255;\red255\green0\blue0;
\red255\green255\blue0;\red255\green255\blue255;\red0\green0\blue128;
\red0\green128\blue128;\red0\green128\blue0;\red128\green0\blue128;
\red128\green0\blue0;\red128\green128\blue0;\red128\green128\blue128;red192\green192\blue192;}{\stylesheet{\ql \li0\ri0\widctlpar\aspalpha
\aspnum\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1033\langfe1033
\cgrid\langnp1033\langfenp1033 \snext0 Normal;}{\*\cs10 \additive
Default Paragraph Font;}}{\info{\title No worries}{\author Steven Holzner}
{\operator Steven Holzner}{\version1}{\edmins0}{\nofpages1}{\nofwords0}
{\nofchars0}{\*\company Your Company Name}{\nofcharsws0}{\vern8269}}
\widowctrl\ftnbj\aenddoc\noxlattoyen\expshrtn\noultrlspc\dntblnsbdb
\nospaceforul\formshade\horzdoc\dgmargin\dghspace180\dgvspace180
\dghorigin1701\dgvorigin1984\dghshow1\dgvshow1
{\*\pnseclvl1\pnucrm\pnstart1\pnindent720\pnhang{\pntxta .}}
{\*\pnseclvl2\pnucltr\pnstart1\pnindent720\pnhang{\pntxta .}}
{\*\pnseclvl3\pndec\pnstart1\pnindent720\pnhang{\pntxta .}}
{\*\pnseclvl4\pnlcltr\pnstart1\pnindent720\pnhang{\pntxta )}}{\*\pnseclvl5
\pndec\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}
{\*\pnseclvl6\pnlcltr\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl7\pnlcrm\pnstart1\pnindent720\pnhang{\pntxtb (}{\pntxta )}}{\*\pnseclvl8\pnlcltr\pnstart1\pnindent720\pnhang
{\pntxtb (}{\pntxta )}}{\*\pnseclvl9\pnlcrm\pnstart1\pnindent720\pnhang
{\pntxtb (}{\pntxta )}}\pard\plain \ql \li0\ri0\widctlpar\aspalpha\aspnum
\faauto\adjustright\rin0\lin0\itap0 \fs24\lang1033\langfe1033\cgrid
\langnp1033\langfenp1033 {\b No worries!\par }}

All the codes you see here are markup. As you can see, markup is just the general name for directives indicating how you want your data treated.

You might think of HTML (which, of course, stands for Hypertext Markup Language) first when someone mentions markup languages, but the fact is that HTML is a very limited language. It's OK for creating standard Web pages, but it can't go much farther than that.

For example, HTML is great for creating Web pages that display standard text and some images, and the HTML tags like <img>, <table>, and others are fine for that. But as things got more complex, HTML couldn't keep up—in the original HTML version, 1.0, there were only about a dozen tags. In the current version, HTML 4.01, there are nearly 100 tags—and still many more are needed (if you add the nonstandard ones that various browsers support to fill in some holes, there are over 120 HTML tags in current use).

Even so, to really fill the needs of Web developers, HTML could use hundreds of additional tags. But there's no way those additional tags could handle all kinds of situations—for example, what if you wanted to store information about your close friends instead? There are no HTML tags like <firstname>, <lastname>, <phone>, or <age>. What if you are a bank that offers loans and you want tags like <amount>, <term>, <rate>, and <accountID>? There's no way HTML could fit in all these kinds of tags. In other words, there are as many reasons to create markup as there are ways of handling data—and that's infinite. That's where XML comes in, because the whole idea behind XML is to let you create your own markup.

Share ThisShare This

Informit Network