XSL Jumpstart: Creating Style Sheets
XSL Jumpstart
In this chapter
- XSL Processing
- Creating the Style Sheet
- Templates and Template Rules
- Understanding Patterns
- Creating Text
- Getting the Content of an Element
- Outputting the Results
- Applying Style Sheets Dynamically
- Retrieving Attributes
- Adding New Template Rules
- In Practice
- Troubleshooting
XSL Processing
This chapter is designed to give you a quick start into creating XSL style sheets. Therefore, a minimum of theory will be presented. However, before you can create even your first style sheet, it is important to understand the basics of style sheet processing. As with the rest of this book, there is an emphasis on creating XSL transformations.
When an XML document is loaded, the parser takes the document and scans all of its components, which may include
- Elements
- Attributes
- Entities
- CDATA sections
- Processing instructions
As each markup component is scanned, it is placed in a hierarchical tree structure in memory. Once the entire document is scanned, the document tree can be accessed through Application Program Interfaces (APIs) like the Document Object Model (DOM).
In the case of XSL (both formatting objects and transformations), you can write style sheets that also access this in-memory tree. From an XSL perspective, this is called the source tree because it represents the source document. The goal in XSL processing is to create a second tree that contains the output you desire. This second tree is called the result tree. To create the result tree, you use rules in your XSL style sheet (called templates) to walk through the source tree, select components of the tree you wish to process, and transform them. The result of applying a style-sheet template is placed in the result tree. In the case of formatting objects, the result tree will contain a formatted version of your XML document. In the case of a transformation, the result tree will contain the transformed XML document.
To clearly understand how this process works, consider the XML document in Listing 2.1.
Listing 2.1 A Typical Invoice Record Represented as an XML Document
<?xml version="1.0" ?> <?xml-stylesheet type="text/xsl" href="invoice.xsl"?> <invoice num="2317" invoiceDate="07-09-01"> <clientName>ACME Programming Company</clientName> <contact>Kris Butler</contact> <address> <streetAddress>123 Fourth Street</streetAddress> <city>Sometown</city> <state>CA</state> <zip>12345</zip> <province /> <country>USA</country> </address> <descriptionOfServices> XML Training </descriptionOfServices> <costOfServices>1000</costOfServices> </invoice>
This XML document, which may have been the result of some database operation, represents a typical invoice containing client information, a description of services, cost of services, and so on. Although in practice, this document might or might not be stored as a physical file, you may give it a filename, invoice.xml, for the purposes of running this example.
For this first example, you would like to transform this document into HTML so that you can display the information in a browser.
Conceptually, the source tree looks like Figure 2.1.
Figure 2.1 This conceptual view of the source tree shows how an XML document is broken down into its constituent parts.
Now you would like to walk this tree and create the result tree shown in Figure 2.2.
Notice that the result tree in Figure 2.2 does not contain XML elements. Rather it contains HTML elements.
How the result tree gets streamed into a document depends on how the style sheet is applied. Recall from Chapter 1, "The Essence of XSL," that the style sheet may be part of a static reference in the XML document instance. In this case, the output is handled by the XML parser. On the other hand, the style sheet may be applied dynamically by an application program. In this case, it is up to your program to stream the results back out to a file, a browser, or some other device.
Figure 2.2 The output from the XSLT processor is a result. In this case, the result tree represents an HTML document.
Creating the Style Sheet
Let's look at a typical style sheet that might be used to transform the XML document in Listing 2.1 into HTML. Listing 2.2 shows the style sheet.
Listing 2.2 This Transformation (invoice.xsl) Takes Listing 2.1 and Converts It into HTML for Viewing in a Browser
<?xml version="1.0" ?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="html" /> <!-- Root template rule --> <xsl:template match="/"> <HTML> <HEAD> <TITLE>First XSLT Example</TITLE> </HEAD> <BODY> <P><B>Company: </B> <xsl:value-of select="invoice/clientName" /> </P> <P><B>Contact: </B> <xsl:value-of select="invoice/contact" /> </P> <P><B>Services Rendered: </B> <xsl:value-of select="invoice/descriptionOfServices" /> </P> <P><B>Total Due: </B> $<xsl:value-of select="invoice/costOfServices" /> </P> </BODY> </HTML> </xsl:template> </xsl:stylesheet>
For simplicity, the goal for this style sheet is to transform just four elements from the source document: clientName, contact, descriptionOfServices, and costOfServices. This also brings up a good point: You only have to transform those parts of a document you wish. Therefore, this transformation represents a departure from the structure of the original source document.
The first thing you'll notice about this XSLT style sheet is that the first line is an XML declaration indicating that it's an XML document. That means this style sheet is a well-formed XML document that must validate against an XSL DTD or schema. Where does it reference the schema? In most XML documents, a DOCTYPE declaration is used to reference the schema. However, in XSL, a namespace attribute in the <stylesheet> element refers to the schema.
A Word on Namespaces
The namespaces mechanism allows you to uniquely identify element types that you create. For example, imagine that you have created an XML document describing a book chapter. You might create element types such as <chapterTitle>, <subHead1>, <subhead2>, <chapterText>, <codeListing>, <sidebar>, <footer>, and so on. Now imagine that you want to merge the content from this document with a document taken from a training manual. That document might also use element type names such as <chapterText> or <sidebar>, but define a completely different structure. Ultimately, you wind up with name collisions between your document and the document you're attempting to merge.
From the perspective of the document author, a namespace is a prefix you can add to your elements that uniquely identify them. Typically, a namespace corresponds to a Uniform Resource Identifier (URI) of an organization, such as your company's Web address, or that of a specification document. Because these URIs can contain long path names, namespace declarations allow you to create an alias that is a shorthand notation for the fully qualified namespace. For example, I might create a document that sets up the following
xmlns:myNS="http://www.beyondhtml.com"
The xmlns portion of the statement says, "I'm creating an XML namespace." The :myNS is optional and is user defined. When included, this sets up the alias for the longer URI. The portion after the equals sign is the fully qualified URI. So, this statement creates the http://www.beyondhtml.com namespace and assigns it to the alias myNS.
The following shows how the namespace is used:
<myNS:chapter> <myNS:chapterTitle> <myNS:chapterText> ... </myNS:chapterText> </myNS:chapter>
As you can see, prefixing elements with myNS helps to create a unique name for the elements in this document.
In XSL, the <stylesheet> element requires that you set up the XSL namespace that points to a URI. The declaration tells the XML processor that this is an XSL style sheet, not just another XML document. The URI that the namespace points to varies depending on the version of XSL you're using. The current XSL specification requires conforming XSLT style sheets to point to http://www.w3.org/1999/XSL/Transform.
TIP
Note in Listing 2.2 that an alias, xsl, is established. Because the alias is optional, it is unnecessary to include the xsl alias. In fact, because it is user defined, you can choose any alias name you wish. However, xsl is the de facto name used by virtually all style sheet developers.
Also, because the alias is optional, it is not necessary to include it at all. Omitting the alias means you can also omit the xsl: that's prefixed to all XSL element type names. This can save you some typing and eliminate a few hundred bytes from the size of your document. However, be aware that both the source document or your transformation may contain element type names that conflict with XSL's naming conventions. Therefore, it is always prudent to include the xsl alias in your style sheets.
CAUTION
Before the XSL became a W3C recommendation in November 1999, processors were forced to use non-standard URIs in their namespace declarations. If you run into an error when using the current namespace, check the version of XSL processor you are using and consider the following alternative namespaces.
XSL processors that follow the December 1998 working draft use the following namespace definition:
xmlns:xsl = "http://www.w3.org/TR/WD-xsl"
Interim processors (such as MSXML 1) use the following:
xmlns:xsl = "http://www.w3.org/XSL/Transform/1.0"
The November 1999 (current) specification requires the following:
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
Returning to Listing 2.2, the <stylesheet> element is the root element of the document and is therefore the container for the rest of the style sheet. You will learn about all of the elements that <stylesheet> supports in Chapter 4, "The XSL Transformation Language." However, one important element type is <output>, which allows style sheet authors to specify how they wish the result tree to be output. Currently, you can specify the result tree to be output as XML, HTML, or as text. Listing 2.2 instructs the processor to output the result tree as HTML.