Selecting Data in XSLT
Day 3: Selecting Data
Yesterday you learned what a stylesheet is and how to use it. You also learned about using templates and getting values from an Extensible Markup Language (XML) document. So far the expressions you've used to match templates and select data have been rudimentary. What you can do at this point therefore is limited.
Today's lesson will focus on getting more control over the data you select. Today you will learn the following:
-
How the XML document tree works
-
What XPath is
-
How you can select single elements
-
How you can select multiple elements
-
How you can select attributes
Understanding the XML Document Tree
An XML document is a hierarchical structure of elements. Each element in an XML document can have zero or more child elements, which in turn have that same property. Also, each element can have zero or more attributes. No surprises so far, but it's actually significant that an XML document is structured this way. Every element and every attribute has a uniquely identifiable place within the document tree. Because all elements and attributes are uniquely identifiable, you can address a single element or attribute and get its value. Figure 3.1 clarifies this structure.
Figure 3.1 Graphical representation of a tree.
In Figure 3.1, each element in the tree is shown as a circle. Different children of an element can be distinguished because different letters identify them. As you can see, some elements have children with the same letters to identify them. This means that you can't say "Give me the value of element C," because element C can be the child element of either element B or element F. This doesn't mean, however, that you can't address this element at all; you just have to be more specific. To get the value of a specific element, you would have to say "Give me the value of element C, the child of element B, which is the child of the root element A." When you address an element in this manner, you use absolute addressing, as shown in Figure 3.1. Absolute addressing means that you specify the exact location of an element within a tree. With absolute addressing, you always specify a unique location.
Another way of addressing is relative to an element. Say that element E in Figure 3.1 is the element's starting point. If you want to address the same element as before, you can say "Give me the value of element C, the sibling element of my parent element." This type of addressing is called relative addressing, as shown in Figure 3.1. Relative addressing means that you specify the location of an element within a tree relative to the position of the current location.
With relative addressing, you don't specify a unique location within the document tree. Which element is specified by the preceding query actually depends on the starting point of the query.
What Is a Node?
Until now, I have been talking about elements and attributes. The difference between elements and attributes is not that great, however. The most important difference is that an element can have child elements; an attribute cannot. Hence, an attribute always has a single (text) value, whereas the value of an element also includes any descendant elements (and attributes).
Because elements and attributes aren't very different, they can be represented as the same thing in a diagram of the XML document tree. Element E in Figure 3.1, for example, could just as well be an attribute because it doesn't have any child elements. In fact, some people think that attributes shouldn't be used because attributes are just special cases of elements. Attributes and elements are interchangeable, as long as an element doesn't have child elements (or attributes). Because attributes are simply names with associated values, also known as name-value pairs, an element can contain only attributes that have different names. An element value, on the other hand, can contain multiple elements with the same name. This distinction is very important when you're designing XML documents, especially when they might have to change in the future.
Within the Document Object Model (DOM), as well as Extensible Stylesheet Language Transformations (XSLT, or actually XPath), the distinction between an element and attribute is so small that they are treated more or less as being the same. Several functions in DOM Level 2 work equally well on elements and attributes. The functions nodeName and nodeValue make no distinction between elements and attributes, although the result may differ based on the type of node the function is used on. Because an element and an attribute are very similar, they are referred to as a node, which is a single item that contains data within the document tree.
Current Node
On Day 2, I used the term current element tentatively. Although this concept is somewhat self-explanatory, some clarification is in order. Also, because of the similarities between elements and attributes, from now on I will use the term current node.
On Day 2, you saw that when a stylesheet processes an XML document, elements of the source XML are matched against templates in the stylesheet. What you haven't learned yet is that you can also create match expressions that match an attribute. So, actually, nodes of the source XML are matched against the templates. Each time a match occurs and a template is invoked, the node that fired the template becomes the current node, which basically is a pointer to a node within the XML tree. This pointer keeps track of which node is being processed.
NOTE
If you're working with an XSLT debugger that enables you to perform the transformation process step by step, you can see which node is the current node. The debugger keeps track of the current node and the template that is being fired and shows that information to you.
Because the current node is just a pointer to the node being processed, a template is not limited to accessing the value of that node alone. Within a template, you can use absolute addressing or relative addressing to get the value of any node in the XML document. As I said earlier, this value isn't necessarily a single value. If an element has attributes and descendant elements, they are also part of that value. Such a value is called a tree fragment, which is a part of an XML document tree, starting at a specific node. A tree fragment is itself a well-formed XML structure or document.
You already saw tree fragments in action on Day 2, when you learned about the text() function that extracts only the text value of an element. If you just specify the value of an element, the text of the element and all its descendants is written to the output. That is, in fact, the text value of the tree fragment. To get a better idea what a tree fragment is, look at Figure 3.2.
Figure 3.2 Tree fragment of node T.
Figure 3.2 is a graphical representation of a tree fragment. In this case, the tree fragment belongs to node T. This is actually the same as the value of node T.
What Is a Node-Set?
Now that you know what a node is, you probably think that a node-set isn't hard to explain. It's a set of nodes, right? Well, yes, but that's not all of it.
When you make a selection based on an expression, the expression doesn't necessarily match one node. It may match several nodes. These nodes together are called a node-set. The most common node-set is a series of an element's child nodes. Some people think this is the only kind of node-set, but it isn't. You can easily create an expression that yields a node-set with nodes in different sections of an XML document. Figure 3.3 shows an example.
Figure 3.3 Node-set containing nodes scattered throughout an XML document.
Figure 3.3 represents the node-set you would get if you were to say "Give me all nodes named B." As you can see, the node's location in the XML document tree is not relevant. Any node matching your query is part of the node-set. The node-set in Figure 3.3 is composed of several nodes. From those nodes, you have access to the tree fragment composed of that node and its descendants. If the expression targeted only attributes, the node-set would consist of only single value nodes.
Node-sets are essential in XSLT. They enable you to create a table of contents, indexes, and all sorts of other documents in which you use data that is scattered throughout an XML document. This capability enables you to create different outputs for different purposes from the same XML source document.