Home > Articles > Web Services > XML

This chapter is from the book

This chapter is from the book

3.9 Master document

In the previous chapter (2.1.2.1, page 49), we found that any web site consisting of more than one page must have a master document providing shared content and a site directory. In this section, we'll look at some practical examples of constructs in a typical web site's master document.

You may find the sample master document described here (see Example 3.2, page 143, for a complete listing) somewhat eclectic. This eclecticism, however, stems from the real-world practice of XML web sites. In fact, the master document is more of a database than a document (1.2). The layout of components in this database is rarely important, as they are not processed sequentially but accessed in arbitrary order. For lots of ideas on how to access and use the master document content from the stylesheet, see Chapter 5.

A master document represents a new document type, with its root element type different from that of a page document, and most other element types usable only in a master document. However, if you don't use DTDs (2.2.4) or XSDL, this distinction has little practical value, and you can use one schema to validate all of your XML (both page documents and the master document). Such a schema written in Schematron is shown in Example 3.3, page 149 (see also 5.1.3 for advanced Schematron checks).

3.9.1 Site structure

The role of the master document is that of a hub that all other documents refer to when they need to figure out a wider context of the web site or establish mutual links. Whenever the stylesheet needs some information that is not supplied by the currently processed document, it will consult the master document to find either that information or a link to it.

Therefore, the most important part of a master document is the site directory — a collection of information about all pages of the site and their organization. This directory is used for building the site's navigation as well as for resolving abbreviated internal links (3.5.3).

Besides pages, other components of the site may also be mentioned in the master document, such as all Flash animations you have or all images of a specific kind used on the site. Units of orthogonal content must be listed in the master document as well (3.9.1.3) so that pages can reference and incorporate them. Finally, sources of dynamic content must be registered for the stylesheet to know what to insert into static page templates (3.9.1.4).

3.9.1.1 Menu structure

A flat list of all pages is not sufficient for building a usable site. We also need to represent the structure of the site's menu and the correspondence between menu items and pages.

A simple site's menu may be little more than a linear list of links to each of its pages. However, most sites require more complex menu structures. Common are hierarchical menus where some of the top-level items encompass multiple subpages and/or nested submenus. Such a structure is straightforward to express in XML.

Some sites may have more than one menu. For example, there may be a menu of topics (content sections) and another independent menu of tools (pages that help navigate the site, such as search and site map). Such orthogonal menu hierarchies can be stored in independent XML subtrees within the master document.

3.9.1.2 Menu items and pages

What do we need to store in the master document for each menu item? To build a clickable menu element, we must know at least its label (the visible text displayed in the menu) and the page that it is linked to. A label may contain inline markup and should therefore be stored in a child element. As for the link, it is natural to use the general linking attributes with abbreviated addresses that we've developed for in-flow links on site pages (3.5.1).

Items vs. pages. A menu item is not the same as a page of the site. Some pages may not be available through the menu, while others may be linked from more than one menu item. Therefore, the page itself must be represented by a separate element that the menu item element will link to.

However, that does not mean that these page elements must be stored in a different part of the master document. You can still categorize all your pages under the branches of the menu tree: Even if a page is not linked from the menu, usually you can find a branch where it logically belongs (unless it is orthogonal content, 3.9.1.3). The stylesheet will thus be able to read the menu structure both hierarchically (when looking for menu items) and sequentially (when looking for pages).

Here's a possible representation of a menu item:

<item link="products">  
  <label>Products</label>
  <page id="products" title="Our products" 
        src="products/"/>
  <page id="software" title="Our software" 
        src="products/software/"/>
  <page id="hardware" title="Our hardware" 
        src="products/hardware"/>
</item>

In addition to a label and one or more pages, an item may also contain other item children. A complete menu description would thus consist of a hierarchy of items under one parent, e.g. menu. Note that in each page element, the id attribute provides a unique identifier of not only that element, but of the page itself. It is these identifiers that are used as abbreviated addresses (3.5.3) in internal links.

How unabbreviation works. When resolving a link, the stylesheet translates the page identifier into the location of that page taken from the src attribute. However, that attribute's value is also somewhat "abbreviated" in that it omits irrelevant technical information such as the filename extension and the default filename (usually index.html) in a directory. These omitted parts are easy to restore by applying simple rules, so the three page elements in the above example would yield these page locations:

/products/index.html
/products/software/index.html
/products/hardware.html

Note that a location ending with a "/" is considered a directory and has "index.html" appended; other locations only receive the ".html" extension.

Accessing the source. There is one more reason to store page pathnames without extensions. When locations are resolved for the purpose of accessing the source XML documents rather than creating an HTML link, the same src values are transformed into *.xml file locations (assuming the directory structure of the site source is similar to that of the transformed site, 3.9.3). For stylesheet code examples to access this menu structure, see Chapter 5 (5.1.1, 5.7).

Storing page metadata. Sometimes, a more complex layout for the page elements may be necessary. For example, if your bilingual site provides two language versions of each page, a page element could hold both metadata that is common to all language versions of the page (e.g., the page's identifier and source location) and language-specific metadata (e.g., title):

<page id="software" src="products/software/">
  <translation lang="en">>Our software</translation>
  <translation lang="fr">>Nos logiciels</translation>
</page>

Some of the metadata (3.1.1) may also be moved from page documents into the master document for convenient access. For example, if you want to control which pages of the site are to be seen by search engine spiders and which are hidden from them, you could add a corresponding value to each page's source document. However, since this information will be pulled from all pages of the site simultaneously, it is more convenient to add a spider control attribute to the page element in the master document. This way, the stylesheet will be able to produce a site-wide robots.txt file for external spiders and/or a configuration update for a local search engine spider without accessing all page documents.

3.9.1.3 Orthogonal content

Along with all pages, a master document should also list all the units of orthogonal content that your site will use (2.1.2.2, page 51). However, unlike pages, orthogonal content references cannot be categorized under the menu hierarchy (that is why this content is orthogonal, after all). You'll need to create a separate construct to associate orthogonal content identifiers with corresponding (abbreviated) source locations - for example,

<blocks>
  <block id="news" src="news/latest"/>
  <block id="subscribe" src="scripts/subscribe"/>
  <block id="donate" src="scripts/donate"/>
</blocks>

Now if the stylesheet processing a page document encounters a block that has no content of its own but references some orthogonal content unit - for example, by specifying idref="news" — the document at news/latest.xml will be retrieved and inserted into the current document, formatted as appropriate for an orthogonal content block.

It is important that the id and src attributes of a master document's block element have the same names and semantics as the attributes of page elements (3.9.1.2). We will use this when writing stylesheet code to unabbreviate links or search through all pages of the site (), since every page must be registered as either a page in the menu or a source of an orthogonal block (or both).

Extracting orthogonal content. In the last example, each orthogonal block was stored in its own file - but this is not always the best approach. You may want to reuse parts of regular pages as orthogonal content.

For instance, the news page of a site is often a list of news items in reverse chronological order. You may want to automatically extract the most recent news item and display it in an orthogonal content block on other pages of the site. Another example is a "featured product" blurb extracted from that product's own page and reused on the front page of the site.

For these situations, what we need is a way to specify what part of the original page document is to be reused as orthogonal content on other pages. Since this part will most likely also be a block, we only need to indicate the id of the block we are interested in. Thus, if the most recent news block on the news page always has id="last", we could write in the master document:

<block id="last-news" src="news/" select="last"/>

Now any page can place a copy of the latest news item by referencing the corresponding orthogonal block by its identifier, last-news. For example, your page document might contain

<block idref="last-news"/>

Likewise, the featured product blurb could be extracted from the block with id="blurb" on that product's page:

<block id="feature" src="products/foobar" select="blurb"/>

Here, the featured product is identified by the path to the corresponding document (products/foobar.xml). When you want to feature a different product, all you need to do is change this value so it points to another product's page (assuming each product page has exactly one block with id="blurb"; see also ). After that, all pages that use

<block idref="feature"/>

will (after you rerun the transformation) display the blurb for the new product.

Logically, without the select attribute, a master document's block will reference the entire content of the document pointed to by the src attribute. Your Schematron schema could also check that the referenced elements actually exist in the referenced documents (see 5.3.3.1, page 224 for how to code this).

No perfection in this world. It would be even more natural to use XPath expressions for extracting orthogonal blocks. Then we could use not only the id attribute value but any XPath test for identifying the block we need. For instance, for the first block on the page, we would write

<block id="news" src="news/" xpath="//block[1]"/>

Selecting the last block that has a section inside would be as simple as

<block id="lastsection" src="dir/page" 
       xpath="//block[section][last()]"/>

There's only one problem with this kind of selector: In XSLT, you can't take a string and treat it as an XPath expression - and what the master document (or any other document) stores in its attributes is always just strings from the XSLT processor viewpoint.

Saxon offers the saxon:evaluate() extension function (4.4.2.1) that might save the idea, but its implementation is quite limited, not to mention nonportable to other XSLT processors. Much better is the dyn:evaluate() function16 from EXSLT (4.4.1) which is currently supported by several processors but not by Saxon.

3.9.1.4 Registering dynamic content

Recall our discussion of dynamic sites in 1.5. We found that a dynamic web page is produced from two main parts - static templates and dynamic values - and that both can (and should) use XML markup. It's now time to see how these concepts fit into the source definition we are building.

One way of many. There exist different ways to aggregate dynamic content and static templates. Some of them come before XSLT transformation, which is usually the last stage in a dynamic XML web site workflow; in these cases, you don't need any special source markup because your stylesheet will get complete seamless page source with both static and dynamic content. However, in some situations (notably offline XSLT processing, 1.4.1) implementing dynamic content aggregation in XSLT is convenient. This section shows one approach to organizing such transformation-time incorporation of dynamic content.

Reusing blocks. An orthogonal content block that the stylesheet extracts from another document may be considered a special case of a composite dynamic value. Therefore, it makes sense to extend our blocks' markup constructs so that they cover the "truly dynamic" content as well - content that is calculated or compiled by some external process and not just stored in a static document.

We can define a number of block conventions that will allow us to use blocks not only for enveloping independent bits of content but also as links to external sources of information. Once again, our guiding principle is: Let the page author use short mnemonic identifiers and hide all the gory details of accessing data in the master document and/or stylesheet.

Calling a process. Suppose we want to build a site map page that automatically compiles a hierarchical list of all pages of the site. The first thing we need is the static part of that page - a document that stores all the static bits unique to the page, such as an introductory paragraph and heading(s). This is a normal page that is listed in the menu hierarchy in the master, just like any other page.

Wherever we want to insert our dynamic content into that static frame, we place a block reference, e.g.:

<block idref="sitemap"/>

In the master, however, we cannot associate the sitemap identifier with any source file, since no such file exists - the list of pages is generated dynamically.

Instead, we must associate our dynamic block identifier (sitemap) with an identifier of some abstract process that generates its data. You can think of a process as a kind of a script or application; it may accept some parameters that affect its output. Thus, if we write in the master document (within the same blocks envelope used for orthogonal blocks)

<block-process id="sitemap" process="sitemap" mode="text" depth="2"/>

then the stylesheet will know that a sitemap block needs to be filled in with data generated by the sitemap process with parameters mode="text" and depth="2". This process can be, for example, a callable template within the stylesheet (4.5.1) or an external program. With this approach, document authors don't need to know anything about processes or parameters; they use identifiers to refer to data sources, and the master document associates each source with a process and its set of parameters.

Watching a directory. A stylesheet can access external files even if the list of these files is changing dynamically. For example, an external process (which may or may not be another stylesheet) might be dropping its output XML documents into a directory. Your stylesheet would then read the list of files in that directory (5.3.2) and do what it pleases with their content - such as dump all available content from all files into one page or perform some elaborate selection, filtering, or rotation.

If, for example, your stylesheet implements a list-titles process that takes a directory as a parameter and returns the list of title elements from all XML documents in the directory, then you could define a block to perform this operation on all (dynamically updated) documents in the news directory by writing in the master document

<block-process id="news-list" process="list-titles" dir="news/"/>

In a page document that wants to use this list, you would then write simply

<block idref="news-list"/>

XML, not HTML. Note that processes similar to sitemap or list-titles should only aggregate content, not format it. This means that the corresponding templates or functions in your stylesheet must produce valid XML data (nodesets), not HTML renditions. You would then feed these nodesets to the regular formatting templates in the same stylesheet (see for ideas on how to chain templates together). If a process is implemented as an external program, it should return serialized XML data or plain text that the stylesheet will be able to convert to nodesets.

3.9.2 Common content and site metadata

On a typical web site, all pages contain bits of information that either remain the same or change predictably from page to page. Some of this repeating data, such as the company logo or tag line, actually belongs to the domain of presentation rather than content and therefore needs to be filled in by the stylesheet rather than stored in the source. Other components, such as webmaster email links, "designed by" signatures, copyright or legal notices, etc., are natural to store in the master document.

It is recommended that you envelop all such bits of content in one or more umbrella elements, each containing data with similar roles or positions on the pages. Here's a master document fragment defining the footer to be placed at the bottom of each page:

<page-footer>
  <designed-by>Site design: <ext link="www.kirsanov.com">Dmitry
    Kirsanov Studio</ext></designed-by>
  <legal linktype="internal" link="legal">Legal notices</legal>
  <contact linktype="internal" link="contact">Contact us</contact>
</page-footer>

Note that the elements inside page-footer may have mixed content with any of the text markup, linking, or other elements that were developed for page documents. In particular, we see internal and external links used in this example, each with its own address abbreviation scheme (3.5.3).

The page-footer parent element makes the stylesheet simpler and more bullet-proof: Instead of providing templates for each of the individual footer elements, you can program the stylesheet to process all items within a page-footer in turn, and only provide separate templates for those that differ from others in formatting. With this approach, you'll be able to add a new element type for a new footer object even without changing the stylesheet.

Similarly, we can create an envelope for storing metadata that applies to the entire site. Examples of such metadata include site-wide keyword lists (which could be merged with page-specific keywords supplied by the page documents, 3.1.1) and extended credits (which could be put in comments in the HTML code of the site's front page).

3.9.3 Processing parameters

Your stylesheet will need to know some parameters of the environment in which it is run as well as the environment where its HTML output will be placed. The most frequently required processing parameter is the base URI that the stylesheet will prepend to all the image and link pathnames. By changing this parameter, you can turn all internal link URIs from relative to absolute with an arbitrary base, which is useful for testing the site in different environments. Other parameters may provide the path to the source tree and the operating system under which the stylesheet is run (which, in turn, may affect the syntax of pathnames).

Grouping parameters into environments. It is important that the same set of source files may be processed on different computers - for example, on a developer's personal system, then in a temporary (staging) location on the server, and finally in the publicly accessible area on the target server. Each of these environments will require its own set of processing parameters. It is therefore convenient to define several groups of parameter values, one for each environment, and select only one of the groups by its identifier when running the transformation.

Where to store the environment groups? Obviously, the need to group parameters and assign a unique identifier to each group makes using XML very convenient - as opposed to, say, storing the values within scripts used to run the site build process (6.5.1). Note also that scripts are the most OS-dependent part of the site setup, so it is best to keep them as simple and therefore as portable as possible. And of all the XML documents of a web site, the two most likely choices are the XSLT stylesheet and the master document.

Your stylesheet is more likely to be shared (in whole or in part) among different projects, so it is not wise to use it for storing information that is too project-specific. Also, even though you can use XSLT variables for storing processing parameters, it is more convenient to use custom element hierarchies for structuring and accessing this data. For these reasons, the master document emerges as the most natural storage for processing parameters.

This does not mean that your master document will differ among environments. Instead, all identical copies of it will have information on all environments, and each environment will extract the relevant set of data by passing a parameter to the stylesheet.

Here's an example group of parameters that define the processing environment called staging (see 3.10.2 for the meanings of the elements):

<environment id="staging">
  <os>Linux</os>
  <src-path>/var/website/src/</src-path>
  <out-path>/var/website/out/</out-path>
  <target-path>/test/</target-path>
  <img-path>img</img-path>
</environment>

3.9.4 Site-wide content and formatting

Normally, formatting of web pages is created by the stylesheet. Sometimes, however, formatting is dependent on certain parameters that, being more content than style, belong in the site's source and not in the stylesheet. Also, sometimes the stylesheet may need to create objects that are used on many pages but do not belong to any one page in particular. In both these situations, the master document is a convenient place to store data.

Site-wide buttons. An example of such an object is a pair of graphic buttons - "next" and "prev" - used on sequential pages (such as chapters in an online book). If your stylesheet generates other graphic buttons on the site (5.5.2), design consistency and maintainability will be much better if all buttons are done in the same way.

These buttons are not specific to any particular page; moreover, pages that use them don't even need to mention the buttons in the source because the stylesheet can automatically create the page sequence, including appropriate navigation. All we need is to store the button labels somewhere so the stylesheet can generate the buttons. It makes sense to use the master document for this.

You can store the button labels in a separate element in the master and program the stylesheet to regenerate the buttons when run with the corresponding parameter. For example,

<buttons>
  <button>prev</button>
  <button>next</button>
</buttons>

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020