Home > Articles > Web Services > XML

  • Print
  • + Share This
This chapter is from the book

XML Schemas

As you know, all XML documents must be well formed. For example, tags cannot overlap. They must specify some sort of hierarchy. But often the benefits of producing a well-formed document aren't enough. Saying that the XML elements cannot overlap is not as useful as saying that the XML elements cannot overlap and must follow a specific order or use certain tag names. The XML Schema specification identifies an XML vocabulary that you can use to create other XML vocabularies. In doing so, you tell consumers of your schema how your XML should be constructed to be considered valid by your design.

Understanding XML Schemas

Many times as you develop XML documents, you often need to place constraints on the way data is represented in the document. You might be concerned that a particular set of XML elements follows a specific order, or you might want to identify an XML element as containing text that actually represents a specific datatype, such as a floating point.

To place constraints on data, you must build Document Type Definitions (DTDs) or XML Schemas to provide data about the data, also known as metadata. DTDs were an early XML constraint mechanism. And although DTDs are beneficial to many XML applications, they do not have the characteristics necessary for describing constructs such as inheritance or complex datatypes. To overcome these limitations, a working group was formed to produce XML Schemas based on an original draft from Microsoft. The XML Schema specification is divided into two parts. The first part, XML Schema Part 1: Structures, proposes a way to structure and constrain document content. The second part, XML Schema Part 2: Data Types, provides a way to describe both primitive and complex datatypes within a document.

The XML Schema specification establishes a means by which the XML Schema language describes the structure and content of XML documents. A desirable feature of XML Schemas is the fact that they are represented in XML, so standard XML parsers can be used to navigate them.

At this point, you are already familiar with XML and many of the terms used to identify the concepts behind XML. The XML Schema draft defines several new terms that help describe the semantics of using and understanding schemas.

Instances and Schema

An XML instance document refers to the document element, including elements, attributes, and content contained within the document, that conforms to an XML Schema. Instances in the more general sense may refer to any element (including its attributes and content) that conforms to an XML Schema. An instance that conforms to a schema is considered to be schema-valid.

Schemas can be independent XML documents, or they can be embedded inside other XML with references to the schema. Schemas take this form:

<xsd:schema xmlns:xsd="http://www.w3.org/TR/xmlschema-1/">
   <!--type definitions, element declarations, etc. -->

Definitions and Declarations

A great advantage of schemas is that they enable you to create simple or complex types for applying classifications to elements. As in most programming languages, this is called type definition and is shown as follows:

<xsd:schema xmlns:xsd="http://www.w3.org/TR/xmlschema-1/">
   <xsd:complexType name="Person">
      <xsd:element name="FirstName" type="xsd:string" />
      <xsd:attribute name="Age" type="xsd:integer" />

The preceding example also shows an element declaration (for the element <FirstName/>) and an attribute declaration (for the attribute Age) that are local to a particular type named Person.

Beyond participating in the type definition, elements also may be declared as top-level elements of a particular type, as shown in the following example, where BaseballPlayer is a type of Person:

<xsd:schema xmlns:xsd="http://www.w3.org/TR/xmlschema-1/">
   <!-- ...Type definition... -->

   <xsd:element name="BaseballPlayer" type="Person" />

Attributes, however, can be of only simple types, as defined in XML Schema Part 2: Datatypes, such as string, boolean, and float.

Target Namespace

Because element and attribute declarations are used to validate instances, it is necessary for them to match the namespace characteristic of a particular instance. This implies that declarations have an association with a target namespace URI or no namespace at all, depending on whether the instance has a qualified name. For a schema to specify a target namespace, it must use the targetNamespace attribute, as follows:

<xsd:schema xmlns:xsd="http://www.w3.org/TR/xmlschema-1/"
   <xsd:element name="ElementInNS" type="xsd:string" />
   <xsd:complexType name="TypeInNS">
      <xsd:element name="LocalElementInNS" type="xsd:integer" />
      <xsd:attribute name="LocalAttrInNS" type="xsd:string" />

As you can see, all global and local elements are associated with SomeNamespaceURI. Lack of the targetNamespace attribute designates that no namespace is associated.

Datatypes and Schema Constraints

Datatypes consist of a value space, lexical space, and facets. The value space is the datatype's permitted set of values, and it can have various properties associated with it. A set of valid literals for a datatype makes up the lexical space of that datatype. Finally, a facet is a single dimension of a concept that enables you to distinguish among different datatypes. Two kinds of facets are used to describe datatypes, fundamental and constraining.

Fundamental facets enable you to describe the order, bounds, cardinality, exactness, and numeric properties of a given datatype's value space.

Constraining facets enable you to describe the constraints on a datatype's value space. Possible constraints include minimum and maximum length, pattern matching, upper and lower bounds, and enumeration of valid values.

The following is the fragment of a simple type definition:

<xsd:simpleType name="HourType">
   <xsd:restriction base="xsd:integer">
      <xsd:minInclusive value="1" />
      <xsd:maxInclusive value="12" />

In this case, HourType is defined to be of the built-in integer datatype and additionally is constrained to values between 1 and 12. This new type can then be used in other type definitions as in the following Hour attribute:

<xsd:complexType name="Time">
   <xsd:attribute name="Hour" type="HourType" />
   <xsd:attribute name="Minute" type="MinuteType" />

The instance for this type might look something like this:

<Time Hour="7" Minute="30" />

That was also an example of a complex type definition. The complex type definition combines one or more simple types to form something new. Here is another complex type example:

<xsd:element name="cars" type="CarsType"/>

<xsd:complexType name="CarsType">
   <xsd:element name="car" type="CarType"
     minoccurs="0" maxoccurs="unbounded"/>

<xsd:complexType name="CarType">
   <xsd:element name="make" type="xsd:string"/>
   <xsd:element name="model" type="xsd:string"/>

This type can be represented by an instance as follows:

<cars xmlns:xsi="http://www.w3.org/TR/xmlschema-1/"

minOccurs and maxOccurs

Elements and attributes enable you to specify the minimum and maximum number of times that they may appear in the instance. The following example shows how you can force an attribute to appear one and only one time:

<xsd:element name="Book">
    <attribute name="Author" type="A" minOccurs="1" maxOccurs="1" />
    <attribute name="Title" type="T" minOccurs="1" maxOccurs="1" />

The maxOccurs attribute can also be set to unbounded to denote that the element or attribute can appear many times. You also can prevent a value from appearing by setting the maxOccurs attribute equal to 0.

Deriving Type Definitions

Similar to the way object-oriented programming languages work, schemas enable you to derive types from other types in a controlled way. When defining a new type, you may choose to extend or restrict the other type definition.

When extending another type definition, you can introduce additional elements and attributes, as shown in the following example:

<xsd:complexType name="Book">
   <xsd:element name="Title" type="xsd:string" />
   <xsd:element name="Author" type="xsd:string" />

<xsd:complexType name="ElectronicBook">
      <xsd:extension base="Book">
             <element name="URL" type="xsd:string" />

Sometimes an instance wants to explicitly indicate its type. To do this, the instance can use the XML Schema instance namespace definition of xsi:type, as follows:

<Car xsi:type="SportsCar">

Although this was not an exhaustive coverage of schemas, at least you now should realize the following:

  • You can constrain your XML documents and their content using XML Schema.

  • The XML Schema specification provides you with a set of built-in datatypes.

  • You can use the built-in datatypes or create your own datatypes.

  • Schemas may be standalone documents or may be combined within other XML documents.

This information is here because XML Schemas are important to .NET. Let's see why.

.NET Web Services and XML Schemas

If you happen to glance at the SOAP specification, you'll find a section that describes how to encode method parameters, Section 5. This section was necessary when the SOAP specification was introduced because there was no way to otherwise describe the SOAP XML. If you couldn't somehow validate the incoming SOAP packets, you could not extract the method's parameter data and actually invoke the method on your local systems.

Section 5 is becoming far less important today because of the Web Service Description Language (WSDL), as you'll see in detail in Chapter 5, "Web Service Description and Discovery." WSDL serves as an interface description document that you can use to determine what XML information the Web Service will accept. In other words, you can change the XML formatting for your Web Service by changing the way you describe the Web Service in its WSDL file. As it happens, there is a schema embedded within the WSDL file.

This actually makes a lot of sense. If you think about it, handing someone an arbitrary XML document and expecting that person to figure out by inspection just what you're asking him to do is a very complex undertaking, if that person even has enough information to make an informed decision. On the other hand, if you hand that same person a document that outlines the datatypes of your method parameters and the order in which they can be found, you've given that person enough information to decipher the XML instance documents that you intend to transmit.

For this reason, you'll find an XML schema embedded within the WSDL document that describes your Web Service. Essentially, with your WSDL document, you're telling the other side what datatypes you expect and how you want them ordered, as well as how they should appear within the SOAP packet. Web Services are now significantly more flexible.

Now that you know how XML documents are formed and how they are validated, how do you get the values associated with the XML elements back out of the XML document? This is the job of XPath.

We will discuss XPath in some detail, mainly because many people who have worked with XML to some degree still might require a little XPath brush-up. And it's XPath that makes your work easier if you need to reach into a SOAP packet and modify what you find, as you might do with a .NET SoapExtension. Why? Because of interoperability, if nothing else, but you might have other reasons as well, depending upon your individual system requirements. For example, you might want to retrieve a SOAP parameter value and encrypt it.

Let's take a more detailed look at XPath to see what it can offer when you're tweaking XML.

XPath Drilldown

Imagine that you have this XML document you created for yourself to remind you how to access a couple of your favorite Web Services:

<?xml version="1.0"?>
   <Server name="Gumby">
      <WebService wsdl="?wsdl">
   <Server name="Pokey">
      <WebService wsdl=".wsdl">

This totally fictitious XML document describes two imaginary Web Service servers. The first provides some sort of calculator service, based upon the service's family, and the second gives the time. The calculator service sends its WSDL by adding ?WSDL to the endpoint URL, which is how .NET works. The second does the same by concatenating .WSDL, which is how the SOAP Toolkit works. Of course, only two servers are shown in this case. You could have hundreds or more, so the corresponding XML document could grow to be quite large. How will you find the one particular server's information that is of interest to you?

Now let's say that you want to retrieve all the servers that are part of the Time family. You could use this XPath query string:


The result of this query is a nodeset, which is a set of XML elements that match the query. The query itself can be called a location step, which can be broken into three parts:

  • The axis

  • The node test

  • The predicate

Let's take a closer look at each.

The XPath Axis

The axis is optional—in fact, this particular location step has no axis identified. You use the axis to move through the XML document in some other manner than from the top down. This is because the default location step is child::, so the XPath query returns, by default, a nodeset containing children of the current context node. The context node is the current XML element that you happen to be examining, which, in this case, is the root or document element. Other possible axes include ancestor::, parent::, following::, and a myriad of other possible values, all shown in Table 3.1.

Table 3.1 XPath Axis Values




Ancestors of the context node, including the root node if the context node is not already the root node


Same as ancestor, but includes the context node


Attributes of the context node (context node should be an element)


All (immediate) children of the context node


All children of the context node, regardless of depth


Same as the descendant, but includes the context node


All nodes following the context node, in document order


Only sibling nodes following the context node, in document order


Namespace of the context node (the context node should be an element)


Immediate parent of the context node, if any (that is, not the root node)


Similar to the following, except returns preceding nodes in document order


Same as preceding, but for sibling nodes only


The context node itself

You probably noticed the term document order mixed into Table 3.1. Document order refers not to the ordering and hierarchy of XML elements, but instead to the literal order in which the element is found in the document, whether it is a parent, sibling, or whatever. Essentially, when you access nodes in document order, you're flattening any hierarchy that might be present.

The XPath Node Test

The node test is effectively a road map that shows the element names in progression, from the start of the document to the particular element in question. It's literally a path from the document element (the root XML element) to the data that you're testing for inclusion into the result nodeset. XPath, as usually implemented, is often more efficient if you specify the complete element path. However, you could have written the example location step as this:


The initial double slash, //, tells the XPath processor to start at the document element, search recursively for the <WebService/> element, and, after finding it, execute the predicate.

The XPath Predicate

The predicate, sometimes referred to as the filter, is a Boolean test that you apply to make a final decision about the particular XML element that XPath is examining. If the predicate returns a true result, the XML element is added to the result nodeset. If not, the element is discarded from the nodeset. Essentially, you're fine-tuning an XML element filter. For example, given the two servers shown in the example XML document, the initial nodeset returned from the axis yields both servers, as does the result of the node test. That is, both <Server/> elements have children <WebService/> elements. It's the predicate that distinguishes them, in this case, because only the second server, Pokey, exposes a Time family Web Service.

The node test, being a pathway into the XML document, is sensitive to both the alphabetical case and the namespace of the particular XML element shown in the path. That is, imagine that you mistyped the example location step in this manner:


The resulting nodeset would be empty, whereas before it contained the element for the Pokey server. Notice in the second XPath query that all the text is lowercase, which is why it would fail.

To help with XPath query generation, we wrote the application that you see in Figure 3.2. The XPathExerciser is a utility that enables you to load an XML document, display its contents so that you can see what your queries should produce, type in an XPath location step, and display the resulting nodeset using a tree control.

You'll examine the source code for the XPathExerciser when we discuss .NET's XML handling capabilities, starting with the upcoming section ".NET and XPath." This is a tool that you truly will use because you can never be too expert at recording XPath expressions.

Figure 3.2 The XPathExerciser user interface.

XPath Operators

XPath is a language all its own. Like any programming language, XPath has a set of operators. The operators represent intrinsic capabilities that XPath can perform upon request—you see these listed in Table 3.2.

Table 3.2 XPath Intrinsic Operators




Child operator, which selects child nodes or specifies the root node


Recursive descent, which looks for a specified element at any depth


Current context node (akin to C++ this or VB me)


Shorthand notation for parent of current context node (akin to moving up a file directory)


Wildcard, which selects all elements regardless of their element name


Namespace operator (same use as in XML proper)


Attribute operator, which prefixes an attribute name


Attribute wildcard (when used alone), which is semantically equivalent to *


Addition indicator


Subtraction indicator


Multiplication indicator


Floating-point division indicator


Modulo (remainder from a truncating division operation)


Precedence operator


Operator that applies a filter (akin to a Boolean test)


Set subscript operator (akin to an array index specification)

Not too much in Table 3.2 should be too surprising. The square brackets, [ and ], indicate either an array or a filter pattern depending upon how you use them. The single period, ., indicates the current context node, much like the same operator does in a Windows file path. The same is true for the dual period, ... The @ indicates an attribute. Otherwise, you have operators that you would expect to see, such as the wildcard operator, *, and mathematical operations.

Returning to the previous example, you could locate all the SOAP Toolkit Web Services stored in the XML document using this XPath location step:


You could accomplish the same task with these two location steps:



If some of the servers had no wsdl attribute but others did, you could test merely for the presence of the attribute, like so:


In this case, the nodeset would contain both servers shown in the example XML document, but this is only because both servers have a wsdl attribute. If one server had no wsdl attribute, the predicate would fail for that particular node, and that XML element would be removed from the result nodeset.

XPath Intrinsic Functions

In addition to operators, XPath has an entire suite of intrinsic functions that it exposes to help with your queries. There are a lot of these, so the more commonly used functions are distilled in Table 3.3.

Table 3.3 Commonly Used XPath Intrinsic Functions




Is the smallest integer not less than the argument


Gives the number of nodes in the nodeset argument


Returns true if the first argument contains the second


Always returns a Boolean false


Is the largest integer not greater than the argument


Gives the context size (number of nodes in context node set)


Returns the local name of the first node (document order)


Returns the QName of the first node (document order)


Returns true for any type of node


Indicates a logical negation


Converts object to a number


Gives the index number of the node within the parent


Returns true if the first argument starts with the second


Turns object into a string


Converts nodeset to numerical values and adds them


Always returns a Boolean true

For a complete list of XPath intrinsic functions, you should refer to a good XPath reference. You'll probably find that these functions will handle most of your XPath needs, however.

Using the Web Service server example, you could identify the first server, or an arbitrary server, using this location step:


Similarly, you find the last server like so:


If you want all the servers but the last one, you query the document in this way:


Many people want to locate XML information within a document based upon string values or string searches. Say, for example, that you want all the servers that have names starting with the letter G:


Of course, this will return the Gumby server's XML information.

As you can see, you can produce a wide variety of queries, especially if you combine an axis with a node test and a filter. This becomes important later if you need to crack open a SOAP packet to examine and modify the contents by hand.

SOAP uses another XML technology called XPointer. Let's now turn to that technology and see what it offers the Web Service.

  • + Share This
  • 🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information

To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.


Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.


If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information

Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.


This site is not directed to children under the age of 13.


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information

If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information

Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents

California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure

Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact

Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice

We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020