Home > Articles > Web Services > XML

This chapter is from the book

1.4 XML and information systems

The first thing to realize is that the arrival of XML does not mean that all information systems that are not based on XML become obsolete all of a sudden. In fact, the reality is very much the opposite; XML and classical information systems are complementary and can be used together. Classical information systems are classical because they are extraordinarily useful, and XML will not change that. What XML is likely to change is the amount of interoperability between information systems. In some cases, it will also change what such systems can do and how they are put together.

This section examines how XML can be used with information systems, particularly classical ones, but also how it makes it possible to create new kinds of applications and uses.

1.4.1 XML in traditional information systems

Traditional information systems follow the basic anatomy outlined in Figure 1–3, with a central data store around which applications are clustered which access it. The exact form of this data store may vary with the application, and the arrival of XML has a number of consequences for the data store.

1.4.1.1 XML files

The most obvious way of basing an information system on XML is to simply use a set of XML documents, stored as files in the file system, as the central data store. This approach has been much used in document-oriented systems and is implicitly assumed by the standard interfaces of many XML tools. These tools expect to be run from the command line and to be passed file names as arguments. The main benefit of this approach is that it requires no work at all to set it up, and any developer and user can understand it.

The first consequence of this approach is that now the XML documents in the file system become the primary representation of the information in the system. The applications in the cluster around this data store will generally take one or more XML documents and produce some output from it. Very often this will be HTML or some other publishing format. Any updates to the information in the system must be made to the XML files, since all other renditions of the information are derived from these files. To have the updates reflected in the published files, one simply runs the translating applications again.

In general, all applications that wish to make use of the information in the XML files will use an XML parser to read the information into its own internal data structure (see 2.3.2, "The parser model," on page 57). This process must be repeated every time an application is started, which may be very awkward if the volume of the information is large. Any application that wishes to change the information must first load in the documents, then change its internal structure and finally write the information back out in XML form so that other applications can access it.

When modifying the source XML documents in this way it is important to preserve all important aspects of the documents in the transformation. But just as in the email example this may be difficult, since the programs are operating on an internal representation of the XML documents rather than the external form of the documents. Since the internal representation contains less information than the original documents did, necessary information may have been lost. We will return to this problem (and the solutions to it) in more detail later.

Of course, updating shared information in this way will often be dangerous, since multiple applications may attempt to modify the same document at the same time, which can cause information to be lost or corrupted. Another problem is that although one can make a schema for the data in the form of a DTD or an XML Schema definition,15 nothing prevents an application (or a user with a text editor) from modifying the XML files in a way that does not conform to the schema.

1.4.1.2 XML databases

Databases were invented to solve the problems with concurrent access to large volumes of information, and provide proven solutions to these problems. This makes them highly desirable for applications that either involve concurrent access or work with large volumes of data, and in fact also for many applications that do neither.

To use databases with XML one must implement the XML data model in a database and then use this to store the tree structure of the XML documents in the database. One approach to this is to use an existing database system, whether relational, object-oriented, or something else, and implement an XML storage system on top of it. (Note that this approach confuses the information model/data model distinction somewhat, since the data model of the database is now used to implement the XML data model.) Another approach is to develop a database specifically based on the XML data model. Such databases are often called native XML databases, since the XML data model is their only data model.

In both cases the solution has much in common with the "XML files" solution, the main difference being the location of the XML documents. The central data store still uses the XML data model and can also use the same kinds of schemas. When an application now wishes to use an XML document from the central data store, it will no longer load it into memory using a parser, but rather connect to the database. Once connected it will be presented with some API that represents the XML document inside the database and access the document information through this API (see 3.4, "Virtual documents," on page 92 for more information on this). This does away with the problems with large XML documents that do not fit in memory and take long to load, since documents are now not loaded at all and the database handles memory management transparently.

The manner in which the XML documents are updated is also changed completely, since the applications are in direct contact with a document that lives inside a database. To change a document, the application will make the change through the document API and then commit it to the database. The costly and risky operation of writing the document back out to disk is done away with; instead, the database updates its internal structure, taking care of any concurrency and data integrity issues.

The only disadvantage to this solution is that it takes longer to set up and requires more know-how. It may also be that the XML database solutions do not support all programming languages in the way that the "XML files" solution does. However, for large-scale projects, using files is generally not an option at all, making the choice obvious.

1.4.1.3 Traditional databases

However, it is definitely possible to use XML in an information system without having to use XML as the data model for the central data repository. Instead, the data store can use traditional databases and their data models, but map data back and forth between the database model and XML as needed. This has the advantage that existing systems can continue as they are today.

Imagine that the national library of some country decides one day that all libraries in the country must allow their users to search for the books they seek not only in the local libraries, but in all libraries in the country. The users should then be allowed to order any books not in the local library from other libraries and have them delivered to the local library to be picked up there.16This means that the library information system in Figure 1–3 must add more applications. It must now be able to produce, at regular intervals, some report in serialized form that shows the updates to the local database since the last report. This report will be sent to the national library which will use it to prepare a report of nation-wide updates to be sent to all libraries in the country. This means that the system must also be able to receive a similar report from the national library that provides similar updates to the national database of books. Figure 1–4 shows the information system updated to handle this new situation.

Figure 1-4Figure 1–4 The library system with XML reporting

This information system also uses XML, but in a less direct way than the other approaches discussed so far. However, for information systems with more traditional data, this may be a much better solution than putting all data into the XML data model, since traditional databases have much more convenient data models.

1.4.2 Bridging information systems

The discussion of information systems given so far in this chapter is based on the traditional view of a database system, where there is a clearly defined information system and the database itself at the heart of that information system. However, most organizations do not have just one information system. Most of them have lots of information systems, and these are usually isolated from one another. XML

promises to solve this problem, by making it possible to build bridges between these systems. Or, to take an entirely different view of the same thing, XML does not require a central database, or even a clearly defined information system, and so it provides a completely different way of creating applications.

The XML equivalent of an information system is what is known as an XML application. An XML application consists of three things: an information model, an XML representation of the information model (often formalized in a DTD or schema definition) and all the programs that can work with data marked up according to the information model.

The result is that the traditional concept of one application or one information system does not apply to XML-based systems. With XML, the information becomes the focal point, and the software exists as a cloud of independent components and systems that interact with one another and accept or emit the XML-encoded data. How they interact with one another is not defined by XML at all, and many different arrangements are possible.

One example of this might be RSS (Rich Site Summary), which is a very simple XML application developed by Netscape for their my.netscape.com site. The idea behind this site was that it would allow Web site publishers to add simple news channels to their sites, which people could subscribe to through the my.netscape.com site. Each user would register and get a user name and password, and then subscribe to a selection of channels interest to them.

When logging into the site later, the user would be shown the current news from each channel he or she subscribed to. Effectively, this would be a personalized news system with content delivered by outside sources. The RSS DTD was developed to enable site publishers to mark up their news channels consisting of news items, each with a title and a link to some Web page with more information. (RSS is described in more detail in 6.4, "RSS: An example application," on page 149.) This application quickly became a big hit with site owners and hundreds of RSS channels were established, something that caused others to start making more RSS client systems. Today you can also subscribe to RSS channels through my.userland.com, geekboys.org and you can get at least three dedicated RSS clients to use on your desktop.

Figure 1–5 shows a conceptual view of RSS as an information system. As can be seen, it incorporates the following software components:

Figure 1-5Figure 1-5 RSS as an information system

  • The publishing system of the site owner (manual or automated) that produces the site itself and the accompanying RSS document.

  • The RSS subscription and publishing system of my.user-land.com, developed with no knowledge at all of the site owner's publishing system, but which can still work with it, through the information provided by the RSS document. Effectively, the RSS document becomes an interface with unusually loose coupling.

  • my.netscape.comhas an equivalent system, developed independently of both my.userland.com and the site owner's system (www.geekboys.org is another example, and there are probably more).

  • The RSS client running on the end-user's computer is yet another software component independent of the others. In a sense, the Web browser could also be described as part of the system, even though it doesn't understand RSS at all.

To summarize, XML applications do not need to be information systems in the traditional sense, but that they can be something that joins together previously separate information systems in new ways.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020