Home > Store

Definitive XML Application Development

Register your product to gain access to bonus material or receive a coupon.

Definitive XML Application Development

Premium Website

  • Sorry, this book is no longer in print.
Not for Sale



  • CD-ROM full of tools, code, applications, and frameworks—The accompanying CD-ROM contains an extensive library of tools to simplify XML programming, plus complete, thoroughly annotated XML applications and frameworks.
    • Gives students a single, convenient source for virtually all the XML development resources they need.

  • XML development with both Java and Python—Offers practical coverage of two of today's most popular and productive object-oriented languages.
    • Serves the needs of a far wider range of students, and enables students to work successfully in a wider range of development environments.

  • Comprehensive, expert coverage—Includes in-depth coverage of the XML processing model; document views; both SAX and DOM; XSLT; architectural forms; schemas; and much more.
    • Helps students with an exceptionally broad cross-section of the XML development challenges they are likely to encounter.

  • By one of the leaders of the global XML development community—Author Lars Marius Garshol co-edited the ISO Topic Map Query Language standard, and has long been active in the XML and topic map communities as a speaker, consultant and developer. He is widely known for his Free XML Tools web site, his translation of SAX to Python, and his xmlproc validating XML parser.
    • Students benefit from an authoritative, insider's look at state-of-the-art XML development.

  • Detailed coverage of the best Java and Python XML tools—Introduces the RSS Development Kit, the tabproc framework, and other powerful resources.
    • Introduces students to powerful tools for streamlining XML development and building richer, more robust XML applications.


  • Copyright 2002
  • Edition: 1st
  • Premium Website
  • ISBN-10: 0-13-088902-4
  • ISBN-13: 978-0-13-088902-7

  • Complete developer's guide to XML programming by a leading XML developer
  • Teaches core concepts using Python for examples
  • Shows how to apply concepts in Java(tm)
  • DOM, SAX, XSLT, XPath, schemas, and much more
  • Plus a quick Python introduction for experienced developers

The start-to-finish guide to XML development for every experienced developer!

In this book, leading XML developer Lars Marius Garshol covers every essential aspect of XML programming, from basic principles through advanced techniques, utilizing DOM, SAX, XSLT, XPath, schemas, and other key XML standards. Garshol presents scores of code examples based on Python, a cross-platform language that is exceptionally well suited for XML development. Garshol also presents new insights into XML application design and optimization, as well as complete sample applications. Coverage includes:

  • XML for programmers: the XML processing model, namespaces, parsing, document views, and more
  • Serialization/deserialization, translation, validation, modification, and information extraction
  • SAX event-based processing: basic techniques, data structures, sample applications, tips, tricks, optimization, and advanced APIs
  • Event-based alternatives to SAX: native XML parser APIs of Pyexpat, xmlproc, xmllib, and XP
  • DOM tree-based processing: fundamental and extended interfaces, serialization, DOM Level 2, performance techniques, and more
  • Tree-based alternatives to DOM: qp_xml, groves, and JDOM
  • Declarative processing with XSLT and XPath, including advanced XSLT topics: combining multiple stylesheets, precedence, cross-references, messages, and more
  • Embedding XSLT engines in applications and writing XSLT and XPath extensions
  • XML development in Java with SAX, DOM, JDOM, and XSLT engines
  • Processing in depth: schemas, DTD programming, creating XML from HTML and SGML, RSS, and more

You'll even find a quick introductory course in Python and an XML developer's glossary.

Whatever your application-from content management through enterprise application integration-Developing XML Applications gives you the resources, skills, insights, and example code you need to build it right.

"The range of XML application domains is growing dramatically, but there are common strategies and techniques for XML development that apply to all of them. This book provides a systematic and thorough grounding—and a real understanding—that will make you productive quickly."

—Charles F. Godfarb



Download the source code (tar.gz file, 48 kb).

Sample Content

Online Sample Chapter

Working with XML and Information Systems

Table of Contents


1. XML and Information Systems.

Representing Data Digitally. XML and Digital Data. Information Systems. XML and Information Systems.

2. The XML Processing Model.

A Bit of XML History. An Introduction to XML namespaces. Documents and Parsers. The Result of Parsing.

3. Views of Documents.

Documents Viewed as Events. Documents Viewed as Trees. Virtual Views. Virtual Documents.

4. Common Processing Tasks.

Serialization and Deserialization. Transformation. Validation. Modification. Information Extraction.

5. Characters—The Atoms of Text.

Terminology. Digital Text. Important Character Standards. Characters in Programming Languages. Further Problems.


6. Event-Based Processing.

Benefits and Disadvantages. Writing Event-based Applications. Tools for Event-based Processing. RSS: An Example Application.

7. Using The XML Parsers.

Xmlproc. Pyexpat. Xmllib. Xerces-C/Pirxx. Working in Jython. Choosing a Parser.

8. SAX: An Introduction.

Background and history. Introduction. The SAX classes. Two Example Applications. Python SAX Utilities.

9. Using SAX.

An Introduction to XBEL. Thinking in SAX. Application-specific data representations. Example Applications. Tips and Tricks. Speed.

10. Advanced SAX.

The Advanced Parts of the API. Parser filters. Working with Entities. Mapping non-XML data to XML.


11. DOM: An Introduction.

Tree-based Processing. Getting to Know DOM. A DOM Overview. Fundamental DOM Interfaces. A Simple Example Application. Extended DOM Interfaces.

12. Using DOM.

Creating DOM trees. DOM Serialization. Some Examples. An Example: A Tree Walker.

13. Advanced DOM.

Other DOM Implementations. The HTML Part of the DOM. DOM Level 2. Future Directions for DOM. DOM Performance.

14. Other Tree-Based APIs.

qp_xml. Groves.


15. XSLT: Introduction.

Declarative Processing. XSLT Background. Introducing XSLT. Two Complete XSLT Examples.

16. XSLT in More Detail.

Xpath in Detail. Advanced XSLT Topics. Advanced XSLT Examples. XSLT Performance.

17. Using XSLT In Applications.

The XSLT Processor APIs. Larger Examples of XSLT Programming. Using Xpath in Software. The Future of XSLT.

18. Architectural Forms.

Introduction to Architectural Forms. Uses of Architectural Forms. Architectural Forms Software. An Example.


19. SAX in Java.

XML and Java. Java XML Parsers. The Java Version of SAX. JAXP. Java SAX APIs. Java SAX Examples.

20. DOM in Java.

JAXP and the DOM. The Java DOM APIs. Using Some Java DOMs. JDOM.

21. Using XSLT In Java Applications.

Using JAXP. The Saxon XSLT Processor. The Xalan XSLT Processor.


22. Other Approaches to Processing.

Pull APIs. RXP. Hybrid Event/Tree-based Approaches. Simplified Approaches.

23. Schemas.

Schemas and XML. Validating Documents. DTD Programming.

24. Creating XML.

Creating XML from HTML. Creating XML from SGML. Creating XML from Other Document Formats. Creating XML from Data Formats.

25. The Tabproc Framework.

Input Handling. Generating XML from Tables. A SAX XMLReader Output. Examples of Use.

26. The RSS Development Kit.

The RSS Object Structure. The Client Kit. The Config Module. The RSS email client. The GUI RSS client. The RSS editor.


Appendix A. A Lightning Introduction to Python.

A Quick Introduction. Basic Building Blocks. An Example Program. Classes and Objects. Various Useful APIs.

Appendix B. Glossary of Terms.

CDATA Marked Sections. Character Data. Character References. Document Element. Document Entity. Document Order. Mixed Content. Processing Instruction. Replacement Text. Standalone Declaration. Text. Text Declaration. XML Declaration.

Appendix C. The Python XML Packages.

The Python Interpreter. The Python XML-SIG package. 4Suite. Sab-pyth. RXP. Pysp. The Easy Ones. Java Packages.




This book was written to help you develop applications that use XML.It focuses on general principles and techniques, aiming to give youknowledge that will remain valuable even after the standards andtools described have evolved a few iterations further than they aretoday.

The text approaches XML by asking questions like: What problems isXML used to solve? What general approaches to these problems exist,and what tools support them? What other technologies is XML relatedto? How can XML be used with these other technologies? Afterreading the book you should, whenever you need to do something withXML, be able to think of several different ways of solving yourproblem and to choose the best of these.

Who is this book for?

This book is written for developers, and much of it requires knowledgeof programming. In general, the reader is expected tohave done enough object-oriented programming to know what a class or amethod is. The chapters in the first part of the book, as well as thefirst two chapters on XSLT, do not require programming knowledge andcould probably be useful to anyone who is familiar with XML.

This book is not an introduction to XML, as it assumes that you knowwhat XML is and have some familiarity with its main features. It doesnot assume that you are an expert, however, and will explain many ofthe subtler aspects of XML that have consequences for softwaredevelopment.

Although the book uses Python in the source code examples, knowingPython is not a prerequisite, since the book contains an Appendix A,"A lightning introduction to Python," on page 1054. Readers who arenot familiar with Python are strongly encouraged to read this appendixbefore going on to the rest of the book.

What the book covers

The book begins with a look at XML from the point of view of softwaredevelopment, comparing it to other related technologies. Many ofthe subtler aspects of XML, the XML family of standards, as well as theirrelationship to software development are also examined. Much space isdevoted to the principles of XML software development, usingparsers, and the existing techniques for development.

Three chapters are dedicated to each of the two most important XMLprogramming APIs: the SAX and the DOM. Two chapters are dedicated to XSLT.In addition to these standards, several lesser-known APIs, toolsand technologies are described. Some are included because of theirutility, others were meant to put the main technologies in perspective.

The last part of the book describes XML application design issues inmore detail and provides some larger examples that presentcomplete XML applications or toolkits.

In Appendix C, "Python XML packages," there is a description of variousdistributions of Python XML software and how to install each of thesedistributions. If you are new to XML processing with Python, it isprobably a good idea to look over this appendix before starting toread the tool-related parts of the book. Installing the tools so thatyou have them available and can play around with them as you read mayalso be a good idea.

The programming language

Python is a very high-level programming language that is unusuallywell suited for information-centric program development, since it hasexcellent support for creation and manipulation of data structures. It isa simple language, in many ways similar to the more widespread languages,such as Java, C++, and Visual Basic, but easier to understand and use.

This means that even though you may not understand Python now, youwill be able to learn it quickly. In general, I have found thatdevelopers need to study Python for two days in order to be able tocontribute usefully to projects. And since Python has so much incommon with other languages, you should be able to make use of whatyou learn even if you usually develop in other languages.

This book mainly uses Python in examples and does in fact have ageneral bent towards Python. Why this is so, and what is sointeresting about Python, may not be immediately obvious to you, sothis section explains what Python is and why it is so interesting.However, even though the book uses Python, it is intended to be usefulto all XML programmers, regardless of what programming languages theyknow or want to do XML programming in.

What is it?

Python is a programming language. It has often been called a scriptinglanguage, but I think this is a little misleading. The image, the term"scripting language" evokes in me is of a simple little language,dynamically typed and easy to use for amateurs, unsuitable for largeapplications, not as powerful as a "real" programming language, anddefinitely slower.

Python, however, is very much a "real" programming language, but atthe same time it has some of the characteristics of a scriptinglanguage. It is simple, it is very dynamic, it is easy to use foramateurs, and it is slower than compiled languages such as C++, Eiffel,and Common Lisp. At the same time, however, it is very powerful,certainly every bit as powerful as Java, if not more, and eminentlysuitable for large applications. Among the things that have beenwritten in Python are CORBA ORBs, Web browsers, relational databaseengines, validating XML parsers, and a full XSLT engine.

I often describe it as "Perl done right," and Python does have a lotin common with Perl. It is a scripting-like language, very suitablefor text processing and systems programming, with excellent operatingsystem integration and with many of the same features. (In fact,Perl's object-oriented features are modeled on Python's objectmodel.) Python is also like Perl in that it was created by asingle person for his own needs (Guido van Rossum), it used tobe distributed as a single widely-ported open source interpreterimplementation (there are now more than one), it is closely connectedto the Internet and Unix, etc., etc.

At the same time, Python has much in common with Java, in that itis dynamic (much more so than Java) and object-oriented, hasexceptions, has a very similar package model, supports in-programdocumentation, and Python byte-code can also be transferredacross a network and executed in a restricted environment.

I am something of a programming language freak and have donedevelopment in at least a dozen different programming languages, andstudied many more. In my experience, Python stands out because it is soeasy and natural to develop in, something that makes Pythondevelopment just plain nice and fun. Returning to Java or C++ afterdoing Python development simply feels painful and awkward. I thinkthis is because Python is so clean, simple, and predictable, with fewsurprises or restrictions and with a large set of ready-made andeasy-to-use libraries. Paul Prescod (affectionately known asthe "St. Paul" of Python evangelism) has said that "Python is alanguage that gets its tradeoffs exactly right," which sums it uppretty well.

A common denominator

Another reason for choosing Python is that no matter which programminglanguage the reader is already used to, Python should be easy to pickup, at least well enough to read. The syntax is clear and simple, and theconcepts in the language are very similar to those of mainstreamlanguages such as Java, C, C++, Visual Basic, and Perl. So Pythonshould not be an obstacle for any reader. In fact, it has often beendescribed as "executable pseudo-code," and you will see it used aspseudo-code in some parts of the book.

Furthermore, using Python does not limit us to a single platform.Python runs just as well on Unix as it does on Mac or Windows,or even on a Psion palmtop or a VMS machine.

Python can talk to anything

One of the most appealing aspects of Python is that it is very wellintegrated with the rest of the world. This means that choosing Pythonhardly ever shuts you off from some technology or system that youwould like your programs to interact with. For example, Microsoftfans will quickly discover that the Windows version of Python can talkto COM objects, create COM servers, connect to ActiveX, DDE, the Win32API, the Windows registry, MFC, Windows Scripting Host, ADO, ODBC, andso on and so forth. In other words, even though Python is highlyportable, you don't have to give up anything under Windows justbecause you use Python.

Many people, however, prefer to use something other than Windows, suchas the Mac. Python is technologically agnostic, so it allows thesepeople to have their way as well. Python runs on Mac, and the Macversion can access the communications toolbox, the font manager, thespeech manager, the sound manager, the QuickTime services, and so on.

Other people believe in Unix and would rather use Python there.Again, this is no problem: the Unix versions of Python fit very wellinto Unix, and there are bindings for things such as Qt, KDE, Gtk,GNOME, Irix and Solaris sound modules, special Linux APIs, etc.

Yet others would like to remain pure, platform-wise, and prefer astrictly operating system-independent platform such as Java. Pythoncan accommodate these people too! Jython (the interpreter formerlyknown as JPython) is an implementation of Python written in 100%Java which lets you run Python programs inside the Java virtualmachine. You can use this as an embedded scripting language for anapplication, or simply write Python programs with full access tothe nice Java stuff such as Swing, JDBC, Jini, RMI, etc.

And of course, apart from the platform issues, most of us wouldlike to be able to speak Internet protocols and connect to otherindependent technologies. Again, Python can help. There are severalways to connect Python with CORBA, a standardized relational databaseAPI (a JDBC for Python), lots of XML tools (of course), LDAP modules,and so on. And the interpreter comes with librariessupporting FTP, HTTP, gopher, NNTP, SMTP, IMAP, POP, HTML, URL parsing,and much more out of the box.

To put it another way, Python is buzzword-friendly and TLA-compatible (i.e., Three-Letter Acronym—Often used as a synonym for technologies, since many of them have three-letter acronym).

Python is a natural fit for XML programming

Whenever I have to write an XML program of some sort, I usuallythink of Python first as the programming language to write it in.There are several reasons for this, the most important beingthat Python is so easy and natural to program in and that it is verywell suited for text processing. It is also very easy to build datastructures in Python, something that is very important for XMLprocessing.

Another thing is that for anything that involves moving informationbetween different systems, Python is a natural choice, given thatwhatever these systems may be, Python can very likely talk tothem.

Also, Python is highly suitable for the many little programs andscripts that you write to do the small but necessary tasksthat usually appear during a project. Doing everything in Python makesit easy to turn prototypes into full programs, and it also means thatwhenever a little script has to be developed further, the full toolbox implemented for that application is already available to you.


Of course, I made changes after all the reviewers read the text,and in so doing no doubt introduced new errors. The honour for these errors, as well as for those so subtly hidden that they escaped the eyesof all my reviewers, and even the Editor himself, I should like toreserve for myself. You will find the best of these listed onhttp://www.garshol.priv.no/download/text/ph1/errata.html.If you spot one that is not on the list already, I would like to hearabout it.


The main problem with books about Internet technology is that thetechnology changes so quickly that books rapidly become dated. To helpyou decide whether the book is up to date or not, here is a list ofthe various standards and tools covered in this book, and the versionof each that the book is based on.

The versions of tools and standards covered in this book



Release date




















1.0 2nd ed.


XML Infoset









PyXML package







1.1 WD







Not yet published



Not yet published





1.0 WD



1.0 CR







Not yet published

In the table above, WD is used as an abbreviation for "Working Draft,"and CR for "Candidate Recommendation," both meaning W3C specificationsthat are still work in progress.


Submit Errata

More Information

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information

To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.


Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.


If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information

Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.


This site is not directed to children under the age of 13.


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information

If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information

Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents

California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure

Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact

Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice

We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020