Home > Store

Processing XML with Java?: A Guide to SAX, DOM, JDOM, JAXP, and TrAX

Register your product to gain access to bonus material or receive a coupon.

Processing XML with Java?: A Guide to SAX, DOM, JDOM, JAXP, and TrAX


  • Sorry, this book is no longer in print.
Not for Sale



World-renowned, best-selling author Elliotte Rusty Harold teaches Java programmers how to put XML to work!

° Author is uniquely qualified to write this book! He is the author of numerous wellreceived books on Java and has written two of the best-selling books on XML. This is the book that brings his skills together.

° Harold has a fantastic reputation, is skilled, and has excellent publicity channels.

° A complete tutorial about writing Java programs that read and write XML documents.


  • Copyright 2003
  • Dimensions: 7-3/8" x 9-1/4"
  • Pages: 1120
  • Edition: 1st
  • Book
  • ISBN-10: 0-201-77186-1
  • ISBN-13: 978-0-201-77186-2

Java is the ideal language for processing XML documents. Consequently, more XML tools have been written in Java than in any other language. More open source XML tools are written in Java than in any other language. Processing XML with Java fills an immediate need for developers who are working with XML in Java. It is a comprehensive tutorial and reference to the major APIs. This book shows developers how to: save XML documents from their applications written in Java; read XML documents produced by other programs; communicate with network servers that send and receive XML data; validate documents they receive against DTDs, schemas, and business rules; and integrate XSLT into their programs.


Related Article

Elliotte Rusty Harold's 10 Must-Have Technical Books

Author's Site

Click below for Web Resources related to this title:
Author's Web Site

Sample Content

Online Sample Chapters

Converting Flat Files to XML

Processing XML with Java: Reading XML

Downloadable Sample Chapter

Untitled Document

Click below for Sample Chapter(s) related to this title:

Chapter 4: Converting Flat Files to XML

Chapter 5: Reading XML

Table of Contents

(NOTE: Each chapter concludes with a Summary.)

List of Examples.

List of Figures.


Who You Are.

What You Need to Know.

What You Need to Have.

How to Use This Book.

The Online Edition.

Some Grammatical Notes.

Contacting the Author.



1. XML for Data.

Motivating XML.

A Thought Experiment.




XML Syntax.

XML Documents.

XML Applications.

Elements and Tags.



XML Declaration.


Processing Instructions.







The Last Mile.



Associating Stylesheets with XML Documents.


2. XML Protocols: XML-RPC and SOAP.

XML as a Message Format.


Data Representation.

HTTP as a Transport Protocol.

How HTTP Works.

HTTP in Java.


Customizing the Request.

Query Strings.

How HTTP POST Works.


Data Structures.


Validating XML-RPC.


A SOAP Example.

Posting SOAP Documents.


Encoding Styles.

SOAP Headers.

SOAP Limitations.

Validating SOAP.

Custom Protocols.

3. Writing XML with Java.

Fibonacci Numbers.

Writing XML.

Better Coding Practices.


Producing Valid XML.


Output Streams, Writers, and Encodings.

A Simple XML-RPC Client.

A Simple SOAP Client.


4. Converting Flat Files to XML.

The Budget.

The Model.


Determining the Output Format.



Building Hierarchical Structures from Flat Data.

Alternatives to Java.

Imposing Hierarchy with XSLT.

The XML Query Language.

Relational Databases.

5. Reading XML.

InputStreams and Readers.

XML Parsers.

Choosing an XML API.

Choosing an XML Parser.

Available Parsers.









6. SAX.

What Is SAX?


Callback Interfaces.

Implementing ContentHandler.

Using the ContentHandler.

The DefaultHandler Adapter Class.

Receiving Documents.

Receiving Elements.

Handling Attributes.

Receiving Characters.

Receiving Processing Instructions.

Receiving Namespace Mappings.

“Ignorable White Space”.

Receiving Skipped Entities.

Receiving Locators.

What the ContentHandler Doesn't Tell You.

7. The XMLReader Interface.

Building Parser Objects.




Exceptions and Errors.


The ErrorHandler Interface.

Features and Properties.

Getting and Setting Features.

Getting and Setting Properties.

Required Features.

Standard Features.

Standard Properties.

Xerces Custom Features.

Xerces Custom Properties.


8. SAX Filters.

The Filter Architecture.

The XMLFilter Interface.

Content Filters.

Filtering Tags.

Filtering Elements.

Filtering Attributes.

Filters That Add Content.

Filters versus Transforms.

The XMLFilterImpl Class.

Parsing Non-XML Documents.

Multihandler Adapters.


The Document Object Model.

The Evolution of DOM.

DOM Modules.

Application-Specific DOMs.


Document Nodes.

Element Nodes.

Attribute Nodes.

Leaf Nodes.

Nontree Nodes.

What Is and Isn't in the Tree.

DOM Parsers for Java.

Parsing Documents with a DOM Parser.

JAXP DocumentBuilder and DocumentBuilderFactory.

DOM3 Load and Save.

The Node Interface.

Node Types.

Node Properties.

Navigating the Tree.

Modifying the Tree.

Utility Methods.

The NodeList Interface.

JAXP Serialization.


Choosing between SAX and DOM.

10. Creating XML Documents with DOM.


Locating a DOMImplementation.

Implementation-Specific Class.

JAXP DocumentBuilder.

DOM3 DOMImplementationRegistry.

The Document Interface as an Abstract Factory.

The Document Interface as a Node Type.

Getter Methods.

Finding Elements.

Transferring Nodes between Documents.


11 The DOM Core.

The Element Interface.

Extracting Elements.


The NamedNodeMap Interface.

The CharacterData Interface.

The Text Interface.

The CDATASection Interface.

The EntityReference Interface.

The Attr Interface.

The ProcessingInstruction Interface.

The Comment Interface.

The DocumentType Interface.

The Entity Interface.

The Notation Interface.

12. The DOM Traversal Module.


Constructing NodeIterators with DocumentTraversal.


Filtering by Node Type.



13. Output from DOM.

Xerces Serialization.


DOM Level 3.

Creating DOMWriters.

Serialization Features.

Filtering Output.


14. JDOM.

What Is JDOM?

Creating XML Elements with JDOM.

Creating XML Documents with JDOM.

Writing XML Documents with JDOM.

Document Type Declarations.


Reading XML Documents with JDOM.

Navigating JDOM Trees.

Talking to DOM Programs.

Talking to SAX Programs.

Configuring SAXBuilder.


Java Integration.

Serializing JDOM Objects.

Synchronizing JDOM Objects.

Testing Equality.

Hash Codes.

String Representations.


What JDOM Doesn't Do.

15. The JDOM Model.

The Document Class.

The Element Class.


Navigation and Search.


The Attribute Class.

The Text Class.

The CDATA Class.

The ProcessingInstruction Class.

The Comment Class.


The DocType Class.

The EntityRef Class.

V. XPath/XSLT.

16. XPath.


The XPath Data Model.

Location Paths.


Node Tests.


Compound Location Paths.

Absolute Location Paths.

Abbreviated Location Paths.

Combining Location Paths.





XPath Engines.

XPath with Saxon.

XPath with Xalan.

DOM Level 3 XPath.

Namespace Bindings.


Compiled Expressions.


17. XSLT.

XSL Transformations.

Template Rules.


Taking the Value of a Node.

Applying Templates.

The Default Template Rules.


Calling Templates by Name.


Thread Safety.

Locating Transformers.

The xml-stylesheet Processing Instruction.


XSLT Processor Attributes.

URI Resolution.

Error Handling.

Passing Parameters to Stylesheets.

Output Properties.

Sources and Results.

Extending XSLT with Java.

Extension Functions.

Extension Elements.


Appendix A: XML API Quick Reference.






The DOM Data Model.



















Appendix B: SOAP 1.1 Schemas.

The SOAP 1.1 Envelope Schema.

The SOAP 1.1 Encoding Schema.

W3C Software Notice and License.

Appendix C: Recommended Reading.



Index. 0201771861T10222002


One night five developers, all of whom wore very thick glasses and had recently been hired byElephants, Inc., the world's largest purveyor of elephants and elephant supplies, were familiarizingthemselves with the company's order processing system when they stumbled into a directoryfull of XML documents on the main server. "What's this?" the team leader asked excitedly.None of them had ever heard of XML before so they decided to split up the files between themand try to figure out just what this strange and wondrous new technology actually was.

The first developer, who specialized in optimizing Oracle databases, printed out a stack of FMPXMLRESULTdocuments generated by the FileMaker database where all the orders werestored, and began pouring over them. "So this is XML! Why, it's nothing novel. As anyone cansee who's able, an XML document is nothing but a table!"

"What do you mean, a table?" replied the second programmer, well versed in object orientedtheory and occupied with a collection of XMI documents that encoded UML diagrams for thesystem. "Even a Visual Basic programmer could see that XML documents aren't tables. Duplicatesaren't allowed in a table relation, unless this is truly some strange mutation. Classes andobjects is what these document are. Indeed, it should be obvious on the very first pass. An XMLdocument is an object and a DTD is a class."

"Objects? A strange kind of object, indeed!" said the third developer, a web designer of somerenown, who had loaded the XHTML user documentation for the order processing system intoMozilla. "I don't see any types at all. If you think this is an object, then it's your software Irefuse to install. But with all those stylesheets there, it should be clear to anyone not sedated,that XML is just HTML updated!"

"HTML? You must be joking" said the fourth, a computer science professor on sabbatical fromMIT, who was engrossed in an XSLT stylesheet that validated all the other documents against aSchematron schema. "Look at the clean nesting of hierarchical structures, each tag matching itspartner as it should. I've never seen HTML that looks this good. What we have here is Sexpressions,which is certainly nothing new. Babbage invented this back in 1882!"

"S expressions?" queried the technical writer, who was occupied with documentation for theproject written in DocBook. "Maybe that means something to those in your learned profession.But to me, this looks just like a FrameMaker MIF file. However, locating the GUI does seem tobe taking me awhile."

And so they argued into the night, none of them willing to give an inch, all of them presentingstill more examples to prove their points, none of them bothering to look at the others' examples.Indeed, they're probably still arguing today. You can even hear their shouts from time totime on xml-dev. Their mistake, of course, was in trying to force XML into the patterns of technologiesthey were already familiar with rather than taking it on its own terms. XML can storedata, but it is not a database. XML can serialize objects, but an XML document is not an object.Web pages can be written in XML, but XML is not HTML. Functional (and other) programminglanguages can be written in XML, but XML is not a programming language. Books arewritten in XML, but that doesn't make XML desktop publishing software.

XML is something truly new that has not been seen before in the world of computing. Therehave been precursors to it, and there are always fanatics who insist on seeing XML throughdatabase (or object, or functional, or S-expression) colored glasses. But XML is none of thesethings. It is something genuinely unique and new in the world of computing; and it can only beunderstood when you're willing to accept it on its own terms, rather than forcing it into yesterday'spigeon holes.

There are a lot of tools, APIs, and applications in the world that try to pretend XML is somethingmore familiar to programmers; that it's just a funny kind of database, or just like an object,or just like remote procedure calls. These APIs are occasionally useful in very restricted andpredictable environments. However, they are not suitable for processing XML in its most generalformat. They work well in their limited domains, but they fail when presented with XMLthat steps outside the artificial boundaries they've defined. XML was designed to be extensible,but it's a sad fact that many of the tools designed for XML aren't nearly as extensible as XMLitself.

This book is going to show you how to handle XML in its full generality. It pulls no punches. Itdoes not pretend that XML is anything except XML, and it shows you how to design your programsso that they handle real XML in all its messiness: valid and invalid, mixed and unmixed,typed and untyped, and both all and none of these at the same time. To that end, this book focuseson those APIs that don't try to hide the XML. In particular, there are three major JavaAPIs that correctly model XML, as opposed to modeling a particular class of XML documentsor some narrow subset of XML. These are:

  • SAX, the Simple API for XML
  • DOM, the Document Object Model
  • JDOM, a Java native API

These APIs are the core of this book. In addition I cover a number of preliminaries and supplementsto the basic APIs including:

  • XML syntax
  • DTDs, schemas, and validity
  • XPath
  • XSLT and the TrAX API
  • JAXP, a combination of SAX, DOM, and TrAX with a few factory classes

And, since we're going to need a few examples of XML applications to demonstrate the APIswith, I also cover XML-RPC, SOAP, and RSS in some detail. However, the techniques thisbook teaches are hardly limited to just those three applications.

Who You Are

This book is written for experienced Java programmers who want to integrate XML into theirsystems. Java is the ideal language for processing XML documents. Its strong Unicode supportin particular made it the preferred language for many early implementers. Consequently, moreXML tools have been written in Java than in any other language. More open source XML toolsare written in Java than in any other language. More programmers process XML in Java than inany other language.

Processing XML with Java™ will teach you how to:
  • Save XML documents from applications written in Java
  • Read XML documents produced by other programs
  • Search, query, and update XML documents
  • Convert legacy flat data into hierarchical XML
  • Communicate with network servers that send and receive XML data
  • Validate documents against DTDs, schemas, and business rules
  • Combine functional XSLT transforms with traditional imperative Java code

This book is meant for Java programmers who need to do anything with XML. It teaches thefundamentals and advanced topics, leaving nothing out. It is a comprehensive course in processingXML with Java that takes developers from little knowledge of XML to designing sophisticatedXML applications and parsing complicated documents. The examples cover a wide rangeof possible uses including file formats, data exchange, document transformation, database integration,and more.

What You Need to Know

This is not an introductory book with respect to either Java or XML. I assume you have substantialprior experience with Java and preferably some experience with XML. On the Java side, Iwill freely use advanced features of the language and its class library without explanation orapology. Among other things, I assume you are thoroughly familiar with:

  • Object oriented programming including inheritance and polymorphism
  • Packages and the CLASSPATH. You should not be surprised by classes that do not have
  • main() methods or that are not in the default package.
  • I/O including streams, readers, and writers. You should understand that System.out is a
  • horrible example of what really goes on in Java programs.
  • The Java Collections API including hash tables, maps, sets, iterators, and lists.

In addition, in one or two places in this book I'm going to use some SQL and JDBC. However,these sections are relatively independent of the rest of the book; and chances are if you aren't alreadyfamiliar with SQL, then you don't need the material in these sections anyway.

What You Need to Have

XML is deliberately architecture, platform, operating system, GUI, and language agnostic (infact, more so than Java). It works equally well on Mac OS, Windows, Linux, OS/2, various flavorsof Unix, and more. It can be processed with Python, C++, Haskell, ECMAScript, C#, Perl,Visual Basic, Ruby, and of course Java. No byte order issues need concern you if you switch betweenPowerPC, X86, or other architectures. Almost everything in this book should workequally well on any platform that's capable of running Java.

Most of the material in this book is relatively independent of the specific Java version. Java 1.4bundles SAX, DOM, and a few other useful classes into the core JDK. However, these are easilyinstalled in earlier JVMs as open source libraries from the Apache XML Project and othervendors. For the most part, I used Java 1.3 and 1.4 when testing the examples; and it's possiblethat a few classes and methods have been used that are not available in earlier versions. In mostcases, it should be fairly obvious how to backport them. All of the basic XML APIs exceptTrAX should work in Java 1.1 and later. TrAX requires Java 1.2 or later.

How to Use This Book

This book is organized as an advanced tutorial that can also serve as a solid and comprehensivereference. The first chapter covers the bare minimum material needed to start working withXML, though for the most part this is intended more as a review for readers who've alreadyread other, more basic books than as a comprehensive introduction. The second chapter introducesRSS, XML-RPC, and SOAP, the XML applications we'll be using for examples in therest of the book. This is followed by two chapters on generating XML from your own programs(a subject which is all too often presented as a lot more complicated than it actually is). The firstcovers generating XML directly from code. The second covers converting legacy data in otherformats to XML. The remaining bulk of the book is devoted to the major APIs for processingXML:

  • The event based SAX API
  • The tree-based DOM API
  • The tree-based JDOM API
  • XPath APIs for searching XML documents
  • The TrAX API for XSLT processing

Finally, the book finishes with an appendix providing quick references to the main APIs.If you have limited experience with XML, I suggest you read at least the first five chapters inorder. From that point forward, if you have a particular API preference, you may begin with thepart covering the major API you're interested in:

  • Chapters 6-8 cover SAX
  • Chapters 9-13 cover DOM
  • Chapters 14 and 15 cover JDOM

Once you're comfortable with one or more of these APIs, you can read Chapters 16 and 17 onXPath and XSLT. However, those APIs and chapters do require some knowledge of at least oneof the three major APIs.



Click below to download the Index file related to this title:


Submit Errata

More Information

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information

To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.


Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.


If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information

Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.


This site is not directed to children under the age of 13.


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information

If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information

Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents

California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure

Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact

Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice

We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020