Gives students a single, convenient source for virtually all the XML development resources they need.
Serves the needs of a far wider range of students, and enables students to work successfully in a wider range of development environments.
Helps students with an exceptionally broad cross-section of the XML development challenges they are likely to encounter.
Students benefit from an authoritative, insider's look at state-of-the-art XML development.
Introduces students to powerful tools for streamlining XML development and building richer, more robust XML applications.
The start-to-finish guide to XML development for every experienced developer!
In this book, leading XML developer Lars Marius Garshol covers every essential aspect of XML programming, from basic principles through advanced techniques, utilizing DOM, SAX, XSLT, XPath, schemas, and other key XML standards. Garshol presents scores of code examples based on Python, a cross-platform language that is exceptionally well suited for XML development. Garshol also presents new insights into XML application design and optimization, as well as complete sample applications. Coverage includes:
You'll even find a quick introductory course in Python and an XML developer's glossary.
Whatever your application-from content management through enterprise application integration-Developing XML Applications gives you the resources, skills, insights, and example code you need to build it right.
"The range of XML application domains is growing dramatically, but there are common strategies and techniques for XML development that apply to all of them. This book provides a systematic and thorough groundingand a real understandingthat will make you productive quickly."
Charles F. Godfarb
I. WORKING WITH XML.1. XML and Information Systems.
Representing Data Digitally. XML and Digital Data. Information Systems. XML and Information Systems.2. The XML Processing Model.
A Bit of XML History. An Introduction to XML namespaces. Documents and Parsers. The Result of Parsing.3. Views of Documents.
Documents Viewed as Events. Documents Viewed as Trees. Virtual Views. Virtual Documents.4. Common Processing Tasks.
Serialization and Deserialization. Transformation. Validation. Modification. Information Extraction.5. Characters—The Atoms of Text.
Terminology. Digital Text. Important Character Standards. Characters in Programming Languages. Further Problems.
II. EVENT-BASED PROCESSING.6. Event-Based Processing.
Benefits and Disadvantages. Writing Event-based Applications. Tools for Event-based Processing. RSS: An Example Application.7. Using The XML Parsers.
Xmlproc. Pyexpat. Xmllib. Xerces-C/Pirxx. Working in Jython. Choosing a Parser.8. SAX: An Introduction.
Background and history. Introduction. The SAX classes. Two Example Applications. Python SAX Utilities.9. Using SAX.
An Introduction to XBEL. Thinking in SAX. Application-specific data representations. Example Applications. Tips and Tricks. Speed.10. Advanced SAX.
The Advanced Parts of the API. Parser filters. Working with Entities. Mapping non-XML data to XML.
III. TREE-BASED PROCESSING.11. DOM: An Introduction.
Tree-based Processing. Getting to Know DOM. A DOM Overview. Fundamental DOM Interfaces. A Simple Example Application. Extended DOM Interfaces.12. Using DOM.
Creating DOM trees. DOM Serialization. Some Examples. An Example: A Tree Walker.13. Advanced DOM.
Other DOM Implementations. The HTML Part of the DOM. DOM Level 2. Future Directions for DOM. DOM Performance.14. Other Tree-Based APIs.
IV. DECLARATIVE PROCESSING.15. XSLT: Introduction.
Declarative Processing. XSLT Background. Introducing XSLT. Two Complete XSLT Examples.16. XSLT in More Detail.
Xpath in Detail. Advanced XSLT Topics. Advanced XSLT Examples. XSLT Performance.17. Using XSLT In Applications.
The XSLT Processor APIs. Larger Examples of XSLT Programming. Using Xpath in Software. The Future of XSLT.18. Architectural Forms.
Introduction to Architectural Forms. Uses of Architectural Forms. Architectural Forms Software. An Example.
V. XML DEVELOPMENT IN JAVA.19. SAX in Java.
XML and Java. Java XML Parsers. The Java Version of SAX. JAXP. Java SAX APIs. Java SAX Examples.20. DOM in Java.
JAXP and the DOM. The Java DOM APIs. Using Some Java DOMs. JDOM.21. Using XSLT In Java Applications.
Using JAXP. The Saxon XSLT Processor. The Xalan XSLT Processor.
VI. XML PROCESSING IN DEPTH.22. Other Approaches to Processing.
Pull APIs. RXP. Hybrid Event/Tree-based Approaches. Simplified Approaches.23. Schemas.
Schemas and XML. Validating Documents. DTD Programming.24. Creating XML.
Creating XML from HTML. Creating XML from SGML. Creating XML from Other Document Formats. Creating XML from Data Formats.25. The Tabproc Framework.
Input Handling. Generating XML from Tables. A SAX XMLReader Output. Examples of Use.26. The RSS Development Kit.
The RSS Object Structure. The Client Kit. The Config Module. The RSS email client. The GUI RSS client. The RSS editor.
VII. APPENDICES.Appendix A. A Lightning Introduction to Python.
A Quick Introduction. Basic Building Blocks. An Example Program. Classes and Objects. Various Useful APIs.Appendix B. Glossary of Terms.
CDATA Marked Sections. Character Data. Character References. Document Element. Document Entity. Document Order. Mixed Content. Processing Instruction. Replacement Text. Standalone Declaration. Text. Text Declaration. XML Declaration.Appendix C. The Python XML Packages.
The Python Interpreter. The Python XML-SIG package. 4Suite. Sab-pyth. RXP. Pysp. The Easy Ones. Java Packages.Index.
This book was written to help you develop applications that use XML.It focuses on general principles and techniques, aiming to give youknowledge that will remain valuable even after the standards andtools described have evolved a few iterations further than they aretoday.
The text approaches XML by asking questions like: What problems isXML used to solve? What general approaches to these problems exist,and what tools support them? What other technologies is XML relatedto? How can XML be used with these other technologies? Afterreading the book you should, whenever you need to do something withXML, be able to think of several different ways of solving yourproblem and to choose the best of these.
This book is written for developers, and much of it requires knowledgeof programming. In general, the reader is expected tohave done enough object-oriented programming to know what a class or amethod is. The chapters in the first part of the book, as well as thefirst two chapters on XSLT, do not require programming knowledge andcould probably be useful to anyone who is familiar with XML.
This book is not an introduction to XML, as it assumes that you knowwhat XML is and have some familiarity with its main features. It doesnot assume that you are an expert, however, and will explain many ofthe subtler aspects of XML that have consequences for softwaredevelopment.
Although the book uses Python in the source code examples, knowingPython is not a prerequisite, since the book contains an Appendix A,"A lightning introduction to Python," on page 1054. Readers who arenot familiar with Python are strongly encouraged to read this appendixbefore going on to the rest of the book.
The book begins with a look at XML from the point of view of softwaredevelopment, comparing it to other related technologies. Many ofthe subtler aspects of XML, the XML family of standards, as well as theirrelationship to software development are also examined. Much space isdevoted to the principles of XML software development, usingparsers, and the existing techniques for development.
Three chapters are dedicated to each of the two most important XMLprogramming APIs: the SAX and the DOM. Two chapters are dedicated to XSLT.In addition to these standards, several lesser-known APIs, toolsand technologies are described. Some are included because of theirutility, others were meant to put the main technologies in perspective.
The last part of the book describes XML application design issues inmore detail and provides some larger examples that presentcomplete XML applications or toolkits.
In Appendix C, "Python XML packages," there is a description of variousdistributions of Python XML software and how to install each of thesedistributions. If you are new to XML processing with Python, it isprobably a good idea to look over this appendix before starting toread the tool-related parts of the book. Installing the tools so thatyou have them available and can play around with them as you read mayalso be a good idea.
Python is a very high-level programming language that is unusuallywell suited for information-centric program development, since it hasexcellent support for creation and manipulation of data structures. It isa simple language, in many ways similar to the more widespread languages,such as Java, C++, and Visual Basic, but easier to understand and use.
This means that even though you may not understand Python now, youwill be able to learn it quickly. In general, I have found thatdevelopers need to study Python for two days in order to be able tocontribute usefully to projects. And since Python has so much incommon with other languages, you should be able to make use of whatyou learn even if you usually develop in other languages.
This book mainly uses Python in examples and does in fact have ageneral bent towards Python. Why this is so, and what is sointeresting about Python, may not be immediately obvious to you, sothis section explains what Python is and why it is so interesting.However, even though the book uses Python, it is intended to be usefulto all XML programmers, regardless of what programming languages theyknow or want to do XML programming in.
Python is a programming language. It has often been called a scriptinglanguage, but I think this is a little misleading. The image, the term"scripting language" evokes in me is of a simple little language,dynamically typed and easy to use for amateurs, unsuitable for largeapplications, not as powerful as a "real" programming language, anddefinitely slower.
Python, however, is very much a "real" programming language, but atthe same time it has some of the characteristics of a scriptinglanguage. It is simple, it is very dynamic, it is easy to use foramateurs, and it is slower than compiled languages such as C++, Eiffel,and Common Lisp. At the same time, however, it is very powerful,certainly every bit as powerful as Java, if not more, and eminentlysuitable for large applications. Among the things that have beenwritten in Python are CORBA ORBs, Web browsers, relational databaseengines, validating XML parsers, and a full XSLT engine.
I often describe it as "Perl done right," and Python does have a lotin common with Perl. It is a scripting-like language, very suitablefor text processing and systems programming, with excellent operatingsystem integration and with many of the same features. (In fact,Perl's object-oriented features are modeled on Python's objectmodel.) Python is also like Perl in that it was created by asingle person for his own needs (Guido van Rossum), it used tobe distributed as a single widely-ported open source interpreterimplementation (there are now more than one), it is closely connectedto the Internet and Unix, etc., etc.
At the same time, Python has much in common with Java, in that itis dynamic (much more so than Java) and object-oriented, hasexceptions, has a very similar package model, supports in-programdocumentation, and Python byte-code can also be transferredacross a network and executed in a restricted environment.
I am something of a programming language freak and have donedevelopment in at least a dozen different programming languages, andstudied many more. In my experience, Python stands out because it is soeasy and natural to develop in, something that makes Pythondevelopment just plain nice and fun. Returning to Java or C++ afterdoing Python development simply feels painful and awkward. I thinkthis is because Python is so clean, simple, and predictable, with fewsurprises or restrictions and with a large set of ready-made andeasy-to-use libraries. Paul Prescod (affectionately known asthe "St. Paul" of Python evangelism) has said that "Python is alanguage that gets its tradeoffs exactly right," which sums it uppretty well.
Another reason for choosing Python is that no matter which programminglanguage the reader is already used to, Python should be easy to pickup, at least well enough to read. The syntax is clear and simple, and theconcepts in the language are very similar to those of mainstreamlanguages such as Java, C, C++, Visual Basic, and Perl. So Pythonshould not be an obstacle for any reader. In fact, it has often beendescribed as "executable pseudo-code," and you will see it used aspseudo-code in some parts of the book.
Furthermore, using Python does not limit us to a single platform.Python runs just as well on Unix as it does on Mac or Windows,or even on a Psion palmtop or a VMS machine.
One of the most appealing aspects of Python is that it is very wellintegrated with the rest of the world. This means that choosing Pythonhardly ever shuts you off from some technology or system that youwould like your programs to interact with. For example, Microsoftfans will quickly discover that the Windows version of Python can talkto COM objects, create COM servers, connect to ActiveX, DDE, the Win32API, the Windows registry, MFC, Windows Scripting Host, ADO, ODBC, andso on and so forth. In other words, even though Python is highlyportable, you don't have to give up anything under Windows justbecause you use Python.
Many people, however, prefer to use something other than Windows, suchas the Mac. Python is technologically agnostic, so it allows thesepeople to have their way as well. Python runs on Mac, and the Macversion can access the communications toolbox, the font manager, thespeech manager, the sound manager, the QuickTime services, and so on.
Other people believe in Unix and would rather use Python there.Again, this is no problem: the Unix versions of Python fit very wellinto Unix, and there are bindings for things such as Qt, KDE, Gtk,GNOME, Irix and Solaris sound modules, special Linux APIs, etc.
Yet others would like to remain pure, platform-wise, and prefer astrictly operating system-independent platform such as Java. Pythoncan accommodate these people too! Jython (the interpreter formerlyknown as JPython) is an implementation of Python written in 100%Java which lets you run Python programs inside the Java virtualmachine. You can use this as an embedded scripting language for anapplication, or simply write Python programs with full access tothe nice Java stuff such as Swing, JDBC, Jini, RMI, etc.
And of course, apart from the platform issues, most of us wouldlike to be able to speak Internet protocols and connect to otherindependent technologies. Again, Python can help. There are severalways to connect Python with CORBA, a standardized relational databaseAPI (a JDBC for Python), lots of XML tools (of course), LDAP modules,and so on. And the interpreter comes with librariessupporting FTP, HTTP, gopher, NNTP, SMTP, IMAP, POP, HTML, URL parsing,and much more out of the box.
To put it another way, Python is buzzword-friendly and TLA-compatible (i.e., Three-Letter AcronymOften used as a synonym for technologies, since many of them have three-letter acronym).
Whenever I have to write an XML program of some sort, I usuallythink of Python first as the programming language to write it in.There are several reasons for this, the most important beingthat Python is so easy and natural to program in and that it is verywell suited for text processing. It is also very easy to build datastructures in Python, something that is very important for XMLprocessing.
Another thing is that for anything that involves moving informationbetween different systems, Python is a natural choice, given thatwhatever these systems may be, Python can very likely talk tothem.
Also, Python is highly suitable for the many little programs andscripts that you write to do the small but necessary tasksthat usually appear during a project. Doing everything in Python makesit easy to turn prototypes into full programs, and it also means thatwhenever a little script has to be developed further, the full toolbox implemented for that application is already available to you.
Of course, I made changes after all the reviewers read the text,and in so doing no doubt introduced new errors. The honour for these errors, as well as for those so subtly hidden that they escaped the eyesof all my reviewers, and even the Editor himself, I should like toreserve for myself. You will find the best of these listed on
http://www.garshol.priv.no/download/text/ph1/errata.html.If you spot one that is not on the list already, I would like to hearabout it.
The main problem with books about Internet technology is that thetechnology changes so quickly that books rapidly become dated. To helpyou decide whether the book is up to date or not, here is a list ofthe various standards and tools covered in this book, and the versionof each that the book is based on.
1.0 2nd ed.
Not yet published
Not yet published
Not yet published
In the table above, WD is used as an abbreviation for "Working Draft,"and CR for "Candidate Recommendation," both meaning W3C specificationsthat are still work in progress.