Sams Teach Yourself XML in 21 Days
- Table of Contents
- About the Author
- Acknowledgments
- We Want to Hear from You!
- Introduction
- Part I: At a Glance
- Day 1. Welcome to XML
- Day 2. Creating XML Documents
- Day 3. Creating Well-Formed XML Documents
- Day 4. Creating Valid XML Documents: DTDs
- Declaring Attributes in DTDs
- Day 6. Creating Valid XML Documents: XML Schemas
- Day 7. Creating Types in XML Schemas
- Part I. In Review
- Day 8. Formatting XML by Using Cascading Style Sheets
- Day 9. Formatting XML by Using XSLT
- Day 10. Working with XSL Formatting Objects
- Part II. In Review
- Part III: At a Glance
- Day 11. Extending HTML with XHTML
- Day 12. Putting XHTML to Work
- Day 13. Creating Graphics and Multimedia: SVG and SMIL
- Day 14. Handling XLinks, XPointers, and XForms
- Part III. In Review
- Part IV: At a Glance
- Day 15. Using JavaScript and XML
- Day 16. Using Java and .NET: DOM
- Day 17. Using Java and .NET: SAX
- Day 18. Working with SOAP and RDF
- Part IV. In Review
- Part V: At a Glance
- Day 19. Handling XML Data Binding
- Day 20. Working with XML and Databases
- Day 21. Handling XML in .NET
- Part V. In Review
- Appendix A. Quiz Answers
Understanding Canonical XML
Infosets are only abstract formulations of the information in an XML document. So without reducing an XML document to its infoset, how can you actually approach the goal of being able to actually compare XML documents character by character? You can write your documents in canonical XML.
Canonical XML is a companion specification to XML, and you can read all about it at http://www.w3.org/TR/xml-c14n. Canonical XML is a very strict XML syntax, which lets documents in canonical XML be compared directly.
Using this strict syntax makes it easier to see whether two XML documents are the same. For example, a section of text in one document might read Black & White, whereas the same section of text might read Black & White in another document, and even <![CDATA[Black & White]]> in another. If you compare those three documents byte by byte, they'll be different. But if you write them all in canonical XML, which specifies every aspect of the syntax you can use, these three documents would all have the same version of this text (which would be Black & White) and could be compared without problem.
As you might imagine, the canonical XML syntax is very strict; for example, canonical XML uses UTF-8 character encoding only, carriage-return linefeed pairs are replaced with linefeeds (that is, ), tabs in CDATA sections are replaced by spaces, all entity references must be expanded, and much more, as specified in http://www.w3.org/TR/xml-c14n.
That completes today's discussion on constructing XML documents. We've covered everything we need to know before we start discussing how to create valid XML documents—and we're going to start doing that tomorrow.
Summary | Next Section

Account Sign In
View your cart