What is XHTML?
XML expert Benoît Marchal discusses the differences among XML, HTML, and XHTML, including coherence, modularization, and where we go from here.
What is XHTML?
The name says it all: XHTML combines XML with HTML. More formally, XHTML is an XML rewriting of HTML. What does that mean in practice?
XML and HTML have a lot in common. One of the only differences (but it's an important one) is that XML is a generic markup language whereas HTML is a specific language for hypertext documents.
Understanding the difference between XML and HTML is essential to understanding XHTML so let me take an example. HTML is specific because it defines specific elements, e.g. there is an element for paragraphs (<P>), an element for images (<IMG>), an element for boldness (<B>).
XML, on the other hand, defines no elements. That's why it's generic. It is up to the author do define the elements he needs in his document. For example DocBook, which is an XML vocabulary for technical documentation, defines a paragraph element (<Para>) but MathML, an XML vocabulary for mathematics, does not define an element for paragraphs. There is no need for paragraphs in mathematical equations so there is no paragraph element in MathML! Instead MathML defines elements for sums (<sum>), exponentiation (<exp>) and other mathematical concepts.
Both DocBook and MathML, which are specific languages, are built on top of XML generic facilities. In fact, many other languages have been created on top for XML. There are XML vocabularies for multimedia, graphics, real-estate, electronic commerce and more.
This raises an interesting question: if XML is a generic language that is used to create specific languages and if HTML is a specific language then why not build HTML on top of XML? It has been done and it's called XHTML.
If you read the XHTML 1.0 recommendation, you will recognize the familiar HTML 4.0 elements (paragraphs, bold, images, etc.). No new element has been added. However XHTML follows the XML syntax, therefore every element must have both a start-tag and an end-tag. HTML only requires the start-tag for most elements.