- What Is Well-Formedness?
- Change Name to Lowercase
- Quote Attribute Value
- Fill In Omitted Attribute Value
- Replace Empty Tag with Empty-Element Tag
- Add End-tag
- Remove Overlap
- Convert Text to UTF-8
- Escape Less-Than Sign
- Escape Ampersand
- Escape Quotation Marks in Attribute Values
- Introduce an XHTML DOCTYPE Declaration
- Terminate Each Entity Reference
- Replace Imaginary Entity References
- Introduce a Root Element
- Introduce the XHTML Namespace
Quote Attribute Value
Put quotes around all attribute values.
<div id=speech1> <span class=speaker>PROSPERO</span> <blockquote cite= http://www-tech.mit.edu/Shakespeare/tempest/tempest.4.1.html> <span class=verse id=a4s1v1>If I have too austerely punish'd you,</span> <span class=verse id=a4s1v2>Your compensation makes amends, for I</span> <span class=verse id=a4s1v3>Have given you here a third of mine own life,</span> <span class=verse id=a4s1v4>Or that for which I live; who once again</span> <span class=verse id=a4s1v5>I tender to thy hand: all thy vexations</span> <span class=verse id=a4s1v6>Were but my trials of thy love and thou</span> <span class=verse id=a4s1v7>Hast strangely stood the test here, afore Heaven,</span> </blockquote> </div>
<div id="speech1"> <span class="speaker">PROSPERO</span> <blockquote cite= "http://www-tech.mit.edu/Shakespeare/tempest/tempest.4.1.html"> <span class="verse" id="a4s1v1">If I have too austerely punish'd you,</span> <span class="verse" id="a4s1v2">Your compensation makes amends, for I</span> <span class="verse" id="a4s1v3">Have given you here a third of mine own life,</span> <span class="verse" id="a4s1v4">Or that for which I live; who once again</span> <span class="verse" id="a4s1v5">I tender to thy hand: all thy vexations</span> <span class="verse" id="a4s1v6">Were but my trials of thy love and thou</span> <span class="verse" id="a4s1v7">Hast strangely stood the test here, afore Heaven,</span> </blockquote> </div>
Motivation
In XHTML, all attribute values are quoted, even those that don't contain whitespace.
Potential Trade-offs
Absolutely no browsers are in the least bit confused by a properly quoted attribute value.
This can add roughly two bytes per attribute value to the file size. If you're Google and are counting every byte on your home page because you serve gigabytes per second, this may matter. This should not concern anybody else.
Mechanics
Manually, all you have to do is type a single or double quote before and after the attribute value. For example, consider this start-tag:
<a class=q href=http://www.example.com>
You simply turn that into this:
<a class="q" href="http://www.example.com">
Or this:
<a class='q' href='http://www.example.com'>
There's no reason to prefer single or double quotes. Use whichever one you like. Mechanically, both Tidy and TagSoup will fill these quotes in for you. It's probably easiest to let them do the work.
Regular expressions are a little tricky because you also need to consider the case where there's whitespace around the equals sign. For instance, you don't just have to handle the preceding examples. You have to be ready for this:
<a class = q href = http://www.example.com>
And even this:
<a class= q href = http://www.example.com>
Finding the cases without whitespace is not too hard. This will do it:
[a-zA-Z]+=[^'"><\s]+
However, the preceding code snippet will also find lots of false positives. For instance, it will find this tag because of item=15314 in the query string:
<a href="http://www.cybout.com/cgi-bin/product_info?item=15314">
We can improve this a little bit by requiring whitespace before the name, like so:
\w+[a-zA-Z]+=[^'"><\s]+
You may discover a few cases where the attribute value contained whitespace and was not quoted. Similarly, you may find a few places where the initial quote is present but the closing quote is not. These are problematic, and you need to fix them. Browsers do not always interpret these as you might expect, and different browsers handle them differently. What makes no difference in Internet Explorer may cause Firefox to hide content and vice versa.