Peachpit Press

Integrated Web Design: Seven Deadly Markup Sins

Date: Jul 9, 2004

Return to the article

HTML or XHTML? If you're in Web design and development, you're using something to mark up your pages. And, if you've been in the industry awhile, no doubt you've learned some standards-based techniques. But despite all our growth and knowledge, some nagging problems remain. Molly Holzschlag helps you hone in on the most common markup mistakes and shows you how to repair them with ease.

Despite all the attention on Web standards in the past few years, there are ongoing problems with document conformance when it comes to HTML and XHTML.

I was recently involved as a judge at the HOW design contest, and have served as a judge for the Webby Awards—which recently announced their winners for 2004. Although both contests place emphasis on quality design and content, there's little attention paid to standards. I'm sure that readers can just picture me viewing source and validating sites because hey, that's what I do. Alas, the results were very distressing. Out of several hundred sites—many of them extremely prominent, well-known web sites—only a mere handful conformed to HTML or XHTML specifications.

What's encouraging about many of the sites is that some attempts by designers and developers to adhere to standards are obvious. What's discouraging about the sites are the kinds of errors showing up. Many of these errors are incredibly easy to repair! Of course, some sites are relying on problem Content Management Systems and others are being served markup via their ad server provider. In those cases, errors are introduced that are beyond the developer's immediate control.

No matter the cause, all developers would do well to take a look at the top seven mistakes I found within these sites. By looking at what we're doing wrong and how to do it right, we can quickly address poor markup practices and create more compliant documents.

1. Document Encoding Problems

Although document encoding does not influence a document's validity per se, it does influence the capability of that document to be validated and properly displayed. Unfortunately, it seems that many designers and developers are unaware of the need for encoding.

Document encoding describes the character set that is being used. Documents in English, for example, have long been identified with the character encoding iso-8859-1, which is the Latin character set. In recent years, we've been able to tap into UTF-8, a more universal character encoding standard.

In an ideal world, all encoding is done on the server. In this scenario, the server administrator sets the proper encoding via the HTTP headers. This can be done with any kind of web server—check with your systems administrator and find out whether encoding is set. If so, you're good to go and do not need to add any additional information about encoding.

However, if your server does not have encoding set, there are two alternative means of adding encoding information. You can add the encoding via a meta element, as follows:

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

If you're using XHTML and want to include the XML declaration, you can include your encoding there, as well:

<?xml version="1.0" encoding="UTF-8" ?>

Of course, there are problems related to using this declaration (also referred to as the XML prolog). If IE 6.0 encounters it, it won't flip the DOCTYPE switch. (If you're not familiar with DOCTYPE switching, read on.) Other problems with the prolog include rendering issues on many older browsers that simply do not recognize the XML syntax or attempt to render the document as an XML tree rather than the document itself.

Another problem that's showing up a lot now that the W3C has upgraded its markup validator is document encoding mismatches. This problem comes from instances where the encoding is, as recommended, set on the server, but the document author includes a conflicting character set.

So, to effectively manage these problems, follow these steps:

  1. Check with your system administrator and ascertain which encoding type is set for your server. If it's set properly, do not add a meta element within your documents. Be careful because many tools automatically add the encoding for you. In those cases, strip out the meta element to avoid conflict and unnecessary markup.

  2. If document encoding isn't set on the server, you may have to resort to using the meta element solution. It's not ideal, but it will work and will allow you to validate your document without causing warnings to occur.

NOTE

For more information on this issue, please see the article, "WaSP Asks the W3C: Specifying Character Encoding."

2. Missing DOCTYPE Declarations

The lack of DOCTYPE declarations in HTML and XHTML documents is very likely the most deadly of deadly markup sins. First, it's important to understand that a DOCTYPE declaration is a required component of your HTML or XHTML document. Without it, you won't validate, and that's that.

DOCTYPE declarations are a bit of SGML placed at the top of a document to declare which language and language version the document is supposed to conform to. In the past, the declaration was passive—ignored by the browser until you'd validate the document, in which case the validator would use the DOCTYPE declaration to compare your document with the declared Document Type Definition (DTD). I like to describe DTDs as laundry lists of allowed elements and attributes for a given language and language version. In order to have a conforming document, authors must use only the allowed elements and attributes in the DTD declared.

But a technology known as DOCTYPE switching has made using the correct DOCTYPE declaration not only important, but imperative. DOCTYPE switching is a technology in many contemporary browsers such as IE 6.0 that will flip a switch within the browser upon finding a correct DOCTYPE in the document, allowing it to operate in standards mode. In the case of IE 6.0, which has a non-standard implementation of the Box Model (a very important browser concern when working with CSS), the DOCTYPE switch allows IE 6.0 to operate with a standards-compliant Box Model.

The simple advice here is to understand how important DOCTYPE declarations are, what makes up a correct declaration, and (most importantly) to include the correct DOCTYPE declaration for your document needs in every document. No exceptions, ever.

NOTE

To better understand the role of DOCTYPEs and DOCTYPE switching, please see my InformIT article "CSS: Beyond the Retrofit."

3. Use of language Attribute in script Elements

If you use JavaScript with any frequency, surely you've seen syntax like this:

<script language="JavaScript 1.1"> . . . </script>

The language attribute was formally deprecated in HTML 4.01. As a result, it cannot be used in HTML 4.01 Strict, XHTML 1.0 Strict, or XHTML 1.1. The only problems that might possibly be encountered by removing this attribute influence browsers that are so old that some readers weren't born when they were in use. Okay, I'm exaggerating, but you get the point. In almost every contemporary case, eliminate the use of the language attribute.

However, the type attribute is required. So, instead of your script elements looking as they do in the prior code sample, your script element should look like this:

<script type="text/javascript"> . . . </script> 

4. Missing type Attribute in style Elements

Similarly, the type attribute must be included along with the style element when using embedded style in conforming documents. So, if you're doing this:

<style>
h1 {
       color: red;
}
</style>

Stop doing that and make sure that you add the type attribute along with a value of text/css:

<style type="text/css">
h1 {
       color: red;
}
</style>

Of course, in an ideal situation, style information should be placed externally and linked via the link element. However, it's perfectly valid to use embedded style. Just make sure that you have the type attribute there and you're good to go.

5. img Element Errors

Two of the most common problems found when validating Web documents relate to the img element. The first of the two is a really disturbing issue: the lack of an alt attribute. The importance of alt attributes for accessibility and usability has been a known factor since before accessibility and usability became subsets of web design. Why, oh why, then, is it still one of the most commonly made errors? Fix it simply by ensuring that all your images have the alt attribute in place.

The second common problem with img elements is the use of the presentational border attribute. This is a deprecated attribute in favor of style sheets. You can turn off all borders on images by simply adding this rule to your style sheet:

img {
       border: 0;
}

Of course, you may have more complex needs and require borders on some images and not others. Using a range of CSS selector types or creating special classes, you can customize border properties for images as you see fit. Just leave the border attribute out of the img element and you'll be good to go.

6. Unnecessary Use of Proprietary Attributes

Back in the days before CSS became so available to us, we had to rely on lots of browser-based, proprietary elements and attributes to create visual output. One such example is the use of proprietary margin properties in the body element:

<body leftmargin="10" topmargin="0">

But CSS has come to the rescue, and in almost all cases in contemporary web design, we can rely on CSS to achieve the results we're after. On the few occasions when a consistent look is required for older browsers there may be a need to fall back on such proprietary attributes, but it's rare. Even if you had to accommodate such a situation, you could do so by using a very streamlined table for layout along with CSS and reduce the use of these attributes, or (better yet) remove them altogether.

7. Un-escaped Entities in XHTML

XHTML requires that certain entities must be escaped. Un-escaped ampersands (&s) in links and embedded JavaScript are the primary problems encountered. So, if you have a link such as the following:

http://www.mysite.com/blah&story.html

you must escape the ampersand to be compliant:

http://www.mysite.com/blah&amp;story.html

This is also true of any ampersands that appear in scripts within your document. If the script is external to the document, there is no need to escape the entity; but if it's embedded in the document, make sure to escape it.

Unfortunately, this is one of the problems most often encountered by dynamic content via CMSs and ad servers. Also, some server-side scripts introduce these ampersands. Make sure that you get your developers to ensure that any dynamic markup being delivered to the browser properly escapes the entity. Note that escaping entities do not affect your link or script integrity.

Finding Redemption

It's fortunate that as developers and designers become more aware of web standards and using HTML and XHTML in their specified forms, many of these problems are being repaired. But we are still in transitional times, and confusion in terms of what's allowed or not allowed in a given document still reigns.

To find redemption, simply repair your deadly sins. To find which of these sins you might be guilty of, try this novel idea on for size: Validate your documents. You can do this by using the W3C's validator, now improved with helpful information describing errors, reasons for errors, and how to repair errors. There are other validators as well; you can find them in many development tools.

My last bit of advice: If you're going to sin, know that you're sinning and know the reason why. Sometimes, it simply is necessary to use a non-standard approach to a specific problem. If you do find yourself in a situation like this, better to know the rules before you break them. That way, you can at least find solace in knowing you did your best.

1301 Sansome Street, San Francisco, CA 94111