Home > Articles > Web Development > HTML/CSS

  • Print
  • + Share This
This chapter is from the book

Replace Imaginary Entity References

Make sure all entity references used in the document are defined.

&copyright; 2007 TIC Corp.  
 
  • arrow2.jpg
© 2007 TIC Corp.   

Motivation

Occasionally, authors begin to use entity references that simply don't exist. Sometimes it's a simple typo, such as &apm; instead of &. Sometimes it's misremembered code, such as &tm; instead of ™ or &copyright; instead of ©. Either way, this causes display problems for all browsers and should be fixed.

Potential Trade-offs

None. This is only good.

Mechanics

The hardest problem is finding these imaginary entity references, because there's not necessarily any rhyme or reason to them. Often, the first time you realize there's a problem is while browsing your site. If you're lucky it will appear in the plain text like this:

&copyright; 2007 TIC Corp.

If not, the browser will just drop it out completely:

2007 TIC Corp.

The same mistakes do tend to repeat themselves, so once you've noticed a problem, a straight search and replace will usually find and fix all other occurrences.

Otherwise, validation (or at least well-formedness checking) is necessary to identify these issues. Once a validator finds such imaginary entity references, you can fix them by hand if they aren't too numerous, or with a targeted search and replace if they are.

Occasionally, you'll find someone has invented an entity reference that perhaps should exist but doesn't: ¥ for ¥ or &bet; for the Hebrew letter hebrew.jpg. Although it's theoretically possible to define new entity references such as these in the internal DTD subset or external DTD, I do not recommend this. XML parsers can handle this, but browsers cannot. Either replace the references with the actual characters (especially if you already reencoded the document in UTF-8) or use a numeric character reference such as ¥ or ב.

  • + Share This
  • 🔖 Save To Your Account