Home > Articles > Web Services > XML

  • Print
  • + Share This
Like this article? We recommend

Working with the Tidy Options

Using redirection is a pretty cumbersome way of doing things. When you run the Tidy program, you should specify at least one option on the command line. Tidy's options tell the program exactly what to do and how to manipulate your HTML files.

Basic Options

Tidy has more than 30 basic options. These options control how the program processes HTML files as well as what character set it uses to encode the cleaned files. You'll probably use only a handful of these options. At most, I use a half dozen of Tidy's basic options.

NOTE

To get a complete list of Tidy's basic options, check the software's documentation or type tidy –help at the command line.

Using the Tidy options is simple. Type the command tidy followed by the option(s) that you want to use. For example, to force Tidy to clean an HTML file and write the results to the same file, type this:

tidy -m myFile.html

The -m option simply tells Tidy to modify the file. For many of your purposes, the -m option probably should be enough. However, you might want to "prettyprint" your HTML documents. Prettyprinting merely formats the code to make it easier to read and understand. Tidy does this by indenting the code. To tell Tidy to indent your code, use the –i option:

tidy –m -i index.html

Two other popular options are –asxhtml and –c. The –asxhtml option converts an HTML file to XHTML. To do this, Tidy adds the XHTML doctype to the file and converts standalone tags (such as <img> and <br>) to their XHTML equivalents.

The –c option replaces proprietary tags such as <font>, <nobr>, and <center> with cascading style sheets (CSS). This option does a good job of replacing nonstandard markup, but not with the CSS you might use. The CSS that Tidy adds to a file looks something like this:

<style type="text/css">
 li.c10 {list-style: none}
 p.c9 {font-family: Arial; font-weight: bold}
 b.c8 {font-family: Arial}
 div.c7 {margin-left: 2em}
 p.c6 {font-family: Arial; font-size: 120%; font-weight: bold}
 b.c5 {font-family: Arial; font-size: 120%}
 p.c4 {font-size: 80%}
 span.c3 {font-size: 80%}
</style>

You'll need to do some manual editing to make the tidied files compatible with the CSS that you use.

Advanced Options

Tidy's basic options are fine for most purposes. But sometimes you might need to do a little more. That's where Tidy's 70+ advanced options come in. The advanced options give Tidy its power. The advanced options enable you to do things like this:

  • Change the tags and attributes to uppercase or lowercase.

  • Specify "alt" text to use with images.

  • Strip out the so-called HTML produced by Microsoft Word.

  • Get rid of empty paragraph tags.

  • Write any errors that Tidy encounters to a file.

NOTE

To get a complete list of Tidy's advanced options, check the software's documentation or type tidy –help-config at the command line.

As with the basic options, you'll probably use only a handful of the advanced options.

The advanced options are specified differently than the basic options on the command line. Advanced options are preceded by two hyphens (--) and followed by a value. Depending on the option, that value could be yes or no, or a number.

For example, to clean up an HTML file that was exported from Microsoft Word 2000 and force all tags to lowercase, type this:

tidy –m –-uppercase-tags no –-word-2000 yes article.html

While the advanced options are useful on the command line, they're really best suited when used with a configuration file.

Using Tidy with a Configuration File

A configuration file is simply a text file containing the advanced options that you want Tidy to use when you run it. When you run the program, it reads the configuration file and cleans a web document based on the options contained in the file. The advantage of using a configuration file is that you don't need to memorize and type a long string of options at the command line each time you run Tidy.

Here's a sample configuration file:

uppercase-tags: no
uppercase-attributes: no
word-2000: yes
clean: yes
logical-emphasis: yes
drop-empty-paras: yes
indent: yes
output-xhtml: yes
show-errors: 0

To use a configuration file, first create a file named tidy.cfg containing the options that you want to use. Save the file in the directory containing the Tidy software. When you run Tidy, it will look for tidy.cfg and use the options that it finds in the file.

Of course, one configuration file may not meet all your needs. For example, I create web sites and web content in HTML, and also convert documentation from such formats as XML and LaTeX to HTML. If you're in a similar boat, consider maintaining multiple configuration files—one for each type of HTML document that you need to convert or clean up with Tidy. I have one configuration file for cleaning up web pages, and another for cleaning up documentation. Working with multiple configuration files is easy. As you did with the file tidy.cfg, you create your configuration files and save them in the directory containing the Tidy executable. Then, run Tidy using the –config option. For example:

tidy –config website.txt contact.html

Tidy reads the options in the configuration file website.txt and uses them to fix any errors it finds.

  • + Share This
  • 🔖 Save To Your Account