Web Publishing: An Introduction to HTML
After finishing up the discussions about the World Wide Web and getting organized, with a large amount of text to read and concepts to digest, you're probably wondering when you're actually going to get to write a Web page. That is, after all, why you bought the book. Wait no longer! Today, you'll get to create your very first (albeit brief) Web page, learn about HTML (the language for writing Web pages), and learn about the following:
What HTML is and why you have to use it
What you can and cannot do when you design HTML pages
HTML tags: what they are and how to use them
What HTML Is—And What It Isn't
Take note of just one more thing before you dive into actually writing Web pages. You should know what HTML is, what it can do, and most importantly what it can't do.
HTML stands for Hypertext Markup Language. HTML is based on the Standard Generalized Markup Language (SGML), a much larger document-processing system. To write HTML pages, you won't need to know a whole lot about SGML. However, knowing that one of the main features of SGML is that it describes the general structure of the content inside documents—rather than its actual appearance on the page or onscreen—does help. This concept might be a bit foreign to you if you're used to working with WYSIWYG (What You See Is What You Get) editors, so let's go over the information carefully.
HTML Describes the Structure of a Page
HTML, by virtue of its SGML heritage, is a language for describing the structure of a document, not its actual presentation. The idea here is that most documents have common elements—for example, titles, paragraphs, or lists. Before you start writing, therefore, you can identify and define the set of elements in that document and give them appropriate names (see Figure 3.1).
Figure 3.1 Document elements.
If you've worked with word processing programs that use style sheets (such as Microsoft Word) or paragraph catalogs (such as FrameMaker), you've done something similar; each section of text conforms to one of a set of styles that are predefined before you start working.
HTML defines a set of common styles for Web pages: headings, paragraphs, lists, and tables. It also defines character styles such as boldface and code examples. Each element has a name and is contained in what's called a tag. When you write a Web page in HTML, you label the different elements of your page with these tags that say "this is a heading" or "this is a list item."
HTML Does Not Describe Page Layout
When you're working with a word processor or page layout program, styles are not just named elements of a page—they also include formatting information such as the font size and style, indentation, underlining, and so on. So when you write some text that's supposed to be a heading, you can apply the Heading style to it, and the program automatically formats that paragraph for you in the correct style.
HTML doesn't go this far. For the most part, HTML doesn't say anything about how a page looks when it's viewed. HTML tags just indicate that an element is a heading or a list; they say nothing about how that heading or list is to be formatted. So, as with the magazine example and the layout person who formats your article, the layout person's job is to decide how big the heading should be and what font it should be in. The only thing you have to worry about is marking which section is supposed to be a heading.
Although HTML doesn't say much about how a page looks when it's viewed, cascading style sheets (CSS) enable you to apply advanced formatting to HTML tags. Many changes in HTML 4.0 favor the use of CSS tags. After you've learned about the basic HTML tags, you'll begin to learn more about CSS in Day 4, "Begin with the Basics," and Day 12, "XHTML and Style Sheets."
Web browsers, in addition to providing the networking functions to retrieve pages from the Web, double as HTML formatters. When you read an HTML page into a browser such as Netscape or Internet Explorer, the browser interprets, or parses, the HTML tags and formats the text and images on the screen. The browser has mappings between the names of page elements and actual styles on the screen; for example, headings might be in a larger font than the text on the rest of the page. The browser also wraps all the text so that it fits into the current width of the window.
Different browsers running on diverse platforms might have various style mappings for each page element. Some browsers might use different font styles than others. For example, a browser on a desktop computer might display italics as italics, whereas a handheld device or mobile phone might use reverse text or underlining on systems that don't have italic fonts. Or it might put a heading in all capital letters instead of a larger font.
What this means to you as a Web page designer is that the pages you create with HTML might look radically different from system to system and from browser to browser. The actual information and links inside those pages will still be there, but the onscreen appearance will change. You can design a Web page so that it looks perfect on your computer system, but when someone else reads it on a different system, it might look entirely different (and it might very well be entirely unreadable.
Why It Works This Way
If you're used to writing and designing documents that will wind up printed on paper, this concept might seem almost perverse. No control over the layout of a page? The whole design can vary depending on where the page is viewed? This is awful! Why on earth would a system work like this?
Remember in Day 1, "The World of the World Wide Web," when I mentioned that one of the cool things about the Web is that it is cross-platform and that Web pages can be viewed on any computer system, on any size screen, with any graphics display? If the final goal of Web publishing is for your pages to be readable by anyone in the world, you can't count on your readers having the same computer systems, the same size screens, the same number of colors, or the same fonts that you have. The Web takes into account all these differences and allows all browsers and all computer systems to be on equal ground.
The Web, as a design medium, is not a new form of paper. The Web is an entirely different medium, with its own constraints and goals that are very different from working with paper. The most important rules of Web page design, as I'll keep harping on throughout this book, are the following:
Do design your pages so they work in most browsers.
Don't design your pages based on what they look like on your computer system and on your browser.
Do focus on clear, well-structured content that is easy to read and understand.
Throughout this book, I'll show you examples of HTML code and what they look like when displayed. In examples where browsers display code very differently, I'll give you a comparison of how a snippet of code looks in two very different browsers. Through these examples, you'll get an idea for how different the same page can look from browser to browser.
Although this rule of designing by structure and not by appearance is the way to produce good HTML, when you surf the Web, you might be surprised that the vast majority of Web sites seem to have been designed with appearance in mind—usually appearance in a particular browser such as Netscape Navigator or Microsoft Internet Explorer. Don't be swayed by these designs. If you stick to the rules I suggest, in the end, your Web pages and Web sites will be even more successful simply because more people can easily read and use them.
HTML Is a Markup Language
HTML is a markup language. Writing in a markup language means that you start with the text of your page and add special tags around words and paragraphs. The tags indicate the different parts of the page and produce different effects in the browser. You'll learn more about tags and how they're used in the next section.
HTML has a defined set of tags you can use. You can't make up your own tags to create new appearances or features. And, just to make sure that things are really confusing, various browsers support different sets of tags. To further explain this, take a brief look at the history of HTML.
A Brief History of HTML Tags
The base set of HTML tags, the lowest common denominator, is referred to as HTML 2.0. HTML 2.0 is the old standard for HTML (a written specification for it is developed and maintained by the W3C) and the set of tags that all browsers must support. In the next few days, you'll primarily learn to use tags that were first introduced in HTML 2.0.
The HTML 3.2 specification was developed in early 1996. Several software vendors, including IBM, Microsoft, Netscape Communications Corporation, Novell, SoftQuad, Spyglass, and Sun Microsystems, joined with the W3C to develop this specification. Some of the primary additions to HTML 3.2 included features such as tables, applets, and text flow around images. HTML 3.2 also provided full backward-compatibility with the existing HTML 2.0 standard.
The enhancements introduced in HTML 3.2 are covered later in this book. You'll learn more about tables in Day 10, "Tables." Day 13, "Multimedia: Adding Sounds, Videos, and More" tells you how to use Java applets.
HTML 4.0, first introduced in 1997, incorporated many new features that gave you greater control than HTML 2.0 and 3.2 in how you designed your pages. Like HTML 2.0 and 3.2, the W3C maintains the HTML 4.0 standard. Although both Internet Explorer 4 and Netscape Navigator 4 support most HTML 4.0 features, users with browsers older than that won't be able to view HTML 4.0 features such as cascading style sheets and dynamic HTML.
Cascading style sheets and dynamic HTML are additional Web technologies that work in conjunction with HTML to give you additional control over the appearance of your Web pages. Style sheets are discussed further in Day 12, "XHTML and Style Sheets." See Day 15, "Using Dynamic HTML" for an introduction to the capabilities of Dynamic HTML.
Framesets (originally introduced in Netscape 2.0) and floating frames (originally introduced in Internet Explorer 3.0) became an official part of the HTML 4.0 specification. Framesets are discussed in more detail in Day 11, "Frames and Linked Windows." We also see additional improvements to table formatting and rendering. By far, however, the most important change in HTML 4.0 was its increased integration with style sheets.
If you're interested in how HTML development is working and just exactly what's going on at the W3C, check out the pages for HTML at the Consortium's site at http://www.w3.org/MarkUp/.
In addition to the tags defined by the various levels of HTML, individual browser companies also implement browser-specific extensions to HTML. Netscape and Microsoft are particularly guilty of creating extensions, and they offer many new features unique to their browsers.
Confused yet? You're not alone. Even Web designers with years of experience and hundreds of pages under their belts have to struggle with the problem of which set of tags to choose to strike a balance between wide support for a design (using HTML 3.2- and 2.0-level tags) or having more flexibility in layout but less consistency across browsers (HTML 4.0 or specific browser extensions). Keeping track of all this information can be really confusing. Throughout this book, as I introduce each tag, I'll let you know which version of HTML the tag belongs to, how widely supported it is, and how to use it to best effect in a wide variety of browsers.