InformIT

Tower of eBabel: Why e-Books Are Such a Mess and How They Can Improve

Date: Feb 17, 2006

Return to the article

The digital publications industry is in chaos, with competing e-book formats resulting in a lack of content that plagues the industry. Mark Carey explains how this impasse came about and suggests what can be done to fix the problem.

Introduction

Global e-book sales in 2005 were projected to reach about $15 million, according to the International Digital Publishing Forum (IDPF), a major trade group. For comparison purposes, e-book sales for the first half of 2005 were up 72% over the same period in 2004. While the increase is impressive, the actual number is just a speck of the $23.7 billion in sales of paper books. A Biblical story comes to mind from the wasteful and confusing proliferation of formats—the tale of the Tower of Babel, whose builders failed for want of a common tongue.

"The Tower of eBabel," says David Rothman, OpenReader Consortium cofounder and coiner of the phrase in an e-book context, "may have cost the industry tens of millions in sales over the years—maybe even hundreds of millions or more. Too many non-geeks have found it a chamber of horrors."

In this article, I’ll explore the major e-book challenges, what industry leaders are doing to address these challenges, and what outcomes you can expect as a user of e-books.

e-Book Challenges

Devices dedicated to reading electronic books have been available since 1991, with the introduction of the Sony Bookman. This was later followed by the Rocket e-book in 1998, as well as other devices that have come and gone. The early e-book reading hardware largely failed due to such technology limitations such as short battery life; poor search capabilities; heaviness; and small, low-resolution screens.

The Acrobat Reader format was introduced in 1994 and the Microsoft Reader in 1999. As other proprietary formats have come into use, a format war, squandering millions in potential sales of e-books and other digital texts, has dragged out for years—with more than 20 major and minor formats battling to become the standard. This squabbling causes a great deal of frustration for consumers and lost opportunities for publishers and authors. These are the biggest challenges, which I’ll discuss in detail shortly:

Standardization

Incompatible formats are among the biggest consumer peeves concerning e-books. The following comment was pulled (with permission) from the Yahoo! group called "The e-Book community":

The thing that causes me day to day annoyance when I read e-books is that I have to have three different reader programs on my PDA (eReader, iSilo, Embiid reader) to read all the e-books in my collection. That’s taking up space that could be used more effectively for something else.

There are four major proprietary e-text formats from Adobe, eReader (formally Palm), Microsoft, and Mobipocket, plus at least 16 minor formats, both proprietary and open. There’s even one called Starbuck! Breaking the ties between formats and content will expand the number of available e-books exponentially. Because e-book distributors don’t carry every available format, the number of available books is limited to the reader used and the format that matches it.

Eventually, one standard will prevail. I believe it will be based on XML, which is managed by the World Wide Web Consortium (W3C). XML is a non-proprietary, unencumbered, cross-platform, vendor-neutral, international standard. It provides a rich, standardized way to mark up textual content, and when properly used cleanly separates content from presentation, among other benefits. XML fulfills the widest needs of publishers, retailers, librarians and archivists, and other groups. It already plays an important role in the exchange of data over the Internet and business-to-business applications, and is prominently used for web pages (as XHTML). The use of editable Cascading Style Sheets (CSS) with XHTML also makes for a better user experience by allowing the end user to change the "look and feel" of the output. This feature is particularly important for visually impaired readers.

But core standards by themselves are not enough—it’s important to employ them properly in an e-book framework. The nonproprietary Open eBook Publication Standard was first developed in 1999 by the Open eBook Forum, now the International Digital Publishing Forum (IDPF). But the OEBPS 1.2 "standard" is more than six years old and doesn’t address many consumer and publisher requirements that have been recognized more recently. A next-generation e-book framework called OpenReader, first propounded by Dr. Jon Noring, is a variation of OEBPS and shows great promise with a superior architecture. Quoting Dr. Noring, "The OpenReader format is intended to fulfill the promise of the revolutionary, XML-based OEBPS framework. It is a next-generation, open standards e-book and digital publication framework, intended to fix the various deficiencies in the [...] OEBPS framework, adding new features of interest to publishers, consumers, librarians/archivists, and accessibility advocates."

Creation/Conversion Tools

Creation and conversion tools are yet another aspect of the complexity of bringing e-books to market. Most print publishers use layout programs such as Adobe FrameMaker and later convert to Acrobat PDF format for the printing process. This typical book workflow moves a rough draft document through editing, layout, and so forth to finished product. But authoring applications such as FrameMaker don’t sufficiently support (if at all) properly tagged and formatted XML. Producing an XML-based "master" format (rather than a page-based layout format) would allow for a more efficient workflow and easier reuse, resulting in lower costs and shorter production times.

Think of it this way: Publishers convert their documentation into PDF as an "end of the road" format. But because PDF documents are not tagged with headers, paragraphs, tables, figures, etc., you can’t easily convert them into other usable formats without going through the document line by line, page by page, and manually retagging all the text. Creating a document in XML from the start allows the publisher to convert the work into PDF, XHTML, and other electronic formats with little or no effort.

Because PDF is an Adobe standard, Adobe should take a leadership role in updating their programs to be more fully XML-compliant. A standard XML-based "master" format would simplify the workflow for publishers, increasing their profits while allowing cost reductions for consumers.

In addition to creation tools, conversion tools will play an important role in the acceptance of an XML format. Don’t expect PDF-to-XML conversion anytime soon, though. PDF is almost always unstructured and by its nature is difficult to auto-convert into high-quality XML. I expect Adobe to partially remedy this problem over time by enabling Adobe product users to build structured PDF documents that are somewhat reflowable, such as on PDAs. But I hear no mention of Adobe opening up new doors for software publishers to create awesome conversion programs for PDF into XML. (At least, not explicitly.)

Reading Software

As stated earlier, OEBPS 1.2 was released in 1999, and yet it has never been implemented in a viable, general-purpose commercial reader application! The first OEBPS/OpenReader-enabled reader will be released in the first half of 2006. By embracing OEBPS/OpenReader, software companies will be able to focus on the quality of the software used to present the digital publications, and worry less about inventing and promoting their own formats. This will save them money and time-to-market, so they can instead compete on software features such as ease of use and the aesthetic appearance of the e-publications on the screen.

Hardware

E-reading hardware has evolved slowly over the years, but if you’re skeptical about the future, just consider the past. A typical handheld PDA packs more computing power than the computers aboard Apollo spaceflights to the moon. Keep an eye out for the proliferation of Linux operating systems in new devices. Margins are so small that manufacturers may not be able to afford Microsoft OEM fees and yet remain competitive. Consumers are looking for e-reading devices with good battery life and better screen technology at a cost below $100. MIT’s $100 laptops show promise as a commercial product in the U.S. and would level the playing field across all economic strata.

Digital Rights Management (DRM)

Digital Rights Management (DRM) can best be described as a four-letter word. Consumers hate it but publishers and authors require it to maintain their businesses. If it weren’t for DRM, we wouldn’t be reading contemporary works in public domain until decades after the death of the author(s), when the titles enter the public domain. Without this technology, many best-sellers such as Dan Brown’s The Da Vinci Code would never appear in e-book format.

All of the DRM in the world won’t stop piracy, however—not when people can merely scan paper copies and post them to the web in a matter of hours. The best countermeasure is the release of affordable electronic editions that either use convenient DRM technology, or none at all.

Existing forms of proprietary DRM are inconvenient, too restrictive, and cost too much. Have you ever tried to move an e-book from one device to another? How about forgetting your password? Right now, as much as 10–15% of an e-book’s price may go toward DRM and related format services. This is far too much, especially in a generally low-margin industry like book publishing.

The Value of e-Books

What is the value of an e-book? The perception is that the same title in e-book format doesn’t cost as much to produce and should cost less. This is true only to a point; while publishers avoid the costs of printing and physical distribution of e-books, behind the scenes are still countless hours of editing, validating, page layout, cover design, and a myriad of other tasks.

One of the largest such tasks is ensuring that content is correct and useful to readers. That involves coaching authors on writing techniques on one end of the spectrum and checking up on the manuscript’s electronic formatting on the other, while also upholding general editorial standards of style and accuracy. All of this processing is organized and paid for by the publisher. Publishing a book needs to be viewed holistically, as a piece of business with total costs and total sales. While e-books contribute to the bottom line, they also take away from sales of print books. Publishers still have to print and ship, etc., so the costs of maintaining those facilities don’t really go away. They simply have to give e-books a "fair" price so people don’t balk at choosing that format.

The Future of e-Books

I envision a future in which all e-books use one common, non-proprietary format—able to be displayed on any hardware device built for compliance. You can confidently purchase an e-book that can be read now, or 100 years from now. Instead of having multiple "readers" on your desktop, you’ll have one that you selected based on the merits of the e-book reader, not its content. Also, when you upgrade to a new computing device, you won’t have to buy the same book again for a different format. Books or articles squeezed into a little PDA screen, for example, will display just as well in full typographical detail on a 27-inch monitor.

While we await this e-book nirvana, those of us with an interest in owning and reading e-books should be urging publishers of software, hardware, and content to develop common, sharable standards that will simplify and improve the e-book industry for everyone.

800 East 96th Street, Indianapolis, Indiana 46240