Approaches for Speech-enabling Your Web Site
To speech-enable your Web site, you can follow any of three approaches:
Use a speech synthesis engine to read the contents of an existing Web page to the caller. However, listening to a Web page that's designed to be viewed on a screen can be time-consuming, tedious, and boring. Even if the caller can "fast forward" and "skip backward," the experience is similar to hunting for a segment of a show recorded on videotape. Not fun.
Add speech to an existing Web page. This enables the page to speak to the user and listen to the user speak. This approach is said to be multimodal because the user can interact with a verbal display, as well as speak and listen. PCs can support multimodal user interfaces, but current telephones and most cell phones cannot. Not yet. New devices that integrate the functions of both cell phones and PDAs are starting to appear.
Article 4 of this series will address multimodal user interfaces, and their advantages and disadvantages.
Develop a speech-only user interface to your Web site. Users can call your Web site from their existing telephones and cell phones. VoiceXML, a language implemented by more than 50 software and platform venders, is designed specifically for developing voice interfaces to Web sites. A new collection of tags, Speech Application Language Tags (SALT), is available to implement either speech-only or multimodal applications.
Article 5 of this series will address the advantages and disadvantages of the VoiceXML and SALT approaches for developing speech applications.
A speech interface is very different from a graphical user interface. New skills are required to specify prompts that encourage callers to speak, design grammars that describe how the caller may respond, and specify event handlers in case the caller fails to respond to prompts appropriately. New hardware for connecting the telephone system to your server (as well as new software for speaking, listening, and managing dialogs between callers and applications) are needed.
Articles 2 and 3 of this series will deal with designing, building, and testing voice interfaces.
Speaking and listening are not the only ways that callers access the Internet using a phone. Many businesses already use IVR systems that replay files of prerecorded questions to which callers respond by pressing the keys on their touchtone phones. These systems were the first to enable callers to interact with a computer by using the telephone without being placed on hold. However, these systems can be awkward for the callerthe caller must move the handset between the caller's ear and the front of the caller's face. The caller must select options from verbal menus, translate the option to a digit, and push the corresponding key on the touchtone phonea cognitive overload for many callers. These systems tend to have large menu hierarchies because the 7±2 information chunks that humans remember in their short-term memory limit callers. Thus, these applications use long narrow menu hierarchies rather than short fat menus, and callers sometimes get lost traversing the long menu hierarchies. Callers find that spelling character strings is difficult by pressing the 12 buttons on a typical telephone keypad. Most importantly, the dialog typically is structured and does not enable the caller to reach the needed information quickly.
Speech overcomes many of these disadvantages of touchtone systems. A caller can hold his handset next to his ear without having to move it. Short fat menus frequently replace long narrow menu hierarchies. Users can speak words and phrases instead of trying to spell character strings.
Speech is often worth the time, effort, and expense if your customers will be better served. There are 10 times as many telephones as connected PCs in the world. And the number of cell phones is increasing dramatically. Everyone has access to a phone. Everyone has the ability to access your Web site quickly.