Speech Recognition: Writing Effective Prompts
- The Language of Asking Questions
- The Art of Writing Perfect Prompts
- Writing Prompts for Elegance, Speed, and Value
- Getting Callers to Focus on the Essentials
- Some Subtleties of Prompt Writing
- Top Five Good Tenets for Writing Prompts
- Top Five Mistakes When Writing Prompts
- Where We've Been—Where We're Going
A writer is somebody for whom writing is more difficult than it is for other people.
Writing is easy. All you do is stare at a blank sheet of paper until drops of blood form on your forehead.
The difference between an adequate speech-recognition system and a great speech-recognition system lies in how the system asks questions and conveys complex information. Great systems do it with an elegance worthy of a haiku; their meaning and impact are clear and immediate, and not a single word is wasted. The more elegant a system is, the more intuitively—and quickly—a caller can use it, and the greater value it offers to both clients and callers.
This chapter does not discuss topics such as "the best way to ask a 'yes/no' question." Why not? Because, for one thing, there is no single "best way." Rather, there are many different ways to properly ask a "yes/no" question, depending on the situation. I've avoided absolute rules—and words like "always," "never," "best," and "worst"—because the state of the art of design is constantly changing. Any absolute rules I could offer would soon be outdated. In fact, I prefer that people understand and extract the underlying ideas about the design of successful speech systems rather than blindly follow a set of rules.
The Language of Asking Questions
At its most basic level, the design of speech-recognition system prompts is composed of two elements:
Questions the system asks
Statements the system makes
The questions that are asked need to be clear and not easily misinterpreted, and the statements need to provide useful information, in a valuable format. And most of the time there is a mixture of statements to inform the caller, and questions by which the system will glean new information, in the same context. This can get tricky when the conversations between the computer and the caller start to get complex. Most people don't realize how much error correction they do in real conversations. How often do our conversations with real people sound anything like the carefully scripted conversations of a TV drama? On TV and in movies, characters never say "Uh, what? I didn't get that last thing you just said."
In real life, real people do lots of error checking and correcting. And when we're asked to clarify something we've said, what do we do? We usually rephrase it or provide further information. If the people to whom we're talking need further clarification, we might even explain why we're asking the question.
Speech-recognition systems need to emulate this behavior, because—just as in real conversations—mistakes and misunderstandings do occur. Think of all the things that could possibly go wrong when a system asks a caller a question.
The caller may not respond.
The caller may respond with the wrong answer and the recognizer can't understand the response.
The caller may respond with the right answer, but the recognizer isn't sure enough to accept the response without confirming it first.
To handle all of these possible situations, speech-recognition systems often use these six types of prompts:
Let's take a close look at each of these prompt types by going through the types of prompts that make up a typical call flow.
When a system enters in a particular state it may provide some information, but then, almost always, it asks a question such as "When do you plan to arrive?" The prompt played to the caller is called (logically) the initial prompt since it's the first thing the system says in a particular state/context. The recognizer then listens for the caller's answer.
If the recognizer hears something, but can't match the caller's utterance with the command vocabulary it is using in that context, then it will play a retry prompt. For example, if a caller says, "Uh, I guess I'm not sure" when the system is expecting to hear a date, such as "April 24th," the system would most likely not understand the utterance and then play a retry prompt to aid the caller.
We often use a retry prompt to clarify an idea with the caller, in the hope that by giving the caller another attempt he or she will understand how to answer a question correctly. For example, after hearing "Uh, I'd like to arrive on the last Saturday in April," an appropriate retry prompt might be an explicit statement, "I'm sorry I didn't understand. Please say the date you'd like to arrive. For example, you could say 'January 29th' or 'February 2nd.' Or for more information, say 'Help.'"
This retry prompt would help many callers, but not necessarily those who are saying the right thing but aren't being heard well—possibly (increasingly) because of bad connections on mobile phones. If the system still doesn't understand what the caller is saying, it might employ a second retry prompt, similar to the first one but with more or different information. For example, it might tell the caller, "I'm sorry, I still didn't understand. Say the date, or enter it, using touchtones. For example, you'd enter January 13th as zero, one, one, three. Otherwise, say 'Help' for more information."
Currently most speech-recognition engines are designed to reject a response after they've calculated that the caller's response is likely invalid. For example, if the caller answered the question "On which date?" with "The last Saturday in April," most likely the recognizer wouldn't understand that response. However, if the system were somewhat certain—but not certain enough—that it understood the answer, it might ask a confirmation question to verify the response, such as "I think you said April 2nd, is that correct?" This technique saves callers from repeating an answer in a case where the response the recognition engine heard is just below the threshold of acceptance.
The system needs to determine whether the caller responded. If the recognizer doesn't hear anything, it will usually ask another question providing more information to the caller to elicit a response. This is called a timeout prompt.
To design timeout prompts we always need to ask ourselves, "Why wouldn't a person respond to this question?"
Were they distracted and didn't hear or understand the whole question?
Did they not know the appropriate words to use in response to the question?
Did they not know the answer to the question?
Did they not understand why they had to answer the question in the first place?
Or did they not want to answer the question? Perhaps because they didn't want to reveal sensitive information?
So, for example, if a system asked, "When do you plan to arrive?" and fails to hear a response at all, it can provide a timeout prompt, such as "I'm sorry, I didn't hear you. Say the date that you plan to arrive, or for more information say 'Help.'" This new prompt provides a little more information about how to answer the question by instructing the person to say a date (rather than a relative time marker such as "After I get to Portland") and steers them to say "Help" if they need to find out more information about this question.
If callers' responses aren't heard twice in a state, a second timeout prompt can be used—often to instruct them on how to use touchtones or how to talk to a representative.
Not to be confused with general help statements that explain how to use the system, the type of help prompt to which we're referring gives the caller assistance with a particular task in a particular situation. Designing a help prompt can be something of a challenge, because we, as designers, don't always know why callers would need help in a particular situation. So we have to make educated guesses by asking ourselves, "Why would callers not be sure how to proceed from here?"
Here are a few answers to that question.
They didn't mean to be there in the first place.
They changed their minds.
There was something important missing from the original question.
They didn't wait to hear the retry or timeout prompt messages.
We need to write our help prompt to address all of these possible needs.
A success prompt is played when a caller has successfully exited the current state and is entering the next state. You may not choose to always use a success prompt, but if you do it can be as simple as "Got it," or more of an informative segue, such as "OK, now let's collect the last piece of information."
A failure prompt is played when callers have tried—and failed—to answer a question several times. They're obviously stuck and need to be either transferred to a "live" representative in a call center, or re-oriented (perhaps by being returned to the main menu). A typical failure prompt goes something like this: "Sorry, but it seems we're having problems. Let me transfer you to someone who'll be able to help. Hold on." In the case of a client without a call center behind the application, it might say "It looks like were having problems. Let's just take things from the top," then proceed to the main menu.
Often the designer can define this prompt once in the Design Specification and simply call for its use whenever a caller fails at a state. However, the designer should always look for situations in which this prompt may be inappropriate—or inadequate. For example, if the system has already collected information from the caller and can pass it along to a customer service representative, an appropriate prompt might be "Sorry, but it seems we're having problems. Let me transfer you to someone who'll be able to help and I'll pass along the information I've collected so far. Hold on."