Home > Articles > Networking

  • Print
  • + Share This
This chapter is from the book

Some Subtleties of Prompt Writing

Some words and sentence constructs work better than others in speech-recognition systems—even when both are grammatically correct. Contrary to the protests of millions of elementary schoolchildren over the years, there are many reasons why we should speak using proper grammar—even in speech-recognition applications. The two biggest reasons? Precision and clarity. Grammatically correct language (unless it sounds extremely awkward) leaves less opportunity for misunderstanding.

I'm not overly pedantic about correct grammar and usage—it can be taken to ridiculous extremes—but a disregard for language betrays a certain sloppiness or lack of attention that reflects on a person or a company. For example, why do most supermarkets have checkout lines incorrectly labeled "12 items or less," instead of the correct "12 or fewer items?" "Fewer" has only one more letter than "less," so they're apparently not doing it to save space on their signs.

We can imagine a vacation-marketing survey system that asked questions about how people travel, and how they have enjoyed particular vacations booked through this company. If the system asked a series of questions about the person who accompanied the caller on a recent trip, the system might ask, "With whom were you?" This question, though grammatically correct, would confuse many people, and perhaps should be worded more colloquially as "Who were you with?"—to ensure that most people would understand the question.

Here are some rules of thumb that apply to virtually all U.S. applications.

"Want" Not "Wish"

Unless we're talking about systems run by genies or fairy godmothers, it's preferable to use "want" instead of "wish." People want answers, they don't usually wish for them. So instead of "Do you wish to search by date or by price?" use "Do you want to search by date or by price?"

"Say" Not "Speak"

"Enter or say your password" sounds a lot more natural than "Enter or speak your password." "Speak" sounds like a clinical term ("Yes, Doctor, the subject speaks whenever the bell rings. He also salivates.") The word "say" conveys a softer and more natural idea.


Designers should use contractions in their prompts. Of course, when some people write text, they often do not use contractions, and it is perfectly correct. But try reading the sentence preceding this one out loud. It sounds as stilted and unnatural as the ending of this sentence, does it not? One of the great advantages of a speech-recognition system is its ability to create an affinity between the company and the caller—and it's harder to establish that sense of familiarity and comfort if the prompts are devoid of contractions, because most people simply don't talk that way. People are more likely to use—and enjoy using—a system if they feel there's a regular person on the other end of the line (even if that person happens to be a machine), and contractions help create that sense.

Word Order Matters

Often a sentence can be constructed in several ways, all of them grammatically correct. Which construct should we use? Whichever one more precisely conveys the idea. For example, the following two statements are both grammatically correct, but while the first correctly conveys the idea, the second could cause callers to start forming an incorrect mental model.

"If you don't think I'm going to get it right, say "Help.'"

"Say 'Help" if you don't think I'm going to get it right."

The first sentence correctly indicates that in a situation where callers don't think the system is getting it right, they can say "Help" to (we would imagine) get a better understanding about the situation. In the second sentence, callers are instructed to say "Help" if—and perhaps only if—they think the computer won't get it right. If the word "Help" can be used in multiple contexts, we don't want to limit its use to only one of them.

The other reason why the first sentence is preferable to the second is that the caller's action—to say "Help"—is revealed after the system describes the circumstances that would prompt the action. This is a more logical sequence, and since callers' memories are short, it's always better to put the most important part of the instruction—to say "Help"—at the end.

Use of the Word "Just"

According to my dictionary, the word "just" has 13 meanings in English—6 as an adjective and 7 as an adverb. The differences in these meanings can be significant. For example, consider these two uses of "just."


Meaning of "just"

Add just enough salt to give it flavor.

precisely, exactly

To get assistance, just say "Help."

simply, merely

In a speech system, we could have the system say either

"You say the search topic, and I'll look for something that sounds like it."


"Just say the search topic, and I'll look for something that sounds like it."

The first sentence indicates that "anything you say" will be considered the search topic. The second statement is intended to say that the user simply needs to say a word to get the system going. However, a caller could misconstrue the meaning of "just" to be as it is in the first sentence above. Under that meaning, it sounds as if callers are required to know a precise search topic word (apparently from some top-secret list that the system isn't sharing) to get any results.

Use of "Want," "Like," "Can," and "May"

These words are often used interchangeably, but they actually have different meanings. Consider the following questions.

"Do you want me to read that back to you?"

"Would you like me to read that back to you?"

"Can I read that back to you?"

"May I read that back to you?"

The first question can evoke several reactions, depending on how it's said. It could sound as if it's urging the caller to listen, as in ""You do want me to read this to you, don't you?" However, it can also be directed to sound like the most neutral way to suggest the idea, as in "Do you want me to do this? Because I really don't care if I do or not."

By using "would," the second question sounds a little more formal and deferential than the previous statement, and it reinforces the personality of the system as someone eager to be helpful.

The third question is grammatically incorrect, substituting "can" for the correct "may." If taken literally, it means "Am I able to read that back to you?"—an irrelevant question. Beyond the grammar problem, it sounds as if the system really wants to read the statement back and is only waiting for the caller to give in and let it.

While "May I read that back to you?" is grammatically correct, it sounds as if the system is pleading with the caller to let it read the statement again. For some reason, "may" sounds too contrived to me. I have visions of a very proper British valet saying "May I draw your bubble bath now, your Lordship?" That's why I avoid it when designing any application.

Natural Language Shortcuts

A natural language shortcut allows callers to "skip" a bunch of steps by allowing them to provide several pieces of information to the application all in one sentence. The use of natural language shortcuts can be very effective, but only if the caller knows how—and where—to use them. For example, take a look at this exchange.


Where do you want to fly from?




Where do you want to fly to?


San Francisco.


At what time?


3 P.M.

This exchange could be shortened considerably if the recognizer understood a natural language shortcut that allowed the caller to provide all the information (while still being able to understand just the first piece of information—in this case, the departure city).


Where do you want to fly from?


From Boston to Los Angeles, at 3 P.M.

This is a very convenient, quick, and natural option for people, but it is difficult for some recognizers to handle such a complex task. In fact, the recognition can be seriously compromised if the recognizer is expecting to hear only one token (a single piece of information, such as "Boston, Massachusetts") and instead hears a long string of several tokens (such as "Boston, Massachusetts to LAX, at 3 P.M."). People who tune these systems to improve recognition accuracy can do a lot to prevent recognition problems, but issues can still arise when we allow the use of a natural-language shortcut designed to listen for, and process several tokens while at the same time allowing the caller to simply indicate just one token of information. It's like a person ordering a pizza and just saying, "I'll have a pizza." The cook might expect that the person would indicate the size, and the toppings (even if it's just a "plain" pizza.) People can generally cope with this situation, but even real people get a little confused for a moment when they expect to hear more information than the person provides.

But what if a system does allow natural language shortcuts? How does it teach callers to take advantage of it? Is it best to tell all callers that they can "Just say their flight itinerary?" Or should that instruction only go to repeat callers who have a better chance—and probably a greater need and incentive—to memorize the structure that the system is looking for?

If the system has been programmed to identify callers and track their usage, it can be personalized to work in whichever way is best for each individual. For example, in a stock trading application, the system could have the caller work through a series of steps:


What kind of trade do you want to make? You can say "Buy," "Sell," "Sell short,"—




How many shares do you want to buy?




Of which security?


Chemex Coffee Corporation.


At what price?


A limit price.


Of what?




OK, let me confirm that with you ….

After the system has led the user through this process several times, it can be programmed to say the following (after the trade is completed).

"Here's a hint: Next time, you can say the whole trade when I ask you the first question. So, for example, you could say, 'Buy 100 shares of Chemex Coffee Corporation at a limit price of 88'—all in one breath."

It's usually better to use callers' most recent trade as the example, because it personalizes and makes the experience more concrete and relevant to the callers—all of which will help them remember it the next time they call.

  • + Share This
  • 🔖 Save To Your Account