Home > Articles > Programming > Windows Programming

This chapter is from the book

Creating a Speech Application

The SASDK provides a template for creating new speech applications with Visual Studio .NET. It also provides visual editors for building the prompts (words spoken to the user) and grammars (words spoken by the user). This section will examine the basics of creating a speech application with the SASDK.

To utilize the template provided with the SASDK, open Visual Studio.NET and execute the following steps:

  1. Click File, New, and Project. From the New Project dialog box, select the desired Project Type and click the Speech Web Template icon in the Templates window. This template was created when you installed the SASDK. Change the value in the location dropdown box to the desired project name and click OK.

  2. You can either accept the setting defaults and click Finish, or select Application Settings and Application Resources to specify custom settings.

  3. The default application mode is voice-only, so if you want to create a multimodal application, you can change the mode from the Application Settings tab.

  4. The Application Resources tab allows you to specify that a default grammar library file will be created and the name it will be called. From here you can also indicate that a new prompt project will be created and specify what the name for it will be.

  5. Click Finish at any time to build the new project.

If you choose to build a voice-only application, the project will include a Web page named Default.aspx. This page contains two speech controls, AnswerCall and SemanticMap. These are basic controls used in every voice-only application. Their specific functions will be covered in the section titled "Using Speech Controls." The default project will also include a folder named Grammars that contains two grammar files, Library.grxml and SpeechWebApplication1.grxml. For voice-only applications the prompt project and Grammars folder are included by default.

If you choose to build a multimodal application, the Default.aspx page is included, but it will contain no controls. There will be a Grammars folder, but no prompt project will be created.

By default, the Manifest.xml file is included for both project types. It is an XML-based file that contains references to the resources used by the project. References include grammar files and prompt projects. Speech Server will preload and cache these resources to help improve performance.

The Prompt Editor

Microsoft recommends that you prerecord static prompts because voice recordings are more natural than the result of the text to speech engine. The prompt editor (see Figure 2.4) is a tool that allows you to specify potential prompts and record wave files associated with each prompt.

Figure 2.4

Figure 2.4 Screenshot of the prompt editor in the prompt database project. The prompt editor is used to record the wave files associated with each prompt. The screenshot includes four different prompts.

The utterance "Welcome to my speech application" represents a single prompt. For voice-only applications, you need to make sure you include a wide range of prompts. Since the user relies on these prompts to understand how the application works, they need to be clear and meaningful.

The Prompt Database

An application built with the Speech SDK wizard adds a prompt database project by default. If you choose to add another prompt database, it can be done by using the File menu and selecting Add Project and New Project (see Figure 2.5). The new project will be based on the Prompt Project template. Once the project is added, a new Prompt Database can be added by right-clicking the prompt project and then selecting Add and Add New Item. The Prompt Database item opens up a data grid style screen that allows you to specify all the potential prompts.

Figure 2.5

Figure 2.5 Screenshot of the dialog used to add a new prompt project to your speech application. This dialog is accessed by clicking Add project from the File menu and then clicking New Project.

The prompt database contains all the prerecorded utterances used to communicate with the user. An application can reference more than one prompt database. One reason for doing this is ease of maintenance. Prompts that change often can be placed in a separate prompt database. By restricting the size of the prompt database, the amount of time needed to recompile is minimized.

If you followed the instructions in the last section to create a new speech project, you can now open the default prompt database by double-clicking the prompts file from Solution Explorer.

Transcriptions and Extractions

Figure 2.6 is a screenshot of the recording pane in the prompt project database. There are two grids in a prompt project. The top one contains transcriptions, and the bottom one extractions. Transcriptions are the individual pieces of speech that relate to a single utterance. Extractions combine transcription elements to form phrases. Extractions are formed when you place square brackets around the transcription elements.

Figure 2.6

Figure 2.6 Contents of the Recording pane in the prompt database project. Transcriptions are the individual pieces of speech that can be prerecorded. No utterances have been recorded for prompts with a red X in the Has Wave column.

Sometimes a prompt can involve one or more transcription elements, such as "I heard you say Sara Rea." In this case, the two elements are "I heard you say" and "Sara Rea." In some cases employee names may also be prerecorded in the prompt database. This adds an additional burden, because every time a new employee is added to the database, someone needs to record the employee’s name. However, by doing this, we prevent the speech engine from utilizing text-to-speech (TTS) to render the prompt. This is preferred because using recordings results in a more natural-sounding prompt.

Prompts are controlled from prompt functions. These functions programmatically indicate what phrases are spoken to the user. When the speech engine is passed a phrase from the function, it first searches the prompt database to see if any prerecorded utterances are present. It searches the entire database for matches and will string together as many transcription elements as necessary to retrieve the entire phrase.

Because the speech engine parses transcription elements together to form phrases, you can break phrases up to prevent redundancy. For instance, the phrase "Sorry, I am having trouble hearing you. If you need help, say help" may be spoken when an application encounters silence. The phrase "Sorry, I am having trouble understanding you. If you need help, say help" is used whenever the speech engine does not recognize the user’s response. Therefore, the subphrase "If you need help, say help" can be recorded as a separate phrase in the prompt database. This means that the subphrase will only have to be recorded once. In addition, the size of the prompt database is minimized.

The Recording Tool

The Recording Tool can be accessed by clicking the red circle icon above the Transcription pane or by clicking Prompt and then Record All. The text from the transcription item selected is displayed in the Display Text textbox (see Figure 2.7). After clicking Record, the person making the recording should speak clearly into the microphone. Click Stop as soon as the entire phrase is spoken. Try to select a recording location where background noise is minimized.

Figure 2.7

Figure 2.7 The Recording tool allows you to directly record each prompt associated with a transcription. Prompts can also be recorded by professional voice talent in a studio, made into wave files, and imported.

In some cases, you may want to utilize professional voice talent to make recordings. There are third-party vendors, such as ScanSoft (see the "ScanSoft" profile box), that can provide professional voice talent and assistance with recordings. Wave files created in a recording studio can be associated with a specific transcription element by clicking Import and browsing to the file’s location.

If the speech engine is unable to find a match in any of the prompt databases, it utilizes TTS. The result is a machine-like voice that may go against the natural interface you are trying to create. Speech Server comes bundled with ScanSoft’s Speechify TTS engine (see the "ScanSoft" profile box), but at present the results from a text-to-speech engine are not as natural-sounding as a recorded human voice. On the other side, it will not always be possible or manageable to prerecord all utterances. You will have to weigh these options when designing your speech application.

ScanSoft and the ScanSoft logo are registered trademarks of ScanSoft, Inc.

The recording of prompts is a major consideration when designing a speech-enabled application. If professional talent is used, you will want to try to minimize the need for multiple recording sessions. If the application requires the utilization of text-to-speech for most prompts, you may want to consider purchasing a third-party TTS add-in.

The Grammar Editor

Grammar, the reverse of prompts, represents what the user says to the application. This is a key element of voice-only applications because they rely completely on accurate understanding of the user’s commands. The grammar editor builds Extensible Markup Language (XML) files that are used by the speech-recognition engine to understand the user’s speech. What is nice about the grammar editor is that you drag-and-drop controls to build the XML instead of having to type it in directly. This helps to reduce the time spent building grammars.

A grammar is stored in the form of an XML file with a grxml extension. Each of its Question/Answer (QA) controls, representing an interaction with the user, is associated with one or more grammars. A single grammar file will contain one or more rules that the application uses to interpret the user’s response.

Clicking Add New Item from the Project menu accomplishes adding a grammar file. From there, select the category Grammar File and name the file accordingly. Existing grammars can be viewed by expanding the Grammar folder within Solution Explorer. By default, two grammar files are added when you create a voice-only or multimodal application. The first file, named library.grxml, contains common grammar rules you may need to utilize. For instance, it includes a rule for collecting yes/no responses (see Figure 2.8). It also includes rules for handling numbers, dates, and even credit card information. Rules embedded within the library grammar file can be referenced in other grammar files through the RuleRef control.

The second grammar file is named the same as the project file by default. This is where you will place the grammar rules associated with your application. Although you could store all the rules in a single file, you may want to consider adding subfolders within the main Grammars folder. You can then create multiple grammar files to group similar types of grammar rules. This helps to organize code and makes referencing grammar rules easier.

Grammar rules are built by dragging elements onto the page. Controls are available in the Grammar tab of the toolbox. Figure 2.9 is a screenshot of these grammar controls. Most rules will consist of one or all of the following:

  • Phrase—represents the actual phrase spoken by the user.

  • List—contains multiple phrase elements that all relate to the same thing. For instance, a yes response could be spoken as "yeah," "ok," or "yes please." A list control allows you to indicate that all these responses are the same as yes.

  • RuleRef—used to reference other rules through the URI property. This is useful when you have multiple grammar files and want to reuse the logic in existing rules.

  • Group—used to group related elements. It can contain any element, such as a List, Phrase, or RuleRef.

  • Wildcard—used to specify which words in a phrase can be ignored.

  • Halt—used to stop the recognition path.

  • Skip—used to indicate that a recognition path is optional.

  • Script Tag—used to get semantic information from the grammar.

The grammar editor (see Figure 2.8) contains a textbox called Recognition String. When dealing with complex rules, it can be used to test the rule without actually running the application. This is very useful when you are building the initial grammar set. To use this feature, just enter text that you would expect the user to say and click Check. The output window will display the Semantic Markup Language (SML), which is the XML generated by the speech engine and sent to the application. If the text was recognized, you will see "Check Path test successfully complete" at the bottom of the output window.

Figure 2.8

Figure 2.8 Screenshot displaying the yes/no rule inside the grammar editor. This is one of several rules included by default with the Library.grxml file.

Figure 2.9

Figure 2.9 Screenshot of the Grammar tab, available in the toolbox when creating a new grammar. The elements you will use most often are the List, Phrase, RuleRef, and Script Tag elements.

The Script tag element is used to value a semantic item with the user’s response. The properties for a script tag include an ellipsis that brings you to the Semantic Script Editor. This editor helps you to create an assignment so that the correct SML result is returned. You can also switch to the Script tab and edit the script directly. Figure 2.10 is a screenshot of the Semantic Script Editor.

Figure 2.10

Figure 2.10 Screenshot of the Semantic Script Editor that is available when you use a Script Tag element. The Script Tag is used whenever you need to value a semantic item with the user’s response.

When building grammars you will probably not anticipate all the responses on an initial pass. Therefore, grammars require fine-tuning to make the application as efficient and accurate as possible. This process is eased since grammar files are not compiled and instead are available as XML reference files. For this reason, you would not want to compile grammar files until after the application has been thoroughly tested and is ready to deploy.

Using Speech Controls

A voice-only application has no visible interface. It runs on IIS as a Web page and is accessed with a telephone. When developing and debugging the application, it is executed within the Web browser, and the Speech Debugging Console is used to provide the developer with information about the application dialog. The user will never see the page created, so it is not important what is placed on it visually. Therefore, the only elements on the page will be speech controls, and they will be seen only by the developer.

The Speech Application SDK includes several speech controls that are visible from the Speech tab in the Toolbox. These controls will be dragged onto the startup form as the application is built. Figure 2.11 is a screenshot of the speech controls available in the speech tab of the toolbox. Speech controls are the basic units for computer-to-human interaction, and the SASDK contains two varieties of controls: dialog and application speech controls.

Figure 2.11

Figure 2.11 Screenshot of all the speech controls available in the speech tab of the toolbox. The QA control is the most basic unit and is utilized in every interaction with the user. SmexMessage, AnswerCall, TransferCall, MakeCall, RecordSound, and DisconnectCall are only applicable for telephony applications.

Dialog Speech Controls

Table 2.1 is a listing of the dialog speech controls used for controlling the conversational flow with the user. A QA control, the most commonly used control, represents a single interaction with the user in the form of a prompt and a response.

Table 2.1 Dialog Speech Controls are used for controlling the conversational flow with the user.

Control Name

Description

Semantic Map

Collection of SemanticItem controls where a SemanticItem control represents a single piece of information collected from the user, such as a last name.

QA

Question/Answer control. This represents one interaction with the user in the form of a question and then a response.

Command

Often used to navigate the application with unprompted commands such as Help or Main Menu.

SpeechControlSettings

Specify common settings for a group of controls.

SmexMessage

Sends and receives messages from a computer-supported telephony application (CSTA) that complies with European Computer Manufacturers Association (ECMA) standards.

AnswerCall

Answer calls from a telephony device. Used for inbound telephony applications.

TransferCall

Transfers a call.

MakeCall

Initiates a new call. Used for outbound telephony applications.

DisconnectCall

Ends a call

CompareValidator

Compares what the user says with some value

CustomValidator

Validates data with client-side script

RecordSound

Records what the user says and copies it to the Web server so it can be played back later.

Listen

Represents the listen element from the SALT specification. Considered a basic speech control.

Prompt

Represents the prompt element from the SALT specification. Considered a basic speech control.

Speech Application Controls

Speech Application Controls are extensions of the basic speech controls. They are used to anticipate common user interaction scenarios. Refer to Table 2.2 for a listing of the application controls included with the SASDK. For instance, the Date control is a speech application control that expands on the basic QA control. It is used to retrieve a date and allows for a wide range of input possibilities. Application controls can reduce development time because much of the user interaction is built directly into them.

Table 2.2 Speech Application Controls available in the Speech tab of the toolbox. These controls can reduce development time by building in typical user interactions.

Control Name

Description

ListSelector

Databound control that presents the user with a list of items and asks user to select one.

DataTableNavigator

Databound control that the user navigates with commands such as Next, Previous, and Read.

AlphaDigit

Collects an alphanumeric string.

CreditCardDate

Collects a credit card expiration date (month and year); does not ensure that it is a future date.

CreditCardNumber

Collects a credit card number and type. Although it does not validate the number, it ensures that the number matches the format for the particular type of credit card.

Currency

Collects an amount in U.S. dollars that falls within a specified range.

Date

Used to collect either a complete date or one broken out into month, day, and year.

NaturalNumber

Collects a natural number that falls within a specified range.

Phone

Collects a U.S. phone number where area code is three numeric digits, number is seven numeric digits, and extension is zero to five numeric digits.

SocialSecurityNumber

Collects a U.S. Social Security number.

YesNo

Collects a yes or no answer.

ZipCode

Collects a U.S. zip code where the zip code is five numeric digits and the extension is four numeric digits.

Creating Custom Controls

If no control does everything you need, you have the option of creating a custom control. Custom controls allow you to expand on the functionality already available with the built-in speech controls. Utilizing the concept of inheritance, custom controls are created using the ApplicationControl class and the IDtmf interface. The developer will create a project file that is compiled into a separate DLL for each custom control.

The Samples solution file, installed with the SASDK, includes a project titled ColorChooserControl. The ColorChooserControl project by itself is installed by default in the C:\Program Files\Microsoft Speech Application SDK 1.0\Applications\Samples\ColorChooserControl directory. This project can serve as a template for any custom control you wish to create. The Color Chooser control is a complex control that consists of child QA controls used to prompt the user for a color and then confirm their selection. The grammar and prompts associated with the control are built directly in. This particular control supports voice-only mode.

The ColorChooserControl is a custom control used to control the dialog flow with the user. It demonstrates what considerations must be made when building these types of controls. It is an excellent starting point for anyone wanting to create custom controls.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020