Home > Guides > Programming > .NET and Windows Programming

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

Speech Technologies

Last updated Jan 1, 2004.

.NET Speech Technologies

Speech recognition is the capability of a computer to recognize the spoken word, for the purpose of receiving command and data input from the speaker. Most speech-recognition engines translate incoming audio data into phonemes, which are then interpreted into text that your application can use. (A phoneme is the smallest structural unit of sound that can be used to distinguish one utterance from another in a spoken language.)

In contrast, text-to-speech technology enables computers to translate text information into synthetic speech output.

Both speech recognition and text-to-speech use engines, software whose purpose is to recognize speech or play text.

Here, though, we'll look explicitly at text-to-speech technology in .NET. There's two types: synthesized text-to-speech, and concatenated text-to-speech.

In synthesized speech, the words are examined, and the engine produces the phonetic pronunciations for the words. The phonemes are then massaged with a complex algorithm that imitates the human vocal tract and produces the sound.

Concatenated text-to-speech engines study the text provided. With this technology, the digital audio recordings are concatenated — glued together — from recordings, words, and phrases in a prerecorded library.

SAPI 5 Overview

Microsoft's SAPI application programming interface (API) considerably decreases the code necessary for you to write an application that uses speech recognition and text-to-speech. Using SAPI makes speech technology more accessible and robust for a broad range of applications.

The SAPI API provides a high-level interface between the application and speech engines. SAPI implements the low-level details needed to control and deal with the real-time operations of various speech engines.

Matching the kinds of speech technologies, SAPI's two fundamental types of engines are text-to-speech (TTS) systems and speech recognizers.

Earlier versions of the API required you to use C and C++. SAPI 5.1 supports OLE automation, so you can use other languages for application development, as long as they too support OLE. Among them are Visual Basic, C#, and JScript.

With the help of TlbImp.exe tool we can generate the SpeechLib.dll file.

This TTS (Text To Speech) application demonstrates how to create an SpVoice object and how to use it to speak text and save it to a .wav file.

namespace TTS
{
using System;
using System.WinForms;
using System.Threading;
using SpeechLib;
public class Tts: System.WinForms.Form
{
private System.WinForms.Button button1;
private System.WinForms.CheckBox checkBox1;
private System.WinForms.Button SPEAK;
private System.WinForms.TextBox textBox1;
public Tts() {
InitializeComponent();
}
public override void Dispose() { }
private void InitializeComponent()
{
this.checkBox1 = new System.WinForms.CheckBox();
this.SPEAK = new System.WinForms.Button();
this.button1 = new System.WinForms.Button();
this.textBox1 = new System.WinForms.TextBox();
checkBox1.Location = new System.Drawing.Point(88, 280);
checkBox1.Text = "Save To Wave.";
checkBox1.Size = new System.Drawing.Size(112, 24);
checkBox1.TabIndex = 2;
checkBox1.BackColor = System.Drawing.Color.DodgerBlue;
SPEAK.Location = new System.Drawing.Point(96, 96);
SPEAK.BackColor = System.Drawing.Color.DodgerBlue;
SPEAK.Size = new System.Drawing.Size(96, 24);
SPEAK.TabIndex = 1;
SPEAK.Text = "SPEAK";
SPEAK.Click += new System.EventHandler(SPEAK_Click);
button1.Location = new System.Drawing.Point(88, 72);
button1.BackColor = System.Drawing.Color.DodgerBlue;
button1.Size = new System.Drawing.Size(112, 24);
button1.TabIndex = 3;
button1.Text = "EXIT";
button1.Click += new System.EventHandler(button1_Click);
textBox1.Location = new System.Drawing.Point(40, 120);
textBox1.Text = " ";
textBox1.Multiline = true;
textBox1.ForeColor = (System.Drawing.Color)System.
Drawing.Color.FromARGB((byte)192,
 (byte)0, (byte)0);
textBox1.TabIndex = 0;
textBox1.Size = new System.Drawing.Size(216, 160);
textBox1.BackColor = (System.Drawing.Color)
System.Drawing.Color.FromARGB((byte)192, (byte)192, (byte)255);
this.Text = "InformIT";
this.AutoScaleBaseSize = new System.Drawing.Size(5, 13);
this.BackColor = (System.Drawing.Color)System.Drawing.
Color.FromARGB((byte)128, (byte)128, (byte)255);
this.ClientSize = new System.Drawing.Size(312, 349);
this.Controls.Add(button1);
this.Controls.Add(checkBox1);
this.Controls.Add(SPEAK);
this.Controls.Add(textBox1);
}
static void Main() 
{
Application.Run(new Tts());
}
private void SPEAK_Click(object sender, System.EventArgs e)
{
try 
{
SpeechVoiceSpeakFlags SpFlags = SpeechVoiceSpeakFlags.SVSFlagsAsync;
SpVoice speech = new SpVoice();
if (checkBox1.Checked)
{
SaveFileDialog sfd = new SaveFileDialog();
sfd.Filter = "All files (*.*)|*.*|wav files (*.wav)|*.wav";
sfd.Title = "Save to a wave file";
sfd.FilterIndex = 2;
sfd.RestoreDirectory = true;
if (sfd.ShowDialog()== DialogResult.OK) 
{
SpeechStreamFileMode SpFileMode = 
SpeechStreamFileMode.SSFMCreateForWrite;
SpFileStream SpFileStream = new SpFileStream();
SpFileStream.Open(sfd.FileName, SpFileMode, false);
speech.AudioOutputStream = SpFileStream;
speech.Speak(textBox1.Text, SpFlags);
speech.WaitUntilDone(Timeout.Infinite);
SpFileStream.Close();
}
}
else
{
speech.Speak(textBox1.Text, SpFlags);
}
}
catch
{
MessageBox.Show("Speak error");
}
}
private void button1_Click(object sender, System.EventArgs e)
{
this.Close();
}
}
}

Here's how we do it:

  • Declare the SpVoice object. //SpVoice speech = new SpVoice();//

  • Create a wave stream //SpFileStream SpFileStream = new SpFileStream();//

  • Create a new .wav file for writing. //SpFileStream.Open(sfd.FileName, SpFileMode, false);//

  • Set the .wav file stream as the output for the Voice object//speech.AudioOutputStream = SpFileStream;//

  • Call the Speak method, which will send the output to the .wav file. //speech.Speak(textBox1.Text, SpFlags);//

  • Close the file //SpFileStream.Close();//

Microsoft Speech Application SDK Version 1.0 Beta 3 (SASDK)

At the moment, the .NET Speech SDK is in beta version (Speech Application SDK Version 1.0 Beta 3). SASDK provides authoring tools, ASP.NET speech controls, a speech add-in for Internet Explorer, a speech add-in for Microsoft Pocket Internet Explorer, a rich grammar library, speech debugging tools, an application event-logging mechanism, samples, and documents.

Speech Application Language Tags (SALT) Specification

The SASDK development tools are based on the emerging Speech Application Language Tags (SALT) specification. SALT is a speech interface markup language that enables developers to write speech interfaces for both voice-only (e.g., telephony) and multimodal applications. SALT is a lightweight set of extensions to existing markup languages, especially HTML and XHTML, to write speech interfaces for web pages.

Multimodal access enables clients to interact with an application in multiple ways, such as input with speech, keyboard, keypad, mouse, and/or stylus; and output as synthesized speech, audio, plain text, motion video, and/or graphics. With SALT, developers can add speech recognition, synthesis, and telephony capabilities to HTML-or XHTML-based applications. It also facilitates accessing applications and information from telephones or other GUI–based devices such as PCs, tablet PCs, and wireless personal digital assistants (PDAs).

With the help of SASDK, ASP.NET web developers can develop, debug, and deploy speech-enabled ASP.NET telephony web and multimodal applications that run on rich clients such as a desktop PC, a tablet PC, or a Pocket PC. Because the development environment is fully integrated with Visual Studio .NET 2003, developers can use the Visual Studio .NET 2003 development environment toolboxes and graphical controls.

Key Elements in the SASDK

These are the key elements in the SASDK:

  • A set of ASP.NET speech controls facilitate speech input and output in ASP.NET applications by generating HTML and SALT markup for telephone (voice-only) and multimodal browsers.

  • Four Visual Studio .NET 2003 add-on tools are included:

    • Tool for creating and editing speech controls

    • Speech Grammar Editor for creating and editing speech-recognition grammars

    • Speech Prompt Editor for creating and editing prerecorded voice output

    • Speech debugging console

  • The Speech Add-in for Internet Explorer is used for running and testing speech-enabled ASP.NET web applications.

  • The tutorial provides instructions and sample applications.

ASP.NET Speech Controls

During execution, the ASP.NET server verifies client capabilities and transforms each speech control tag into client-side speech tags along with the required script. Let's briefly explore these controls.

  • QA control. This is the fundamental element of the speech controls, which defines a single interaction consisting of prompting the user, recognizing the user's response, and binding that response to a specific control on the page. The QA control is composed of several subordinate elements such as the following properties:

  • Prompt. Used for speech output directed to the user.

  • Reco. Indicates speech-input resources and features.

  • Answer. Defines how to handle the recognition results.

  • CompareValidator and CustomValidator controls. Because no GUI is available in voice-only applications, these validators provide an audio error message when input data is invalid.

  • Command control. With the help of this control, developers can define global speech-handling options such as Help, Repeat, or Cancel.

  • SemanticItem controls. These controls link the QA answers to the actual input controls. The recognized text is placed in the target SemanticItem for confirmation and binding.

  • Telephony controls. These controls address call control and messaging for telephony applications.

Speech Tools for Visual Studio .NET

The .NET Speech SDK includes four tools intended to construct and debug grammars, prompts, and speech-enabled web pages:

  • Grammar Editor. For creating and editing speech-recognition grammars.

  • Prompt Editor. For creating and editing prerecorded voice output.

  • Speech Debugging Console. Debugger for .NET speech applications.

  • Speech Control Editor. Designer for speech-enabled web pages.

Sample Application

Following is a sample application using only SALT markup without Visual Studio .NET:

<html xmlns:SALT="http://www.saltforum.org/2002/SALT">
<head>
<!--The SALT Add-in to Internet Explorer object -->
<object id="SpeechTags" CLASSID="clsid:DCF68E5B-84A1-4047-98A4-0A72276D19CC"
 VIEWASTEXT>
</object>
</head>
<!--Importing the namespace from the implementation -->
<?import namespace="SALT" implementation="#SpeechTags" />
<body bgcolor = "cyan">
<!--SALT text-to-speech object -->
<SALT:prompt id="Demo"> </SALT:prompt>
<p>Click the <b>Button!</b></p>
<p> <input type="button" value="Text to Speech: SALT Demo" 
onClick="Startdemo()"> </p>
<script id="script1" language="jscript">
   <!--
   function Startdemo()
   {
     try
     {
      Demo.Start("SALT DEMO. Your .NET Reference Guide host
      is Mr. G. Gnana Arun Ganesh");
     }
     catch(e)
     {
      alert("Voice error");
     }
   }
   -->
</script>
</body>
</html>

The output is shown in the following figure. If you click the button, you can hear the voice.

Figure 52Figure 52

Discussions

Copies of the array?
Posted Dec 23, 2008 03:40 PM by luige21
1 Replies
Hi
Posted Dec 5, 2008 05:10 AM by ajay2000bhushan
2 Replies
You have no clue.
Posted Jun 10, 2008 03:28 PM by theinternetmaster
1 Replies

Make a New Comment

You must log in in order to post a comment.

Related Resources

Jim Mischel"Highly unlikely" does not mean "impossible"
By Jim MischelJuly 18, 2009 No Comments

One of my programs crashed the other day in a very unexpected place.  A call to System.Threading.ConcurrentQueue.TryDequeue (from the Parallel Extensions to .NET) resulted in an OverflowException being thrown.  Investigation revealed a pretty serious bug in the System.Random constructor.

It's Here; Put Away Your Pre-Conceptions on What an OS Must Be: Part II
By John TraenkenschuhMay 24, 2009 No Comments

In the last blog in this series, Traenk relates his first experiences with computers and with coding.  But now, some years have passed. . .

It's Here; Put Away Your Pre-Conceptions on What an OS Must Be: Part I
By John TraenkenschuhMay 24, 2009 No Comments

Traenk relates his past experience with Operating Systems that goes back 25 years, ok, more than that but he ain't tellin'

See More Blogs

Informit Network