- Table of Contents
- .NET Book Recommendations
- What Is .NET?
- The Microsoft .NET Framework
- The Common Language Runtime (CLR), the Common Type System (CTS), and the Common Language Specification (CLS)
- .NET Framework Class Library
- Visual Studio .NET
- .NET Enterprise Servers and .NET My Services
- .NET Compliant Languages
- C#
- Visual Basic .NET (VB .NET)
- ASP.NET
- XML Web Services
- ADO.NET
- XML.NET
- Windows Forms
- Why .NET?
- Displaying Errors with the Error Provider
- COM Interoperability
- Comparing Java and .NET
- Calling Unmanaged Code
- .NET Application Security
- Code Access Security
- .NET Standards Support
- Numeric Types in the .NET Framework
- Working with Strings
- Formatting Strings
- Trimming Character Strings
- Comparing Strings in .NET 2.0
- Arrays and Collections
- Arrays as Class Members
- Sorting a Multi-Dimensional Array
- Sorting a Multi-Dimensional Array with LINQ
- File I/O (System.IO)
- Working with File Names
- Using the File System
- Working with Files and Directories
- Monitoring the File System
- Working with Streams
- Working with Text Encodings
- Working with Date and Time
- Extending the DateTime Class
- Fun with Dates
- Exceptions
- Delegates
- Events
- Asynchronous Programming
- Asynchronous File I/O
- Timers
- Random Numbers
- Cryptographically Secure Random Numbers
- Serialization
- MultiThreading (System.Threading)
- Multi-Threading Overview
- The Managed Thread Pool
- Managed Threading
- Thread Synchronization
- Synchronizing Data Access
- Trace Debugging
- Tracing in .NET 2.0
- ASP.NET Trace
- Validating User Input in ASP.NET Web Pages
- Event Logging
- Monitoring Application Performance
- Accessing the Registry
- Accessing Environment Information
- Environment Variables in .NET 2.0
- Managing Windows Forms Applications
- Working with Email
- Working with Graphics
- Animating a Background
- Working with Images
- Drawing Cycloid Curves
- Simulating the Spirograph
- Building International Web Applications
- .NET Compact Framework
- Mobile Web Development with ASP.NET
- Speech Technologies
- Microsoft MapPoint Web Service
- Working with Typed DataSets
- Using Relationships in DataSets
- DataColumn Expressions
- Playing Simple Sounds
- Playing Sounds with .NET 2.0
- Returning an Image in a Web Page
- RSS
- Best Practices Project Structure
- Best Practices Application Blocks
- The Data Access Application Block
- The Exception Management Application Block
- Best Practices — Performance
- Best Practices — Performance and Scalability
- Best Practices - Testing
- Reading the Tea Leaves, 2005
- Predictions: A Look Back at 2005, and a Look Ahead to 2006
- .NET Downloads
- Application Deployment Overview
- Application Deployment — Versioning
- Application Deployment — Version Policy
- Application Deployment — Packaging and Distribution
- .NET Remoting Overview
- A Remoting Demonstration
- Remoting Configuration
- Remoting: Lifetimes and Leases
- Remoting: Other Issues
- Attributes
- Writing Custom Attributes
- Accessing Attributes in Code
- Reflection
- Class Design: Inheritance, Interface, or Composition?
- The TriTryst Game
- Console Applications in .NET 2.0
- New File I/O Methods in .NET 2.0
- Building Projects with MSBuild
- Unmanaged Callbacks in .NET 2.0
- Timer Troubles
- Non-Rectangular Windows Forms
- Windows Forms Transparency
- 10 Things I Hate About Visual Basic
- 10 Things I Hate About C#
- Background Processing with Idle Time
- Scaling Windows Forms
- Reading and Writing Binary Data
- New Memory Management Functions in .NET 2.0
- Compatibility Between .NET 1.1 and .NET 2.0
- Managed Debugging Assistants in .NET 2.0
- XDir: A Program for Viewing Directory Sizes
- The Microsoft.VisualBasic Namespace
- Operator Overloading
- Working with GPS Data
- Hidden Visual Studio Tools
- .NET 3.0
- The .NET 2.0 Stopwatch Class
- Nullable Types
- Drawing Rotated Text
- Unsafe Code
- Other .NET Languages
- Compiler Directives
- Safe Handles
- Predictions, 2007 Edition
- New Features in C# 3.0
- Generics
- Network Client Programming
- On the Misuse of Exceptions
- Maximum Object Size in .NET
- More on Maximum Object Sizes
- Keyed Collection Memory Limitations
- Matching String Endings
- Allocating Small Data Structures
- Grumbling About Limitations
- Some Thoughts on the Nature of What We Do
- Working with Predicates in Collections
- Working with DataReaders
- Outputting XML with XmlWriter
- Writing XML Data
- Working with Compression
- Another Look at Compressed Streams
- Compressing a Very Large File
- Canonical URIs
- Constructing URIs
- Using OneWayAttribute for Remote Calls
- Selecting a Garbage Collector
- Linked List
- Linked List Application - The MRU List
- Auto-implemented Properties in C#
- The HashSet Collection
- Looking Ahead: 2018
- An Experiment in Optimization
- A Larger Integer
- Extension Methods
- Language Integrated Query (LINQ)
- Variable Length Parameter Lists
- The ReaderWriterLockSlim Synchronization Primitive
- Sorting a Text File
- Sorting a Large Text File
- Using ListView with Large Data Sets
- LINQ One-Liners
- Regular Expression Optimization
- Random File I/O
- Computing the Size of a Structure
- More on Computing Structure Sizes
- UnmanagedMemoryStream
- Dynamically Loading Code
- Building a String Table
- Delegates Versus Function Pointers
- Visual Studio Editor Features
- A Simple Profile Timer
- New Features in C# 4.0
- IEnumerator or IList?
- New Features in .NET 4.0
- Set Operations with IEnumerable and HashSet
- Using File Locks
- Extending Object Functionality
- Clearing a HashSet
- When Hash Codes Matter
- Parsing Command Line Options
- Creating a Single-Instance Program
- Asynchronous Windows Forms Events
- The BackgroundWorker Component
- Fixing a Dumb Mistake
- Thinking About Multi-Threaded Programs
- JavaScript Object Notation
- Useful .NET-related Sites
- Markov Models
- Building an Order 0 Markov Model
- Higher Order Markov Models
- Webmaster's Guide to robots.txt
- An Overview of the Parallel Extensions to .NET
- Parallel Extensions Synchronization Objects
- Thread Safe Collections
- A Bug and a Conundrum
- Another Bug and an Answer
- Task Parallel Library
- Good and Bad Ideas in C#
- Parallel LINQ
- Copying Large Files
- Replacing File.Copy
- Learning from Our Mistakes
- Symbolic Links
- There Is No Easy Fix
- Tracking Hurricanes
- Examining Hurricane Data
- Searching for Multiple Strings
- Simple JSON Processing
- Aho-Corasick String Searching
- Writing a Web Crawler
- Web Crawler Politeness
- Source Control Management
- Subversion
- Communicating with Datagrams
- Fun with Actions and Funcs New
- The Future of Media
- The Importance of Metadata
- Of Comparison and IComparer
- IComparer, Comparer, IComparable, Oh My!
- Comparing Generic Types New
- A Simple HTTP Server New
- Informit Reference Library
Speech Technologies
Last updated Jan 1, 2004.
.NET Speech Technologies
Speech recognition is the capability of a computer to recognize the spoken word, for the purpose of receiving command and data input from the speaker. Most speech-recognition engines translate incoming audio data into phonemes, which are then interpreted into text that your application can use. (A phoneme is the smallest structural unit of sound that can be used to distinguish one utterance from another in a spoken language.)
In contrast, text-to-speech technology enables computers to translate text information into synthetic speech output.
Both speech recognition and text-to-speech use engines, software whose purpose is to recognize speech or play text.
Here, though, we'll look explicitly at text-to-speech technology in .NET. There's two types: synthesized text-to-speech, and concatenated text-to-speech.
In synthesized speech, the words are examined, and the engine produces the phonetic pronunciations for the words. The phonemes are then massaged with a complex algorithm that imitates the human vocal tract and produces the sound.
Concatenated text-to-speech engines study the text provided. With this technology, the digital audio recordings are concatenated glued together from recordings, words, and phrases in a prerecorded library.
SAPI 5 Overview
Microsoft's SAPI application programming interface (API) considerably decreases the code necessary for you to write an application that uses speech recognition and text-to-speech. Using SAPI makes speech technology more accessible and robust for a broad range of applications.
The SAPI API provides a high-level interface between the application and speech engines. SAPI implements the low-level details needed to control and deal with the real-time operations of various speech engines.
Matching the kinds of speech technologies, SAPI's two fundamental types of engines are text-to-speech (TTS) systems and speech recognizers.
Earlier versions of the API required you to use C and C++. SAPI 5.1 supports OLE automation, so you can use other languages for application development, as long as they too support OLE. Among them are Visual Basic, C#, and JScript.
With the help of TlbImp.exe tool we can generate the SpeechLib.dll file.
This TTS (Text To Speech) application demonstrates how to create an SpVoice object and how to use it to speak text and save it to a .wav file.
namespace TTS
{
using System;
using System.WinForms;
using System.Threading;
using SpeechLib;
public class Tts: System.WinForms.Form
{
private System.WinForms.Button button1;
private System.WinForms.CheckBox checkBox1;
private System.WinForms.Button SPEAK;
private System.WinForms.TextBox textBox1;
public Tts() {
InitializeComponent();
}
public override void Dispose() { }
private void InitializeComponent()
{
this.checkBox1 = new System.WinForms.CheckBox();
this.SPEAK = new System.WinForms.Button();
this.button1 = new System.WinForms.Button();
this.textBox1 = new System.WinForms.TextBox();
checkBox1.Location = new System.Drawing.Point(88, 280);
checkBox1.Text = "Save To Wave.";
checkBox1.Size = new System.Drawing.Size(112, 24);
checkBox1.TabIndex = 2;
checkBox1.BackColor = System.Drawing.Color.DodgerBlue;
SPEAK.Location = new System.Drawing.Point(96, 96);
SPEAK.BackColor = System.Drawing.Color.DodgerBlue;
SPEAK.Size = new System.Drawing.Size(96, 24);
SPEAK.TabIndex = 1;
SPEAK.Text = "SPEAK";
SPEAK.Click += new System.EventHandler(SPEAK_Click);
button1.Location = new System.Drawing.Point(88, 72);
button1.BackColor = System.Drawing.Color.DodgerBlue;
button1.Size = new System.Drawing.Size(112, 24);
button1.TabIndex = 3;
button1.Text = "EXIT";
button1.Click += new System.EventHandler(button1_Click);
textBox1.Location = new System.Drawing.Point(40, 120);
textBox1.Text = " ";
textBox1.Multiline = true;
textBox1.ForeColor = (System.Drawing.Color)System.
Drawing.Color.FromARGB((byte)192,
(byte)0, (byte)0);
textBox1.TabIndex = 0;
textBox1.Size = new System.Drawing.Size(216, 160);
textBox1.BackColor = (System.Drawing.Color)
System.Drawing.Color.FromARGB((byte)192, (byte)192, (byte)255);
this.Text = "InformIT";
this.AutoScaleBaseSize = new System.Drawing.Size(5, 13);
this.BackColor = (System.Drawing.Color)System.Drawing.
Color.FromARGB((byte)128, (byte)128, (byte)255);
this.ClientSize = new System.Drawing.Size(312, 349);
this.Controls.Add(button1);
this.Controls.Add(checkBox1);
this.Controls.Add(SPEAK);
this.Controls.Add(textBox1);
}
static void Main()
{
Application.Run(new Tts());
}
private void SPEAK_Click(object sender, System.EventArgs e)
{
try
{
SpeechVoiceSpeakFlags SpFlags = SpeechVoiceSpeakFlags.SVSFlagsAsync;
SpVoice speech = new SpVoice();
if (checkBox1.Checked)
{
SaveFileDialog sfd = new SaveFileDialog();
sfd.Filter = "All files (*.*)|*.*|wav files (*.wav)|*.wav";
sfd.Title = "Save to a wave file";
sfd.FilterIndex = 2;
sfd.RestoreDirectory = true;
if (sfd.ShowDialog()== DialogResult.OK)
{
SpeechStreamFileMode SpFileMode =
SpeechStreamFileMode.SSFMCreateForWrite;
SpFileStream SpFileStream = new SpFileStream();
SpFileStream.Open(sfd.FileName, SpFileMode, false);
speech.AudioOutputStream = SpFileStream;
speech.Speak(textBox1.Text, SpFlags);
speech.WaitUntilDone(Timeout.Infinite);
SpFileStream.Close();
}
}
else
{
speech.Speak(textBox1.Text, SpFlags);
}
}
catch
{
MessageBox.Show("Speak error");
}
}
private void button1_Click(object sender, System.EventArgs e)
{
this.Close();
}
}
}
Here's how we do it:
Declare the SpVoice object. //SpVoice speech = new SpVoice();//
Create a wave stream //SpFileStream SpFileStream = new SpFileStream();//
Create a new .wav file for writing. //SpFileStream.Open(sfd.FileName, SpFileMode, false);//
Set the .wav file stream as the output for the Voice object//speech.AudioOutputStream = SpFileStream;//
Call the Speak method, which will send the output to the .wav file. //speech.Speak(textBox1.Text, SpFlags);//
Close the file //SpFileStream.Close();//
Microsoft Speech Application SDK Version 1.0 Beta 3 (SASDK)
At the moment, the .NET Speech SDK is in beta version (Speech Application SDK Version 1.0 Beta 3). SASDK provides authoring tools, ASP.NET speech controls, a speech add-in for Internet Explorer, a speech add-in for Microsoft Pocket Internet Explorer, a rich grammar library, speech debugging tools, an application event-logging mechanism, samples, and documents.
Speech Application Language Tags (SALT) Specification
The SASDK development tools are based on the emerging Speech Application Language Tags (SALT) specification. SALT is a speech interface markup language that enables developers to write speech interfaces for both voice-only (e.g., telephony) and multimodal applications. SALT is a lightweight set of extensions to existing markup languages, especially HTML and XHTML, to write speech interfaces for web pages.
Multimodal access enables clients to interact with an application in multiple ways, such as input with speech, keyboard, keypad, mouse, and/or stylus; and output as synthesized speech, audio, plain text, motion video, and/or graphics. With SALT, developers can add speech recognition, synthesis, and telephony capabilities to HTML-or XHTML-based applications. It also facilitates accessing applications and information from telephones or other GUIbased devices such as PCs, tablet PCs, and wireless personal digital assistants (PDAs).
With the help of SASDK, ASP.NET web developers can develop, debug, and deploy speech-enabled ASP.NET telephony web and multimodal applications that run on rich clients such as a desktop PC, a tablet PC, or a Pocket PC. Because the development environment is fully integrated with Visual Studio .NET 2003, developers can use the Visual Studio .NET 2003 development environment toolboxes and graphical controls.
Key Elements in the SASDK
These are the key elements in the SASDK:
A set of ASP.NET speech controls facilitate speech input and output in ASP.NET applications by generating HTML and SALT markup for telephone (voice-only) and multimodal browsers.
Four Visual Studio .NET 2003 add-on tools are included:
Tool for creating and editing speech controls
Speech Grammar Editor for creating and editing speech-recognition grammars
Speech Prompt Editor for creating and editing prerecorded voice output
Speech debugging console
The Speech Add-in for Internet Explorer is used for running and testing speech-enabled ASP.NET web applications.
The tutorial provides instructions and sample applications.
ASP.NET Speech Controls
During execution, the ASP.NET server verifies client capabilities and transforms each speech control tag into client-side speech tags along with the required script. Let's briefly explore these controls.
QA control. This is the fundamental element of the speech controls, which defines a single interaction consisting of prompting the user, recognizing the user's response, and binding that response to a specific control on the page. The QA control is composed of several subordinate elements such as the following properties:
Prompt. Used for speech output directed to the user.
Reco. Indicates speech-input resources and features.
Answer. Defines how to handle the recognition results.
CompareValidator and CustomValidator controls. Because no GUI is available in voice-only applications, these validators provide an audio error message when input data is invalid.
Command control. With the help of this control, developers can define global speech-handling options such as Help, Repeat, or Cancel.
SemanticItem controls. These controls link the QA answers to the actual input controls. The recognized text is placed in the target SemanticItem for confirmation and binding.
Telephony controls. These controls address call control and messaging for telephony applications.
Speech Tools for Visual Studio .NET
The .NET Speech SDK includes four tools intended to construct and debug grammars, prompts, and speech-enabled web pages:
Grammar Editor. For creating and editing speech-recognition grammars.
Prompt Editor. For creating and editing prerecorded voice output.
Speech Debugging Console. Debugger for .NET speech applications.
Speech Control Editor. Designer for speech-enabled web pages.
Sample Application
Following is a sample application using only SALT markup without Visual Studio .NET:
<html xmlns:SALT="http://www.saltforum.org/2002/SALT"> <head> <!--The SALT Add-in to Internet Explorer object --> <object id="SpeechTags" CLASSID="clsid:DCF68E5B-84A1-4047-98A4-0A72276D19CC"VIEWASTEXT> </object> </head> <!--Importing the namespace from the implementation --> <?import namespace="SALT" implementation="#SpeechTags" /> <body bgcolor = "cyan"> <!--SALT text-to-speech object --> <SALT:prompt id="Demo"> </SALT:prompt> <p>Click the <b>Button!</b></p> <p> <input type="button" value="Text to Speech: SALT Demo"
onClick="Startdemo()"> </p> <script id="script1" language="jscript"> <!-- function Startdemo() { try { Demo.Start("SALT DEMO. Your .NET Reference Guide host
is Mr. G. Gnana Arun Ganesh"); } catch(e) { alert("Voice error"); } } --> </script> </body> </html>
The output is shown in the following figure. If you click the button, you can hear the voice.

VIEWASTEXT>
</object>
</head>
<!--Importing the namespace from the implementation -->
<?import namespace="SALT" implementation="#SpeechTags" />
<body bgcolor = "cyan">
<!--SALT text-to-speech object -->
<SALT:prompt id="Demo"> </SALT:prompt>
<p>Click the <b>Button!</b></p>
<p> <input type="button" value="Text to Speech: SALT Demo"
Figure
52

Account Sign In
View your cart