Home > Articles > Programming > Java

Speech-Enable Your Java Software

Stephen B. Morris
  • PrintPrint
  • Share ThisShare This
  • DiscussDiscuss
Close WindowStephen B. Morris

Stephen B. Morris

Learn more…

Nonlinear Code Management in EJB3
Jul 30, 2009
Database Access via C# ADO.NET
Jul 2, 2009
C# Reflection
Jun 29, 2009
Object Relational Mapping and Java Persistence: Data Modeling and Legacy Schemas
Jun 23, 2009
C# GUI Programming
Jun 16, 2009
C# Assembly Programming
Jun 11, 2009
Using the C# system.io Namespace
Jun 2, 2009
Using C# Generics
May 13, 2009
An Introduction to Concurrent Java Programming
May 12, 2009
Using C# Interfaces
Apr 22, 2009
Getting Your Software onto Linux
Apr 13, 2009
What’s Wrong with Modern Software Development?
Mar 9, 2009
Building a Solid Foundation for JPA and Hibernate
Feb 16, 2009
Java Patterns for MPLS Network Management, Part 2
Dec 15, 2008
Java Patterns for MPLS Network Management, Part 1
Dec 8, 2008
Getting Started with Spring Web Flow
Sep 26, 2008
Application Contexts for Spring Web Services
Sep 16, 2008
Spring Web Services with SOAPUI
Aug 15, 2008
Hit the Ground Running with the Spring Framework
Aug 8, 2008
Building Multithreaded C# Applications
Jun 6, 2008
Hosting an LDAP Server in VMWare
May 30, 2008
Integrating Linux Into Your Windows Environment
May 21, 2008
Understanding C# Object Serialization and Object Graphs
May 19, 2008
Further Subversion Steps: Get Motoring with TortoiseSVN
Apr 22, 2008
Further Steps with the Java Sound API
Apr 7, 2008
5 Easy Steps to Using Virtualization Technology
Apr 4, 2008
Five Steps to Getting Started Server-Side with PHP
Mar 7, 2008
Targeted Client Upgrades: Creating a Flexible, Low-cost Application Upgrade Mechanism
Feb 15, 2008
Java DMK and Legacy IT Management
Feb 8, 2008
Aspect-Oriented Programming: A Tool for Internationalization
Feb 1, 2008
Achieving Separation of Concerns Using BPEL
Jan 25, 2008
Mobile Java with J2ME
Jan 18, 2008
Graph Algorithms in Java
Dec 28, 2007
Design Patterns in Java: The Observer
Dec 21, 2007
Five Steps to Managing Unstructured Data with Derby
Dec 14, 2007
Using the Java Sound API
Nov 9, 2007
Moving C++ and Java Programmers Up the Value Chain
Nov 2, 2007
Java Nuts and Bolts: Copy Constructors, Cloning, and Linked Structures
Oct 19, 2007
Further Steps with Derby: Derby Embedded in a Browser
Oct 5, 2007
Further Steps with Derby: Defining and Accessing Your Data
Sep 28, 2007
IT Management Using C# with WMI
Sep 21, 2007
Using Derby as a Network Database Server Engine
Sep 21, 2007
Five Steps to Further Success with Subversion
Sep 14, 2007
Improve Software Installation with AntInstaller
Sep 7, 2007
Multithreaded Java GUI Programming
Aug 10, 2007
The C++ Strategy Pattern for Multiple Network Events
Aug 4, 2007
Java Collections and Iterators
Aug 3, 2007
The C++ State Pattern for Network Operations
Jun 15, 2007
C++ Nuts and Bolts: Casts, Call-by-Reference, and Inheritance
Jun 8, 2007
Getting Started with Subversion on Windows
May 25, 2007
Getting Started with Derby
May 11, 2007
C# Callback and Event Mechanisms
Mar 23, 2007
Some Rules for Safer Java Programming
Mar 16, 2007
Inheritance and Polymorphism in C++ and C#
Mar 2, 2007
IT Management: Dipping into the Platform with C#
Feb 2, 2007
C# and IT Management Infrastructure
Jan 26, 2007
C++ to C# Migration
Dec 29, 2006
Service Oriented Architecture with Apache Axis
Nov 22, 2006
Speech-Enable Your Java Software
Sep 1, 2006
Generic C++ for Networks
Apr 21, 2006
C++ Inheritance and Polymorphism
Apr 7, 2006
Thinking Recursively with C++
Mar 24, 2006
Aspect-Oriented Programming for Production Code
Mar 10, 2006
C++ Modularity with Namespaces and Exception Handling
Mar 3, 2006
C++ Chain of Responsibility Pattern: Network Events
Feb 10, 2006
The Web Services Distributed Management (WSDM) Standard
Feb 3, 2006
MPLS Network Design Nuts and Bolts
Jan 6, 2006
BPEL: The Next Big Thing in Software?
Nov 18, 2005
On-Demand Computing: A New Paradigm
Oct 6, 2005
Software Plasticity with Aspect-Oriented Programming
Sep 16, 2005
Parser Configuration in JAXP
Aug 26, 2005
Publish and Subscribe Using C++ and the Observer Pattern
May 27, 2005
Java Application Servers: Seven Things You Should Know
Apr 29, 2005
Legacy IT Management using C++
Apr 29, 2005
Software Futures: Architecture
Mar 25, 2005
Saving Money with Legacy Data
Mar 11, 2005
Saving Money with Legacy Source Code
Feb 4, 2005
MPLS and Ethernet: Seven Things You Need To Know
Dec 17, 2004
Quality of Service, Part 2 of 2: Managing Enterprise QoS
Oct 15, 2004
Quality of Service, Part 1 of 2: Elements of Enterprise QoS for Voice Over IP
Oct 8, 2004
The Need for Autonomic Computing
Oct 8, 2004
A Blade Server Primer
Aug 27, 2004
Workflow-Based Network Management
Jul 30, 2004
SNMP Versus Command-Line Interface (CLI) for Network Management
Jul 23, 2004
Security and the Management Plane, Part 2
Jul 2, 2004
Security and the Management Plane, Part 1
Jun 25, 2004
Network Management and MPLS
Nov 13, 2003
Managing Large Networks: Problems and Solutions
Oct 17, 2003
Getting my first iPhone app into the App Store
By on September 25, 20092 Comments

Having just posted my first iPhone to the App store, I wanted to briefly describe the experience and at the same time to debunk some myths I’ve heard about since I started.

SOA and IT Strategy
By on April 27, 2009 No Comments

It's a rare organization that has a comprehensive IT strategy. Why does a strategy matter? For one thing, IT is now part of the DNA of all organizations. So, an effective strategy is a key business element.

IT staff and contractors - why they matter
By on April 21, 2009 1 Comment

IT salaries and contract rates in free fall

A day in the life of a software contractor - Forming good relationships
By on April 9, 20082 Comments

I wanted to share a few findings with you about my recent forays into the world of software contracting.

The path to programming excellence ? C++ to Java from C, Pascal and others
By on September 28, 2007 No Comments

I’ve probably used more programming languages down the years than is good for me. And like many other programmers, I mostly use Java nowadays. During my most recent product development, I made a foray back into C in order to do some Ethernet protocol analysis. Some languages are just better than others for such tasks - remember C is a system language so you can use it to dig right into the platform. Java is more constrained because of the JVM boundary. In the end, I integrated the down-to-the-metal C code right back into Java using JNI.

Moving up the value chain
By on September 14, 2007 No Comments

Following on from my previous posting on decision-making quality, I wanted to talk a little about the related subject of the value chain.

Decision-making Quality
By on August 7, 20073 Comments

The need for forward momentum is perhaps one of the negative consequences of the fast pace of modern life. This issue is discussed in ex-world chess champion, Garry Kasparov’s latest book “How life imitates chess”. Kasparov feels that decision-making suffers if insufficient time is taken. He’s not alone. Edward de Bono – the inventor of lateral thinking once said that apart from extreme emergencies there is rarely a need to think quickly. Warren Buffett moved his offices out of New York City to Omaha because he wanted not one good idea every day but one good idea every year. Excessive and unnecessary speed takes a heavy toll on the quality of decision-making.

Speech-enabling your software is easy, says Stephen Morris. If speech is added in a sympathetic fashion, it can raise the standard of your user interface in subtle but powerful ways. This approach potentially opens up new markets to your software products; for example, reaching visually impaired users. Developments in web standards are also dictating that speech-enabled software is essentially a commodity item. Read on to find out more.

I was paying for parking recently when I noticed that the ticket machine was speech-enabled. After I inserted my ticket, the machine told me in a tinny voice the amount to pay and then said (a tad impolitely), "Get your ticket." They say that 50% of communication is nonverbal, so the programmers of the parking machine might need to add some of this nonverbal content into the prompts. Still, it’s pretty impressive!

This article presents a very basic speech-enabled payment application. I discuss coding and design issues related to speech technology, and my examples employ speech synthesis. My focus is primarily on the practical elements (above and beyond "Hello World"), rather than theory. As you’ll see, all this technology has some interesting elements.

Speaking and Hearing: Speech Synthesis and Speech Recognition

Voice capabilities consist of two core speech technologies:

  • Speech synthesis produces synthetic speech from text generated by an application, an applet, or a user. Speech synthesis is often referred to as text-to-speech technology.
  • Speech recognition provides computers with the ability to listen to spoken language and to determine what has been said. In other words, recognition processes audio input containing speech by converting it to text.

Many organizations have limited voice recognition systems on their customer phone-support channels. This usage is a means of both reducing staffing levels and possibly making the host organization seem more technically advanced. Other services also exist in which text messages can be sent from mobile phones to landlines. The landline phone then uses a text-to-speech service to play the message to the user as a voicemail message. Some landline phones also allow for sending text messages—in a sense, using the text-to-speech service in reverse.

Just as podcasting is now a mainstream technology, we can expect to hear (pardon the pun!) a lot more about speech-enabled solutions. One area similar to podcasting is that of listening to audio versions of documents; for example, when traveling.

Speech recognition offers even more profound benefits to end users than does speech synthesis. For example, consider situations in which users are physically limited—such as doing tasks that require both hands (surgery, do-it-yourself projects, etc.) while trying to operate some kind of hardware.

Interestingly, the three speech recognition software packages I’ve tried were very complex to set up, or the results were useless. In either case, I didn’t have much success. This problem seems to indicate that speech recognition technology is not at the same level of market maturity as that of speech synthesis. You might have to spend a significant amount of money for a decent speech recognition solution.

Emerging Standards

There is a broad web context for speech-enabled software. Emerging standards, such as the Device Independent Authoring Language (DIAL) indicate that the audience for web content is growing fast. This growth is occurring in terms of the following:

  • Device types (mobile phones, PDAs, laptops, and even children’s’ toys)
  • Accessibility requirements
  • Time (people want access to the same web pages at work and at home)

DIAL has some generic requirements that may affect the way in which speech technology is used. Let’s consider this issue briefly.

DIAL is a standard for how web pages should be designed and written to accommodate developments in web access, delivery networks, and device technology. It has as its major goal the production of web content that is available any time, any way, and anywhere. To make this pithy requirement set more concrete, let’s say that someone with a mobile phone is traveling home from work on a train and wants to see the value of his or her portfolio of shares. DIAL facilitates mechanisms to allow the web site to present the required data in a format that suits the needs of the user, the target device, and the delivery network. So, in this case, the content might be presented in an audio format or in a tightly summarized textual fashion because of the small screen.

DIAL provides for a sympathetic way of producing, conveying, and rendering web content. It’s entirely likely that DIAL will make special use of speech synthesis and recognition technologies (and other media, such as video).

Listing 1 shows an XHTML2 object definition:

Listing 1 An XHTML2 object.

<object src="http://www.example.com/stocks.mp3" srctype="audio/mpeg">
 An audio file representing stocks.
</object>

The object in Listing 1 is allowed by DIAL and could be downloaded to a device equipped with an audio/MPEG player. In turn, the player could incorporate a speech synthesizer. The important point here is that there’s an emerging nexus between web content, small devices, and speech synthesis technology. It’s only a matter of time before speech recognition is added to the mix to make the user experience even richer.

Writing Java-Based Voice Software

Overall, Java-based voice synthesis and recognition software isn’t particularly difficult to write. Free toolkits are available that provide pretty impressive results (for synthesis, at least) in a very short time.

The Java Speech API (JSAPI) is a definition of a standard, easy-to-use, cross-platform software interface to state-of-the-art speech technology, providing capabilities for both speech synthesis and speech recognition. The API is decoupled from implementations in order to provide the conditions for a vibrant market for speech technology. In this way, the industry can enjoy the use of a standard, well-researched specification and API, while still adding differentiating product features.

Without further ado, let’s get your system set up to run the examples.

 

  • Share ThisShare This
  • Your Account

Discussions

Make a New Comment

You must log in in order to post a comment.

Related Resources

Danny KalevMinutes from the October 2009 Meeting
By Danny Kalev on November 19, 2009 No Comments

The minutes from the Santa Cruz (October 2009) meeting are available here. Even if you're not a language layer at heart, I encourage you to read them.

Danny KalevA Reader's Opinion on Attributes
By Danny Kalev on October 20, 2009 No Comments

In August I dedicated a series to the debate about C++0x attributes. I believe that it covered the subject in a balanced and detailed way, but I keep getting complaints from C++ users who don't like attributes for various reasons. Here's a recent email I received from a Polish C++ programmer. While it  doesn't represent my opinion about attributes -- I'm rather neutral about this feature and consider it a "solution waiting for a problem" -- but it suggests that attributes are still a highly controversial issue that will haunt C++ for a long time. The email is quoted here with minor edits that and as usual, with all private details removed.

Danny KalevFollowup: The Web 2.0 Guy I Ain't
By Danny Kalev on October 16, 2009 1 Comment

Almost a year ago, I posted here The Web 2.0 Guy I Ain't. People wonder whether I still resist all those Web 2.0 features and technologies at the end of 2009.

See All Related Blogs

Informit Network