1.2 What is VoiceXML?
VoiceXML is an open standard markup language based on XML (Extensible Markup Language).
VoiceXML is used to define voice dialogs; VoiceXML is to voice dialogs what HTML is to graphical dialogs. The specification is essential to making Internet content and information accessible via voice and phone.
VoiceXML represents combined contributions of AT&T, Lucent, Motorola, and IBM. AT&T (and Lucent) started work on Phone Markup Language, PML, back in 1995. Motorola had another markup language, VoxML, which was first released in October 1998. In February 1999, IBM joined the other three companies and contributed its SpeechML technology. Following this, in March 1999 the W3C VoiceXML Forum was formed.
The first public release, VoiceXML 0.9, was in August 1999. VoiceXML 1.0 was released seven months later in March 2000. In May 2000, the Forum submitted the specification to the W3C (World Wide Web Consortium).
1.2.2 W3C organization
One of the principal standard bodies for technologies that make the Web a robust, scalable, and adaptive infrastructure is the World Wide Web Consortium, also known as the W3C. The W3C was created in October 1994 to lead the technical evolution of the Web by promoting interoperability and encouraging an open discussion through the development of specifications, guidelines, software, and tools.
Managed by the W3C are XML and the many industry-specific XML applications. VoiceXML is one of the applications that fall under the W3C.
Relating to voice applications there are many adjunct XML document types for dealing with specialized problems in voice development. These markup languages are also managed by the W3C. Each of these can be used with VoiceXML to fully specify a voice application:
Grammar XML (GRXML)
GRXML provides a standard language for representing speech grammars used by speech recognition engines. Grammars tell the speech recognizer what words to listen for and in what order they may appear. Grammars and GRXML are covered in detail in Chapter 2, "VoiceXML essentials," on page 24.
Speech Synthesis Markup Language (SSML)
SSML provides a standardized way of specifying how text is rendered as speech. This includes tags for controlling the pronunciation, tone, inflection, and other characteristics of spoken words.
Call Control XML (CCXML)
CCXML provides a language for controlling telephony and switching equipment. CCXML applications can perform tasks such as setting up conference calls, transferring calls, answering incoming calls, and creating outbound calls.
XML events is a standard providing an interoperable way of associating behaviors with document-level markup.
Since the markup languages mentioned throughout the book are managed by the W3C, it will be beneficial to summarize the document processes which the W3C has formalized. Every technical report on the Recommendation track is edited by one or more editors appointed by a Working Group Chair. It is the responsibility of these editors to ensure that the decisions of the group are correctly reflected in subsequent drafts of the technical report.
The W3C "Recommendation track" is the process that W3C follows to build consensus around a Web technology, both within W3C and in the Web community as a whole. W3C turns a technical report into a Recommendation by following this process. The five stages that describe the increasing levels of maturity and consensus along the Recommendation track are:
Working Draft (WD)
A Working Draft begins with the submission of a technical report to the W3C. A Working Draft is a chartered work item of a Working Group and generally represents work in progress and the commitment by the W3C to pursue work in a particular area.
Last Call Working Draft (LC)
A Last Call Working Draft is a special instance of a Working Draft that is considered by the Working Group to fulfill the relevant requirements of its charter and any accompanying requirements documents. A Last Call Working Draft is a public technical report for which the Working Group seeks technical review from other W3C groups, W3C Members, and the public.
Candidate Recommendation (CR)
A Candidate Recommendation is believed to meet the relevant requirements of the Working Group's charter and any accompanying requirements documents, and has been published in order to gather implementation experience and feedback. Advancement of a technical report to Candidate Recommendation is an explicit call for implementation experience to those outside of the related Working Groups or the W3C itself.
Proposed Recommendation (PR)
A Proposed Recommendation is believed to meet the relevant requirements of the Working Group's charter and any accompanying requirements documents, to represent sufficient implementation experience, and to adequately address dependencies from the W3C technical community and comments from previous reviewers. A Proposed Recommendation is a technical report that the Director of the W3C has sent to the Advisory Committee, of the W3C, for review.
A W3C Recommendation is a technical report that is the end result of extensive consensus-building inside and outside of W3C about a particular technology or policy. W3C considers that the ideas or technology specified by a Recommendation are appropriate for widespread deployment and that they promote W3C's mission.
1.2.3 The XML standard
The parent of VoiceXML is XML, the Extensible Markup Language. XML is the universal format for structured documents and data on the Web. The base specifications are XML 1.0, W3C Recommendation published in February 1998, and Namespaces, published in January 1999. XML is a non-proprietary public standard. It is designed to improve the functionality of the Web by providing more flexible and adaptable information identification.
XML is called extensible because it is not a fixed format like HTML (which is a single, predefined markup language). Instead, XML is actually a "meta-language" a language for describing other languages allowing you to design your own customized markup languages for limitless different types of documents.