Home > Articles > Web Services > XML

  • Print
  • + Share This
This chapter is from the book

This chapter is from the book

3.42 prosody

Element type



contour | duration | pitch | range | rate | volume


audio | choice | emphasis | enumerate | paragraph | prompt | prosody | sentence | voice


PCDATA | audio | break | emphasis | enumerate | mark | paragraph | phoneme | prosody | say-as | sentence | value | voice


Associates text-to-speech rendering parameters with the contained tts content.


<!ELEMENT prosody (%allowed-within-sentence; | %structure;)* >
<!ATTLIST prosody
  pitch    CDATA #IMPLIED
  contour  CDATA #IMPLIED
  range    CDATA #IMPLIED
  rate     CDATA #IMPLIED
  duration CDATA #IMPLIED
  volume   CDATA #IMPLIED >

Language model


  • contour : CDATA

    Indicates a transform function for the pitch variation. This can be used to "deaden" or "brighten" the pitch range in a more specific way than simply with a single constant as is done with the range attribute. This attribute is specified as a set of (input percentage, relative output percentage) pairs, for example: "(0%,+20)(10%,+30%)(40%,+10)".

  • duration : CDATA

    The time interval in which the contained tts content must be spoken. This is specified using an integer followed by a unit abbreviation: s (second), ms (millisecond).

  • pitch : CDATA

    The base line pitch for the speech specified either as an integer in Hertz, as a relative value (e.g. +10, +5%, +5st where st means “semitone,”), or as one of the following symbols: high, medium, low, default.

  • range : CDATA

    The variability of pitch specified either in Hertz, as a relative value (e.g. +10, +5%, +5st where st means “semitone,”), or as one of the following symbols: high, medium, low, default.

  • rate : CDATA

    The speaking rate for the text specified either as a relative value (e.g. +10, +5%, etc.), or as one of the following symbols: fast, medium, slow, default.

  • volume : CDATA

    The volume at which the contained tts content should be played specified as an integer in the range of [0, 100] or one of the following values: silent, soft, medium, loud, default.


  • tts and audible content

    To be spoken using the given prosody parameters.


Example 3-53 A dialog that sings Twinkle Twinkle Little Star

<?xml version="1.0" encoding="iso-8859-1"?>
<vxml version="2.0">
  <form id="audiotest">
      <prompt xml:lang="us-en">
        <prosody duration="500ms" pitch="440">Twin</prosody>
        <prosody duration="500ms" pitch="440">cull</prosody>

        <prosody duration="500ms" pitch="659">Twin</prosody>
        <prosody duration="500ms" pitch="659">cull</prosody>

        <prosody duration="500ms" pitch="740">Lit</prosody>
        <prosody duration="500ms" pitch="740">tull</prosody>

        <prosody duration="1000ms" pitch="659">star</prosody>

        <prosody duration="500ms" pitch="587">How</prosody>
        <prosody duration="500ms" pitch="587">I</prosody>

        <prosody duration="500ms" pitch="554">Won</prosody>
        <prosody duration="500ms" pitch="554">der</prosody>

        <prosody duration="500ms" pitch="494">what</prosody>
        <prosody duration="500ms" pitch="494">you</prosody>

        <prosody duration="1000ms" pitch="440">are</prosody>

  • + Share This
  • 🔖 Save To Your Account