Skip to main content
    Country/region [change]    Terms of use
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

IBM Research

IBM Text-to-Speech Research

Text-to-speech (TTS) is the generation of synthesized speech from text. Our goal is to make synthesized speech as intelligible, natural and pleasant to listen to as human speech and have it communicate just as meaningfully.

We have developed a novel TTS system, built on IBM's successful work in data-driven methodologies for speech recognition. Our system obtains its parameters through completely automated training on a few hours of speech data, which is acquired by recording a specially prepared script. During synthesis very small segments of recorded human speech are concatenated together to produce the synthesized speech.

Expressive Samples

Most speech synthesis has a neutral, one-size-fits-all expression, regardless of what it's saying. The new IBM expressive speech synthesizer has a range of expressions, so you can tune the speech to fit its content. Here are some examples.

Good news statement  Unexpressive  Expressive
Bad news statement  Unexpressive  Expressive
Yes-no question      Unexpressive  Expressive
Contrastive emphasis
response to question about
changing in Atlanta
  Unexpressive  Expressive

Language Samples

Here are some samples from other languages that we are working on. IBM Research labs all over the world have developed these examples of synthesized speech in their languages.

  ????  ? ??  ?? ? ?  . ? ? ?? ?  ?? ? ????? ?.   ?   ????? . Discovered in 1922 by Howard Carter, the tomb of Tutankhamun contained the most extensive royal treasure of ancient Egypt. The collection consisted of over 3850 artifacts including everything from toys and games for the young king to furniture, weapons, chariots, a golden mask and a golden sarcophagus. Many statues and symbols of deities to protect and help the king in the afterlife were also found in the tomb.
  • Male
  • 22 kHz(wav)
  • Female
  • 22 kHz(wav)

    Welcome to the Mandarin text-to-speech system developed by the International Business Machines Corporation.
  • Simplified-Male
  • 22 kHz(wav)
  • Simplified-Female
  • 22 kHz(wav)

    Welcome to the Cantonese text-to-speech system developed by the International Business Machines Corporation.
  • Cantonese-Male
  • 22 kHz(wav)
  • Cantonese-Female
  • 22 kHz(wav)

    Welcome to the Mandarin text-to-speech system developed by the International Business Machines Corporation.
  • Taiwanese
  • 22 kHz(wav)

    Bienvenue chez IBM. Taper 1 pour commander. Taper 2 pour annuler. Taper 3 pour confirmer. Puis, veuillez composer les 10 chiffres de votre numro et terminer en appuyant sur la touche dise.
    Welcome to IBM. Press 1 to order. Press 2 to cancel. Press 3 to confirm. Then, compose the 10 digits of your number and finish by pressing pound sign.

    • Male 8 kHz (wav)

    Date : Le 21 juin 2002, 12 heures 30. Adresse : 2 avenue Gambetta La Dfense. Tlphone : 01 49 05 43 67. Votre compte est crditeur de 3481 euros.
    Date : 21st of June 2002, 12h30. Address : 2 Gambetta avenue in La Defense. Telephone : 01 49 05 43 67. Your credit balance is 3481 euros.
    • Female 8 kHz (wav)

    Ich habe fr sie ein Zimmer mit Blick auf das Meer fr den Zeitraum vom 28. Juli bis 5. August reserviert.
    I have reserved a room for you with a view on the ocean for the time of Jul 28 til Aug 5.

    • Male 8 kHz (wav)

    Ihr aktuelles Guthaben betrgt 19 Euro und 25 Cent, ihr nchster Aufladetermin ist der 25.01.2003.
    Your current credit is 19 Euro and 25 cent, your next recharge date is January 25th 2003.
    • Female 8 kHz (wav)


    IBM's commercial TTS site contains more access points to available IBM products and product information.


    News about our research
      · The Quest for the Digital Chatterbox

      About IBM  |  Privacy  |  Legal  |  Contact