Embedded Concatenative Text to Speech (eCTTS)

Speech Technologies


eCTTS is a small-footprint, embedded concatenative text-to-speech system, suitable for implementation on embedded platforms. The main demand for this technology comes from the automotive market, where it is an essential part of a speech-enabled car navigation and control system. eCTTS is an integral part of IBM Embedded ViaVoice, the flagship speech recognition and text-to-speech product for embedded applications, which has been deployed in cars since 2004.

In Concatenative Text-to-Speech, small speech segments are selected from a database of speech recordings (known as a "voice"), and then manipulated and concatenated together to form the synthesized sentence. The main research challenge of eCTTS is to reduce the IBM server-based TTS voice size by two orders of magnitude: from 500-1000 MB to 5-10 MB, with minimal degradation of speech quality. This is achieved by specialized, low bit rate speech compression, in addition to a deep preselection process, where 90% of the original recording is discarded. In order to preserve a high level of quality, effective speech modeling and processing methods have been invented.

Research and development work in the eCTTS project is carried out in the IBM Haifa Research Lab.


