IBM
Skip to main content
 
Search IBM Research
     Home  |  Products & services  |  Support & downloads  |  My account
 Select a country
 IBM Research Home
Text-to-Speech
Publications
Feedback
 
 


IBM Research
  IBM Text-to-Speech: Publications

Here is a sampling of some of our recent publications in Text-to-Speech. Some of the papers are available for download.

Eide, E., Aaron, A., Bakis, R., Hamza, W., Picheny, M., and Pitrelli, J., "A Corpus-Based Approach to <AHEM/> Expressive Speech Synthesis", Proceedings of the 5th ISCA Speech Synthesis Workshop, Pittsburgh, PA, USA, June 14-16, 2004. (abs)

Pitrelli, John F., "ToBI Prosodic Analysis of a Professional Speaker of American English", Proceedings of Speech Prosody 2004, Nara, Japan, March 23-26, 2004. (abs)

Pitrelli, John F., and Eide, Ellen M., "Expressive Speech Synthesis using American English ToBI: Questions and Contrastive Emphasis", Proceedings of IEEE ASRU 2003: Automatic Speech Recognition and Understanding Workshop, St. Thomas, U.S. Virgin Islands, December 1-4, 2003. (abs)

Eide, E., Bakis, R., Hamza, W., Pitrelli, J., "Multi-layered Extensions to the Speech Synthesis Markup Language for Describing Expressiveness", Proceedings of the 8th European Conference on Speech Communication and Technology (Eurospeech), Geneva, Switzerland, September 1-4, 2003. (abs)

Eide, E., Aaron, A., Bakis, R., Cohen, P., Donovan, R., Hamza, W., Mathes, T., Picheny, M., Polkosky, M., Smith, M., and Viswanathan, M. (2003) Recent Improvements to the IBM Trainable Speech Synthesis System, Proc ICASSP'03, Hong Kong, China. (abs)

Aaron, A., Eide, E., and Pitrelli, J. F., "Making Computers Talk", Scientific American, on-line publication, March 17, 2003. (article)

Hamza, W. and Donovan, R.E. (2002) Data-Driven Segment Preselection in the IBM Trainable Speech Synthesis System, Proc ICSLP'02, Denver, CO, USA. (pdf)

Saito, T. and Sakamoto, M. (2002) Applying a Hybrid Intonational Model to a Seamless Speech Synthesizer, Proc ICSLP'02, Denver, CO, USA. (pdf)

Saito, T. and Sakamoto, M. (2002) Speaker Recognizability Evaluation of a Voicefont-based Text-to-Speech, Proc ICSLP'02, Denver, CO, USA. (pdf)

Eide, E. (2002) Preservation, Identification, and Use of Emotion in a Text-to-Speech System, Proc. IEEE 2002 Workshop on Speech Synthesis, Santa Monica, CA, USA. (abs)

Viswanathan, M. and Viswanathan, M. (2002) Comparison of Measurement of Speech Quality for Listening Tests of Text-to-Speech Systems, Proc. IEEE 2002 Workshop on Speech Synthesis, Santa Monica, CA, USA. (abs)

Shi, Q., Shen, L.Q., and Chai, H.X. (2002) Automatic New Word Extraction Method, Proc. ICASSP 2002, Orlando, FL, USA. (abs)

Donovan, R.E. (2001) A Component by Component Listening Test Analysis of the IBM Trainable Speech Synthesis System, Proc. Eurospeech'01, Aalborg, Denmark. (pdf)

Donovan, R.E., Ittycheriah, A., Franz, M., Ramabhadran, B., Eide, E., Viswanathan, M., Bakis, R., Hamza, W., Picheny, M., Gleason, P., Rutherfoord, T., Cox, P., Green, D., Janke, E., Revelin, S., Waast, C., Zeller, B. Guenther, C. & Kunzmann, J. (2001) Current Status of the IBM Trainable Speech Synthesis System, Proc. 4th ESCA Tutorial and Research Workshop on Speech Synthesis, Atholl Palace Hotel, Scotland, UK. (pdf)

Donovan, R.E. (2001) A New Distance Measure for Costing Spectral Discontinuities in Concatenative Speech Synthesisers, Proc. 4th ESCA Tutorial and Research Workshop on Speech Synthesis, Atholl Palace Hotel, Scotland, UK. (pdf)

Hamza, W., Rashwan, M., and Afifi, M. (2001) A Quantitative Method for Modeling Context in Concatenative Speech Synthesis system Using Large Speech Database, Proc. ICASSP 2001, Salt Lake City, UT, USA. (abs)

Luke, K.K, Chen, F., Lee, W., and Shen, L.Q. (2001) A Phonetic Study of the Prosodic Properties of Bisyllabic Compounds in Hong Kong Cantonese, Proc. 5th National Conference on Modern Phonetics, Beijing, China. (pdf)

Saito, T. (2001) Generating F0 Contours by Statistical Manipulation of Natural F0 Shapes, Proc. Eurospeech'01, Aalborg, Denmark. (pdf)

Zhang, W., Shen, L.Q., and Tang, D. (2001) Voice Conversion Based on Acoustic Feature Transformation, Proc. 6th National Conference on Man-Machine speech Communications,  ShenZhen, China. (pdf)

Donovan, R.E. (2000) Segment Pre-selection in Decision-Tree Based Speech Synthesis Systems, Proc. ICASSP'00, Istanbul, Turkey. (abs)

Niu, X.C., Shen, L.Q., Zhu, W.B., and Shi, Q. (2000) Modelling and Decision Tree Based Prediction of Pitch Contour in IBM's Mandarin Speech Synthesis System, Proc. International Symposium on Chinese Spoken Language Processing, Beijing, China. (pdf)

Saito, T., and Sakamoto, M. (2000) A Method of Creating a New Speaker's VoiceFont in a Text-to-Speech System, Proc. ICSLP'00, Beijing, China. (pdf)

Sakamoto, M., and Saito, T. (2000) An Automatic Pitch Marking Method Using Wavelet Transform, Proc. ICSLP'00, Beijing, China. (pdf)

Zhu, W.B., Shen, L.Q., and Niu, X.C. (2000) Duration Modeling for Chinese Synthesis from C-ToBI Labeled Corpus, ICSLP'00, Beijing, China. (pdf)

Donovan, R.E., Franz, M., Sorensen, J.S. & Roukos, S. (1999) Phrase Splicing and Variable Substitution using the IBM Trainable Speech Synthesis System, Proc. ICASSP'99, Phoenix, AZ, USA. (abs)

Donovan, R.E., and Eide, E.M. (1998) The IBM Trainable Speech Synthesis System, Proc. ICSLP'98, Sydney, Australia. (pdf)

Saito, T. (1998), Use of F0 Features in Automatic Segmentation for Speech Synthesis, Proc. ICSLP'98, Sydney, Australia. (pdf)


  
 

  

  About IBM  |  Privacy  |  Legal  |  Contact