Overview
The Speech Technologies group focuses on voice technologies and their use for advanced services and applications. We create solutions, standards, frameworks, and services that enhance the experience and capabilities offered to users and enterprises. In the embedded area, our activities focus on advanced voice-based and multimodal user interface technologies and on advanced speech-enabled services. We develop state-of-the-art technology for high quality, natural sounding embedded text-to-speech that can be used to deliver information, entertainment and convenience to mobile users and car drivers. For contact centers, we develop solutions and middleware for speech analytics, including transcription, audio search and retrieval, and emotion detection.
The group's expertise covers a wide spectrum of technologies for speech/audio coding and processing, speech recognition and synthesis, speech enhancement, multimedia processing, and web services and applications.
Activities
Publications
- R. Hoory, Z. Kons and A. Sorin, "The future of text-to-speech on mobile clients", ACM workshop on Speech in Mobile and Pervasive Environments, Sep. 2006, Espoo, Finland.
- Z. Shuang, R. Bakis, S. Shechtman, D. Chazan and Y. Qin, "Frequency warping based on mapping formant parameters", in Proc. ICSLP, Sep. 2006, Pittsburgh PA, USA.
- S. Ben-David, A. Roytman, R. Hoory and Z. Sivan, "Using voice servers for speech analytics", International Conference on Digital Telecommunications (ICDT), Aug. 2006, Cap Esteral, France.
- J. Mamou, D. Carmel and R. Hoory, "Spoken document retrieval from call-center conversations", in Proc. SIGIR, Aug. 2006, Seattle WA, USA.
- D. Chazan, R. Hoory, A. Sagi, S. Shechtman, A. Sorin, Z. Shuang and R. Bakis, "High quality sinusoidal modeling of wideband speech for the purpose of speech synthesis and modification", in Proc. ICASSP, May 2006, Toulouse, France.
- G. Mishne, D. Carmel, A. Roytman and A. Soffer "Automatic analysis of call-center conversations", in Proc. 14th ACM international conference on Information and knowledge management (CIKM), Oct. 2005, Bremen, Germany.
- D. Chazan, R. Hoory, Z. Kons, A. Sagi, S. Shechtman and A. Sorin, "Small footprint concatenative text-to-speech synthesis system using complex spectral envelope modeling", in Proc. Eurospeech, Sep. 2005, Lisbon, Portugal.
- S. Basson, A. Faisman, R. Hoory, D. Kanevsky, M. Picheny, A. Roytman, Z. Sivan and A. Sorin, “Accessibility, Speech Technology, and Human Interventions” AVIOS/SpeechTek 2005.
- A. Sorin, T. Ramabadran, D. Chazan, R Hoory, M. McLaughlin, D. Pearce, F. Wang, Y. Zhang, "The ETSI Extended Distributed Speech Recognition Standards: Client Side Processing and Tonal Language Recognition Evaluation", in Proc. ICASSP, May 2004, Motreal Canada.
- T. Ramabadran, A. Sorin, M. McLaughlin, D. Chazan, D. Pearce, R. Hoory, "The ETSI Extended Distributed Speech Recognition Standards: Server Side Speech Reconstruction", in Proc. ICASSP, May 2004, Motreal Canada.
- K. Y. Kupeev and Z. Sivan, "Selective Enhancement of Contrast Blocks for MPEG/JPEG Image Compression", Visual Communications and Image Processing (VCIP) 2003, Lugano, Switzerland, pp. 1382-1389.
- D. Chazan, R. Hoory, Z. Kons, D. Silberstein and A. Sorin, "Reducingthe footprint of the IBM trainable synthesis system", in Proc.7th Int. Conf. Spoken Language Processing, Sep. 2002, Denver USA ( ICSLP2002).
- K. Y. Kupeev and Z. Sivan, "New shape representation and similarity measure for fast shape indexing", Proceedings of SPIE,"Storage and Retrieval for Media Databases 2002", Vol. 4676, pp. 116-125,San Jose, USA, 2002.
- D. Cohen-Or, Y. Noimark and T. Zvi, "A Server-based Interactive Remote Walkthrough", proceedings of EGMM2001.
- D. Chazan, M. Zibulski, R. Hoory and G. Cohen, "Efficient periodicityextraction based on sine-wave representation and its application to pitch determination of speech signals", in proceedings of EUROSPEECH2001.
- K. Y. Kupeev and Z. Sivan, "An algorithm for efficient segmentation and selection of representative frames in video sequences", Proceedings of SPIE "Storage and Retrieval for Media Databases 2001", Vol. 4315, pp.253-261, Jan. 2001,San Jose USA.
- S. H. Maes, G. Cohen, R. Hoory and D. Chazan, "Conversational networking: conversational protocols for transport, coding and control", in Proc. 6th Int. Conf. Spoken Language Processing, Beijing China,Oct. 2000 (ICSLP-2000 ).
- D. Chazan, G. Cohen, R. Hoory and M. Zibulski, "Low bit rate speechcompression for playback in speech recognition systems", in proceedings of EUSIPCO,Sept. 2000.
- D. Chazan, G. Cohen, R. Hoory and M. Zibulski, "Speech reconstructionfrom mel-frequency cepstral coefficients and pitch frequency", in proceedings of ICASSP,June 2000.
- Z. Sivan, D. Chazan, G. Cohen, R. Hoory, A. Sorin, "Voice in Pervasive Devices - Serving both Human Listeners and Machine Recognizers", PvCC 2000, Yorktown Hights USA.
- A. Amir, D. Ponceleon, B. Blanchard, D. Petkovic, S. Srinivasan and G.Cohen, "Using Audio Time Scale Modification for Video Browsing", in collaboration with IBM Almaden , in Proceedings of HICSS2000. Received best paper award in the digital documents track.
- Z. Sivan, E. D. Karnin, D. Ramm and R. Cohen, "Performance of a Software-Only H.263 Video Encoder on the PowerPC processor" ,19th IEEE conventionin Israel, Jerusalem Israel, November 1996, pp. 395-398.
- R. Hoory, N. Shaked and D. Chazan, "Building a speech database for the purpose of speaker specific speech synthesis", In Proceedings of ICSP 1996, pp. 741--744.
- R. Hoory and D. Chazan, "Speech Synthesis for a specific speaker based on a labeled speech database", In Proceedings of ICPR 1994, pp. C146-148.
- Y. Medan, E. Yair and D. Chazan, "Super resolution pitch determination of speech signals", IEEE Trans. Acouts., Speech and Signal Processing, vol. 39, pp.40-48, Jan. 1991.