Speech Technologies
Publications
- S. Tiomkin, D. Malah, Z. Kons and S. Shechtman, "A Hybrid Text-to-Speech System that Combines Concatenative and Statistical Synthesis Units" to be published in IEEE Transactions on audio, speech and language processing, Oct. 2010
- S. Tiomkin, D. Malah and S. Shechtman, " Statistical Text-To-Speech Synthesis based on Segment-wise Representation with a Norm Constraint", IEEE Transactions on audio, speech and language processing, July 2010
- S. Shechtman and A. Sorin, " Sinusoidal model parameterization for HMM-based TTS system", Interspeech 2010, Makuhari, Japan
- H. Aronowitz, "Unsupervised Compensation of Intra-Session Intra-Speaker Variability for Speaker Diarization", Odyssey 2010, Brno, Czech Republic
- H. Aronowitz and V. Aronowitz, "Efficient score normalization for speaker recognition", ICASSP 2010, Dallas, USA
- T. Shoham, D. Malah and S. Shechtman, "Footprint Reduction of Concatenative Text-To-Speech Synthesizers using Polynomial Temporal Decomposition", ISCCSP 2010, Limassol, Cyprus
- A. Sorin and R. Hoory, "Automatic Speech Transcription in AAL Solutions", AMI Workshop 2009, Salzburg, Austria
- Y. Solewicz, H. Aronowitz, " Two-Wire Nuisance Attribute Projection", ", Interspeech 2009, Brighton, UK.
- J. Huerta , C. Wu, A Sakrajda , S. Caskey, E. Jan, A. Faisman, S. Ben-David, W. Liu, U. Stewart, M. Frissora , D. Lubensky , A. Lee, "RTTS: Towards Enterprise-level Real-Time Speech Transcription and Translation Services", Interspeech 2009, Brighton, UK.
- A. Kaplan, J. Mamou, F. Gallo, and B. Sznajder, " Multimedia Feature Extraction in the SAPIR Project", UIMA Workshop at GSCL 2009, Potsdam, Germany
- J. Mamou, Y. Mass, M. Shmueli-Scheuer, B. Sznajder, " A Unified Inverted Index for an efficient Image and Text Retrieval", SIGIR 2009, Boston, USA
- B. Ramabhadran, A. Sethy, J. Mamou, B. Kingsbury, U. Chaudhari, " Fast Decoding for Open Vocabulary Spoken Term Detection", NAACL-HLT 2009, Boulder, USA
- S. Shechtman and R. Tachibana, "Efficient Gradient F0 Tree Model for Prosody Modeling and Unit-selection, Applied for the Embedded American English Concatenative TTS", ICASSP 2009, Taipai Taiwan
- R. Fernandez, Z. Kons, S. Shechtman, Z. Shuang, R. Hoory, B. Ramabhadran and Y. Qin, "The IBM Submission to the 2008 Text-to-Speech Blizzard Challenge", Blizzard Workshop, Sep. 2008, Brisbane Australia.
- S. Tiomkin and D. Malah, "Statistical Text-to-Speech Synthesis with Improved Dynamics", Interspeech, Sep. 2008, Brisbane, Australia, Sep. 2008.
- H. Aronowitz, "Online Vocabulary Adaptation Using Contextual Information and Information Retrieval," in Proc. Interspeech, Sep. 2008, Brisbane Australia.
- H. Aronowitz and Y. Solewicz , "Speaker Recognition in Two Wire Test Sessions," in Proc. Interspeech, Sep. 2008, Brisbane Australia.
- J. Mamou, B. Ramabhadran, "Phonetic Query Expansion for Spoken Document Retrieval", in Proc. Interspeech, Sep. 2008, Brisbane Australia.
- J. Mamou, Y. Mass, B. Ramabhadran, B. Sznajder, "Combination of Multiple Speech Transcription Methods for Vocabulary Independent Search", Search in Spontaneous Conversational Speech Workshop, SIGIR 2008, Singapore
- A. Geven, M. Tscheligi, A. Sorin and H. Aronowitz, "Presenting a speech-based mobile reminder system", SiMPE 2008, Sept. 2008, Amsterdam, Netherlands.
- V. Mylonakis, J. Soldatos, A. Pnevmatikakis, L. Polymenakos, A. Sorin and H. Aronowitz, "Using Robust Audio and Video Processing Technologies to Alleviate the Elderly Cognitive Decline", PETRA 2008, July 2008, Athens, Greece.
- B. Sznajder, J. Mamou, Y. Mass, and M. Shmueli-Scheuer "Metric inverted - an efficient inverted indexing method for metric spaces" in Proc. Efficiency Issues in Information Retrieval Workshop, ECIR 2008
- W. Allasia, F. Falchi, F. Gallo, M. Kacimi, A. Kaplan, J. Mamou, Y. Mass and N. Orio, "Audio-visual content analysis in P2P networks: the SAPIR approach", 1st Workshop on Automated Information Extraction in Media Production, AIEMPro'08
- S. Chu, H. Kuo, L. Mangu, Y. Liu , S. Qin, Q. Shi, S. Zhang, H. Aronowitz, "Recent advances in the IBM GALE Mandarin transcription system", in Proc. ICASSP, Apr. 2008, Las Vegas, USA
- H. Aronowitz and D. Burshtein, "Efficient Speaker Recognition Using Approximated Cross Entropy (ACE)", in IEEE Trans. on Audio, Speech & Language Processing, pp. 2033-2043, September 2007
- S. Shechtman, "Maximum Likelihood Dynamic Intonation Model for Concatenative Text-to-Speech Systems", in Proc. 6th ISCA Workshop on Speech Synthesis, Aug. 2007, Bonn, Germany
- J. Mamou, B. Ramabhadran, O. Siohan, "Vocabulary Independent Spoken Term Detection", in Proc. SIGIR, July 2007, Amsterdam, Netherlands
- J. Mamou, Y. Mass, M. Shmueli-Scheuer, B. Sznajder, "A Query Language for Multimedia Content", in Proc. SIGIR 2007 Multimedia workshop, July 2007, Amsterdam, Netherlands
- R. Hoory, Z. Kons and A. Sorin, "The future of text-to-speech on mobile clients", ACM workshop on Speech in Mobile and Pervasive Environments, Sep. 2006, Espoo, Finland.
- Z. Shuang, R. Bakis, S. Shechtman, D. Chazan and Y. Qin, "Frequency warping based on mapping formant parameters", in Proc. ICSLP, Sep. 2006, Pittsburgh PA, USA.
- S. Ben-David, A. Roytman, R. Hoory and Z. Sivan, "Using voice servers for speech analytics", International Conference on Digital Telecommunications (ICDT), Aug. 2006, Cap Esteral, France.
- J. Mamou, D. Carmel and R. Hoory, "Spoken document retrieval from call-center conversations", in Proc. SIGIR, Aug. 2006, Seattle WA, USA.
- D. Chazan, R. Hoory, A. Sagi, S. Shechtman, A. Sorin, Z. Shuang and R. Bakis, "High quality sinusoidal modeling of wideband speech for the purpose of speech synthesis and modification", in Proc. ICASSP, May 2006, Toulouse, France.
- G. Mishne, D. Carmel, A. Roytman and A. Soffer "Automatic analysis of call-center conversations", in Proc. 14th ACM international conference on Information and knowledge management (CIKM), Oct. 2005, Bremen, Germany.
- D. Chazan, R. Hoory, Z. Kons, A. Sagi, S. Shechtman and A. Sorin, "Small footprint concatenative text-to-speech synthesis system using complex spectral envelope modeling", in Proc. Eurospeech, Sep. 2005, Lisbon, Portugal.
- S. Basson, A. Faisman, R. Hoory, D. Kanevsky, M. Picheny, A. Roytman, Z. Sivan and A. Sorin, "Accessibility, Speech Technology, and Human Interventions" AVIOS/SpeechTek 2005.
- A. Sorin, T. Ramabadran, D. Chazan, R Hoory, M. McLaughlin, D. Pearce, F. Wang, Y. Zhang, "The ETSI Extended Distributed Speech Recognition Standards: Client Side Processing and Tonal Language Recognition Evaluation", in Proc. ICASSP, May 2004, Motreal Canada.
- T. Ramabadran, A. Sorin, M. McLaughlin, D. Chazan, D. Pearce, R. Hoory, "The ETSI Extended Distributed Speech Recognition Standards: Server Side Speech Reconstruction", in Proc. ICASSP, May 2004, Motreal Canada.
- K. Y. Kupeev and Z. Sivan, "Selective Enhancement of Contrast Blocks for MPEG/JPEG Image Compression", Visual Communications and Image Processing (VCIP) 2003, Lugano, Switzerland, pp. 1382-1389.
- D. Chazan, R. Hoory, Z. Kons, D. Silberstein and A. Sorin, "Reducingthe footprint of the IBM trainable synthesis system", in Proc.7th Int. Conf. Spoken Language Processing, Sep. 2002, Denver USA ( ICSLP2002).
- K. Y. Kupeev and Z. Sivan, "New shape representation and similarity measure for fast shape indexing", Proceedings of SPIE,"Storage and Retrieval for Media Databases 2002", Vol. 4676, pp. 116-125,San Jose, USA, 2002.
- D. Cohen-Or, Y. Noimark and T. Zvi, "A Server-based Interactive Remote Walkthrough", proceedings of EGMM2001.
- D. Chazan, M. Zibulski, R. Hoory and G. Cohen, "Efficient periodicityextraction based on sine-wave representation and its application to pitch determination of speech signals", in proceedings of EUROSPEECH2001.
- K. Y. Kupeev and Z. Sivan, "An algorithm for efficient segmentation and selection of representative frames in video sequences", Proceedings of SPIE "Storage and Retrieval for Media Databases 2001", Vol. 4315, pp.253-261, Jan. 2001,San Jose USA.
- S. H. Maes, G. Cohen, R. Hoory and D. Chazan, "Conversational networking: conversational protocols for transport, coding and control", in Proc. 6th Int. Conf. Spoken Language Processing, Beijing China,Oct. 2000 (ICSLP-2000 ).
- D. Chazan, G. Cohen, R. Hoory and M. Zibulski, "Low bit rate speechcompression for playback in speech recognition systems", in proceedings of EUSIPCO,Sept. 2000.
- D. Chazan, G. Cohen, R. Hoory and M. Zibulski, "Speech reconstructionfrom mel-frequency cepstral coefficients and pitch frequency", in proceedings of ICASSP,June 2000.
- Z. Sivan, D. Chazan, G. Cohen, R. Hoory, A. Sorin, "Voice in Pervasive Devices - Serving both Human Listeners and Machine Recognizers", PvCC 2000, Yorktown Hights USA.
- A. Amir, D. Ponceleon, B. Blanchard, D. Petkovic, S. Srinivasan and G.Cohen, "Using Audio Time Scale Modification for Video Browsing", in collaboration with IBM Almaden , in Proceedings of HICSS2000. Received best paper award in the digital documents track.
- Z. Sivan, E. D. Karnin, D. Ramm and R. Cohen, "Performance of a Software-Only H.263 Video Encoder on the PowerPC processor" ,19th IEEE conventionin Israel, Jerusalem Israel, November 1996, pp. 395-398.
- R. Hoory, N. Shaked and D. Chazan, "Building a speech database for the purpose of speaker specific speech synthesis", In Proceedings of ICSP 1996, pp. 741--744.
- R. Hoory and D. Chazan, "Speech Synthesis for a specific speaker based on a labeled speech database", In Proceedings of ICPR 1994, pp. C146-148.
- Y. Medan, E. Yair and D. Chazan, "Super resolution pitch determination of speech signals", IEEE Trans. Acouts., Speech and Signal Processing, vol. 39, pp.40-48, Jan. 1991.