Photo
Audio Visual Speech Technologies

Chalapathy Neti - Publications

 

Joint processing of audio and visual information 

The key thrust of this work is to understand how visual information could be exploited to improve audio-based processing of speech and speaker in adverse acoustic conditions. This is a first step towards understanding general primciples of combining/fusing multiple sources of information for robust interpretation of human identity, activity and intent (Perceptual computing).

  1. HJ Nock, G Iyengar, C Neti, Speaker Localisation using Audio-Visual Synchrony: An Empirical Study To appear in CIVR 2003
  2. G Iyengar, HJ Nock, C Neti, Audio-Visual Synchrony for Detection of Monologues in Video Archives, Proc ICASSP 2003 (Presented at ICME 2003)
  3. HJ Nock, G Iyengar, C Neti, Issues in Speech-based Retrieval of Video, Proc ISCA Tutorial Workshop (Multilingual Spoken Document Retrieval) 2003
  4. HJ Nock, W Adams, G Iyengar, C-Y Lin, M Naphade, A Natsev, C Neti, JR Smith, B Tseng, User-trainable Video Annotation Using Multimodal Cues To appear in Proc SIGIR 2003
  5. D.M.Russel, P.P. Maglio, R. Dordick, C. Neti, Dealing with Ghosts: Managing the User Experience of Autonomic Computing, IBM Systems Journal, Vol.42, No.1, pp.177-188, 2003.
  6. W.H. Adams, G. Iyengar, C-Y Lin, M.R. Naphade, C. Neti, H.J. Nock, J.R. Smith, Semantic Indexing of Multimedia Content Using Visual, Audio and Text Cues, Eurasip Journal on Applied Signal Processing Vol 2003, No 2, Feb 2003
  7. Sabine Deligne, Gerasimos Potamianos, Chalapathy Neti, Audio-Visual Speech Enhancement With AVCDCN (Audio-Visual Codebook Dependent Cepstral Normalization), IEEE workshop on Sensor Array and Multichannel Signal Processing in August 2002, Washington DC and ICSLP 2002, Denver.
  8. G. Iyengar, H. Nock, C. Neti, M. Franz, Semantic Indexing of Multimedia using Audio, Text and Visual Cues, Proceedings of ICME2002, Lausanne, Switzerland, 2002
  9. G. Gravier, G. Potamianos, and C. Neti, Asynchrony modeling for audio-visual speech recognition, Proc. Human Language Technology Conference, San Diego, 2002.
  10. G. Gravier, S. Axelrod, G. Potamianos, and C. Neti, Maximum entropy and MCE based HMM stream weight estimation for audio-visual ASR, Proc. Int. Conf. Acoust. Speech Signal Process., Orlando, 2002.
  11. R. Goecke, G. Potamianos, and C. Neti, Noisy audio feature enhancement using audio-visual speech data, Proc. Int. Conf. Acoust. Speech Signal Process., Orlando, 2002.
  12. G. Iyengar, C. Neti. A vision-based microphone switch for speech intent detection, Recognition, Analysis and Tracking of Face and Gestures in Real Time Systems (RATFG-RTS) Workshop at ICCV 2001 in Vancouver, 13th July 2001.
  13. G. Potamianos, C. Neti, G. Iyengar, and E. Helmuth, Large-vocabulary audio-visual speech recognition by machines and humans, Proc. Eurospeech, Aalborg, 2001.
  14. G. Potamianos and C. Neti, Automatic speechreading of impaired speech, Proc. Work. Audio-Visual Speech Process., Scheelsminde, 2001.
  15. G. Potamianos and C. Neti, Improved ROI and within frame discriminant features for lipreading, Proc. Int. Conf. Image Process., Thessaloniki, 2001.
  16. C. Neti, G. Potamianos, J. Luettin, I. Matthews, H. Glotin, and D. Vergyri, Large-vocabulary audio-visual speech recognition: A summary of the Johns Hopkins Summer 2000 Workshop, Proc. IEEE Work. Multimedia Signal Process., Cannes, 2001.
  17. G. Iyengar and C. Neti, Detection of faces under shadows and lighting variations, Cannes, 2001.
  18. G. Iyengar, G. Potamianos, C. Neti, T. Faruquie, and A. Verma, Robust detection of visual ROI for automatic speechreading, Proc. IEEE Work. Multimedia Signal Process., Cannes, 2001.
  19. I. Matthews, G. Potamianos, C. Neti, and J. Luettin, A comparison of model and transform-based visual features for audio-visual LVCSR, Proc. IEEE Int. Conf. Multimedia Expo., Tokyo, 2001.
  20. G. Potamianos, C. Neti, G. Iyengar, A.W. Senior, and A. Verma, A cascade visual front end for speaker independent automatic speechreading,Int. J. Speech Technology, Vol. 4, pp. 193-208, 2001.
  21. G. Potamianos, J. Luettin, C. Neti. Hierarchical discriminant features for audio-visual LVCSR, ICASSP, Salt Lake City, May 2001.
  22. J. Luettin, G. Potamianos, C. Neti. Asynchronous stream modeling for large-vocabulary audio-visual speech recognition, ICASSP, Salt Lake City, May 2001.
  23. H. Glotin, D. Vergyri, C. Neti, G. Potamianos, J. Luettin. Weighting schemes for audio-visual fusion in speech recognition, ICASSP, Salt Lake City, May 2001.
  24. G. Potamianos, C. Neti. Stream confidence estimation for audio-visual speech recognition, ICSLP, vol III, pp. 746-749, Beijing, October 2000
  25. C. Neti, G. Iyengar, G. Potamianos, A. Senior, B. Maison. Perceptual interfaces for information interaction: Joint processing of audio and visual information for human-computer interaction, ICSLP, vol III, pp. 11-14, Beijing, October 2000.
  26. C. Neti, G. Potamianos, J. Luettin, I. Matthews, H. Glotin, D. Vergyri, J. Sison, A.Mashari, and J. Zhou, Audio-Visual Speech Recognition, Final Workshop 2000 Report, Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, MD (Oct. 12, 2000).
  27. G. Potamianos, A. Verma, C. Neti, G. Iyengar. A cascade image transform for speaker independent automatic speechreading. International Conference on Multimedia and Expo (ICME00), New York, July 2000.
  28. C.Neti, P.deCuetos A.Senior. Audio-visual intent-to-speak detection for human-computer interaction, ICASSP June 5-9 2000, Istanbul, Turkey.
  29. C.Neti, B.Maison, A.Senior, G.Iyengar, P.deCuetos, S.Basu, A.Verma. Joint proccessing of audio and visual information for multimedia indexing and human-computer interaction, RIAO April 12-14 2000, Paris, France.
  30. G.Iyengar, C.Neti. Speaker change detection using joint audio-visual statistics, RIAO 12-14 April 2000, Paris, France, Dec. 20, 1999.
  31. Ashish Verma, Tanveer Faruquie, C. Neti, Sankar Basu, Andrew Senior. Late Integration in Continuous Audio-Visual Speech Recognition, ASRU, Colorado, 1999.
  32. Benoit Maison, Chalapathy Neti, Andrew Senior. Audio-Visual speaker recognition for video broadcast news: some fusion techniques, IEEE Multimedia Signal Processing (MMSP99), Denmark, Sept, 1999.
  33. S. Basu, C. Neti, N. Rajput, A. Senior. L. Subramaniam, A. Verma. Audio-Visual large-vocabulary continous speech recognition in the broadcast news domain, IEEE Multimedia Signal Processing Conference (MMSP99), Denmark, Sept, 1999.
  34. Andrew Senior, Chalapathy Neti, Benoit Masion. On the use of visual information for improving audio-based speaker recognition, Audio-Visual Speech processing conference (AVSP99), Santa Cruz, CA, Aug, 1999.
  35. Chalapathy Neti, Andrew Senior. Audio-Visual speaker recognition for video broadcast news, DARPA HUB4 Workshop, Washington D.C., March 1999.

Position Papers

  1. Chalapathy Neti, Stephane Maes, Mark Lucente and Dragutin Petkovic. Knowledge/Smart Spaces, 1999 DARPA/NSF/NIST Workshop on Research issues in Smart Computing Environments, July 1999.
  2. S. Basu, E. E. Jan, Mark Lucente and Chalapathy Neti. Beyond Audio-based speech recognition, 1998 NIST/DARPA Workhop on SmartSpaces, GaithersBurg, MD, 1998.

Conversational (Spoken language) systems

The thrust of these papers is to develop an understanding of the design of conversational systems that include speech recognition, natural language understanding and dialog. The first paper is the basis of a prototype for conversational interaction with personal information  and the second is the basis for the bilingual (English and French) ATIS prototype demonstration. Both demonstrations are widely used by the Human Language technology group for customer visits.

  1. G. Ramaswamy, J. Kleindienst, D. Coffman, P. Gopalakrishnan and C. Neti., A Pervasive Conversational Interface for Information Interaction. Eurospeech99, Budapest, Hunagary, 1999.
  2. T.Ward, S. Roukos, C. Neti, M. Epstein, S. Dharanipragada. Towards speech understanding across multiple languages, Proceedings of the International conference on spoken language processing (ICSLP98),  Sydney, Australia, 1998. 

Speech Recognition

The thrust of these papers is to improve spontaneous, speaker and language-indepedent speech recognition performance. These papers are related to algorithms on confidence estimation, accent and language independence and noise-robust speech representations based on mammalian auditory system.

  1. C. Neti, S. Dharanipragada and Salim Roukos. Towards a universal speech recognizer for multiple languages. Automatic Speech Recognition and Understanding Workshop (ASRU97), Santa Barbara, CA, 1997.
  2. C. Neti and Salim Roukos. Phone-specific gender-dependent models for continuous speech recognition, Automatic Speech Recognition and Understanding Workshop (ASRU97), Santa Barbara, CA, 1997. 
  3. C. Neti, E. Eide and Salim Roukos. Word-based confidence measures as a guide for stack search in continuous speech recognition. International Conference on Acoustics Speech and Signal Processing (ICASSP97), Munich, Germany, 1997.
  4. R.Bakis, P.S. Gopalakrishnan, R. Gopinath, F.H. Liu, S. Maes, M. Monkowski, C. Neti, H. Printz, P.S. Rao. Confidence MeasuresProceedings of the Large Vocabulary Continuous Speech Recognition Workshop. April 29, 1996. 
  5. Nagendra Kumar, Chalapathy Neti and Andreas Andreou. Application of Discriminant Analysis to Speech Recognition with Auditory Model Based Features. Proceedings of the Speech Research Symposium XV, Johns Hopkins University, Baltimore, MD, 1995. 
  6. Chalapathy Neti. Neuromorphic Speech processing for noisy environments. Proceedings of the IEEE International conference on Neural Network, Orlando, FL, pp 4425-4430, 1994.
  7.  

Computational Neuroscience and Biology

The thrust of these papers is to develop computational models of human sensory processing and systemic aspects. In particular, I focussed on models of sound localization that are structurally similar to the underlying brain pathways.The concept of fault tolerance and model of sound localization developed in this work is the first work of its kind and is cited by all subsequent work.

  1. Chalapathy Neti, Michael Schneider and Eric Young. Maximally fault-tolerant Neural Networks  IEEE transactions on Neural Networks, vol 3, no 1}, pp 14-23, 1992. 
  2. Chalapathy Neti, Eric Young and Michael Schneider. Neural network models of Sound Localization based on Directional filtering by the Pinna. J. Acoust. Soc. America, Vol 92, No 6}, pp 3140-3156, 1992. 
  3. Michael Schneider, Kristen Farrow and Chalapathy Neti. Low Storage Second-Order learning algorithms. Proceedings of IJCNN, Seattle, pp A-954, 1991.
  4. Chalapathy Neti, Michael Schneider and Eric Young. Maximally fault-tolerant Neural Networks and nonlinear programming . Proceedings of the IJCNN, San Diego, CA, 1990.  
  5. Chalapathy Neti and Eric Young.  A neural network model of sound localization based on spectral cues.  Neuroscience Abstracts, St. Louis, MO, 1990.
  6. K. Campbell, J. Ringo. C. Neti and J. Alexander. Informational analysis of Left Ventricle/Systemic-Arterial interaction  Annals of Biomedical Engineering,  pp 209-231, 1984. 

VLSI design tools

The main focus was to develop sophisticated design tools for function, timing and fault simulation of VLSI chips. 

  1. Chalapathy Neti and David Coelho. Timing-Verification using a General behavioural simulator Proceedings of International conference on Computer Design, Portchester, New York, 1984. 

PATENTS    

  1. D. Coffman, P. Gopalakrishnan, G. Ramaswamy, J. Kleindinst, C. Neti. Method and system for multi-client access to a dialog system, US Patent Number 6,377,913, April 2002.
  2. S. Basu, T. Farooquie, C. Neti, N. Rajput, L. V. Subramaniam and A. Verma. Speech driven lip synthesis using viseme based hidden markov models, US Patent Number 6,366,885, April 2002.
  3. Chalapathy Neti and Salim Roukos. Speech recognition models combining gender-dependent and gender-indepedent phone states using phonetic-context-dependence, US Patent Number. 5,953,701. Issued Sept 14, 1999.
  4. Chalapathy Neti. Method and System for noise-robust speech processing with cochlear filters in an auditory model, US Patent Number 5,768,474, 1998.
  5. Chalapathy Neti. Method and System for adapter configuration in a data processing system, US Patent Number 5,619,701, 1997.

Email: cneti@watson.ibm.com