PROJECTS
IBM Research Homepage 
 Research Home  >> Audio Visual Speech Technology Group


Audio Visual Speech Technology Group

Humans use a variety of senses to recognize people and understand their communications. Recently we have begun exploring the use of visual information to improve the performance of audio-based technologies such as speech recognition, speaker recognition, speech event detection and speaker change detection. This work is a collaboration, lead by Chalapathy Neti, between the Human Language Technologies Group (HLT), Computer Vision Group and the India Research Lab (ISRC).

The applications for this work include (but are not limited to) accurate audio transcription for efficient search and retrieval of multimedia content Improved human/computer interfaces that use multiple modes for robust recognition of human activity (speech, gesture, etc) in realistic environments like automobiles and public information kiosks, where background noise is a serious problem for recognition technologies based only on acoustics. The navigation bar contains more detailed information about the different areas we are exploring.

People:

  • Chalapathy Neti (HLT)
  • Giridharan Iyengar (HLT)
  • Gerasimos Potamianos (HLT)
  • Sankar Basu (HLT)
  • Andrew Senior (Exploratory Computer Vision)
  • Eric Helmuth (Data Collection, Webmaster)
  • Multimedia Speech Recognition and Synthesis Group at India Research Lab

  • Past Contributors:

  • Benoit Maison (HLT)
  • Mahesh Vishwanathan (HLT)
  • Fereydoun Mali (HLT 1999)
  • Philippe De Cuetos (HLT 1999)
  •  Privacy | Legal | Contact | IBM Home | Research Home | Project List | Research Sites | Page Contact