
 |
 |

Audio Visual Speech Technology Group
Humans use a variety of senses to recognize people and understand their
communications. Recently we have begun exploring the use of visual information
to improve the performance of audio-based technologies such as speech
recognition, speaker recognition, speech event detection and speaker change
detection. This work is a collaboration, lead by Chalapathy Neti, between
the Human Language Technologies Group (HLT), Computer Vision Group and
the India Research Lab (ISRC).
The applications for this work include (but are not limited to) accurate
audio transcription for efficient search and retrieval of multimedia content
Improved human/computer interfaces that use multiple modes for robust
recognition of human activity (speech, gesture, etc) in realistic environments
like automobiles and public information kiosks, where background noise
is a serious problem for recognition technologies based only on acoustics.
The navigation bar contains more detailed information about the different
areas we are exploring.
People:
Chalapathy Neti (HLT)
Giridharan Iyengar (HLT)
Gerasimos Potamianos (HLT)
Sankar Basu (HLT)
Andrew Senior (Exploratory Computer Vision)
Eric Helmuth (Data Collection, Webmaster)
Multimedia Speech Recognition and Synthesis Group at India Research Lab
Past Contributors:
Benoit Maison (HLT)
Mahesh Vishwanathan (HLT)
Fereydoun Mali (HLT 1999)
Philippe De Cuetos (HLT 1999)
|
|