
 |
 |

Audio Visual Speech Technology Group
Humans use a variety of senses to recognize people and understand their
communications. Recently we have begun exploring the use of visual information
to improve the performance of audio-based technologies such as speech
recognition, speaker recognition, speech event detection and speaker change
detection. This work is a collaboration, lead by Chalapathy Neti, between
the Human Language Technologies Group (HLT), Computer Vision Group and
the India Research Lab (ISRC).
The applications for this work include (but are not limited to) accurate
audio transcription for efficient search and retrieval of multimedia content
Improved human/computer interfaces that use multiple modes for robust
recognition of human activity (speech, gesture, etc) in realistic environments
like automobiles and public information kiosks, where background noise
is a serious problem for recognition technologies based only on acoustics.
The navigation bar contains more detailed information about the different
areas we are exploring.
Related Websites:
Johns Hopkins Audio Visual Speech Recognition (Workshop 2000)
Audio-Visual Speech Processing (AVSP99)
1999 International Workshop on Multimedia Signal Processing (MMSP99)
Inter-Agency Workshop on Smart Computing Environments
IEEE International Conference on Multimedia and Expo (ICME2000)
|
|