![]() |
![]() |
![]() |
![]() |
|
| Audio Visual Speech Technologies | |||
|
|
|||
|
|
General OverviewOur ability to organize and intelligently combine sensory data derived from multiple sensors (modulated by perceptual relevance and sensory confidence) is crucial for building a robust model of objects and events in our environment, in spite of dramatically varying perceptual conditions. Our group's goal is to exploit this human perceptual principle of sensory integration (currently, the joint use of audio and visual information) to improve the recognition of human activity (e.g. speech recognition, speech activity, speaker change, etc.), intent (e.g. speech intent) and identity (e.g: speaker recognition), particularly in the presence of acoustic degradation due to noise and channel, and the analysis and mining of multimedia content. The applications for this work include (but are not limited to) accurate
transcription of human activity for improved human information interfaces,
multimedia content mining and meeting transcription. The links provided in the left side menu
contain more detailed information about the different areas we are exploring.
The project is currently being managed by Roberto Sicconi. Members:
Collaborators: Summer Interns:
Past Contributors:
UPDATES: Final Workshop 2000 Report, JHU
|
| About IBM | Privacy | Legal | Contact |