![]() |
![]() |
![]() |
![]() |
|
| Audio Visual Speech Technologies | |||
|
|
|||
|
|
Audio Visual Speech Event Detection
Speech recognition systems have opened the way towards an intuitive and natural human-computer interaction (HCI). However, current HCI systems using speech recognition require a human to explicitly indicate one's intent to speak by turning on a microphone using the keyboard or mouse. One of the key aspects of naturalness of speech communication involves the ability of humans to detect an intent to speak. Humans detect an intent to speak by a combination of visual and auditory cues. Visual cues include physical proximity, eye contact and lip movement, etc. Automatic detection of speech onset for open-microphone solutions can be carried out using silence/speech detection. However, purely audio-based techniques suffer from sensitivity to background noise. We are exploring the use of the combination of visual cues and audio cues to provide robust indicators of speech intent and speech onset/offset. Our current approach uses the following visual cues: User proximity to the computer, frontality of pose and visual speech activity. These cues will be combined with audio cues based on speech/silence detection. Research Areas:
Papers:
Demo:
|
| About IBM | Privacy | Legal | Contact |