Skip to main content
Click to return to IBM ECVG home
 

On the use of visual information for improving audio-based speaker recognition
A. Senior, C. Neti and B. Maison
In proceedings of Audio Visual Speech Processing, Santa Cruz, California, 7-9 August 1999.

Audio­based speaker identification degrades severely when there is a mismatch between training and test conditions either due to channel or noise. In this paper, we explore various techniques to fuse video based speaker identification with audio­based speaker identification to improve the performance under mismatched conditions. Specifically, we explore techniques to optimally determine the relative weights of the indepen­ dent decisions based on audio and video to achieve the best combination. Experiments on video broadcast news data suggest that significant im­ provements can be achieved by the combination in acoustically degraded conditions.

PDF version


  Privacy | Legal | Contact | IBM Home | Research Home | Project List | Research Sites | Page Contact