![]() |
![]() |
![]() |
![]() |
|
| Audio Visual Speech Technologies | |||
|
|
|||
|
|
Audio-Visual Speech Synthesis
Human computer interaction will achieve the naturalness of human to human communication when the impersonal computer interface of today is replaced by the combination of natural input interfaces using speech and gesture with visual agents that deliver the information supplied by the computer. A computer should not only be able to understand the natural language of the person but should respond in the same natural way. Visual speech synthesis can also be used for compensating for lack of auditory information for hearing impaired, movie dubbing, virtual avatars, distance learning and low bandwidth conferencing. Researchers have tried various approaches to convert acoustic speech to visual speech. Approaches include: mapping phonemes to visemes, vector quantization, direct estimation techniques and HMMs. It is still a challenge to design facial animation models that easily control facial expression, gesture and emotion. Researchers have taken two different approaches. One approach in the literature is based on 3-D wire frame models with detail descriptions of motion of facial muscles and articulators like teeth and tongue. The other method relies on image based techniques like key framing and morphing. Presently we are exploring the feasibility of animating a face given an incoming audio stream and pictures of a speaker speaking different visemes and showing different expressions. The viseme and expression set is predecided. To start with the pose of the faces are aligned to correct for some small rotational and translational discrepancies between them. The optical flow is then computed for every transition between the images. For a incoming audio signal the corresponding viseme is identified and the transition from the preceding to the next viseme is done along the optical flow previously computed and stored. Key component technologies:
Team:
Publications: Tanveer A. Faruquie, Chalapathy Neti, Nitendra Rajput, L. Venkata Subramaniam,
Ashish Verma, Translingual visual speech synthesis,
International Conference on Multimedia and Expo, vol II, pp. 1089-1092,
New York, July-August 2000
|
| About IBM | Privacy | Legal | Contact |