![]() |
![]() |
![]() |
![]() |
|
| Audio Visual Speech Technologies | |||
|
|
|||
|
|
PublicationsGroup PublicationsJing Huang, Etienne Marcheret, Karthik Visweswariah, Rapid Feature Space Speaker Adaptation For Multi-Stream HMM-Based Audio-Visual Speech Recognition. Proc. International Conference on Multimedia and Expo, Amsterdam, The Netherlands, 2005 G. Potamianos, C. Neti, J. Luettin, and I. Matthews,
Audio-Visual Automatic Speech Recognition: An Overview.
In: Issues in Visual and Audio-Visual Speech Processing,
G. Bailly, E. Vatikiotis-Bateson, and P. Perrier (Eds.),
MIT Press (In Press), 2004. B Ramabhadran, J Huang, U Chaudhari, G Iyengar, HJ Nock, Guess Who's Speaking: Audio Segmentation for the Automated Transcription of Large Spoken Archives To appear in Eurospeech 2003 HJ Nock, G Iyengar, C Neti, Speaker Localisation using Audio-Visual Synchrony: An Empirical Study To appear in CIVR 2003 G Iyengar, HJ Nock, C Neti, Audio-Visual Synchrony for Detection of Monologues in Video Archives, Proc ICASSP 2003 (Presented at ICME 2003) HJ Nock, G Iyengar, C Neti, Issues in Speech-based Retrieval of Video, Proc ISCA Tutorial Workshop (Multilingual Spoken Document Retrieval) 2003 HJ Nock, W Adams, G Iyengar, C-Y Lin, M Naphade, A Natsev, C Neti, JR Smith, B Tseng, User-trainable Video Annotation Using Multimodal CuesTo appear in Proc SIGIR 2003 J.H. Connell, N. Haas, E. Marcheret, C. Neti, G. Potamianos, and S. Velipasalar,
A real-time prototype for small-vocabulary audio-visual ASR,
Proc. Int. Conf. Multimedia Expo.,
vol. II, pp. 469-472, Baltimore, July 2003. U.V. Chaudhari, G.N. Ramaswamy, G. Potamianos, and C. Neti,
Information fusion and decision cascading for audio-visual speaker recognition based on time varying stream reliability prediction,
Proc. Int. Conf. Multimedia Expo.,
vol. III, pp. 9-12, Baltimore, July 2003. A. Garg, G. Potamianos, C. Neti, and T.S. Huang,
Frame-dependent multi-stream reliability indicators for audio-visual speech recognition,
Proc. Int. Conf. Acoust. Speech Signal Process.,
vol. I, pp. 24-27, Hong Kong, Apr. 2003. U.V. Chaudhari, G.N. Ramaswamy, G. Potamianos, and C. Neti,
Audio-visual speaker recognition using time-varying stream reliability prediction,
Proc. Int. Conf. Acoust. Speech Signal Process.,
vol. V, pp. 712-715, Hong Kong, Apr. 2003. D.M.Russel, P.P. Maglio, R. Dordick, C. Neti, Dealing with Ghosts: Managing the User Experience of Autonomic Computing, IBM Systems Journal, Vol.42, No.1, pp.177-188, 2003. W.H. Adams, G. Iyengar, C-Y Lin, M.R. Naphade, C. Neti, H.J. Nock, J.R. Smith, Semantic Indexing of Multimedia Content Using Visual, Audio and Text Cues, Eurasip Journal on Applied Signal Processing Vol 2003, No 2, Feb 2003. G. Potamianos, C. Neti, G. Gravier, and A. Garg,
Automatic Recognition of audio-visual speech: Recent progress and challenges,
Proceedings of the IEEE,
vol. 91, no. 9, Sep. 2003. G. Potamianos, C. Neti, and S. Deligne,
Joint audio-visual speech processing for recognition and enhancement,
Proc. Work. Audio-Visual Speech Process.,
pp. 95-104, St. Jorioz, France, Sep. 2003. J. Huang, G. Potamianos, and C. Neti,
Improving audio-visual speech recognition with an infrared headset,
Proc. Work. Audio-Visual Speech Process.,
pp. 175-178, St. Jorioz, France, Sep. 2003. G. Potamianos and C. Neti,
Audio-visual speech recognition in challenging environments,
Proc. Eur. Conf. Speech Comm. Tech.,
pp. 1293-1296, Geneva, Sep. 2003. Sabine Deligne, Gerasimos Potamianos, Chalapathy Neti, Audio-Visual Speech Enhancement With AVCDCN (Audio-Visual Codebook Dependent Cepstral Normalization), IEEE workshop on Sensor Array and Multichannel Signal Processing in August 2002, Washington DC and ICSLP 2002, Denver. G. Iyengar, H. Nock, C. Neti, M. Franz, Semantic Indexing of Multimedia using Audio, Text and Visual Cues, Proceedings of ICME2002, Lausanne, Switzerland, 2002 G. Gravier, G. Potamianos, and C. Neti, Asynchrony modeling for audio-visual speech recognition, Proc. Human Language Technology Conference, San Diego, 2002. G. Gravier, S. Axelrod, G. Potamianos, and C. Neti, Maximum entropy and MCE based HMM stream weight estimation for audio-visual ASR, Proc. Int. Conf. Acoust. Speech Signal Process., Orlando, 2002. R. Goecke, G. Potamianos, and C. Neti, Noisy audio feature enhancement using audio-visual speech data, Proc. Int. Conf. Acoust. Speech Signal Process., Orlando, 2002. G. Iyengar, C. Neti. A vision-based microphone switch for speech intent detection, Recognition, Analysis and Tracking of Face and Gestures in Real Time Systems (RATFG-RTS) Workshop at ICCV 2001 in Vancouver, 13th July 2001. G. Potamianos, C. Neti, G. Iyengar, and E. Helmuth, Large-vocabulary audio-visual speech recognition by machines and humans, Proc. Eurospeech, Aalborg, 2001.
G. Potamianos and C. Neti, Automatic speechreading of impaired speech, Proc. Work. Audio-Visual Speech Process., Scheelsminde, 2001.
G. Potamianos and C. Neti, Improved ROI and within frame discriminant features for lipreading, Proc. Int. Conf. Image Process., Thessaloniki, 2001.
C. Neti, G. Potamianos, J. Luettin, I. Matthews, H. Glotin, and D. Vergyri, Large-vocabulary audio-visual speech recognition: A summary of the Johns Hopkins Summer 2000 Workshop, Proc. IEEE Work. Multimedia Signal Process., Cannes, 2001.
G. Iyengar and C. Neti, Detection of faces under shadows and lighting variations, Cannes, 2001.
G. Iyengar, G. Potamianos, C. Neti, T. Faruquie, and A. Verma, Robust detection of visual ROI for automatic speechreading, Proc. IEEE Work. Multimedia Signal Process., Cannes, 2001.
I. Matthews, G. Potamianos, C. Neti, and J. Luettin, A comparison of model and transform-based visual features for audio-visual LVCSR, Proc. IEEE Int. Conf. Multimedia Expo., Tokyo, 2001.
G. Potamianos, C. Neti, G. Iyengar, A.W. Senior, and A. Verma, A cascade visual front end for speaker independent automatic speechreading,Int. J. Speech Technology, Vol. 4, pp. 193-208, 2001.
G. Potamianos, J. Luettin, C. Neti. Hierarchical discriminant features for audio-visual LVCSR, ICASSP, Salt Lake City, May 2001.
J. Luettin, G. Potamianos, C. Neti. Asynchronous stream modeling for large-vocabulary audio-visual speech recognition, ICASSP, Salt Lake City, May 2001.
H. Glotin, D. Vergyri, C. Neti, G. Potamianos, J. Luettin. Weighting schemes for audio-visual fusion in speech recognition, ICASSP, Salt Lake City, May 2001.
G. Potamianos, C. Neti. Stream confidence estimation for audio-visual speech recognition, ICSLP, vol III, pp. 746-749, Beijing, October 2000.
C. Neti, G. Iyengar, G. Potamianos, A. Senior, B. Maison. Perceptual interfaces for information interaction: Joint processing of audio and visual information for human-computer interaction, ICSLP, vol III, pp. 11-14, Beijing, October 2000.
C. Neti, G. Potamianos, J. Luettin, I. Matthews, H. Glotin, D. Vergyri, J. Sison, A.Mashari, and J. Zhou, Audio-Visual Speech Recognition, Final Workshop 2000 Report, Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, MD (Oct. 12, 2000).
G. Potamianos, A. Verma, C. Neti, G. Iyengar, S. Basu. A cascade image transform for speaker independent automatic speechreading International Conference on Multimedia and Expo, vol. II, pp. 1097-1100, New York, July-August 2000.
P. de Cuetos, C. Neti, A.W. Senior. Audio-visual intent-to-speak detection for human-computer interaction, ICASSP June 5-9 2000, pp. 1325-1328, Istanbul, Turkey.
G.Iyengar, C.Neti. Speaker change detection using joint audio-visual statistics, RIAO 12-14 April 2000, Paris, France, Dec. 20, 1999.
C.Neti, B.Maison, A.Senior, G.Iyengar, P.deCuetos, S.Basu, A.Verma. Joint proccessing of audio and visual information for multimedia indexing and human-computer interaction, RIAO April 12-14 2000, Paris, France.
Benoit Maison, Chalapathy Neti, Andrew Senior. Audio-visual speaker recognition for video broadcast news: some fusion techniques IEEE Multimedia Signal Processing (MMSP99), Denmark, Sept, 1999.
S. Basu, C. Neti, N. Rajput, A. Senior. L. Subramaniam, A. Verma. Audio-visual large-vocabulary continous speech recognition in the broadcast news domain, IEEE Multimedia Signal Processing Conference (MMSP99), Denmark, Sept, 1999.
Andrew Senior, Chalapathy Neti, Benoit Masion. On the use of visual information for improving audio-based speaker recognition, Audio-Visual Speech processing conference (AVSP99), Santa Cruz, CA, Aug, 1999.
Chalapathy Neti, Stephane Maes, Mark Lucente and Dragutin Petkovic. Knowledge/Smart Spaces, 1999 DARPA/NSF/NIST Workshop on Research issues in Smart Computing Environments, July 1999.
Chalapathy Neti, Andrew Senior. Audio-visual speaker recognition for video broadcast news, DARPA HUB4 Workshop, Washington D.C., March 1999.
Ashish Verma, Tanveer Faruquie, C. Neti, Sankar Basu, Andrew Senior. Late integration in continuous audio-visual speech recognition, ASRU, Colorado, 1999.
A.W.Senior. Recognizing faces in broadcast video. IEEE International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. ICCV 1999.
Jianbo Ma, Chalapathy Neti, Andrew Senior. Pose compensation for bimodal speech recognition. Automatic speech recognition and understanding workshop (ASRU99), Keystone Resort, Colorado, 1999.
S. Basu, E. E. Jan, Mark Lucente and Chalapathy Neti. Beyond Audio-based speech recognition, 1998 NIST/DARPA Workhop on SmartSpaces, Gaithersburg, MD, 1998. Tanveer A. Faruquie, Chalapathy Neti, Nitendra Rajput, L. Venkata Subramaniam, Ashish Verma, Translingual visual speech synthesis, International Conference on Multimedia and Expo, vol II, pp. 1089-1092, New York, July-August 2000
Related PublicationsE. Cosatto, G. Potamianos, H.P. Graf. Audio-visual unit selection for the synthesis of photo-realistic talking-heads, International Conference on Multimedia and Expo, vol. II, pp. 619-622, New York, July-August 2000.
A.W.Senior. Recognizing faces in broadcast video. IEEE International Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems. ICCV 1999.
A.W.Senior. Face and Feature Finding for a Face Reccognition System. Audio and Video based Biometric Person Authentication '99. Washington D.C. March 22-24, 1999.
Eli Yamamoto, Satoshi Nakamura, Kiyohiro. Speech to lip movement synthesis by HMM, AVSP, Rhodes(Greece), 1999.
G.Potamianos, A.Potamianos. Speaker adptation for audio-visual automatic speech recognition. Eurospeech, Budapest vol. 3, pp.1291-1294, 1999
G.Potamianos, H.P.Graf. Linear descriminant analysis for speechreading. IEEE Work. Multimedia Signal Process. Los Angeles, pp. 221-226, 1998
G.Potamianos, H.P.Graf, E.Cosatto. An image transform approach for HMM based automatic lipreading. Int. Conf. Image Process. Chicago, vol. 111 pp. 173-177, 1998
G.Potamianos, H.P.Graf. Discriminative training of HMM stream exponents for audio-visual speech recognition. Int. Conf. Acoust. Speech Signal Process. Seattle, vol. 6, pp. 3733-3736, 1998.
H.P.Graf, E.Cosatto, G.Potamianos. Machine vision of faces and facial features. R.I.E.C. Int. Symp. Design Archit. Inform. Process. Systems Based Brain Inform. Princ. Sendai, pp. 48-53, 1998.
G.Potamianos, E.Cosatto, H.P.Graf, D.B.Roe. Speaker independent audio-visual database for bimodal ASR. Europ. Tutorial Research Work. Audio-Visual Process. Rhodes, pp. 65-68, 1997.
H.P.Graf, E.Cosatto, G.Potamianos. Robust recognition of faces and facial features with a multi-modal system. Int. Conf. Systems Man Cybern. Orlando, pp. 2034-2939, 1997.
Tushan Chen, Ram R. Rao. Audio-Visual integration in multimodal communication, IEEE , 5, May .
|
| About IBM | Privacy | Legal | Contact |