Visual Acoustic Speech Recognition (Computer Lipreading)
In the frame of this project we investigate the utility of using visual
information (lip-movements) to improve speech recogntion.
This research was initiated by the
Neural Net Speech Group (UKA and CMU) where we showed
significant recognition improvement using a
visual acoustic MS-TDNN architecture.
At ICSI
we specifically focus on conditions where state-of-the-art recognition
systems perform poorly, for example in car environments with background noise,
or office environments with cross-talk.
Using a visual acoustic MLP/HMM architecture developed at the ICSI
Speech Recognition Group
we showed significant improvement over pure acoustic performance.
Currently we are extending the interactive spontaneous speech system
"BeRP"
to make usage of the additional visual speech modality.
This research is funded as massive parallel computer vision under the
CNS-1
project (a collaboration of the
Computer Science Department
of the
University of California at Berkeley and the
International Computer
Science Institute) by the
Advanced Research Projects Agency
ARPA contract N0000 1493 C0249.
More Info:
Lipreading Publications:
-
C.Bregler, H.Hild, S.Manke, and A.Waibel,
Improving Connected Letter Recognition by Lipreading,
in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing,
Minneapolis, 1993.
(gzip-postscript)
- C.Bregler, S.Manke, H.Hild, and A.Waibel,
Bimodal Sensor Integration on the Example of "Speachreading",
Proc. of IEEE Int. Conf. on Neural Networks, San Francisco, 1993.
- C.Bregler, S.Omohundro,
Surface Learning with Applications to Lipreading,
in Cowan, J.D., Tesauro, G., and Alspector, J. (eds.),
Advances in Neural Informantion Precessing Systems 6. San Francisco,
CA: Morgan Kaufmann Publishers, 1994.
(gzip-postscript)
-
C.Bregler, Y.Konig, "Eigenlips" for Robust Speech Recognition in,
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing,
Adelaide, Australia, 1994.
(gzip-postscript)
- C.Bregler, S.Omohundro, Y.Konig,
A Hybrid Approach to Bimodal Speech Recognition,
(invited talk) in
Proc. of 28th Annual Asilomar Conf. on Signals, Systems,
and Computers, Pacific Grove, CA 1994.
- C.Bregler, S.Omohundro,
Nonlinear Image Interpolation using Manifold Learning,
in Advances in Neural Information Processing Systems 7, 1995
- C.Bregler, S.Omohundro,
Nonlinear Manifold Learning for Visual Speech Recognition,
to appear at Int. Conf. Computer Vision, M.I.T. 1995
other publications.
Chris Bregler (bregler@cs.berkeley.edu)