Visual Acoustic Speech Recognition (Computer Lipreading)

In the frame of this project we investigate the utility of using visual information (lip-movements) to improve speech recogntion.

This research was initiated by the Neural Net Speech Group (UKA and CMU) where we showed significant recognition improvement using a visual acoustic MS-TDNN architecture.

At ICSI we specifically focus on conditions where state-of-the-art recognition systems perform poorly, for example in car environments with background noise, or office environments with cross-talk. Using a visual acoustic MLP/HMM architecture developed at the ICSI Speech Recognition Group we showed significant improvement over pure acoustic performance.

Currently we are extending the interactive spontaneous speech system "BeRP" to make usage of the additional visual speech modality.

This research is funded as massive parallel computer vision under the CNS-1 project (a collaboration of the Computer Science Department of the University of California at Berkeley and the International Computer Science Institute) by the Advanced Research Projects Agency ARPA contract N0000 1493 C0249.


More Info:


Lipreading Publications: other publications.


Chris Bregler (bregler@cs.berkeley.edu)