Towards a Robust Speechreading Dialog System

Bregler, Christoph; Omohundro, Stephen M.; Shi, Jianbo; Konig, Yochai

doi:10.1007/978-3-662-13015-5_31

Christoph Bregler^3,4,
Stephen M. Omohundro⁵,
Jianbo Shi³ &
…
Yochai Konig^3,4

Part of the book series: NATO ASI Series ((NATO ASI F,volume 150))

230 Accesses
4 Citations

Abstract

We describe a hybrid speechreading system that is based on a Manifold Learning technique, on Neural Networks, and on Hidden Markov Models. Manifold Learning is a technique for representing and learning smooth nonlinear lowdimensional surfaces embedded in highdimensional abstract feature spaces. The technique is capable of determining the structure of the surface and of finding the closest manifold point to a given query point. We use this technique to learn the “space of lips”. The learned manifold is used for tracking and extracting the lips, for interpolating between frames in an image sequence and for providing features for recognition. The hybrid speechreading system based on this learned lip manifold, connectionist phone classifier, and Hidden Markov Models significantly improves the performance of acoustic speech recognizers in degraded environments. We also present preliminary results on a purely visual lip reader and work in progress on a new face finding technique.

Download to read the full chapter text

Chapter PDF

Experimenting with lipreading for large vocabulary continuous speech recognition

Article 16 July 2018

Karel Paleček

Lip-Reading: Toward Phoneme Recognition Through Lip Kinematics

Lip-Reading Using Pixel-Based and Geometry-Based Features for Multimodal Human–Robot Interfaces

Keywords

Author information

Authors and Affiliations

Computer Science Division, University of California, Berkeley, CA, 94720, USA
Christoph Bregler, Jianbo Shi & Yochai Konig
Int. Computer Science Institute, 1947 Center St, Suite 600, Berkeley, CA, 94704, USA
Christoph Bregler & Yochai Konig
NEC Research Institute, Inc., 4 Independence Way, Princeton, NJ, 08540, USA
Stephen M. Omohundro

Authors

Christoph Bregler
View author publications
You can also search for this author in PubMed Google Scholar
Stephen M. Omohundro
View author publications
You can also search for this author in PubMed Google Scholar
Jianbo Shi
View author publications
You can also search for this author in PubMed Google Scholar
Yochai Konig
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Ricoh California Research Center, 2882 Sand Hill Road #115, 94025-7022, Menlo Park, CA, USA
David G. Stork & Marcus E. Hennecke &
Department of Electrical Engineering, Stanford University, 94305, Stanford, CA, USA
David G. Stork & Marcus E. Hennecke &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bregler, C., Omohundro, S.M., Shi, J., Konig, Y. (1996). Towards a Robust Speechreading Dialog System. In: Stork, D.G., Hennecke, M.E. (eds) Speechreading by Humans and Machines. NATO ASI Series, vol 150. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-13015-5_31

Download citation

DOI: https://doi.org/10.1007/978-3-662-13015-5_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-08252-8
Online ISBN: 978-3-662-13015-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Towards a Robust Speechreading Dialog System

Abstract

Chapter PDF

Similar content being viewed by others

Experimenting with lipreading for large vocabulary continuous speech recognition

Lip-Reading: Toward Phoneme Recognition Through Lip Kinematics

Lip-Reading Using Pixel-Based and Geometry-Based Features for Multimodal Human–Robot Interfaces

Keywords

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Towards a Robust Speechreading Dialog System

Abstract

Chapter PDF

Similar content being viewed by others

Experimenting with lipreading for large vocabulary continuous speech recognition

Lip-Reading: Toward Phoneme Recognition Through Lip Kinematics

Lip-Reading Using Pixel-Based and Geometry-Based Features for Multimodal Human–Robot Interfaces

Keywords

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation