Classifying Visemes for Automatic Lipreading
Automatic lipreading is automatic speech recognition that uses only visual information. The relevant data in a video signal is isolated and features are extracted from it. From a sequence of feature vectors, where every vector represents one video image, a sequence of higher level semantic elements is formed. These semantic elements are “visemes” the visual equivalent of “phonemes” The developed prototype uses a Time Delayed Neural Network to classify the visemes.
Unable to display preview. Download preview PDF.
- 1.Benoît, Abry, Cathiard, Guiard-Marigny et all.: Read my lips: where? how? when? and so::: What? In: Proceedings of the 8th Int. Congress on Event perception and Action (1995).Google Scholar
- 2.Benoît, Guiard-Marigy, Le Goff, Adjoudani: Which components of the face do humans and machines best speechread? In: Stork, D. (eds.): Speechreading by Man and Machine. Springer-Verlag.Google Scholar
- 3.Lang, K.J., Hinton, G.E.: The development of the time-delay neural network architecture for speech recognition. Techical Report CMU-CS-88-152, Carnegie-Mellon University (1988).Google Scholar
- 4.Sanger, T.D.: Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Networks. 12 459–473.Google Scholar