Analysis and encoding of lip movements

Coianiz, T.; Torresani, L.

doi:10.1007/BFb0015979

T. Coianiz¹ &
L. Torresani^1,2

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1206))

Included in the following conference series:

International Conference on Audio- and Video-Based Biometric Person Authentication

2421 Accesses
1 Citations

Abstract

Model-based image coding has recently attracted much attention as a basis for the next generation of communication services. This article proposes a model-based image coding for the mouth, which is aimed at capturing visual information related to speech, in order to make the decoded video sequence suitable for lip-reading. Such a coding system is basically composed of an analysis process on the transmitting side, and a synthesis process on the receiving side. On the transmitting side, an encoding technique based on a deformable template of the lips is introduced, which allows representing data about the form of the mouth in a very compact way. On the receiving side, a decoding technique for lip movement synthesis is proposed, one that allows lip animation, starting from a reference image, by applying warping techniques to the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

N. Arad, N. Dyn, D. Reisfeld, and Y. Yeshurun. Image warping by radial basis functions: Application to facial expressions. CVGIP: Graphical Models and Image Processing, 56(2):161–172, 1994.
Google Scholar
E. Magno Caldognetto, K. Vagges, P. Cosi, and F. E. Ferrero. La lettura labiale: dati sperimantali e problemi teorici. In IV Convegno Nazionale Informatica, Didattica e Disabilità, 1995.
Google Scholar
G. Chow and X. Li. Towards a system for automatic facial feature detection. Pattern Recognition, 26(12):1739–1755, 1993.
Google Scholar
T. Coianiz. Tecniche per il riconoscimento visivo del parlato. In Proc. of XXIV Convegno Nazionale dell'Associazione Italiana di Acustica, Trento, Italy, 1996.
Google Scholar
T. Coianiz, L. Torresani, and B. Caprile. 2d deformable models for visual speech analysis. In NATO Advanced Study Institute: Speechreading by Man and Machine, pages 391–398, 1995.
Google Scholar
P. Cosi, E. Magno Caldognetto, K. Vagges, G. A. Mian, and M. Contolini. Bimodal recognition experiments with recurrent neural networks. In International Conference on acoustics, speech & signal processing, 1994.
Google Scholar
B. Dodd and R. Campbell. Hearing bu eye: the psychology of lip-reading. Lawrence Erlbaum Assiciates, Hillsdale, New Jersey, 1987.
Google Scholar
K. Finn and A. Montgomery. Automatic optically-based recognition of speech. Pattern Recognition Letters, 8(3): 159–164, 1988.
Google Scholar
Alan J. Goldschen, Oscar N. Garcia, and Eric Petajan. Continuous optical automatic speech recognition by lipreading. In 28th Asilomar Conf. on Signals, Systems and Computers, pages 572–577, 1994.
Google Scholar
M. E. Hennecke, K. V. Prasad, and D. G. Stork. Using deformable templates to infer visual speech dynamics. In 28th Annual Asilomar Conference on Signals, Systems, and Computers. IEEE, November 1994.
Google Scholar
S. Horbelt and J.L. Dugelay. Active contours for lipreading — combining snakes with templates. In 15th Symposium on Signal and Image Processing, pages 18–22, 1995.
Google Scholar
C. L. Huang and C. W. Chen. Human facial feature extraction for face interpretation and recognition. Pattern Recognition, 25(12):1435–1444, 1992.
Google Scholar
M. Kaneko, A. Koike, and Y. Hatori. Coding of facial image sequence based on a 3-d model of the head and motion detection. Journal of Visual Communication and Image Representation, 2(1):39–54, 1991.
Google Scholar
M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active contour models. International Journal of Computer Vision, 15:321–331, 1987.
Google Scholar
H. Li, P. Roivainen, and R. Forchheimer. 3-d motion estimation in model-based facial image coding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(6):545–555, 1993.
Google Scholar
D. W. Massaro. Speech Perception by Ear and Eye: A Paradigm for Psychologocal Inquiry. Lawrence Erlbaum Assiciates, Hillsdale, New Jersey, 1987.
Google Scholar
D. W. Massaro and M. M. Cohen. Auditory/visual speech in multimodal human interfaces. In International Conference of Spoken Language Processing, 1994.
Google Scholar
R. Rao and R. Merserau. Lip modelling for visual speech recognition. Technical report, School of Eletrical Engineering — Georgia Institute of Technology-Atlanta, 1994.
Google Scholar
A. Samal and P. Iyengar. Automatic recognition and analysis of human faces and facial expressions: a survey. Pattern Recognition, 25(1):65–77, 1992.
Google Scholar
K. Waters and D. Terzopoulos. Modelling and animating faces using scanned data. The Journal of Visaulizzation and Computer Animation, 2:123–128, 1991.
Google Scholar
G. Xu, E. Segawa, and S. Tsuji. Robust active contour with insensitive parameters. Pattern Recognition, 27(7):879–884, 1994.
Google Scholar
A. L. Yuille, P. W. Hallinan, and D. S. Cohen. Feature extraction from images using deformable templates. International Journal of Computer Vision, 8(2):99–11, 1992.
Google Scholar

Download references

Author information

Authors and Affiliations

IRST-Istituto per la Ricerca Scientifica e Tecnologica, I-38050, Povo, Trento, Italy
T. Coianiz & L. Torresani
Università degli Studi di Milano, Milano, Italy
L. Torresani

Authors

T. Coianiz
View author publications
You can also search for this author in PubMed Google Scholar
L. Torresani
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Josef Bigün Gérard Chollet Gunilla Borgefors

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Coianiz, T., Torresani, L. (1997). Analysis and encoding of lip movements. In: Bigün, J., Chollet, G., Borgefors, G. (eds) Audio- and Video-based Biometric Person Authentication. AVBPA 1997. Lecture Notes in Computer Science, vol 1206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0015979

Download citation

DOI: https://doi.org/10.1007/BFb0015979
Published: 10 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62660-2
Online ISBN: 978-3-540-68425-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics