Abstract
Model-based image coding has recently attracted much attention as a basis for the next generation of communication services. This article proposes a model-based image coding for the mouth, which is aimed at capturing visual information related to speech, in order to make the decoded video sequence suitable for lip-reading. Such a coding system is basically composed of an analysis process on the transmitting side, and a synthesis process on the receiving side. On the transmitting side, an encoding technique based on a deformable template of the lips is introduced, which allows representing data about the form of the mouth in a very compact way. On the receiving side, a decoding technique for lip movement synthesis is proposed, one that allows lip animation, starting from a reference image, by applying warping techniques to the proposed model.
Preview
Unable to display preview. Download preview PDF.
References
N. Arad, N. Dyn, D. Reisfeld, and Y. Yeshurun. Image warping by radial basis functions: Application to facial expressions. CVGIP: Graphical Models and Image Processing, 56(2):161–172, 1994.
E. Magno Caldognetto, K. Vagges, P. Cosi, and F. E. Ferrero. La lettura labiale: dati sperimantali e problemi teorici. In IV Convegno Nazionale Informatica, Didattica e Disabilità, 1995.
G. Chow and X. Li. Towards a system for automatic facial feature detection. Pattern Recognition, 26(12):1739–1755, 1993.
T. Coianiz. Tecniche per il riconoscimento visivo del parlato. In Proc. of XXIV Convegno Nazionale dell'Associazione Italiana di Acustica, Trento, Italy, 1996.
T. Coianiz, L. Torresani, and B. Caprile. 2d deformable models for visual speech analysis. In NATO Advanced Study Institute: Speechreading by Man and Machine, pages 391–398, 1995.
P. Cosi, E. Magno Caldognetto, K. Vagges, G. A. Mian, and M. Contolini. Bimodal recognition experiments with recurrent neural networks. In International Conference on acoustics, speech & signal processing, 1994.
B. Dodd and R. Campbell. Hearing bu eye: the psychology of lip-reading. Lawrence Erlbaum Assiciates, Hillsdale, New Jersey, 1987.
K. Finn and A. Montgomery. Automatic optically-based recognition of speech. Pattern Recognition Letters, 8(3): 159–164, 1988.
Alan J. Goldschen, Oscar N. Garcia, and Eric Petajan. Continuous optical automatic speech recognition by lipreading. In 28th Asilomar Conf. on Signals, Systems and Computers, pages 572–577, 1994.
M. E. Hennecke, K. V. Prasad, and D. G. Stork. Using deformable templates to infer visual speech dynamics. In 28th Annual Asilomar Conference on Signals, Systems, and Computers. IEEE, November 1994.
S. Horbelt and J.L. Dugelay. Active contours for lipreading — combining snakes with templates. In 15th Symposium on Signal and Image Processing, pages 18–22, 1995.
C. L. Huang and C. W. Chen. Human facial feature extraction for face interpretation and recognition. Pattern Recognition, 25(12):1435–1444, 1992.
M. Kaneko, A. Koike, and Y. Hatori. Coding of facial image sequence based on a 3-d model of the head and motion detection. Journal of Visual Communication and Image Representation, 2(1):39–54, 1991.
M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active contour models. International Journal of Computer Vision, 15:321–331, 1987.
H. Li, P. Roivainen, and R. Forchheimer. 3-d motion estimation in model-based facial image coding. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(6):545–555, 1993.
D. W. Massaro. Speech Perception by Ear and Eye: A Paradigm for Psychologocal Inquiry. Lawrence Erlbaum Assiciates, Hillsdale, New Jersey, 1987.
D. W. Massaro and M. M. Cohen. Auditory/visual speech in multimodal human interfaces. In International Conference of Spoken Language Processing, 1994.
R. Rao and R. Merserau. Lip modelling for visual speech recognition. Technical report, School of Eletrical Engineering — Georgia Institute of Technology-Atlanta, 1994.
A. Samal and P. Iyengar. Automatic recognition and analysis of human faces and facial expressions: a survey. Pattern Recognition, 25(1):65–77, 1992.
K. Waters and D. Terzopoulos. Modelling and animating faces using scanned data. The Journal of Visaulizzation and Computer Animation, 2:123–128, 1991.
G. Xu, E. Segawa, and S. Tsuji. Robust active contour with insensitive parameters. Pattern Recognition, 27(7):879–884, 1994.
A. L. Yuille, P. W. Hallinan, and D. S. Cohen. Feature extraction from images using deformable templates. International Journal of Computer Vision, 8(2):99–11, 1992.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Coianiz, T., Torresani, L. (1997). Analysis and encoding of lip movements. In: Bigün, J., Chollet, G., Borgefors, G. (eds) Audio- and Video-based Biometric Person Authentication. AVBPA 1997. Lecture Notes in Computer Science, vol 1206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0015979
Download citation
DOI: https://doi.org/10.1007/BFb0015979
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62660-2
Online ISBN: 978-3-540-68425-1
eBook Packages: Springer Book Archive