Machine Vision and Applications

, Volume 24, Issue 1, pp 1–18 | Cite as

Lip contour segmentation and tracking compliant with lip-reading application constraints

  • Sébastien Stillittano
  • Vincent Girondel
  • Alice Caplier
Original Paper

Abstract

We propose to use both active contours and parametric models for lip contour extraction and tracking. In the first image, jumping snakes are used to detect outer and inner contour key points. These points initialize a lip parametric model composed of several cubic curves that are appropriate to the mouth deformations. According to a combined luminance and chrominance gradient, the initial model is optimized and precisely locked onto the lip contours. On subsequent images, the segmentation is based on the mouth bounding box and key point tracking. Quantitative and qualitative evaluations show the effectiveness of the algorithm for lip-reading applications.

Keywords

Lip contour Segmentation Tracking Jumping snakes Parametric models Lip-reading 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Neely K.K.: Effect of visual factors on the intelligibility of speech. J. Acoust. Soc. Am. 28, 1275–1277 (1956)CrossRefGoogle Scholar
  2. 2.
    Sumby W.H., Pollack I.: Visual contribution to speech intelligibility in noise. J. Acoust. Soc. Am. 26, 212–215 (1954)CrossRefGoogle Scholar
  3. 3.
    Kass M., Witkin A., Terzopoulos D.: Snakes: active contour models. Int. J. Comput. Vis. 1(4), 321–331 (1987)CrossRefGoogle Scholar
  4. 4.
    Yuille A., Hallinan P., Cohen D.: Features extraction from faces using deformable template. Int. J. Comput. Vis. 8(2), 99–111 (1992)CrossRefGoogle Scholar
  5. 5.
    Shinchi, T., Maeda, Y., Sugahara, K., Konishi, R.: Vowel recognition according to lip shapes by using neural network. In: IEEE International Joint Conference on Neural Networks. Proceedings and IEEE World Congress on Computational Intelligence, vol. 3, pp. 1772–1777 (1998)Google Scholar
  6. 6.
    Sugahara, K., Kishino, M., Konishi, R.: Personal Computer Based Real Time Lipreading System. In: Signal Processing Proceedings, WCCC-ICSP2000, vol. 2, pp. 1341–1346 (2000)Google Scholar
  7. 7.
    Seguier, R., Cladel, N.: Genetic snakes: application on lipreading. In: International Conference on Artificial Neural Networks and Genetic Algorithms, (ICANNGA) (2003)Google Scholar
  8. 8.
    Nakamura, S., Kawamura, T. Sugahara, K.: Vowel recognition system by lipreading method using active contour models and its hardware realization. In: SICE-ICASE International Joint Conference, pp. 1143–1146 (2006)Google Scholar
  9. 9.
    Liew A., Leung S.H., Lau W.H.: Lip contour extraction using a deformable model. Int. Conf. Image Process. 2, 255–258 (2000)Google Scholar
  10. 10.
    Tian, Y., Kanade, T., Cohn, J.: Robust lip tracking by combining shape, color and motion. In: 4th Asian Conference on Computer Vision (2000)Google Scholar
  11. 11.
    Chen, Q.C., Deng, G.H., Wang, X.L., Huang, H.J.: An inner contour based lip moving feature extraction method for chinese speech. In: International Conference on Machine Learning and Cybernetics, pp. 3859–3864 (2006)Google Scholar
  12. 12.
    Werda, S., Mahdi, W., Hamadou, A.B.: Automatic hybrid approach for lip poi localization. In: application for lip-reading system proceedings of the International Conference on Information and Communication Technology and Accessibility’07 (2007)Google Scholar
  13. 13.
    Delmas, P., Eveno, N., Lievin, M.: Towards robust lip tracking. In: International Conference on Pattern Recognition (ICPR’02), vol. 2, pp. 528–531 (2002)Google Scholar
  14. 14.
    Beaumesnil, B., Chaumont, M., Luthon, F.: Lip tracking and MPEG4 animation with feedback control. In: IEEE International Conference On Acoustics, Speech, and Signal Processing, (ICASSP’06) (2006)Google Scholar
  15. 15.
    Eveno N., Caplier A., Coulon P.Y.: Automatic and accurate lip tracking. IEEE Trans. Circuits Syst. Video Technol. 14(5), 706–715 (2004)CrossRefGoogle Scholar
  16. 16.
    Stillittano, S., Caplier, A.: Inner Lip Segmentation by Combining Active Contours and Parametric Models. In: VISAPP’08—International Conference on Computer Vision Theory and Applications, pp. 297–304, Madeira, Portugal (2008)Google Scholar
  17. 17.
    Stillittano, S., Girondel, V., Caplier, A.: Inner and outer lip contour tracking using cubic curve parametric models. In: Proceedings of IEEE International Conference on Image Processing (ICIP’09), pp. 2469–2472 (2009)Google Scholar
  18. 18.
    Wyszecki G., Stiles W.S.: Color Science: Concepts and Methods, Quantitative Data and Formulae, 2nd edn. Wiley, New YorkGoogle Scholar
  19. 19.
    Lievin M., Luthon F.: Nonlinear color space and spatiotemporal MRF for hierarchical segmentation of face features in video. IEEE Trans. Image Process. 13(1), 63–71 (2004)CrossRefGoogle Scholar
  20. 20.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 511–518. ISSN: 1063-6919 (2001)Google Scholar
  21. 21.
    Schneiderman, H., Kanade, T.: A statistical method for 3D object detection applied to faces and cars. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 746–751 (2000)Google Scholar
  22. 22.
    Rowley H., Baluja S., Kanade T.: Neural network-based face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20(1), 23–38 (1998)CrossRefGoogle Scholar
  23. 23.
    Garcia C., Delakis M.: Convolutional face finder: a neural architecture for fast and robust face detection. IEEE Trans. Pattern Anal. Mach. Intell. 26(11), 1408–1423 (2004)CrossRefGoogle Scholar
  24. 24.
    Zhang, L.: Estimation of the mouth features using deformable template. In: International Conference on Image Processing (ICIP’97), vol. 3, pp. 328–331 (1997)Google Scholar
  25. 25.
    Pantic, M., Tomc, M., Rothkrantz, L.J.M.: A hybrid approach to mouth features detection. In: Proceedings of IEEE International Conference Systems, Man and Cybernetics (SMC’01), pp. 1188–1193 (2001)Google Scholar
  26. 26.
    Martinez, A.M., Benavente, R.: The AR face database. CVC Technical Report, No 24 (1998)Google Scholar
  27. 27.
    Wang, S.L., Lau, W.H., Leung, S.H., Yan, H.: A Real-time Automatic Lipreading System. In: ISCAS, IEEE International Symposium on Circuits and Systems, vol. 2, pp 101–104 (2004)Google Scholar
  28. 28.
    Kalman R.E.: A new approach to linear filtering and prediction problems. Trans. ASME J. Basic Eng. 82, 35–45 (1960)CrossRefGoogle Scholar
  29. 29.
    Lucas B.D., Kanade T.: An iterative image registration technique with an application to stereo vision. Proc. IJCAI 81, 674–679 (1981)Google Scholar
  30. 30.
    Kass M., Witkin A., Terzopoulos D.: Snakes: active contour models. Int. Vis. 1(4), 321–331 (1987)CrossRefGoogle Scholar
  31. 31.
    Cornett R.O.: Cued speech. Am. Ann. Deaf 112, 3–13 (1967)Google Scholar
  32. 32.
    Sebastien Stillittano’s page. Research Results [Online]. http://www.lis.inpg.fr/pages_perso/stillittano/ in “Résultats et Démo”
  33. 33.
    Rehman, S.U., Liu, L., Li, H.: Lip localization and performance evaluation. In: Proceedings of IEEE International Conference on Machine Vision (ICMV’07), pp. 29–34 (2007)Google Scholar
  34. 34.
    Wu, Z., Aleksic, P.S., Katsaggelos, A.K.: Lip tracking for MPEG-4 facial animation. In: ICMI, IEEE International Conference on Multimodal Interfaces, pp. 293–298 (2002)Google Scholar
  35. 35.
    Aboutabit, N. Beautemps, D. Clarke, J. Besacier, L.: A HMM recognition of consonant-vowel syllables from lip contours: the cued speech case. In: Proceedings of Interspeech, Antwerp, Belgium (2006)Google Scholar
  36. 36.
    Aboutabit, N., Beautemps, D., Besacier, L.: Vowel classification from lips: the cued speech production case. In: Proceeding of International Seminar on Speech Production (ISSP), pp. 127–134 (2006)Google Scholar
  37. 37.
    Chandrasekaran, C., Trubanova, A., Stillittano, S., Caplier, A., Ghazanfar, A.A.: The natural statistics of audiovisual speech. PLoS Comput. Biol. 5(7). doi: 10.1371/journal.pcbi.1000436 (2009)
  38. 38.
    Cooke M., Barker J., Cunningham S., Shao X.: An audio-visual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am. 120, 2421–2424 (2006)CrossRefGoogle Scholar
  39. 39.
    Vu, S., Caplier, A.: Illumination-robust face recognition using retina modelling. In: Proceedings of IEEE International Conference on Image Processing (ICIP’09), pp. 3289–3292 (2009)Google Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  • Sébastien Stillittano
    • 1
  • Vincent Girondel
    • 2
  • Alice Caplier
    • 2
  1. 1.VesalisClermont-FerrandFrance
  2. 2.Département Images et Signal (DIS)GIPSA-Lab, Domaine UniversitaireGrenoble CedexFrance

Personalised recommendations