Mouth Region Localization Method Based on Gaussian Mixture Model

  • Kenichi Kumatani
  • Rainer Stiefelhagen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4153)


This paper presents a new mouth region localization method which uses the Gaussian mixture model (GMM) of feature vectors extracted from mouth region images. The discrete cosine transformation (DCT) and principle component analysis (PCA) based feature vectors are evaluated in mouth localization experiments. The new method is suitable for audio-visual speech recognition. This paper also introduces a new database which is available for audio visual processing. The experimental results show that the proposed system has high accuracy for mouth region localization (more than 95 %) even if the tracking results of preceding frames are unavailable.


Feature Vector Discrete Cosine Transformation Gaussian Mixture Model Automatic Speech Recognition Mouth Region 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Vezhnevets, V., Soldatov, S., Degtiareva, A.: Automatic Extraction of Frontal Facial Features. In: Proc. Asian Conf. on Computer Vision, Jeju, vol. 2, pp. 1020–1025 (2004)Google Scholar
  2. 2.
    Zhu, X., Fan, J., Elmagarmid, A.K.: Towards Facial Feature Extraction and Verification for Omni-face Detection in Video/images. In: Proc. the IEEE Int. Conf. on Image Processing, New York, vol. 2, pp. 113–116 (2002)Google Scholar
  3. 3.
    Tian, Y.-l., Kanade, T., Cohn, J.F.: Lip Tracking by Combining Shape, Color and Motion. In: Proc. Asian Conference on Computer Vision, Taipei, pp. 1040–1045 (2000)Google Scholar
  4. 4.
    Baskan, S., Bulut, M.M., Atalay, V.: Projection based Method for Segmentation of Human Face and its Evaluation. Pattern Recognition Letters 23, 1623–1629 (2002)MATHCrossRefGoogle Scholar
  5. 5.
    Wu, H., Yokoyama, T., Pramadihanto, D., Yachida, M.: Face and Facial Feature Extraction from Color Image. In: Proc. Int. Conf. on Automatic Face and Gesture Recognition, Killington, pp. 345–350 (1996)Google Scholar
  6. 6.
    Barnard, M., Holden, E.-J., Owens, R.: Lip Tracking using Pattern Matching Snakes. In: Proc. Asian Conf. on Computer Vision, Melbourne, pp. 23–25 (2002)Google Scholar
  7. 7.
    Luettin, J.: Visual Speech and Speaker Recognition. PhD thesis, Department of Computer Science, University of Sheffield (1997)Google Scholar
  8. 8.
    Lienhart, R., Liang, L., Kuranov, A.: A Detector Tree of Boosted Classifiers for Real-time Object Detection and Tracking. In: Proc. IEEE Int. Conf. on Multimedia and Expo., Baltimore, pp. 277–280 (2003)Google Scholar
  9. 9.
    Jiang, J., Potamianos, G., Nock, H.J., Iyengar, G., Neti, C.: Improved Face and Feature Finding for Audio-visual Speech Recognition in Visually Challenging Environments. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Montreal, vol. 5, pp. 873–876 (2004)Google Scholar
  10. 10.
    Sung, K.-K., Poggio, T.: Example-based Learning for View-based Face Detection. IEEE Trans. on Pattern Analysis and Machine Intelligence 20, 39–51 (1998)CrossRefGoogle Scholar
  11. 11.
    Potamianos, G., Neti, C., Luettin, J., Matthews, I.: Audio-Visual Automatic Speech Recognition: An Overview. In: Bailly, G., Vatikiotis-Bateson, E., Perrier, P. (eds.) Issues in Visual and Audio-Visual Speech Processing. MIT Press, Cambridge (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Kenichi Kumatani
    • 1
  • Rainer Stiefelhagen
    • 1
  1. 1.Interactive Systems LabsUniversitaet Karlsruhe (TH)KarlsruheGermany

Personalised recommendations