Abstract
This paper presents a new mouth region localization method which uses the Gaussian mixture model (GMM) of feature vectors extracted from mouth region images. The discrete cosine transformation (DCT) and principle component analysis (PCA) based feature vectors are evaluated in mouth localization experiments. The new method is suitable for audio-visual speech recognition. This paper also introduces a new database which is available for audio visual processing. The experimental results show that the proposed system has high accuracy for mouth region localization (more than 95 %) even if the tracking results of preceding frames are unavailable.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Vezhnevets, V., Soldatov, S., Degtiareva, A.: Automatic Extraction of Frontal Facial Features. In: Proc. Asian Conf. on Computer Vision, Jeju, vol. 2, pp. 1020–1025 (2004)
Zhu, X., Fan, J., Elmagarmid, A.K.: Towards Facial Feature Extraction and Verification for Omni-face Detection in Video/images. In: Proc. the IEEE Int. Conf. on Image Processing, New York, vol. 2, pp. 113–116 (2002)
Tian, Y.-l., Kanade, T., Cohn, J.F.: Lip Tracking by Combining Shape, Color and Motion. In: Proc. Asian Conference on Computer Vision, Taipei, pp. 1040–1045 (2000)
Baskan, S., Bulut, M.M., Atalay, V.: Projection based Method for Segmentation of Human Face and its Evaluation. Pattern Recognition Letters 23, 1623–1629 (2002)
Wu, H., Yokoyama, T., Pramadihanto, D., Yachida, M.: Face and Facial Feature Extraction from Color Image. In: Proc. Int. Conf. on Automatic Face and Gesture Recognition, Killington, pp. 345–350 (1996)
Barnard, M., Holden, E.-J., Owens, R.: Lip Tracking using Pattern Matching Snakes. In: Proc. Asian Conf. on Computer Vision, Melbourne, pp. 23–25 (2002)
Luettin, J.: Visual Speech and Speaker Recognition. PhD thesis, Department of Computer Science, University of Sheffield (1997)
Lienhart, R., Liang, L., Kuranov, A.: A Detector Tree of Boosted Classifiers for Real-time Object Detection and Tracking. In: Proc. IEEE Int. Conf. on Multimedia and Expo., Baltimore, pp. 277–280 (2003)
Jiang, J., Potamianos, G., Nock, H.J., Iyengar, G., Neti, C.: Improved Face and Feature Finding for Audio-visual Speech Recognition in Visually Challenging Environments. In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Montreal, vol. 5, pp. 873–876 (2004)
Sung, K.-K., Poggio, T.: Example-based Learning for View-based Face Detection. IEEE Trans. on Pattern Analysis and Machine Intelligence 20, 39–51 (1998)
Potamianos, G., Neti, C., Luettin, J., Matthews, I.: Audio-Visual Automatic Speech Recognition: An Overview. In: Bailly, G., Vatikiotis-Bateson, E., Perrier, P. (eds.) Issues in Visual and Audio-Visual Speech Processing. MIT Press, Cambridge (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kumatani, K., Stiefelhagen, R. (2006). Mouth Region Localization Method Based on Gaussian Mixture Model. In: Zheng, N., Jiang, X., Lan, X. (eds) Advances in Machine Vision, Image Processing, and Pattern Analysis. IWICPAS 2006. Lecture Notes in Computer Science, vol 4153. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11821045_12
Download citation
DOI: https://doi.org/10.1007/11821045_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37597-5
Online ISBN: 978-3-540-37598-2
eBook Packages: Computer ScienceComputer Science (R0)