An Improvement of Basic Mouth Shape Detection Rate from Japanese Utterance Image Sequence Using Optical Flow
In this paper, we describe an improvement of the method that detects distinctive mouth shapes from Japanese utterance image sequence. Previously, we proposed a detection method of the mouth shapes by using template matching. Two kinds of mouth shapes are formed when we pronounce a Japanese phone. One is a mouth shape that is formed at the beginning of utterance, and the other is formed at the end. The former is called “Beginning Mouth Shape” (BeMS) and the latter is “End Mouth Shape” (EMS). The proposed method was able to detect the mouth shapes. However, the method misdetected in some cases, because the term in which BeMS was formed was short. Therefore we considered that a high-speed camera was able to capture BeMS. According to some experiments, it was able to capture BeMS but another problem occurred. A deformed mouth shape that was changing to another was detected as BeMS. To prevent detecting the mouth shapes, optical flow is adopted. The term in which a mouth is deforming is detected by using optical flow and the mouth shape in the term prevents detecting.We propose a detection method of BeMS and EMS in the Japanese utterance image sequence by using template matching and optical flow.
Unable to display preview. Download preview PDF.
- 1.Farnebäck, G.: Two-Frame Motion Estimation Based on Polynomial Expansion. In: Proceedings of the 13th Scandinavian Conference on Image Analysis, pp. 363–370 (2003)Google Scholar
- 2.Kiyota, K., Uchimura, K.: An Utered Word Recognition Using Lip Image Information. The Transactions of the Institute of Electronics, Information and Communication Engineers J76-D-II(3), 812–814 (1993) (in Japanese)Google Scholar
- 5.Nakata, Y., Ando, M.: Lipreading Method Using Color Extraction Method and Eigenspace Technique. The Transactions of the Institute of Electronics, Information and Communication Engineers J85-D-II(12), 1813–1822 (2002) (in Japanese)Google Scholar
- 6.Okumura, A., Hamaguchi, Y., Okano, K., Miyazaki, T.: Speech Recognition Based on Integration of Visual and Auditory Information. Transactions of Information Processing Society of Japan 39(12), 3232–3241 (1998) (in Japanese) Google Scholar
- 7.Otsu, N.: An Automatic Threshold Selection Method Based on Discriminant and Least Squares Criteria. The Transactions of the Institute of Electronics, Information and Communication Engineers J63-D(4), 349–356 (1980) (in Japanese)Google Scholar
- 8.Saitoh, T., Konishi, R.: Lip Reading Based on Trajectory Feature. The IEICE Transactions on Information and Systems (Japanese edition) J90-D(4), 1105–1114 (2007) (in Japanese)Google Scholar
- 9.Uda, K., Tagawa, N., Minagawa, A., Moriya, T.: Effectiveness Evaluation of Word Characteristics Obtained from 3-D Image Information for Lipreading. In: Proceedings of the 11th International Conference on Image Analysis and Processing (ICIAP 2001), pp. 296–301 (2001)Google Scholar
- 10.Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 511–518 (2001)Google Scholar