Abstract
A critical assumption of all current visual speech recognition systems is that there are visual speech units called visemes which can be mapped to units of acoustic speech, the phonemes. Despite there being a number of published maps it is infrequent to see the effectiveness of these tested, particularly on visual-only lip-reading (many works use audio-visual speech). Here we examine 120 mappings and consider if any are stable across talkers. We show a method for devising maps based on phoneme confusions from an automated lip-reading system, and we present new mappings that show improvements for individual talkers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Association, I.P.: Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet. Cambridge University Press (1999)
Chen, T., Rao, R.R.: Audio-visual integration in multimodal communication. Proceedings of the IEEE 86, 837–852 (1998)
Fisher, C.G.: Confusions among visually perceived consonants. Journal of Speech, Language and Hearing Research 11, 796 (1968)
Hazen, T.J., Saenko, K., La, C.H., Glass, J.R.: A segment-based audio-visual speech recognizer: Data collection, development, and initial experiments. In: Proceedings of the 6th International Conference on Multimodal Interfaces, ICMI 2004, pp. 235–242. ACM, New York (2004)
Theobald, B.J.: Visual speech synthesis using shape and appearance models. PhD thesis, University of East Anglia (2003)
Binnie, C.A., Jackson, P.L., Montgomery, A.A.: Visual intelligibility of consonants: A lipreading screening test with implications for aural rehabilitation. Journal of Speech and Hearing Disorders 41, 530 (1976)
Franks, J.R., Kimble, J.: The confusion of english consonant clusters in lipreading. Journal of Speech, Language and Hearing Research 15, 474 (1972)
Walden, B.E., Prosek, R.A., Montgomery, A.A., Scherr, C.K., Jones, C.J.: Effects of training on the visual recognition of consonants. Journal of Speech, Language and Hearing Research 20, 130 (1977)
Kricos, P.B., Lesner, S.A.: Differences in visual intelligibility across talkers. The Volta Review (1982)
Owens, E., Blazek, B.: Visemes observed by hearing-impaired and normal-hearing adult viewers. Journal of Speech and Hearing Research 28, 381 (1985)
Cox, S., Harvey, R., Lan, Y., Newman, J., Theobald, B.-J.: The challenge of multispeaker lip-reading. In: International Conference on Auditory-Visual Speech Processing, Citeseer, pp. 179–184 (2008)
Matthews, I., Baker, S.: Active appearance models revisited. International Journal of Computer Vision 60, 135–164 (2004)
Cappelletta, L., Harte, N.: Phoneme-to-viseme mapping for visual speech recognition. In: ICPRAM (2), pp. 322–329 (2012)
Bozkurt, E., Erdem, C., Erzin, E., Erdem, T., Ozkan, M.: Comparison of phoneme and viseme based acoustic units for speech driven realistic lip animation. In: Proc. of Signal Proc. and Communications Applications, pp. 1–4 (2007)
Lander, J.: Read my lips: Facial animation techniques (2014), http://www.gamasutra.com/view/feature/131587/read_my_lips_facial_animation_.php (accessed: January 28, 2014)
Jeffers, J., Barley, M.: Speechreading (lipreading). Thomas Springfield, IL (1971)
Lee, S., Yook, D.: Audio-to-visual conversion using hidden markov models. In: Ishizuka, M., Sattar, A. (eds.) PRICAI 2002. LNCS (LNAI), vol. 2417, pp. 563–570. Springer, Heidelberg (2002)
Montgomery, A.A., Jackson, P.L.: Physical characteristics of the lips underlying vowel lipreading performance. The Journal of the Acoustical Society of America 73, 2134 (1983)
Neti, C., Potamianos, G., Luettin, J., Matthews, I., Glotin, H., Vergyri, D., Sison, J., Mashari, A., Zhou, J.: Audio-visual speech recognition. In: Final Workshop 2000 Report, vol. 764 (2000)
Nitchie, E.B.: Lip-Reading, principles and practise: A handbook for teaching and self-practise. Frederick A Stokes Co., New York (1912)
Finn, K.E., Montgomery, A.A.: Automatic optically-based recognition of speech. Pattern Recognition Letters 8, 159–164 (1988)
Heider, F., Heider, G.M.: An experimental investigation of lipreading. Psychological Monographs 52, 124–153 (1940)
Woodward, M.F., Barber, C.G.: Phoneme perception in lipreading. Journal of Speech, Language and Hearing Research 3, 212 (1960)
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X.A., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchec, V., Woodland, P.: The HTK Book (for HTK Version 3.4). Cambridge University Engineering Department (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Bear, H.L., Harvey, R.W., Theobald, BJ., Lan, Y. (2014). Which Phoneme-to-Viseme Maps Best Improve Visual-Only Computer Lip-Reading?. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2014. Lecture Notes in Computer Science, vol 8888. Springer, Cham. https://doi.org/10.1007/978-3-319-14364-4_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-14364-4_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14363-7
Online ISBN: 978-3-319-14364-4
eBook Packages: Computer ScienceComputer Science (R0)