Skip to main content

Which Phoneme-to-Viseme Maps Best Improve Visual-Only Computer Lip-Reading?

  • Conference paper
Advances in Visual Computing (ISVC 2014)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8888))

Included in the following conference series:

Abstract

A critical assumption of all current visual speech recognition systems is that there are visual speech units called visemes which can be mapped to units of acoustic speech, the phonemes. Despite there being a number of published maps it is infrequent to see the effectiveness of these tested, particularly on visual-only lip-reading (many works use audio-visual speech). Here we examine 120 mappings and consider if any are stable across talkers. We show a method for devising maps based on phoneme confusions from an automated lip-reading system, and we present new mappings that show improvements for individual talkers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Association, I.P.: Handbook of the International Phonetic Association: A guide to the use of the International Phonetic Alphabet. Cambridge University Press (1999)

    Google Scholar 

  2. Chen, T., Rao, R.R.: Audio-visual integration in multimodal communication. Proceedings of the IEEE 86, 837–852 (1998)

    Article  Google Scholar 

  3. Fisher, C.G.: Confusions among visually perceived consonants. Journal of Speech, Language and Hearing Research 11, 796 (1968)

    Article  Google Scholar 

  4. Hazen, T.J., Saenko, K., La, C.H., Glass, J.R.: A segment-based audio-visual speech recognizer: Data collection, development, and initial experiments. In: Proceedings of the 6th International Conference on Multimodal Interfaces, ICMI 2004, pp. 235–242. ACM, New York (2004)

    Google Scholar 

  5. Theobald, B.J.: Visual speech synthesis using shape and appearance models. PhD thesis, University of East Anglia (2003)

    Google Scholar 

  6. Binnie, C.A., Jackson, P.L., Montgomery, A.A.: Visual intelligibility of consonants: A lipreading screening test with implications for aural rehabilitation. Journal of Speech and Hearing Disorders 41, 530 (1976)

    Article  Google Scholar 

  7. Franks, J.R., Kimble, J.: The confusion of english consonant clusters in lipreading. Journal of Speech, Language and Hearing Research 15, 474 (1972)

    Article  Google Scholar 

  8. Walden, B.E., Prosek, R.A., Montgomery, A.A., Scherr, C.K., Jones, C.J.: Effects of training on the visual recognition of consonants. Journal of Speech, Language and Hearing Research 20, 130 (1977)

    Article  Google Scholar 

  9. Kricos, P.B., Lesner, S.A.: Differences in visual intelligibility across talkers. The Volta Review (1982)

    Google Scholar 

  10. Owens, E., Blazek, B.: Visemes observed by hearing-impaired and normal-hearing adult viewers. Journal of Speech and Hearing Research 28, 381 (1985)

    Article  Google Scholar 

  11. Cox, S., Harvey, R., Lan, Y., Newman, J., Theobald, B.-J.: The challenge of multispeaker lip-reading. In: International Conference on Auditory-Visual Speech Processing, Citeseer, pp. 179–184 (2008)

    Google Scholar 

  12. Matthews, I., Baker, S.: Active appearance models revisited. International Journal of Computer Vision 60, 135–164 (2004)

    Article  Google Scholar 

  13. Cappelletta, L., Harte, N.: Phoneme-to-viseme mapping for visual speech recognition. In: ICPRAM (2), pp. 322–329 (2012)

    Google Scholar 

  14. Bozkurt, E., Erdem, C., Erzin, E., Erdem, T., Ozkan, M.: Comparison of phoneme and viseme based acoustic units for speech driven realistic lip animation. In: Proc. of Signal Proc. and Communications Applications, pp. 1–4 (2007)

    Google Scholar 

  15. Lander, J.: Read my lips: Facial animation techniques (2014), http://www.gamasutra.com/view/feature/131587/read_my_lips_facial_animation_.php (accessed: January 28, 2014)

  16. Jeffers, J., Barley, M.: Speechreading (lipreading). Thomas Springfield, IL (1971)

    Google Scholar 

  17. Lee, S., Yook, D.: Audio-to-visual conversion using hidden markov models. In: Ishizuka, M., Sattar, A. (eds.) PRICAI 2002. LNCS (LNAI), vol. 2417, pp. 563–570. Springer, Heidelberg (2002)

    Google Scholar 

  18. Montgomery, A.A., Jackson, P.L.: Physical characteristics of the lips underlying vowel lipreading performance. The Journal of the Acoustical Society of America 73, 2134 (1983)

    Article  Google Scholar 

  19. Neti, C., Potamianos, G., Luettin, J., Matthews, I., Glotin, H., Vergyri, D., Sison, J., Mashari, A., Zhou, J.: Audio-visual speech recognition. In: Final Workshop 2000 Report, vol. 764 (2000)

    Google Scholar 

  20. Nitchie, E.B.: Lip-Reading, principles and practise: A handbook for teaching and self-practise. Frederick A Stokes Co., New York (1912)

    Google Scholar 

  21. Finn, K.E., Montgomery, A.A.: Automatic optically-based recognition of speech. Pattern Recognition Letters 8, 159–164 (1988)

    Article  Google Scholar 

  22. Heider, F., Heider, G.M.: An experimental investigation of lipreading. Psychological Monographs 52, 124–153 (1940)

    Article  Google Scholar 

  23. Woodward, M.F., Barber, C.G.: Phoneme perception in lipreading. Journal of Speech, Language and Hearing Research 3, 212 (1960)

    Article  Google Scholar 

  24. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X.A., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchec, V., Woodland, P.: The HTK Book (for HTK Version 3.4). Cambridge University Engineering Department (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Bear, H.L., Harvey, R.W., Theobald, BJ., Lan, Y. (2014). Which Phoneme-to-Viseme Maps Best Improve Visual-Only Computer Lip-Reading?. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2014. Lecture Notes in Computer Science, vol 8888. Springer, Cham. https://doi.org/10.1007/978-3-319-14364-4_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14364-4_22

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14363-7

  • Online ISBN: 978-3-319-14364-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics