The Importance of Visual Context Clues in Multimedia Translation

  • Christopher G. Harris
  • Tao Xu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6941)


As video-sharing websites such as YouTube proliferate, the ability to rapidly translate video clips into multiple languages has become an essential component for enhancing their global reach and impact. Moreover, the ability to provide closed captioning in a variety of languages is paramount to reach a wider variety of viewers. We investigate the importance of visual context clues by comparing transcripts of multimedia clips (which allow transcriptionists to make use of visual context clues in their translations) with their corresponding written transcripts (which do not). Additionally, we contrast translations produced using crowdsourcing workers with those made by professional translators on cost and quality. Finally, we evaluate several genres of multimedia to examine the effects of visual context clues on each and demonstrate the results through heat maps.


Machine Translation Professional Translation Statistical Machine Translation Music Video Visual Context 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Rao, L.: comScore: YouTube Reaches All-Time High of 14.6 Billion Videos Viewed In (May), (retrieved May 5, 2011)
  2. 2.
    Crocker, M.: Computational Psycholinguistics. Kluwer Academic Publishing, Dordrecht (1996)CrossRefGoogle Scholar
  3. 3.
    Grainger, J., Dijkstra, T. (eds.): Visual word recognition: Models and experiments. Computational psycholinguistics: AI and connectionist models of human language processing. Taylor & Francis, London (1996)Google Scholar
  4. 4.
    Johnson-Laird, P.N.: Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness. Cambridge University Press, Cambridge (1983)Google Scholar
  5. 5.
    Chun, M.M.: Contextual cueing of visual attention. Trends in Cognitive Sciences 4, 170–178 (2000)CrossRefGoogle Scholar
  6. 6.
    Torres-Oviedo, G., Bastian, A.J.: Seeing is believing: effects of visual contextual cues on learning and transfer of locomotor adaptation. Neuroscience 30, 17015–17022 (2010)Google Scholar
  7. 7.
    Deubel, H., et al. (eds.): Attention, information processing and eye movement control. Reading as a perceptual process. Elsevier, Oxford (2000)Google Scholar
  8. 8.
    Mueller, G.: Visual contextual cues and listening comprehension: An experiment. Modern Language Journal 64, 335–340 (1980)CrossRefGoogle Scholar
  9. 9.
    Meskill, C.: Listening skills development through multimedia. Journal of Educational Multimedia and Hypermedia 5, 179–201 (1996)Google Scholar
  10. 10.
    Fernald, A., et al. (eds.): Looking while listening: Using eye movements to monitor spoken language comprehension by infants and young children. Developmental Psycholonguistics: On-line methods in children’s language processing. John Benjamins, Amsterdam (2008)Google Scholar
  11. 11.
    Roy, D., Mukherjee, N.: Towards Situated Speech Understanding: Visual Context Priming of Language Models. Computer Speech and Language 19, 227–248 (2005)CrossRefGoogle Scholar
  12. 12.
    Hardison, D.: Visual and auditory input in second-language speech processing. Language Teaching 43, 84–95 (2010)CrossRefGoogle Scholar
  13. 13.
    Cunillera, T., et al.: Speech segmentation is facilitated by visual cues. Quarterly Journal of Experimental Psychology 63, 260–274 (2010)CrossRefGoogle Scholar
  14. 14.
    Long, D.R.: Second language listening comprehension: A schema-theoretic perspective. Modern Language Journal 73 (Spring 1989)Google Scholar
  15. 15.
    Gullberg, M., et al.: Adult Language Learning After Minimal Exposure to an Unknown Natural Language. Language Learning 60, 5–24 (2010)CrossRefGoogle Scholar
  16. 16.
    Kawahara, J.: Auditory-visual contextual cuing effect. Percept. Psychophys 69, 1399–1408 (2007)CrossRefGoogle Scholar
  17. 17.
    Lew, M.S., et al.: Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 2, 1–19 (2006)CrossRefGoogle Scholar
  18. 18.
    Zhang, X., et al.: A visualized communication system using cross-media semantic association. Presented at the 17th International Conference on Advances in Multimedia Modeling - Volume Part II, Taipei, Taiwan (2011)Google Scholar
  19. 19.
    Tung, L.L., Quaddus, M.A.: Cultural differences explaining the differences in results in GSS: implications for the next decade. Decis. Support Syst. 33, 177–199 (2002)CrossRefGoogle Scholar
  20. 20.
    Morita, D., Ishida, T.: Collaborative translation by monolinguals with machine translators. Presented at the 14th International Conference on Intelligent User Interfaces, Sanibel Island, Florida, USA (2009)Google Scholar
  21. 21.
    Bar-Hillel, Y.: A demonstration of the nonfeasibility of fully automatic high quality machine translation. Jerusalem Academic Press, Jerusalem (1964)Google Scholar
  22. 22.
    Madsen, M.: The Limits of Machine Translation, Masters in Information Technology and Cognition, Scandanavian Studies and Linguistics. University of Copenhagen, Copenhagen (2009)Google Scholar
  23. 23.
    Howe, J.: The Rise of Crowdsourcing. Wired (June 2006)Google Scholar
  24. 24.
    Munro, R., et al.: Crowdsourcing and language studies: the new generation of linguistic data. Presented at the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT 2010), pp. 122–130 (2010)Google Scholar
  25. 25.
    Snow, R., et al.: Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. Presented at the Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii (2008) Google Scholar
  26. 26.
    Marge, M., et al.: Using the Amazon Mechanical Turk for transcription of spoken language. In: ICASSP (2010)Google Scholar
  27. 27.
    Novotney, S., Callison-Burch, C.: Cheap, fast and good enough: automatic speech recognition with non-expert transcription. Presented at Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT 2010), pp. 207–215 (2010)Google Scholar
  28. 28.
    Banerjee, S., Lavie, A.: METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. Presented at the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, Michigan (2005)Google Scholar
  29. 29.
    Porter, M.: Snowball: A language for stemming algorithms (2001),
  30. 30.
    Miller, G., Fellbaum, C.: WordNet, (retrieved April 6, 2011)
  31. 31.
    van Rijsbergen, C.: Information Retrieval, 2nd edn. Butterworths, London (1979)zbMATHGoogle Scholar
  32. 32.
    Agarwal, A., Lavie, A.: METEOR, M-BLEU and M-TER: evaluation metrics for high-correlation with human rankings of machine translation output. Presented at the Third Workshop on Statistical Machine Translation, Columbus, Ohio (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Christopher G. Harris
    • 1
  • Tao Xu
    • 2
  1. 1.Informatics ProgramThe University of IowaIowa CityUSA
  2. 2.School of Foreign LanguagesTongji UniversityShanghaiChina

Personalised recommendations