A Multi-modal Data-Set for Systematic Analyses of Linguistic Ambiguities in Situated Contexts

  • Özge AlaçamEmail author
  • Tobias Staron
  • Wolfgang Menzel
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 795)


Human situated language processing involves the interaction of linguistic and visual processing and this cross-modal integration helps to resolve ambiguities and predict what will be revealed next in an unfolding sentence during spoken communication. However, most state-of-the-art parsing approaches rely solely on the language modality. This paper aims to introduce a multi-modal data-set addressing challenging linguistic structures and visual complexities, which state-of-the-art parsers should be able to deal with. It also briefly addresses the multi-modal parsing approach and a proof-of-concept study that shows the contribution of employing visual information during disambiguation.


Linguistic Ambiguity Sentence Unfolds Relative Clause Attachment Scope Ambiguities Relative Pronoun 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This research was funded by the German Research Foundation (DFG) in project ‘Crossmodal Learning’, TRR-169.


  1. 1.
    Tanenhaus, M.K., Spivey-Knowlton, M.J., Eberhard, K.M., Sedivy, J.C.: Integration of visual and linguistic information in spoken language comprehension. Science 268(5217), 1632 (1995)CrossRefGoogle Scholar
  2. 2.
    Altmann, G.T., Kamide, Y.: Incremental interpretation at verbs: restricting the domain of subsequent reference. Cognition 73(3), 247–264 (1999)CrossRefGoogle Scholar
  3. 3.
    Knoeferle, P.S.: The role of visual scenes in spoken language comprehension: evidence from eye-tracking. Ph.D. thesis, Universitätsbibliothek (2005)Google Scholar
  4. 4.
    Ferreira, F., Foucart, A., Engelhardt, P.E.: Language processing in the visual world: effects of preview, visual complexity, and prediction. J. Mem. Lang. 69(3), 165–182 (2013)CrossRefGoogle Scholar
  5. 5.
    McRae, K., Hare, M., Ferretti, T., Elman, J.L.: Activating verbs from typical agents, patients, instruments, and locations via event schemas. In: Proceedings of the Twenty-Third Annual Conference of the Cognitive Science Society, Erlbaum Mahwah, NJ, pp. 617–622 (2001)Google Scholar
  6. 6.
    Van Berkum, J.J.A., Brown, C.M., Zwitserlood, P., Kooijman, V., Hagoort, P.: Anticipating upcoming words in discourse: evidence from ERPs and reading times. J. Exp. Psychol. Learn. Mem. Cogn. 31(3), 443 (2005)CrossRefGoogle Scholar
  7. 7.
    Coco, M.I., Keller, F.: The interaction of visual and linguistic saliency during syntactic ambiguity resolution. Q. J. Exp. Psychol. 68(1), 46–74 (2015)CrossRefGoogle Scholar
  8. 8.
    Berzak, Y., Barbu, A., Harari, D., Katz, B., Ullman, S.: Do you see what I mean? Visual resolution of linguistic ambiguities. arXiv preprint arXiv:1603.08079 (2016)
  9. 9.
    McCrae, P.: A computational model for the influence of cross-modal context upon syntactic parsing (2010)Google Scholar
  10. 10.
    Mayberry, M.R., Crocker, M.W., Knoeferle, P.: A connectionist model of the coordinated interplay of scene, utterance, and world knowledge. In: Proceedings of the 28th Annual Conference of the Cognitive Science Society, pp. 567–572 (2006)Google Scholar
  11. 11.
    McCrae, P.: A model for the cross-modal influence of visual context upon language processing. In: Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP 2009), Borovets, Bulgaria, pp. 230–235 (2009)Google Scholar
  12. 12.
    Baumgärtner, C., Beuck, N., Menzel, W.: An architecture for incremental information fusion of cross-modal representations. In: IEEE Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), Hamburg, Germany, pp. 498–503. IEEE (2012)Google Scholar
  13. 13.
    Beuck, N., Köhn, A., Menzel, W.: Incremental parsing and the evaluation of partial dependency analyses. In: DepLing 2011, Proceedings of the 1st International Conference on Dependency Linguistics (2011)Google Scholar
  14. 14.
    Beuck, N., Köhn, A., Menzel, W.: Predictive incremental parsing and its evaluation. In: Computational Dependency Theory. Frontiers in Artificial Intelligence and Applications, vol. 258, pp. 186–206. IOS Press (2013)Google Scholar
  15. 15.
    Camerini, P.M., Fratta, L., Maffioli, F.: The k best spanning arborescences of a network. Networks 10(2), 91–109 (1980)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Charniak, E., Johnson, M.: Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 173–180. Association for Computational Linguistics, June 2005Google Scholar
  17. 17.
    Salama, A.R., Menzel, W.: Multimodal graph-based dependency parsing of natural language. In: Hassanien, A.E., Shaalan, K., Gaber, T., Azar, A.T., Tolba, M.F. (eds.) AISI 2016. AISC, vol. 533, pp. 22–31. Springer, Cham (2017). Scholar
  18. 18.
    Zhang, Y., Lei, T., Barzilay, R., Jaakkola T., Globerson, A.: Steps to excellence: simple inference with refined scoring of dependency trees. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, Maryland, pp. 197–207. Association for Computational Linguistics (2014)Google Scholar
  19. 19.
    Lei, T., Xin, Y., Zhang, Y., Barzilay, R., Jaakkola, T.: Low-rank tensors for scoring dependency structures. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, Maryland, pp. 1381–1391. Association for Computational Linguistics, June 2014Google Scholar
  20. 20.
    Tarjan, R.E.: Finding optimum branchings. Networks 7(1), 25–35 (1977)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Hall, K.: k-best spanning tree parsing. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pp. 392–399 (2007)Google Scholar
  22. 22.
    Foth, K.A., Köhn, A., Beuck, N., Menzel, W.: Because size does matter: the Hamburg dependency treebank. In: Proceedings of the Language Resources and Evaluation Conference 2014, LREC, European Language Resources Association (ELRA), Reykjavik, Iceland (2014)Google Scholar
  23. 23.
    Schiller, A., Teufel, S., Thielen, C.: Guidelines für das tagging deutscher textcorpora mit STTS. Universität Stuttgart und Universität Tübingen (1995)Google Scholar
  24. 24.
    Martins, A.F.T., Almeida, M.B., Smith, N.A.: Turning on the turbo: fast third-order non-projective turbo parsers. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 617–622 (2013)Google Scholar
  25. 25.
    Staron, T., Alacam, O., Menzel, W.: Incorporating contextual information for language-independent, dynamic disambiguation tasks. In: Proceedings of the 11th Language Resources and Evaluation Conference (LREC) (2018)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of InformaticsUniversity of HamburgHamburgGermany

Personalised recommendations