Advertisement

Abstract

Recognizing textual entailment (RTE) is a well-defined task concerning semantic analysis. It is evaluated against manually annotated collection of pairs hypothesis–text. A pair is annotated true if the text entails the hypothesis and false otherwise. Such collection can be used for training or testing a RTE application only if it is large enough.

We present a game which purpose is to collect h–t pairs. It follows a detective story narrative pattern: a brilliant detective and his slower assistant talk about the riddle to reveal the solution to readers. In the game the detective (human player) provides a short story. The assistant (the application) proposes hypotheses the detective judges true, false or non-sense.

Hypothesis generation is a rule-based process but the most likely hypotheses that are offered for annotation are calculated from a language model. During generation individual sentence constituents are rearranged to produce syntactically correct sentences.

The game is intended to collect data in the Czech language. However, the idea can be applied for other languages. The paper concentrates on description of the most interesting modules from a language-independent point of view as well as the game elements.

Keywords

Noun Phrase Natural Language Processing Computational Linguistics Human Player Syntactic Pattern 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    von Ahn, L., Dabbish, L.: Designing games with a purpose. Commun. ACM 51(8), 58–67 (2008), http://doi.acm.org/10.1145/1378704.1378719 Google Scholar
  2. 2.
    von Ahn, L., Kedia, M., Blum, M.: Verbosity: a game for collecting common-sense facts. In: CHI 2006: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 75–78. ACM, New York (2006)Google Scholar
  3. 3.
    Chamberlain, J., Kruschwitz, U., Poesio, M.: Constructing an anaphorically annotated corpus with non-experts: Assessing the quality of collaborative annotations. In: Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources, People’s Web 2009, pp. 57–62. Association for Computational Linguistics, Stroudsburg (2009), http://dl.acm.org/citation.cfm?id=1699765.1699774 CrossRefGoogle Scholar
  4. 4.
    Chklovski, T.: Collecting paraphrase corpora from volunteer contributors. In: Proceedings of the 3rd International Conference on Knowledge Capture, K-CAP 2005, pp. 115–120. ACM, New York (2005), http://doi.acm.org/10.1145/1088622.1088644 Google Scholar
  5. 5.
    Dagan, I., Dolan, B., Magnini, B., Roth, D.: Recognizing textual entailment: Rational, evaluation and approaches. Natural Language Engineering 15(special issue 04), i–xvii (2009), http://dx.doi.org/10.1017/S1351324909990209
  6. 6.
    Dagan, I., Roth, D., Zanzotto, F.M.: Tutorial notes. In: 45th Annual Meeting of the Association of Computational Linguistics. The Association of Computational Linguistics, Prague (2007)Google Scholar
  7. 7.
    Fellbaum, C.: WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press (May 1998); published: HardcoverGoogle Scholar
  8. 8.
    Grác, M.: Rapid Development of Language Resources. Dissertation, Masaryk University in Brno (2013), http://is.muni.cz/th/50728/fi_d/
  9. 9.
    Hlaváčková, D., Horák, A.: VerbaLex – new comprehensive lexicon of verb valencies for Czech. In: Proceedings of the Slovko Conference (2005)Google Scholar
  10. 10.
    Kovář, V., Horák, A., Jakubíček, M.: Syntactic analysis using finite patterns: A new parsing system for Czech. In: Human Language Technology. Challenges for Computer Science and Linguistics, Poznań, Poland, November 6-8, p. 161 (2011); revised Selected Papers Google Scholar
  11. 11.
    Němčík, V.: Saara: Anaphora resolution on free text in Czech. In: Horák, A., Rychlý, P. (eds.) Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2012, pp. 3–8. Tribun EU, Brno (2012)Google Scholar
  12. 12.
    Nevěřilová, Z., Grác, M.: Common sense inference using verb valency frames. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 328–335. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  13. 13.
    Šmerk, P.: Towards Computational Morphological Analysis of Czech. Dissertation, Masaryk University in Brno (2010), http://is.muni.cz/th/3880/fi_d/
  14. 14.
    Vickrey, D., Bronzan, A., Choi, W., Kumar, A., Turner-Maier, J., Wang, A., Koller, D.: Online word games for semantic data collection. In: EMNLP 2008: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 533–542. Association for Computational Linguistics, Morristown (2008)CrossRefGoogle Scholar
  15. 15.
    Vossen, P.: EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Computers and the humanities. Springer (1998)Google Scholar
  16. 16.
    Wang, A., Hoang, C., Kan, M.Y.: Perspectives on crowdsourcing annotations for natural language processing. Language Resources and Evaluation 47(1), 9–31 (2013), http://dx.doi.org/10.1007/s10579-012-9176-1 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Zuzana Nevěřilová
    • 1
  1. 1.Natural Language Processing Centre, Faculty of InformaticsMasaryk UniversityBrnoCzech Republic

Personalised recommendations