Advertisement

Topic–Focus Articulation: A Third Pillar of Automatic Evaluation of Text Coherence

  • Michal Novák
  • Jiří Mírovský
  • Kateřina Rysová
  • Magdaléna Rysová
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11289)

Abstract

We present a feature-rich system for automatic evaluation of surface text coherence in Czech essays written by native and non-native speakers. The EVALD system, in addition to basic features covering spelling, vocabulary, morphology and syntax, stands on two main pillars representing the features closely related to the phenomenon of surface coherence: discourse relations and coreference. Newly we add a third pillar, features targeting topic–focus articulation (sentence information structure). Therefore, we propose and implement a procedure for disclosing topic–focus articulation by marking contextual boundness in the text automatically. The experiments show that EVALD enriched with topic–focus articulation features succeeds in outperforming the original system. Further experiments show that the system for essays written by non-native speakers exhibits different signs in terms of importance of individual feature sets and the size of the training data than the system for native speakers.

References

  1. 1.
    Boyd, A., et al.: The MERLIN corpus: learner language and the CEFR. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), pp. 1281–1288. European Language Resources Association, Reykjavík (2014)Google Scholar
  2. 2.
    Feng, V.W., Lin, Z., Hirst, G.: The impact of deep hierarchical discourse structures in the evaluation of text coherence. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 940–949 (2014)Google Scholar
  3. 3.
    Hajičová, E., Havelka, J., Veselá, K.: Corpus evidence of contextual boundness and focus. In: Danielsson, P. (ed.) Proceedings of the Corpus Linguistics Conference Series. vol. 1, pp. 1–9. University of Birmingham, Birmingham (2005)Google Scholar
  4. 4.
    Hajičová, E., Sgall, P., Partee, B.: Topic-Focus Articulation, Tripartite Structures, and Semantic Content. Kluwer, Dordrecht (1998). ISBN 0-7923-5289-0CrossRefGoogle Scholar
  5. 5.
    Hajičová, E., Mírovský, J.: Topic/focus vs. given/new: information structure and coreference relations in an annotated corpus. In: 51st Annual Meeting of the Societas Linguistica Europaea, Book of Abstracts, Tallinn, Estonia (in press)Google Scholar
  6. 6.
    Hajič, J., et al.: Prague Dependency Treebank 3.5. LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University (2018). http://hdl.handle.net/11234/1-2621
  7. 7.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The Weka data mining software: an update. ACM SIGKDD explor. newslett. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  8. 8.
    Hancke, J., Meurers, D.: Exploring CEFR classification for German based on rich linguistic modeling. Learner Corpus Research, pp. 54–56 (2013)Google Scholar
  9. 9.
    Joshi, A.K., Weinstein, S.: Control of inference: role of some aspects of discourse structure-centering. In: IJCAI, pp. 385–387 (1981)Google Scholar
  10. 10.
    Lin, Z., Ng, H.T., Kan, M.Y.: Automatically evaluating text coherence using discourse relations. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 997–1006. Association for Computational Linguistics (2011)Google Scholar
  11. 11.
    Mann, W.C., Thompson, S.A.: Rhetorical structure theory: toward a functional theory of text organization. Text-Interdisc. J. Study Discourse 8(3), 243–281 (1988)CrossRefGoogle Scholar
  12. 12.
    Miltsakaki, E., Kukich, K.: Evaluation of text coherence for electronic essay scoring systems. Nat. Lang. Eng. 10(1), 25–55 (2004)CrossRefGoogle Scholar
  13. 13.
    Mírovský, J., Rysová, K., Rysová, M., Hajičová, E.: (Pre-)annotation of topic-focus articulation in Prague Czech-English Dependency Treebank. In: Proceedings of the 6th International Joint Conference on Natural Language Processing, pp. 55–63. Asian Federation of Natural Language Processing, Nagoya (2013)Google Scholar
  14. 14.
    Novák, M., Rysová, K., Mírovský, J., Rysová, M., Hajičová, E.: EVALD 2.0, data/software. ÚFAL MFF UK, Prague, Czechia (2017)Google Scholar
  15. 15.
    Novák, M., Rysová, K., Mírovský, J., Rysová, M., Hajičová, E.: EVALD 2.0 for Foreigners, data/software. ÚFAL MFF UK, Prague, Czechia (2017)Google Scholar
  16. 16.
    Novák, M., Rysová, K., Rysová, M., Mírovský, J.: Incorporating coreference to automatic evaluation of coherence in essays. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds.) SLSP 2017. LNCS (LNAI), vol. 10583, pp. 58–69. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-68456-7_5CrossRefGoogle Scholar
  17. 17.
    Östling, R., Smolentzov, A., Hinnerich, B.T., Höglin, E.: Automated essay scoring for Swedish. In: Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 42–47 (2013)Google Scholar
  18. 18.
    Panevová, J., Böhmová, A., Hajičová, E., Sgall, P., Ceplová, M., Řezníčková, V.: A manual for tectogrammatical tagging of the Prague Dependency Treebank. Technical report TR-2000-09 (2000)Google Scholar
  19. 19.
    Poláková, L., Mírovský, J., Nedoluzhko, A., Jínová, P., Zikánová, Š., Hajičová, E.: Introducing the Prague Discourse Treebank 1.0. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing, pp. 91–99. Asian Federation of Natural Language Processing, Nagoya (2013)Google Scholar
  20. 20.
    Prasad, R., et al.: The Penn Discourse Treebank 2.0. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), pp. 2961–2968. European Language Resources Association, Marrakech (2008)Google Scholar
  21. 21.
    Rysová, K., Mírovský, J., Hajičová, E.: On an apparent freedom of Czech word order. A case study. In: 14th International Workshop on Treebanks and Linguistic Theories (TLT 2015), pp. 93–105. IPIPAN, Warszawa (2015)Google Scholar
  22. 22.
    Rysová, K., Rysová, M., Mírovský, J.: Automatic evaluation of surface coherence in L2 texts in Czech. In: Proceedings of the 28th Conference on Computational Linguistics and Speech Processing ROCLING XXVIII (2016), pp. 214–228. National Cheng Kung University, The Association for Computational Linguistics and Chinese Language Processing (ACLCLP), Taipei (2016)Google Scholar
  23. 23.
    Rysová, K., Rysová, M., Mírovský, J., Novák, M.: Introducing EVALD - software applications for automatic evaluation of discourse in Czech. In: Angelova, G., Boncheva, K., Mitkov, R., Nikolova, I., Temnikova, I. (eds.) Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 634–641. Bulgarian Academy of Sciences, INCOMA Ltd., Šumen (2017)Google Scholar
  24. 24.
    Šebesta, K., Bedřichová, Z., Šormová, K., et al.: AKCES 5 (CzeSL-SGT), data/software. LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics, Charles University, Prague, Czech Republic (2014)Google Scholar
  25. 25.
    Šebesta, K., Goláňová, H., Letafková, J., et al.: AKCES 1, data/software. LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics, Charles University, Prague, Czech Republic (2016)Google Scholar
  26. 26.
    Sgall, P.: Generativní systémy v lingvistice [Generative systems in linguistics]. Slovo a slovesnost 25(4), 274–282 (1964)Google Scholar
  27. 27.
    Sgall, P.: Generativní popis jazyka a česká deklinace [Generative Description of Language and Czech Declension]. Academia, Prague (1967)Google Scholar
  28. 28.
    Vajjala, S., Loo, K.: Automatic CEFR level prediction for Estonian learner text. In: Proceedings of the Third Workshop on NLP for Computer-Assisted Language Learning, pp. 113–127 (2014)Google Scholar
  29. 29.
    Volodina, E., Pilán, I., Alfter, D.: Classification of Swedish learner essays by CEFR levels. In: CALL Communities and Culture-Short Papers from EUROCALL 2016, pp. 456–461 (2016)CrossRefGoogle Scholar
  30. 30.
    Žabokrtský, Z.: Treex - an open-source framework for natural language processing. In: Information Technologies - Applications and Theory, vol. 788, pp. 7–14. Univerzita Pavla Jozefa Šafárika v Košiciach, Košice (2011)Google Scholar
  31. 31.
    Zesch, T., Wojatzki, M., Scholten-Akoun, D.: Task-independent features for automated essay grading. In: Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 224–232 (2015)Google Scholar
  32. 32.
    Zikánová, Š., et al.: Discourse and coherence. From the sentence structure to relations in text. Studies in Computational and Theoretical Linguistics, ÚFAL, Praha (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Michal Novák
    • 1
  • Jiří Mírovský
    • 1
  • Kateřina Rysová
    • 1
  • Magdaléna Rysová
    • 1
  1. 1.Faculty of Mathematics and Physics, Institute of Formal and Applied LinguisticsCharles UniversityPrague 1Czech Republic

Personalised recommendations