Advertisement

Automatic Mining of Discourse Connectives for Russian

  • Svetlana Toldova
  • Maria Kobozeva
  • Dina Pisarevskaya
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 930)

Abstract

The identification of discourse connectives plays an important role in many discourse processing approaches. Among them there are functional words usually enumerated in grammars (iz-za ‘due to’, blagodarya ‘thanks to’,) and not grammaticalized expressions (X vedet k Y ‘X leads to Y’, prichina etogo ‘the cause is’). Both types of connectives signal certain relations between discourse units. However, there are no ready-made lists of the second type of connectives. We suggest a method for expanding a seed list of connectives based on their vector representations by candidates for not grammaticalized connectives for Russian. Firstly, we compile a list of patterns for this type of connectives. These patterns are based on the following heuristics: the connectives are often used with anaphoric expressions substituting discourse units (thus, some patterns include special anaphoric elements); the connectives more frequently occur at the sentence beginning or after a comma. Secondly, we build multi-word tokens that are based on these patterns. Thirdly, we build vector representations for the multi-word tokens that match these patterns. Our experiments based on distributional semantics give quite reasonable list of the candidates for connectives.

Keywords

Rhetorical Structure Theory Discourse connectives Word embeddings 

References

  1. 1.
    Alonso, L., Castellón, I., Gibert, K., Padró, L.: An empirical approach to discourse markers by clustering. In: Escrig, M.T., Toledo, F., Golobardes, E. (eds.) CCIA 2002. LNCS (LNAI), vol. 2504, pp. 173–183. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-36079-4_15CrossRefGoogle Scholar
  2. 2.
    Apresyan, Y.D.: System-forming meanings to know and to consider in russian /sistemoobrazuyushchiye smysly znat’ i schitat’ v russkom yazyke. In: Russian Language and Linguistic Theory /Russkiy yazyk v nauchnom osveshchenii, vol. 1, pp. 5–26 (2001)Google Scholar
  3. 3.
    Boguslavskaya, O.Y., Levontina, I.B.: Meanings cause and purpose in natural language /smysly ‘prichina’ i ‘tsel’ v yestestvennom yazyke. In: Topics in the study of language /Voprosy yazykoznaniya, vol. 2, pp. 68–88 (2004)Google Scholar
  4. 4.
    Carlson, L., Marcu, D.: Discourse tagging reference manual. Technical report, ISI-TR-545, University of Southern California Information Sciences Institute (2001). http://www.isi.edu/~marcu/discourse/tagging-ref-manual.pdf
  5. 5.
    Carlson, L., Marcu, D., Okurowski, M.E.: Building a discourse-tagged corpus in the framework of rhetorical structure theory. In: Proceedings of the Second SIGdial Workshop on Discourse and Dialogue, SIGDIAL 2001, vol. 16, pp. 1–10. Association for Computational Linguistics, Stroudsburg (2001).  https://doi.org/10.3115/1118078.1118083
  6. 6.
    Crible, L.: Discourse markers and (dis) fluency across registers: a contrastive usage-based study in English and French. Ph.D. thesis, UCL-Université Catholique de Louvain (2017)Google Scholar
  7. 7.
    Ferrucci, D., et al.: Building watson: an overview of the DeepQA project. AI Mag. 31(3), 59–79 (2010)CrossRefGoogle Scholar
  8. 8.
    Galitsky, B., Ilvovsky, D., Kuznetsov, S.O.: Detecting logical argumentation in text via communicative discourse tree. J. Exp. Theor. Artif. Intell. 30, 1–27 (2018)CrossRefGoogle Scholar
  9. 9.
    Harris, Z.S.: Distributional structure. In: Harris, Z.S. (ed.) Papers in Structural and Transformational Linguistics, pp. 775–794. Springer, Dordrecht (1970).  https://doi.org/10.1007/978-94-017-6059-1CrossRefzbMATHGoogle Scholar
  10. 10.
    Heerschop, B., Goossen, F., Hogenboom, A., Frasincar, F., Kaymak, U., de Jong, F.: Polarity analysis of texts using discourse structure. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 1061–1070. ACM (2011)Google Scholar
  11. 11.
    Louis, A., Joshi, A., Nenkova, A.: Discourse indicators for content selection in summarization. In: Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 147–156. Association for Computational Linguistics (2010)Google Scholar
  12. 12.
    Mann, W.C., Thompson, S.A.: Rhetorical Structure Theory: Description and Construction of Text Structures. In: Kempen, G. (ed.) Natural Language Generation, pp. 85–95. Springer, Dordrecht (1987).  https://doi.org/10.1007/978-94-009-3645-4_7CrossRefGoogle Scholar
  13. 13.
    Mukherjee, S., Bhattacharyya, P.: Sentiment analysis in Twitter with lightweight discourse analysis. In: Proceedings of COLING 2012, pp. 1847–1864 (2012)Google Scholar
  14. 14.
    Pekelis, O.Y.: Causal subordinate clauses /prichinnyye pridatochnyye. In: Materials for the Project of Russian Grammar Corpus Description /Materialy dlya proyekta korpusnogo opisaniya russkoy grammatiki (2014). http://rusgram.ru
  15. 15.
    Pisarevskaya, D.: Rhetorical structure theory as a feature for deception detection in news reports in the Russian language. In: Computational Linguistics and Intellectual Technologies, pp. 184–193 (2017)Google Scholar
  16. 16.
    Ribaldo, R., Akabane, A.T., Rino, L.H.M., Pardo, T.A.S.: Graph-based methods for multi-document summarization: exploring relationship maps, complex networks and discourse information. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds.) PROPOR 2012. LNCS (LNAI), vol. 7243, pp. 260–271. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-28885-2_30CrossRefGoogle Scholar
  17. 17.
    Rubin, V.L., Conroy, N.J., Chen, Y.: Towards news verification: deception detection methods for news discourse. In: HICSS 2015 (2015)Google Scholar
  18. 18.
    Rysová, K., Rysová, M.: Discourse connectives and reference. In: TextLink2018-Final Action Conference, p. 122 (2018)Google Scholar
  19. 19.
    Rysova, M., Mírovský, J.: Use of coreference in automatic searching for multiword discourse markers in the Prague dependency treebank. In: LAW VIII - The 8th Linguistic Annotation Workshop, pp. 11–19 (2014)Google Scholar
  20. 20.
    Schauer, H.: From elementary discourse units to complex ones. In: Proceedings of the 1st SIGdial Workshop on Discourse and Dialogue, vol. 10, pp. 46–55. Association for Computational Linguistics (2000).  https://doi.org/10.3115/1117736.1117742. http://portal.acm.org/citation.cfm?doid=1117736.1117742
  21. 21.
    Shvedova, N.Y. (ed.): Russian Grammar [Russkaya grammatika]. Nauka, Moscow (1980)Google Scholar
  22. 22.
    Taboada, M., Mann, W.C.: Applications of rhetorical structure theory. Discourse Stud. 8(4), 567–588 (2006).  https://doi.org/10.1177/1461445606064836CrossRefGoogle Scholar
  23. 23.
    Taboada, M., Voll, K., Brooke, J.: Extracting sentiment as a function of discourse structure and topicality (2008)Google Scholar
  24. 24.
    Toldova, S., Pisarevskaya, D., Kobozeva, M.: The cues for rhetorical relations in Russian: cause-effect relation in Russian rhetorical structure treebank. Comput. Linguist. Intellect. Technol. 17(24), 748–761 (2018)Google Scholar
  25. 25.
    Verberne, S., Boves, L., Oostdijk, N., Coppen, P.A.: Evaluating discourse-based answer extraction for why-question answering. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 735–736. ACM (2007)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Svetlana Toldova
    • 1
  • Maria Kobozeva
    • 2
  • Dina Pisarevskaya
    • 2
  1. 1.National Research University “Higher School of Economics”MoscowRussia
  2. 2.Institute for Systems Analysis FRC CSC RASMoscowRussia

Personalised recommendations