Summarizing Short Texts Through a Discourse-Centered Approach in a Multilingual Context

  • Daniel Alexandru Anechitei
  • Dan Cristea
  • Ioannidis Dimosthenis
  • Eugen Ignat
  • Diman Karagiozov
  • Svetla Koeva
  • Mateusz Kopeć
  • Cristina Vertan
Chapter

Abstract

The chapter presents the architecture of a system targeting summaries of short texts in six languages. At the core of a summary, which comprises clauses and sentences extracted from the original text, is the structure of the discourse and its relationship with its coreferential links. The approach shows a uniform design for all languages, while language specificity is attributed to the resources that fuel the component modules. The design described here includes a number of feedback loops used to fine-tune the parameters by comparing the output of the modules against annotated corpora. “Average” summaries over some human-produced ones are used to evaluate the accuracy of each of the monolingual systems. The study also presents some quantitative data on the corpora used, showing a comparison among languages and results that, mostly, prove to be above the state of the art.

Keywords

Entropy Coherence Assure Production Line Arena 

Notes

Acknowledgements

The work described in this chapter was supported by ATLAS (Applied Technology for Language-Aided CMS)—a project funded by the European Commission under the ICT Policy Support Programme, Grant Agreement 250467, and, partly, by the METANET4U ICT-PSP project, Grant Agreement 270893. Our thanks go to the following people: Anelia Belogay—for the impeccable leadership of the ATLAS project; Angel Genov—for mastering the Bulgarian chain and preparing data for Bulgarian; Walther von Hahn—for providing the linguistic support for German; Maciej Ogrodniczuk and Adam Przepiorkowski—for mastering the Polish chain; Polivios Raxis—for coordinating the evaluation processes; and Sabina Deliu—for preparing corpora and organizing the evaluation activity for Romanian.

References

  1. Bangalore S, Stent A (2009) Incremental parsing models for dialog task structure. In: Proceedings of the meeting of the European chapter of the association for computational linguistics (EACL), Athens, GreeceGoogle Scholar
  2. Brennan SE, Walker Friedman M, Pollard, CJ (1987) A centering approach to pronouns. In: Proceedings of the 25th annual meeting of ACL, Stanford, pp 155–162Google Scholar
  3. Carlson L, Marcu D, Okurowski M (2001) Building a discourse-tagged corpus in the framework of rhetorical structure theory. In: Proceedings of the 2nd SIGDIAL workshop on discourse and dialogue, DenmarkGoogle Scholar
  4. Cristea D, Dima GE (2001) An integrating framework for anaphora resolution. In: Information science and technology, vol 4, no. 3–4. Romanian Academy Publishing House, Bucharest, pp 273–291
  5. Cristea D, Ide N, Romary L (1998). Veins theory: a model of global discourse cohesion and coherence. In: Proceedings of the 17th international conference on computational linguistics ACL’98, Montreal, August, pp 281–285Google Scholar
  6. Cristea D, Ide N, Marcu D, Tablan V (1999) Discourse Structure and Co-Reference: An Empirical Study, In: Proceedings of the workshop on the relation between discourse structure and reference, Maryland, June 1999, workshop in conjunction with The 37th Annual Meeting of the Association for Computational Linguistics—ACL’99, Maryland, June, pg 48–57Google Scholar
  7. Cristea D, Iftene A (2011) Grounding coherence properties of discourse. In: ALEAR Final Report, vol II. Embodied Cognitive Semantics, Berlin, AprilGoogle Scholar
  8. Cristea D, Postolache O, Pușcașu G, Ghetu L (2003) Summarizing documents based on cue-phrases and references. In: Proceedings of the International Symposium on Reference Resolution and its Applications to Question Answering and Summarization, Venice, Italy, JuneGoogle Scholar
  9. Cristea D, Postolache O (2005) How to deal with wicked anaphora. In: Antonio Branco, Tony McEnery, Ruslan Mitkov (eds) Anaphora processing: linguistic, cognitive and computational modelling. Benjamin Publishing Books, Amsterdam, ISBN 90-272-4777-3Google Scholar
  10. Cristea D, Postolache O, Pistol I (2005) Summarization through discourse structure. In: Alexander Gelbukh (ed) Computational linguistics and intelligent text processing, 6th international conference CICLing 2005, Mexico City, Mexico, February 2005, Proceedings, Springer LNSC, vol 3406, ISBN 3-540-24523-5, pp 632–644Google Scholar
  11. Cristea D, Webber BL (1997) Expectations in incremental discourse processing. In: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, MadridGoogle Scholar
  12. De Silva N, Henderson P (2005) Narrative support for technical documents: formalising rhetorical structure theory. At the International Conference on Enterprise Information Systems (ICEIS), Miami, FL, USA, 24–28 May 2005Google Scholar
  13. Erkan G, Radev D (2004) LexRank: graph-based lexical centrality as salience in text summarization. J Artif Int Res 22(1), ISSN: 1076-9757, AI Access Foundation, USA, pp 457–479Google Scholar
  14. Feng VW, Hirst G (2012) Text-level discourse parsing with rich linguistic features. In: Proceedings of ACL-2012Google Scholar
  15. Fox B (1987) Discourse structure and anaphora: written and conversational English. Cambridge Studies in Linguistics—48. Cambridge University Press, ISBN: 9780521330824Google Scholar
  16. Grosz BJ, Joshi AK, Weinstein S (1995) Centering: a framework for modeling the local coherence of discourse. Comput Linguist 12(2):203–225Google Scholar
  17. Grosz BJ, Sidner C (1986) Attention, intention and the structure of discourse. Comput Linguist 12:175–204Google Scholar
  18. Hernault H, Prendinger H, duVerle DA, Ishizuka M (2010) HILDA: a discourse parser using support vector machine classification. Dialogue Discourse 1(3):1–33Google Scholar
  19. Hilbert M, Lobin H, Bärenfänger M, Lüngen H, Puskas C (2006) A text-technological approach to automatic discourse analysis of complex texts. In: Proceedings of KONVENS 2006, KonstanzGoogle Scholar
  20. Ide N, Cristea, D (2000) A hierarchical account of referential accessibility. In: Proceedings of the 38th Annual Meeting of the association for computational linguistics, ACL’2000, Hong Kong, ChinaGoogle Scholar
  21. Joshi A, Schabes Y (1997) Tree-adjoining grammars, In: Rozenberg G, Salomaa A (eds) Handbook of formal languages, Springer, Berlin, ISBN: 978-3-642-63859-6 pp 69–123Google Scholar
  22. Kameyama M (1997) Intrasentential centering: a case study. In: Centering theory in discourse. Clarendon Press, ISBN: 978-0-19-823687-0Google Scholar
  23. Karagiozov D, Belogay A, Cristea D, Koeva S, Ogrodniczuk M, Raxis P, Stoyanov E, Vertan C (2012) I-Librarian—Free Onine Library For European Citizens, in INFOtheca. J Librarianship Inform XIII(1):27–43, BS Print. Belgrade, ISSN: 1450-9687Google Scholar
  24. Leffa V. (1988). Clause processing in complex sentences. In: Proceedings of the first international conference on language resource and evaluation, vol 1, pp 937–943, May 1998Google Scholar
  25. Lin C-Y., Hovy EH (2003) Automatic evaluation of summaries using N-gram Co-occurrence statistics. In: Proceedings of the human language technology conference of the North American chapter of the association for computational linguistics (HLT-NAACL), Edmonton, CanadaGoogle Scholar
  26. Mann WC, Thompson SA (1988) Rhetorical structure theory: a theory of text organization. Text 8(3):243–281Google Scholar
  27. Marcu D (1997) The rhetorical parsing, summarization and generation of natural language texts, Ph.D. thesis, Department of Computer Science, University of TorontoGoogle Scholar
  28. Marcu D (2000) The theory and practice of discourse parsing and summarization. The MIT Press, Cambridge, MAMATHGoogle Scholar
  29. Nguyen VV, Nguyen ML, Shimazu A (2009) Clause splitting with conditional random fields. Inform Media Technol 4(1):57–75, reprinted from: Journal of Natural Language Processing 16(1): 47-65 (2009) © The Association for Natural Language ProcessingGoogle Scholar
  30. Orăsan C (2000) A hybrid method for clause splitting in unrestricted English texts. In: Proceedings of ACIDCA’2000, Monastir, TunisiaGoogle Scholar
  31. Parveen D, Sanyal R, Ansari A (2011) Clause boundary identification using classifier and clause markers in urdu languag. Polibits Res J Comput Sci 43:61–65Google Scholar
  32. Pușcașu G (2004) A multilingual method for clause splitting. In: Proceedings of the 7th annual colloquium for the UK Special interest group for computational linguistics (CLUK 2004), Birmingham, UKGoogle Scholar
  33. Serețțan V, Cristea D (2002) The use of referential constrains in structuring discourse. In: Proceedings of The 3rd international conference on language resources and evaluation, LREC-2002, Las Palmas, SpainGoogle Scholar
  34. Subba R, Di Eugenio B (2007) Automatic discourse segmentation using neural networks. In: Proceedings of 11th workshop on the semantics and pragmatics of dialougue, Trento, ItalyGoogle Scholar
  35. Taboada M, Mann WC (2006) Rhetorical structure theory: looking back and moving ahead. In: Discourse studies, vol 8, Nr. 3 (2006), pp 423–459Google Scholar
  36. Șoricuț R, Marcu D (2003) Sentence level discourse parsing using syntactic and lexical information. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology (HLT/NAACL), vol 1. Edmonton, Canada, pp 149–156Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Daniel Alexandru Anechitei
    • 1
  • Dan Cristea
    • 1
    • 2
  • Ioannidis Dimosthenis
    • 3
  • Eugen Ignat
    • 1
  • Diman Karagiozov
    • 4
  • Svetla Koeva
    • 5
  • Mateusz Kopeć
    • 6
  • Cristina Vertan
    • 7
  1. 1.Department of Computer Science“Alexandru Ioan Cuza” University of IaşiIaşiRomania
  2. 2.Institute for Computer ScienceRomanian Academy, Iaşi BranchIaşiRomania
  3. 3.Atlantis Consulting SAThessalonikiGreece
  4. 4.Tetracom Interactive Solutions Ltd.SofiaBulgaria
  5. 5.Institute for Bulgarian LanguageBulgarian Academy of SciencesSofiaBulgaria
  6. 6.Institute of Computer SciencePolish Academy of SciencesWarsawPoland
  7. 7.Department of LinguisticsUniversity of HamburgHamburgGermany

Personalised recommendations