Graph Ranking on Maximal Frequent Sequences for Single Extractive Text Summarization

  • Yulia Ledeneva
  • René Arnulfo García-Hernández
  • Alexander Gelbukh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8404)

Abstract

We suggest a new method for the task of extractive text summarization using graph-based ranking algorithms. The main idea of this paper is to rank Maximal Frequent Sequences (MFS) in order to identify the most important information in a text. MFS are considered as nodes of a graph in term selection step, and then are ranked in term weighting step using a graph-based algorithm. We show that the proposed method produces results superior to the-state-of-the-art methods; in addition, the best sentences were found with this method. We prove that MFS are better than other terms. Moreover, we show that the longer is MFS, the better are the results. If the stop-words are excluded, we lose the sense of MFS, and the results are worse. Other important aspect of this method is that it does not require deep linguistic knowledge, nor domain or language specific annotated corpora, which makes it highly portable to other domains, genres, and languages.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ledeneva, Y.N., Gelbukh, A., García-Hernández, R.A.: Terms Derived from Frequent Sequences for Extractive Text Summarization. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 593–604. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  2. 2.
    Hope, D., Keller, B.: MaxMax: A Graph-Based Soft Clustering Algorithm Applied to Word Sense Induction. In: Gelbukh, A. (ed.) CICLing 2013, Part I. LNCS, vol. 7816, pp. 368–381. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  3. 3.
    Ceylan, H., Mihalcea, R., Ozertem, U., Lloret, E., Palomar, M.: Quantifying the Limits and Success of Extractive Summarization Systems Across Domains. In: Proc. of the North American Chapter of the ACL (NACLO 2010), Los Angeles (2010)Google Scholar
  4. 4.
    Ribaldo, R., Akabane, A.T., Rino, L.H.M., Pardo, T.A.S.: Graph-based Methods for Multi-document Summarization: Exploring Relationship Maps, Complex Networks and Discourse Information. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds.) PROPOR 2012. LNCS, vol. 7243, pp. 260–271. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  5. 5.
    Maziero, E.G. and Pardo, T.A.S. Automatic Identification of Multi-document Relations. In the (on-line) Proceedings of the PROPOR 2012 PhD and MSc/MA Dissertation Contest, Coimbra, Portugal, April 17-20, pp. 1–8 (2012)Google Scholar
  6. 6.
    Antiqueira, L., Oliveira Jr., O.N., Costa, L.F., Nunes, M.G.V.: A Complex Network Approach to Text Summarization. Information Sciences 179(5), 584–599 (2009)CrossRefMATHGoogle Scholar
  7. 7.
    Mihalcea, R.: Random Walks on Text Structures. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 249–262. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Mihalcea, R., Radev, D.: Graph-based Natural Language Processing and Information Retrieval. Cambridge University Press (2011)Google Scholar
  9. 9.
    Sinha, R., Mihalcea, R.: Unsupervised Graph-based Word Sense Disambiguation. In: Nicolov, N., Mitkov, R. (eds.) Current Issues in Linguistic Theory: Recent Advances in Natural Language Processing. John Benjamins Publishers (2009)Google Scholar
  10. 10.
    Hassan, S., Mihalcea, R., Banea, C.: Random-Walk Term Weighting for Improved Text Classification. In: IEEE International Conference on Semantic Computing (ICSC 2007), Irvine, CA (2007)Google Scholar
  11. 11.
    Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain (2004)Google Scholar
  12. 12.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management 24, 513–523 (1988)CrossRefGoogle Scholar
  13. 13.
    García-Hernández, R.A., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A.: A Fast Algorithm to Find All the Maximal Frequent Sequences in a Text. In: Sanfeliu, A., Martínez Trinidad, J.F., Carrasco Ochoa, J.A. (eds.) CIARP 2004. LNCS, vol. 3287, pp. 478–486. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  14. 14.
    García-Hernández, R.A., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A.: A New Algorithm for Fast Discovery of Maximal Sequential Patterns in a Document Collection. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 514–523. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  15. 15.
    DUC. Document understanding conference (2002), http://www-nlpir.nist.gov/projects/duc
  16. 16.
    Lin, C.Y.: ROUGE: A Package for Automatic Evaluation of Summaries. In: Proceedings of Workshop on Text Summarization of ACL, Spain, (2004)Google Scholar
  17. 17.
    Lin, C.Y., Hovy, E.: Automatic Evaluation of Summaries Using N-gram Co-Occurrence Statistics. In: Proceedings of HLT-NAACL, Canada, (2003)Google Scholar
  18. 18.
    Ledeneva, Y., Hernández, R.G., Soto, R.M., Reyes, R.C., Gelbukh, A.: EM Clustering Algorithm for Automatic Text Summarization. In: Batyrshin, I., Sidorov, G. (eds.) MICAI 2011, Part I. LNCS, vol. 7094, pp. 305–315. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  19. 19.
    Soto, R.M., Hernández, R.G., Ledeneva, Y., Reyes, R.C.: Comparación de Tres Modelos de Representación de Texto en la Generación Automática de Resúmenes. Procesamiento del Lenguaje Natural 43, 303–311 (2009)Google Scholar
  20. 20.
    Ledeneva, Y.: PhD. Thesis: Automatic Language-Independent Detection of Multiword Descriptions for Text Summarization, Mexico: National Polytechnic Institute (2008)Google Scholar
  21. 21.
    Ledeneva, Y.N.: Effect of preprocessing on extractive summarization with maximal frequent sequences. In: Gelbukh, A., Morales, E.F. (eds.) MICAI 2008. LNCS (LNAI), vol. 5317, pp. 123–132. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  22. 22.
    Sidorov, G.: Syntactic Dependency Based N-grams in Rule Based Automatic English as Second Language Grammar Correction. International Journal of Computational Linguistics and Applications 4(2), 169–188 (2013)Google Scholar
  23. 23.
    Sidorov, G.: Non-continuous Syntactic N-grams. Polibits 48, 67–75 (2013)Google Scholar
  24. 24.
    Bora, N.N.: Summarizing Public Opinions in Tweets. International Journal of Computational Linguistics and Applications 3(1), 41–55 (2012)Google Scholar
  25. 25.
    Balahur, A., Kabadjov, M., Steinberger, J.: Exploiting Higher-level Semantic Information for the Opinion-oriented Summarization of Blogs. International Journal of Computational Linguistics and Applications 1(1-2), 45–59 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Yulia Ledeneva
    • 1
  • René Arnulfo García-Hernández
    • 1
  • Alexander Gelbukh
    • 2
  1. 1.Unidad Académica Profesional TianguistencoUniversidad Autónoma del Estado de MéxicoTolucaEstado de México
  2. 2.Centro de Investigación en ComputaciónInstituto Politécnico NacionalMexico DFMexico

Personalised recommendations