Abstractive Multi-Document Text Summarization Using a Genetic Algorithm

  • Verónica Neri Mendoza
  • Yulia LedenevaEmail author
  • René Arnulfo García-Hernández
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11524)


Multi-Document Text Summarization (MDTS) consists of generating an abstract from a group of two or more number of documents that represent only the most important information of all documents. Generally, the objective is to obtain the main idea of several documents on the same topic. In this paper, we propose a new MDTS method based on a Genetic Algorithm (GA). The fitness function is calculated considering two text features: sentence position and coverage. We propose the binary coding representation, selection, crossover and mutation operators to improve the state-of-the-art results. We test the proposed method on DUC02 data set, specifically, on Abstractive Multi-Document Text Summarization (AMDST) task demonstrating the improvement over the state-of-art methods. Four different tasks for each of the 59 collection of documents (in total 567 documents) are tested. In addition, we test different configurations of the most used methodology to generate AMDST summaries. Moreover, different heuristics such as topline, baseline, baseline-random and lead baseline are calculated. The proposed method for AMDTS demonstrates the improvement over the state-of-art methods and heuristics.


Multi-Document Text Summarization (MDTS) Language-independent methods MDTS methodology Genetic algorithm Heuristics 


  1. 1.
    Bakkar, H., et al.: Multi-document summarizer (2018)Google Scholar
  2. 2.
    Cao, Z., Wei, F., Dong, L., Li, S., Zhou, M.: Ranking with recursive neural networks and its application to multi-document summarization, vol. 7 (2015)Google Scholar
  3. 3.
    Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR 1998, pp. 335–336 ACM Press, New York (1998)Google Scholar
  4. 4.
    Das, D., Martins, A.F.T.: A survey on automatic text summarization (2007)Google Scholar
  5. 5.
    Du, K.L., Swamy, M.N.S.: Search and optimization by metaheuristics: techniques and algorithms inspired by nature (2016)Google Scholar
  6. 6.
    Ferreira, R., et al.: A multi-document summarization system based on statistics and linguistic treatment. Expert Syst. Appl. 41(13), 5780–5787 (2014)CrossRefGoogle Scholar
  7. 7.
    García-Hernández, R.A., Martínez-Trinidad, J.F., Carrasco-Ochoa, J.A.: A new algorithm for fast discovery of maximal sequential patterns in a document collection. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 514–523. Springer, Heidelberg (2006). Scholar
  8. 8.
    García-Hernández, R.A., Ledeneva, Y.: Single extractive text summarization based on a genetic algorithm. In: Carrasco-Ochoa, J.A., Martínez-Trinidad, J.F., Rodríguez, J.S., di Baja, G.S. (eds.) MCPR 2013. LNCS, vol. 7914, pp. 374–383. Springer, Heidelberg (2013). Scholar
  9. 9.
    Kaushik, A., Naithani, S.: A comprehensive study of text mining approach (2016)Google Scholar
  10. 10.
    Kumar Bharti, S., et al.: Automatic keyword extraction for text summarization in multi-document e-newspapers articles (2017)Google Scholar
  11. 11.
    Ledeneva, Y., García-Hernández, R., Gelbukh, A.: Multi-document summarization using maximal frequent sequences, vol. 47, pp. 15–24 (2010). ISSN 1870-4069Google Scholar
  12. 12.
    Ledeneva, Y., et al.: Experimenting with maximal frequent sequences for multi-document summarization, vol. 45, pp. 233–244 (2010). ISSN 1870-4069Google Scholar
  13. 13.
    Ledeneva, Y., Gelbukh, A., García-Hernández, R.A.: Terms derived from frequent sequences for extractive text summarization. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 593–604. Springer, Heidelberg (2008). Scholar
  14. 14.
    Ledeneva, Y.N., García-Hernández, R.A.: Generación automática de resúmenes - Retos, propuestas y experimentos (2017)Google Scholar
  15. 15.
    Ledeneva, Y.N., Gelbukh, A.: Automatic Language-Independent Detection of Multiword Descriptions for Text Summarization. Instituto Politécnico Nacional (2013)Google Scholar
  16. 16.
    Lin, C.-Y.: ROUGE: a package for automatic evaluation of summaries, vol. 34, no. 12, pp. 1213–1220 (2011)Google Scholar
  17. 17.
    Lin, H., Bilmes, J.: Multi-document summarization via budgeted maximization of submodular functions, 912–920 (2010)Google Scholar
  18. 18.
    Lloret, E., et al.: Incorporating textual entailment recognition in single-and multi-document summarization systems (2008)Google Scholar
  19. 19.
    Mani, I., Bloedorn, E.: Multi-document summarization by graph search and matching (1997)Google Scholar
  20. 20.
    Matías, M.G.A.: Generación Automática De Resúmenes Usando Algoritmos Genéticos. Universidad Autónoma del Estado de México (2013)Google Scholar
  21. 21.
    McDonald, R.: A study of global inference algorithms in multi-document summarization. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 557–564. Springer, Heidelberg (2007). Scholar
  22. 22.
    Nayeem, M.T., Chali, Y.: Extract with order for coherent multi-document summarization (2017)Google Scholar
  23. 23.
    Over, P., Dang, H.: DUC in context. Inf. Process. Manag. 43(6), 1506–1520 (2007)CrossRefGoogle Scholar
  24. 24.
    Rojas-Simón, J., Ledeneva, Y., García-Hernández, R.A.: Calculating the upper bounds for portuguese automatic text summarization using genetic algorithm. In: Simari, G.R., Fermé, E., Gutiérrez Segura, F., Rodríguez Melquiades, J.A. (eds.) IBERAMIA 2018. LNCS (LNAI), vol. 11238, pp. 442–454. Springer, Cham (2018). Scholar
  25. 25.
    Rojas Simón, J., et al.: Calculating the upper bounds for multi-document summarization using genetic algorithms. Comput. Sist. 22, 1 (2018)Google Scholar
  26. 26.
    Saggion, H., Poibeau, T.: Automatic text summarization: past, present and future. In: Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds.) Multi-source, Multilingual Information Extraction and Summarization, pp. 3–21. Springer, Heidelberg (2013). Scholar
  27. 27.
    Sidorov, G.: N-gramas sintácticos no-continuos. Polibits 48, 69–78 (2013)CrossRefGoogle Scholar
  28. 28.
    Vázquez, E., et al.: Sentence features relevance for extractive text summarization using genetic algorithms. J. Intell. Fuzzy Syst. 35(1), 353–365 (2018)CrossRefGoogle Scholar
  29. 29.
    Wang, D., et al.: Multi-document summarization using sentence-based topic models. In: ACL and AFNLP, p. 297 (2010)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Autonomous University of the State of MexicoTolucaMexico

Personalised recommendations