An Experimental Analysis of Feature Selection and Similarity Assessment for Textual Summarization

  • Ana Maria Schwendler Ramos
  • Vinicius Woloszyn
  • Leandro Krug Wives
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 735)


Since the access to information is increasing every day, and we can quickly acquire knowledge from many sources such as news websites, blogs, and social networks, the capacity of processing all this information becomes increasingly difficult. So, tools are needed to automatically extract the most relevant sentences, aiming to reduce the volume of text into a shorter version. One alternative to achieve this process while preserving the core information content by using a process called Automatic Text Summarization. One relevant issue in this context is the presence of typos, synonyms, and other orthographic variations since some extractive techniques are not prepared to handle them. This work presents an evaluation of different similarity approaches to minimize these problems, selecting the most appropriate sentences to represent a document in an automatically generated summary.


  1. 1.
    Cardoso, P.C., Maziero, E.G., Jorge, M.L., Seno, E.M., Di Felippo, A., Rino, L.H., Nunes, M.G., Pardo, T.A.: CSTnews-a discourse-annotated corpus for single and multi-document summarization of news texts in Brazilian Portuguese. In: Proceedings of the 3rd RST Brazilian Meeting, pp. 88–105 (2011)Google Scholar
  2. 2.
    Erkan, G., Radev, D.R.: LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 42, 457–479 (2004)Google Scholar
  3. 3.
    Hearst, M.A.: Texttiling: segmenting text into multi-paragraph subtopic passages. Comput. Linguist. 23(1), 33–64 (1997)Google Scholar
  4. 4.
    Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, Barcelona, Spain, vol. 8 (2004)Google Scholar
  5. 5.
    Murgante, B., Misra, S., Rocha, A., Torre, C., Rocha, J.G., Falcão, M.I., Taniar, D., Apduhan, B.O., Gervasi, O. (eds.): Computational Science and Its Applications - ICCSA 2014. LNCS, vol. 8583. Springer, Cham (2014). doi: 10.1007/978-3-319-09156-3 Google Scholar
  6. 6.
    Nenkova, A., Maskey, S., Liu, Y.: Automatic summarization. In: Proceedings Annual Meeting of the Association for Computational Linguistics, p. 3. Association for Computational Linguistics (2011)Google Scholar
  7. 7.
    Oliveira, H.M.: Seleção de entes complexos usando lógica difusa. Instituto de Informática da PUC-RS, dissertation (Masters in Computer Science) (1996)Google Scholar
  8. 8.
    Prado, H.A.D., de Oliveira, J.P.M., Ferneda, E., Wives, L.K., Silva, E.M., Loh, S.: Text mining in the context of business intelligence. In: Khosrow-Pour, M. (ed.) Encyclopedia of Information Science and Technology, 1st edn, pp. 2793–798. IGI Global, Hershey (2005)CrossRefGoogle Scholar
  9. 9.
    Ribaldo, R., Cardoso, P.C.F., Pardo, T.A.S.: Exploring the subtopic-based relationship map strategy for multi-document summarization. Revista de Informática Teórica e Aplicada 23(1), 183–211 (2016)Google Scholar
  10. 10.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975). CrossRefMATHGoogle Scholar
  11. 11.
    Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)CrossRefGoogle Scholar
  12. 12.
    Wilcoxon, F., Katti, S., Wilcox, R.A.: Critical values and probability levels for the wilcoxon rank sum test and the wilcoxon signed rank test. Sel. Tables Math. Stat. 1, 171–259 (1970)MATHGoogle Scholar
  13. 13.
    Wives, L.K.: Utilizando conceitos como descritores de textos para o processo de identificação de conglomerados (clustering) de documentos. Ph.D. thesis, Universidade Federal do Rio Grande do Sul (2004)Google Scholar
  14. 14.
    Wives, L.K., Loh, S.: Recuperação de informaçães usando a expansão semântica e a lógica difusa. In: Congreso Internacional de Ingeniería Informática, pp. 201–211. CITA, Faculdad de Ingenieria (1998)Google Scholar
  15. 15.
    Wives, L.K., Loh, S., de Oliveira, J. P.M.: A comparative study of clustering versus classification over reuters collection. In: Proceedings of the 8th International Workshop on Pattern Recognition in Information Systems, pp. 231–236 (2009)Google Scholar
  16. 16.
    Wives, L.K., de Oliveira, J.P.M., Loh, S.: Conceptual clustering of textual documents and some insights for knowledge discovery. In: Prado, H.d., Ferneda, E. (eds.) Text Mining: Techniques and Applications, pp. 223–243. Information Science Reference, Hershey (2008)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Ana Maria Schwendler Ramos
    • 1
  • Vinicius Woloszyn
    • 1
  • Leandro Krug Wives
    • 1
  1. 1.PPGC, Instituto de InformáticaUFRGSPorto AlebreBrazil

Personalised recommendations