Automatic summarization of scientific publications using a feature selection approach

  • Hazem Al Saied
  • Nicolas Dugué
  • Jean-Charles Lamirel
Article

Abstract

Feature Maximization is a feature selection method that deals efficiently with textual data: to design systems that are altogether language-agnostic, parameter-free and do not require additional corpora to function. We propose to evaluate its use in text summarization, in particular in cases where documents are structured. We first experiment this approach in a single-document summarization context. We evaluate it on the DUC AQUAINT corpus and show that despite the unstructured nature of the corpus, our system is above the baseline and produces encouraging results. We also observe that the produced summaries seem robust to redundancy. Next, we evaluate our method in the more appropriate context of SciSumm challenge, which is dedicated to research publications summarization. These publications are structured in sections and our class-based approach is thus relevant. We more specifically focus on the task that aims to summarize papers using those that refer to them. We consider and evaluate several systems using our approach dealing with specific bag of words. Furthermore, in these systems, we also evaluate cosine and graph-based distance for sentence weighting and comparison. We show that our Feature Maximization based approach performs very well in the SciSumm 2016 context for the considered task, providing better results than the known results so far, and obtaining high recall. We thus demonstrate the flexibility and the relevance of Feature Maximization in this context.

Keywords

Text summarization Feature selection Feature Maximization 

References

  1. 1.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATHGoogle Scholar
  2. 2.
    Cao, Z., Li, W., Wu, D.: Polyu at cl-scisumm 2016. In: BIRNDL@ JCDL, pp. 132–138 (2016)Google Scholar
  3. 3.
    Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-law distributions in empirical data. SIAM Rev. 51(4), 661–703 (2009)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Cohan, A., Goharian, N.: Revisiting Summarization Evaluation for Scientific Articles. arXiv preprint arXiv:1604.00400 (2016)
  5. 5.
    Collins, A.M., Loftus, E.F.: A spreading-activation theory of semantic processing. Psychol. Rev. 82(6), 407 (1975)CrossRefGoogle Scholar
  6. 6.
    Conroy, J.M., O’leary, D.P.: Text summarization via hidden markov models. In: SIGIR, pp. 406–407 (2001)Google Scholar
  7. 7.
    Crestani, F.: Application of spreading activation techniques in information retrieval. Artif. Intell. Rev. 11(6), 453–482 (1997)CrossRefGoogle Scholar
  8. 8.
    Das, D., Martins, A.F.T.: A survey on automatic text summarization. Literature Survey for the Language and Statistics II course at CMU 4, 192–195 (2007)Google Scholar
  9. 9.
    Dugué, N., Lamirel, J.-C., Cuxac, P.: Keep track of your clusters ! In: Research Challenges in Information Science (RCIS) (2016)Google Scholar
  10. 10.
    Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)Google Scholar
  11. 11.
    Baeza-Yates, R.: Introduction to data structures and algorithms related to information retrieval. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval, Data Structures and Algorithms, pp. 13–27. Prentice-Hall (1992)Google Scholar
  12. 12.
    Haghighi, A., Vanderwende, L.: Exploring content models for multi-document summarization. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the ACL, pp. 362–370 (2009)Google Scholar
  13. 13.
    Jaidka, K., Chandrasekaran, M.K., Rustagi, S., Kan, M.-Y.: Overview of the cl-scisumm 2016 shared task. In: BIRNDL@ JCDL, pp. 93–102 (2016)Google Scholar
  14. 14.
    Klampfl, S., Rexha, A., Kern, R.: Identifying referenced text in scientific publications by summarisation and classification techniques. In: BIRNDL@ JCDL, pp. 122–131 (2016)Google Scholar
  15. 15.
    Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: ACM SIGIR, pp. 68–73 (1995)Google Scholar
  16. 16.
    Lamirel, J.-C., Cuxac, P., Chivukula, A.S., Hajlaoui, K.: A new feature selection and feature contrasting approach based on quality metric: application to efficient classification of complex textual data. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 367–378. Springer, Berlin (2013)Google Scholar
  17. 17.
    Lamirel, J.-C., Dugué, N., Cuxac, P.: New efficient clustering quality indexes. In: International Joint Conference on Neural Networks (2016)Google Scholar
  18. 18.
    Lamirel, J.-C., Dugué, N., Cuxac, P.: Performing and visualizing temporal analysis of large text data issued for open sources: past and future methods. In: Beyond Databases, Architectures and Structures (2016)Google Scholar
  19. 19.
    Lamirel, J.-C., Falk, I., Gardent, C.: Federating clustering and cluster labelling capabilities with a single approach based on feature maximization: French verb classes identification with igngf neural clustering. Neurocomputing 147, 136–146 (2015)CrossRefGoogle Scholar
  20. 20.
    Lamirel, J.-C., Ta, A.P., Attik, M.: Novel labeling strategies for hierarchical representation of multidimensional data analysis results. In: IASTED International Conference on Artificial Intelligence and Applications (2008)Google Scholar
  21. 21.
    Li, L., Mao, L., Zhang, Y., Chi, J., Huang, T., Cong, X., Peng, H.: Cist system for cl-scisumm 2016 shared task. In: BIRNDL@ JCDL, pp. 156–167 (2016)Google Scholar
  22. 22.
    Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out: the ACL-04 workshop, vol. 8 (2004)Google Scholar
  23. 23.
    Lin, C.-Y., Hovy, E.: The automated acquisition of topic signatures for text summarization. In: 18th Conference on Computational Linguistics, vol. 1, pp. 495–501 (2000)Google Scholar
  24. 24.
    Lloret, E.: Text summarisation based on human language technologies and its applications. Ph.D. Thesis, Universidad de Alicante (2015)Google Scholar
  25. 25.
    Lu, K., Mao, J., Li, G., Xu, J.: Recognizing reference spans and classifying their discourse facets. In: BIRNDL@ JCDL, pp. 139–145 (2016)Google Scholar
  26. 26.
    Malenfant, B., Lapalme, G.: Rali system description for cl-scisumm 2016 shared task. In: BIRNDL@ JCDL, pp. 146–155 (2016)Google Scholar
  27. 27.
    Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Lin, D., Wu, D. (eds.) Proceedings of EMNLP 2004, pp. 404–411. Association for Computational Linguistics, Barcelona, Spain (2004)Google Scholar
  28. 28.
    Moraes, L., Baki, S., Verma, R., Lee, D.: University of houston at cl-scisumm 2016: Svms with tree kernels and sentence similarity. In: BIRNDL@ JCDL, pp. 113–121 (2016)Google Scholar
  29. 29.
    Nenkova, A., Maskey, S., Liu, Y.: Automatic summarization. In: 49th Annual Meeting of the ACL: Tutorial Abstracts, p. 3 (2011)Google Scholar
  30. 30.
    Nicolas, D., Lamirel, J.-C.: Une métrique de sélection de variables appliquée à la centralité et à la détection des roles communautaires. In: EGC (2017)Google Scholar
  31. 31.
    Nomoto, Ta.: Neal: a neurally enhanced approach to linking citation and reference. In: BIRNDL@ JCDL, pp. 168–174 (2016)Google Scholar
  32. 32.
    Saggion, H., AbuRaed, A., Ronzano, F.: Trainable citation-enhanced summarization of scientific articles. In: BIRNDL@ JCDL (2016)Google Scholar
  33. 33.
    Tata, S., Patel, J.M.: Estimating the selectivity of tf-idf based cosine similarity predicates. ACM Sigmod Rec. 36(2), 7–12 (2007)CrossRefGoogle Scholar
  34. 34.
    Vanderwende, L., Suzuki, H., Brockett, C., Nenkova, A.: Beyond sumbasic: task-focused summarization with sentence simplification and lexical expansion. Inf. Process. Manag. 43(6), 1606–1618 (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  • Hazem Al Saied
    • 1
  • Nicolas Dugué
    • 2
  • Jean-Charles Lamirel
    • 3
  1. 1.ATILFNancyFrance
  2. 2.LIUMUniversité du MaineLe MansFrance
  3. 3.LORIA, SYNALPNancyFrance

Personalised recommendations