Automatic summarization of scientific publications using a feature selection approach

Al Saied, Hazem; Dugué, Nicolas; Lamirel, Jean-Charles

doi:10.1007/s00799-017-0214-x

Automatic summarization of scientific publications using a feature selection approach

Published: 13 April 2017

Volume 19, pages 203–215, (2018)
Cite this article

International Journal on Digital Libraries Aims and scope Submit manuscript

Hazem Al Saied¹,
Nicolas Dugué² &
Jean-Charles Lamirel³

563 Accesses
5 Citations
Explore all metrics

Abstract

Feature Maximization is a feature selection method that deals efficiently with textual data: to design systems that are altogether language-agnostic, parameter-free and do not require additional corpora to function. We propose to evaluate its use in text summarization, in particular in cases where documents are structured. We first experiment this approach in a single-document summarization context. We evaluate it on the DUC AQUAINT corpus and show that despite the unstructured nature of the corpus, our system is above the baseline and produces encouraging results. We also observe that the produced summaries seem robust to redundancy. Next, we evaluate our method in the more appropriate context of SciSumm challenge, which is dedicated to research publications summarization. These publications are structured in sections and our class-based approach is thus relevant. We more specifically focus on the task that aims to summarize papers using those that refer to them. We consider and evaluate several systems using our approach dealing with specific bag of words. Furthermore, in these systems, we also evaluate cosine and graph-based distance for sentence weighting and comparison. We show that our Feature Maximization based approach performs very well in the SciSumm 2016 context for the considered task, providing better results than the known results so far, and obtaining high recall. We thus demonstrate the flexibility and the relevance of Feature Maximization in this context.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence to automate the systematic review of scientific literature

Article Open access 11 May 2023

Automated identification of media bias in news articles: an interdisciplinary literature review

Article Open access 16 November 2018

A novel feature and class-based globalization technique for text classification

Article 25 April 2023

Notes

The 2nd Computational Linguistics Scientific Document Summarization Shared Task (CL-SciSumm 2016), http://wing.comp.nus.edu.sg/cl-scisumm2016/
In this paper, we always consider only one reference summary, but there may be several ones created by distinct human annotators for example.
The choice of the weighting scheme is not really constrained by the approach instead of producing positive values. Such a scheme is supposed to figure out the significance (i.e., semantic and importance) of the feature for the data. Feature recall is a scale-independent measure but feature predominance is not. We have, however, shown experimentally that the F-measure which is a combination of these two measures is only weakly influenced by feature scaling. Nevertheless, to guarantee full scale-independent behavior for this measure, data may be standardized.
The Document Understanding Conference.
Query-focused summarization.

References

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Cao, Z., Li, W., Wu, D.: Polyu at cl-scisumm 2016. In: BIRNDL@ JCDL, pp. 132–138 (2016)
Clauset, A., Shalizi, C.R., Newman, M.E.J.: Power-law distributions in empirical data. SIAM Rev. 51(4), 661–703 (2009)
Article MathSciNet MATH Google Scholar
Cohan, A., Goharian, N.: Revisiting Summarization Evaluation for Scientific Articles. arXiv preprint arXiv:1604.00400 (2016)
Collins, A.M., Loftus, E.F.: A spreading-activation theory of semantic processing. Psychol. Rev. 82(6), 407 (1975)
Article Google Scholar
Conroy, J.M., O’leary, D.P.: Text summarization via hidden markov models. In: SIGIR, pp. 406–407 (2001)
Crestani, F.: Application of spreading activation techniques in information retrieval. Artif. Intell. Rev. 11(6), 453–482 (1997)
Article Google Scholar
Das, D., Martins, A.F.T.: A survey on automatic text summarization. Literature Survey for the Language and Statistics II course at CMU 4, 192–195 (2007)
Google Scholar
Dugué, N., Lamirel, J.-C., Cuxac, P.: Keep track of your clusters ! In: Research Challenges in Information Science (RCIS) (2016)
Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)
Article Google Scholar
Baeza-Yates, R.: Introduction to data structures and algorithms related to information retrieval. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval, Data Structures and Algorithms, pp. 13–27. Prentice-Hall (1992)
Haghighi, A., Vanderwende, L.: Exploring content models for multi-document summarization. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the ACL, pp. 362–370 (2009)
Jaidka, K., Chandrasekaran, M.K., Rustagi, S., Kan, M.-Y.: Overview of the cl-scisumm 2016 shared task. In: BIRNDL@ JCDL, pp. 93–102 (2016)
Klampfl, S., Rexha, A., Kern, R.: Identifying referenced text in scientific publications by summarisation and classification techniques. In: BIRNDL@ JCDL, pp. 122–131 (2016)
Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: ACM SIGIR, pp. 68–73 (1995)
Lamirel, J.-C., Cuxac, P., Chivukula, A.S., Hajlaoui, K.: A new feature selection and feature contrasting approach based on quality metric: application to efficient classification of complex textual data. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 367–378. Springer, Berlin (2013)
Lamirel, J.-C., Dugué, N., Cuxac, P.: New efficient clustering quality indexes. In: International Joint Conference on Neural Networks (2016)
Lamirel, J.-C., Dugué, N., Cuxac, P.: Performing and visualizing temporal analysis of large text data issued for open sources: past and future methods. In: Beyond Databases, Architectures and Structures (2016)
Lamirel, J.-C., Falk, I., Gardent, C.: Federating clustering and cluster labelling capabilities with a single approach based on feature maximization: French verb classes identification with igngf neural clustering. Neurocomputing 147, 136–146 (2015)
Article Google Scholar
Lamirel, J.-C., Ta, A.P., Attik, M.: Novel labeling strategies for hierarchical representation of multidimensional data analysis results. In: IASTED International Conference on Artificial Intelligence and Applications (2008)
Li, L., Mao, L., Zhang, Y., Chi, J., Huang, T., Cong, X., Peng, H.: Cist system for cl-scisumm 2016 shared task. In: BIRNDL@ JCDL, pp. 156–167 (2016)
Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out: the ACL-04 workshop, vol. 8 (2004)
Lin, C.-Y., Hovy, E.: The automated acquisition of topic signatures for text summarization. In: 18th Conference on Computational Linguistics, vol. 1, pp. 495–501 (2000)
Lloret, E.: Text summarisation based on human language technologies and its applications. Ph.D. Thesis, Universidad de Alicante (2015)
Lu, K., Mao, J., Li, G., Xu, J.: Recognizing reference spans and classifying their discourse facets. In: BIRNDL@ JCDL, pp. 139–145 (2016)
Malenfant, B., Lapalme, G.: Rali system description for cl-scisumm 2016 shared task. In: BIRNDL@ JCDL, pp. 146–155 (2016)
Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Lin, D., Wu, D. (eds.) Proceedings of EMNLP 2004, pp. 404–411. Association for Computational Linguistics, Barcelona, Spain (2004)
Moraes, L., Baki, S., Verma, R., Lee, D.: University of houston at cl-scisumm 2016: Svms with tree kernels and sentence similarity. In: BIRNDL@ JCDL, pp. 113–121 (2016)
Nenkova, A., Maskey, S., Liu, Y.: Automatic summarization. In: 49th Annual Meeting of the ACL: Tutorial Abstracts, p. 3 (2011)
Nicolas, D., Lamirel, J.-C.: Une métrique de sélection de variables appliquée à la centralité et à la détection des roles communautaires. In: EGC (2017)
Nomoto, Ta.: Neal: a neurally enhanced approach to linking citation and reference. In: BIRNDL@ JCDL, pp. 168–174 (2016)
Saggion, H., AbuRaed, A., Ronzano, F.: Trainable citation-enhanced summarization of scientific articles. In: BIRNDL@ JCDL (2016)
Tata, S., Patel, J.M.: Estimating the selectivity of tf-idf based cosine similarity predicates. ACM Sigmod Rec. 36(2), 7–12 (2007)
Article Google Scholar
Vanderwende, L., Suzuki, H., Brockett, C., Nenkova, A.: Beyond sumbasic: task-focused summarization with sentence simplification and lexical expansion. Inf. Process. Manag. 43(6), 1606–1618 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

ATILF, Nancy, France
Hazem Al Saied
LIUM, Université du Maine, Le Mans, France
Nicolas Dugué
LORIA, SYNALP, Nancy, France
Jean-Charles Lamirel

Authors

Hazem Al Saied
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Dugué
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Charles Lamirel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicolas Dugué.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al Saied, H., Dugué, N. & Lamirel, JC. Automatic summarization of scientific publications using a feature selection approach. Int J Digit Libr 19, 203–215 (2018). https://doi.org/10.1007/s00799-017-0214-x

Download citation

Received: 27 November 2016
Revised: 24 March 2017
Accepted: 27 March 2017
Published: 13 April 2017
Issue Date: September 2018
DOI: https://doi.org/10.1007/s00799-017-0214-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic summarization of scientific publications using a feature selection approach

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence to automate the systematic review of scientific literature

Automated identification of media bias in news articles: an interdisciplinary literature review

A novel feature and class-based globalization technique for text classification

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic summarization of scientific publications using a feature selection approach

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence to automate the systematic review of scientific literature

Automated identification of media bias in news articles: an interdisciplinary literature review

A novel feature and class-based globalization technique for text classification

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation