A Text Feature Based Automatic Keyword Extraction Method for Single Documents

Campos, Ricardo; Mangaravite, Vítor; Pasquali, Arian; Jorge, Alípio Mário; Nunes, Célia; Jatowt, Adam

doi:10.1007/978-3-319-76941-7_63

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10772))

Included in the following conference series:

European Conference on Information Retrieval

6056 Accesses
59 Citations
10 Altmetric

Abstract

In this work, we propose a lightweight approach for keyword extraction and ranking based on an unsupervised methodology to select the most important keywords of a single document. To understand the merits of our proposal, we compare it against RAKE, TextRank and SingleRank methods (three well-known unsupervised approaches) and the baseline TF.IDF, over four different collections to illustrate the generality of our approach. The experimental results suggest that extracting keywords from documents using our method results in a superior effectiveness when compared to similar approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Implementation available at http://www.hlt.utdallas.edu/~saidul/code.html.
2.
Implementation available at https://github.com/zelandiya/RAKE-tutorial.
3.
Implementation available at https://pypi.python.org/pypi/yake.

References

Aquino, G., Lanzarini, L.: Keyword identification in Spanish documents using neural networks. J. Comput. Sci. Technol. 15(2), 55–60 (2015)
Google Scholar
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: YAKE! collection-independent automatic keyword extractor. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018, LNCS, vol. 10772, pp. 806–810. Springer, Cham (2018)
Google Scholar
Kim, S., Medelyan, O., Kan, M.-Y., Baldwin, T.: SemEval-2010 task 5: automatic keyphrase extraction from scientific articles. In: SemEval 2010, Sweden, pp. 21–26 (2010)
Google Scholar
Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966)
MathSciNet MATH Google Scholar
Marujo, L., Viveiros, M., Neto, J.: Keyphrase cloud generation of broadcast news. In: arXiv (2013)
Google Scholar
Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. J. Artif. Intell. Tools 13(1), 157–169 (2004)
Article Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: EMNLP 2004, pp. 404–411 (2004)
Google Scholar
Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic Keyword Extraction from Individual Documents. Text Mining: Theory and Applications. Wiley, Chichester (2010)
Google Scholar
Schutz, A.T.: Keyphrase extraction from single documents in the open domain exploiting linguistic and statistical methods. Master thesis, National University of Ireland (2008)
Google Scholar
Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: AAAI 2008, 13–17 July, pp. 855–860 (2008)
Google Scholar
Witten, I., Paynter, G., Frank, E., Gutwin, C., Nevill-Manning, C.: KEA: practical automatic keyphrase extraction. In: Proceedings of the JCDL 2004, 7–11 June, pp. 254–255 (1999)
Google Scholar

Download references

Acknowledgements

This work is partially funded by the ERDF through the COMPETE 2020 Programme within project POCI-01-0145-FEDER-006961, and by National Funds through the FCT as part of project UID/EEA/50014/2013 and of project UID/MAT/00212/2013. It was also financed by MIC SCOPE (171507010) and by Project “TEC4Growth - Pervasive Intelligence, Enhancers and Proofs of Concept with Industrial Impact/NORTE-01-0145-FEDER-000020” which is financed by the NORTE 2020, under the PORTUGAL 2020, and through the ERDF.

Author information

Authors and Affiliations

Polytechnic Institute of Tomar, Tomar, Portugal
Ricardo Campos
LIAAD – INESC TEC, Porto, Portugal
Ricardo Campos, Vítor Mangaravite, Arian Pasquali & Alípio Mário Jorge
DCC – FCUP, University of Porto, Porto, Portugal
Alípio Mário Jorge
University of Beira Interior, Covilhã, Portugal
Célia Nunes
Kyoto University, Kyoto, Japan
Adam Jatowt

Authors

Ricardo Campos
View author publications
You can also search for this author in PubMed Google Scholar
Vítor Mangaravite
View author publications
You can also search for this author in PubMed Google Scholar
Arian Pasquali
View author publications
You can also search for this author in PubMed Google Scholar
Alípio Mário Jorge
View author publications
You can also search for this author in PubMed Google Scholar
Célia Nunes
View author publications
You can also search for this author in PubMed Google Scholar
Adam Jatowt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ricardo Campos .

Editor information

Editors and Affiliations

Department of Informatics, Systems, and Communication, University of Milano-Bicocca, Milan, Italy
Gabriella Pasi
LIP6 – UPMC/CNRS, University Pierre et Marie Curie, Paris, France
Benjamin Piwowarski
University of Glasgow, Glasgow, United Kingdom
Leif Azzopardi
Technical University of Vienna, Vienna, Austria
Allan Hanbury

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Campos, R., Mangaravite, V., Pasquali, A., Jorge, A.M., Nunes, C., Jatowt, A. (2018). A Text Feature Based Automatic Keyword Extraction Method for Single Documents. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds) Advances in Information Retrieval. ECIR 2018. Lecture Notes in Computer Science(), vol 10772. Springer, Cham. https://doi.org/10.1007/978-3-319-76941-7_63

Download citation

DOI: https://doi.org/10.1007/978-3-319-76941-7_63
Published: 01 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76940-0
Online ISBN: 978-3-319-76941-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics