A Text Feature Based Automatic Keyword Extraction Method for Single Documents

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10772)


In this work, we propose a lightweight approach for keyword extraction and ranking based on an unsupervised methodology to select the most important keywords of a single document. To understand the merits of our proposal, we compare it against RAKE, TextRank and SingleRank methods (three well-known unsupervised approaches) and the baseline TF.IDF, over four different collections to illustrate the generality of our approach. The experimental results suggest that extracting keywords from documents using our method results in a superior effectiveness when compared to similar approaches.


Keyword extraction Information extraction Feature extraction 



This work is partially funded by the ERDF through the COMPETE 2020 Programme within project POCI-01-0145-FEDER-006961, and by National Funds through the FCT as part of project UID/EEA/50014/2013 and of project UID/MAT/00212/2013. It was also financed by MIC SCOPE (171507010) and by Project “TEC4Growth - Pervasive Intelligence, Enhancers and Proofs of Concept with Industrial Impact/NORTE-01-0145-FEDER-000020” which is financed by the NORTE 2020, under the PORTUGAL 2020, and through the ERDF.


  1. 1.
    Aquino, G., Lanzarini, L.: Keyword identification in Spanish documents using neural networks. J. Comput. Sci. Technol. 15(2), 55–60 (2015)Google Scholar
  2. 2.
    Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: YAKE! collection-independent automatic keyword extractor. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018, LNCS, vol. 10772, pp. 806–810. Springer, Cham (2018)Google Scholar
  3. 3.
    Kim, S., Medelyan, O., Kan, M.-Y., Baldwin, T.: SemEval-2010 task 5: automatic keyphrase extraction from scientific articles. In: SemEval 2010, Sweden, pp. 21–26 (2010)Google Scholar
  4. 4.
    Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Marujo, L., Viveiros, M., Neto, J.: Keyphrase cloud generation of broadcast news. In: arXiv (2013)Google Scholar
  6. 6.
    Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. J. Artif. Intell. Tools 13(1), 157–169 (2004)CrossRefGoogle Scholar
  7. 7.
    Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: EMNLP 2004, pp. 404–411 (2004)Google Scholar
  8. 8.
    Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic Keyword Extraction from Individual Documents. Text Mining: Theory and Applications. Wiley, Chichester (2010)Google Scholar
  9. 9.
    Schutz, A.T.: Keyphrase extraction from single documents in the open domain exploiting linguistic and statistical methods. Master thesis, National University of Ireland (2008)Google Scholar
  10. 10.
    Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: AAAI 2008, 13–17 July, pp. 855–860 (2008)Google Scholar
  11. 11.
    Witten, I., Paynter, G., Frank, E., Gutwin, C., Nevill-Manning, C.: KEA: practical automatic keyphrase extraction. In: Proceedings of the JCDL 2004, 7–11 June, pp. 254–255 (1999)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Polytechnic Institute of TomarTomarPortugal
  2. 2.LIAAD – INESC TECPortoPortugal
  3. 3.DCC – FCUPUniversity of PortoPortoPortugal
  4. 4.University of Beira InteriorCovilhãPortugal
  5. 5.Kyoto UniversityKyotoJapan

Personalised recommendations