MFSRank: An Unsupervised Method to Extract Keyphrases Using Semantic Information

  • Roque Enrique López
  • Dennis Barreda
  • Javier Tejada
  • Ernesto Cuadros
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7094)

Abstract

This paper presents an unsupervised graph-based method to extract keyphrases using semantic information. The proposed method has two stages. In the first one, we have extracted MFS (Maximal Frequent Sequences) and built the nodes of a graph with them. The weight of the connection between two nodes has been established according to common statistical information and semantic relatedness. In the second stage, we have ranked MFS with traditionally PageRank algorithm; but we have included ConceptNet. This external resource adds an extra weight value between two MFS. The experimental results are competitive with traditional approaches developed in this area. MFSRank overcomes the baseline for top 5 keyphrases in precision, recall and F-score measures.

Keywords

Keyphrase Extraction Maximal frequent sequences Semantic Graphs 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Jianga, X., Hub, Y., Lib, H.: A ranking Approach to Keyphrase Extraction. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009, pp. 756–757 (2009)Google Scholar
  2. 2.
    Gelbukh, A., Sidorov, G., Guzmán-Arenas, A.: Use of a Weighted Topic Hierarchy for Document Classification. In: Matoušek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds.) TSD 1999. LNCS (LNAI), vol. 1692, pp. 133–138. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  3. 3.
    Ledo Mezquita, Y., Sidorov, G., Gelbukh, A.: Tool for Computer-Aided Spanish Word Sense Disambiguation. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 277–280. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  4. 4.
    Gelbukh, A., Sidorov, G., Galicia Haro, S., Bolshakov, I.: Environment for Development of a Natural Language Syntactic Analyzer. Acta Academia 2002, 206–213 (2002)Google Scholar
  5. 5.
    Kim, S.N., Medelyan, O., Kan, M.Y., Baldwin, T.: SemEval-2010 task 5: Automatic keyphrase extraction from scientific articles. In: Proceedings of the 5th International Workshop on Semantic Evaluation, pp. 21–26 (2010)Google Scholar
  6. 6.
    Xiaojun, W., Jianguo, X.: Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the 23rd National Conference on Artificial Intelligence, vol. 2, pp. 855–860 (2008)Google Scholar
  7. 7.
    Rada, M., Paul, T.: TextRank: Bringing order into texts. In: Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)Google Scholar
  8. 8.
    Xiaojun, W., Jianwu, Y., Jianguo, X.: Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 552–559 (2007)Google Scholar
  9. 9.
    Kazi, S.H., Vincent, N.: Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 365–373 (2010)Google Scholar
  10. 10.
    Roberto, O., David, P., Mireya, T., Héctor, J.: BUAP: An unsupervised approach to automatic keyphrase extraction from scientific articles. In: Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval 2010), pp. 174–177 (2010)Google Scholar
  11. 11.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The Pagerank Citation Ranking: Bringing Order to the Web. Technical report, Stanford Digital Libraries (1998)Google Scholar
  12. 12.
    Sandra, G., Roxana, D., Paolo, R.: Drug-Drug Interaction Detection: A New Approach Based on Maximal Frequent Sequences. Procesamientto de Lenguje Natural 45 (2010)Google Scholar
  13. 13.
    Helena, A.M.: Discovery of Frequent Word Sequences in Text. In: Proceedings of the ESF Exploratory Workshop on Pattern Detection and Discovery, pp. 180–189 (2002)Google Scholar
  14. 14.
    Liu, H., Singh, P.: ConceptNet: A Practical Commonsense Reasoning Tool-Kit. BT Technology Journal 22 (2004)Google Scholar
  15. 15.
    Liu, H., Singh, P.: Commonsense Reasoning in and Over Natural Language. In: Negoita, M.G., Howlett, R.J., Jain, L.C. (eds.) KES 2004. LNCS (LNAI), vol. 3215, pp. 293–306. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  16. 16.
    Ledeneva, Y., Gelbukh, A., García-Hernández, R.: Keeping Maximal Frequent Sequences Facilitates Extractive Summarization. In: Sidorov, G., et al. (eds.) Advances in Computer Science and Engineering, 9th Conference on Computing (CORE 2008), Research in Computing Science, vol. 34, pp. 163–174 (2008)Google Scholar
  17. 17.
    Ian, H.W., Gordon, W.P., Eibe, F., Carl, G., Craig, G.: KEA: Practical automatic keyphrase extraction. In: Proceedings of the Fourth ACM Conference on Digital Libraries (DL 1999), pp. 254–255. ACM (1999)Google Scholar
  18. 18.
    Chong, H., Yonghong, T., Zhi, Z., Charles, X.L., Tiejun, H.: Keyphrase extraction using semantic networks structure analysis. In: Proc. of the ICDM 2006, pp. 275–284 (2006)Google Scholar
  19. 19.
    Peter, D.: Learning Algorithms for Keyphrase Extraction. Inf. Retr. 2(4), 303–336 (2006)Google Scholar
  20. 20.
    Porter, M.F.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Roque Enrique López
    • 1
  • Dennis Barreda
    • 2
  • Javier Tejada
    • 2
  • Ernesto Cuadros
    • 2
  1. 1.School of System EngineeringSan Agustin National UniversityPerú
  2. 2.School of Computer ScienceSan Pablo Catholic UniversityPerú

Personalised recommendations