SemCluster: Unsupervised Automatic Keyphrase Extraction Using Affinity Propagation

  • Hassan H. Alrehamy
  • Coral WalkerEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 650)


Keyphrases provide important semantic metadata for organizing and managing free-text documents. As data grow exponentially, there is a pressing demand for automatic and efficient keyphrase extraction methods. We introduce in this paper SemCluster, a clustering-based unsupervised keyphrase extraction method. By integrating an internal ontology (i.e., WordNet) with external knowledge sources, SemCluster identifies and extracts semantically important terms from a given document, clusters the terms, and, using the clustering results as heuristics, identifies the most representative phrases and singles them out as keyphrases. SemCluster is evaluated against two baseline unsupervised methods, TextRank and KeyCluster, over the Inspec dataset under an F1-measure metric. The evaluation results clearly show that SemCluster outperforms both methods.


Keyphrase extraction Clustering-based AKE Unsupervised AKE 


  1. 1.
    Turney, P.D.: Learning algorithms for keyphrase extraction. Inf. Retr. 2(4), 303–336 (2000)CrossRefGoogle Scholar
  2. 2.
    Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 1262–1273 (2014)Google Scholar
  3. 3.
    Washio, T., Motoda, H.: State of the art of graph-based data mining. ACM SIGKDD Explor. Newsl. 5(1), 59–68 (2003)CrossRefGoogle Scholar
  4. 4.
    Sonowane, S.S., Kulkarni, P.A.: Graph based representation and analysis of text document: a survey of techniques. Int. J. Comput. Appl. 96, 1–8 (2014)Google Scholar
  5. 5.
    Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. Association for Computational Linguistics (2004)Google Scholar
  6. 6.
    Liu, Z., Li, P., Zheng, Y., Sun, M.: Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 257–266 (2009)Google Scholar
  7. 7.
    Steier, A.M., Belew, R.K.: Exporting phrases: a statistical analysis of topical language. In: Second Symposium on Document Analysis and Information Retrieval, pp. 179–190 (1993)Google Scholar
  8. 8.
    Barker, K., Cornacchia, N.: Using noun phrase heads to extract document keyphrases. In: Conference of the Canadian Society for Computational Studies of Intelligence, pp. 40–52. Springer (2000)Google Scholar
  9. 9.
    Litvak, M., Last, M.: Graph-based keyword extraction for single-document summarization. In: Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization, pp. 17–24 (2008)Google Scholar
  10. 10.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Technical Report, Stanford InfoLab (1999)Google Scholar
  11. 11.
    Tsatsaronis, G., Varlamis, I., Nrvg, K.: SemanticRank: ranking keywords and sentences using semantic graphs. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1074–1082 (1999)Google Scholar
  12. 12.
    Bracewell, D.B., Ren, F., Kuriowa, S.: Multilingual single document keyword extraction for information retrieval. In: Proceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering, pp. 517–522 (2005)Google Scholar
  13. 13.
    Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, pp. 216–223. Association for Computational Linguistics (2003)Google Scholar
  14. 14.
    Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Introduction to WordNet: an on-line lexical database. Int. J. Lexicogr. 3(4), 235–244 (1990)CrossRefGoogle Scholar
  15. 15.
    Alrehamy, H., Walker, C.: Personal data lake with data gravity pull. In: Proceedings of the 2015 IEEE Fifth International Conference on Big Data and Cloud Computing, pp. 160–167 (2015)Google Scholar
  16. 16.
    Navigli, R.: Word sense disambiguation: a survey. ACM Comput. Surv. 41(2) (2009)Google Scholar
  17. 17.
    Patwardhan, S., Banerjee, S., Pedersen, T.: SenseRelate::TargetWord: a generalized framework for word sense disambiguation. In Proceedings of the ACL 2005 on Interactive Poster and Demonstration Sessions, pp. 73–76. Association for Computational Linguistics (2005)Google Scholar
  18. 18.
    Meng, L., Huang, R., Gu, J.: A review of semantic similarity measures in WordNet. Int. J. Hybrid Inf. Technol. 6(1), 1–12 (2013)Google Scholar
  19. 19.
    Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics (1994)Google Scholar
  20. 20.
    Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Liu, F., Pennell, D., Liu, F., Liu, Y.: Unsupervised approaches for automatic keyword extraction using meeting transcripts. In: Proceedings of Human Language Technologies, pp. 620–628. Association for Computational Linguistics (2009)Google Scholar
  22. 22.
    Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 1606–1611 (2007)Google Scholar
  23. 23.
    Witten, I., Milne, D.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 25–30. AAAI Press, Chicago (2008)Google Scholar
  24. 24.
    Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(2), 281–305 (2012)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint (1997)
  26. 26.
    Moro, A., Cecconi, F., Navigli, R.: Multilingual word sense disambiguation and entity linking for everybody. In: Proceedings of the 2014 International Conference on Posters and Demonstrations, pp. 25–28. (2014)Google Scholar
  27. 27.
    Guan, R., Shi, X., Marchese, M., Yang, C., Liang, Y.: Text clustering with seeds affinity propagation. IEEE Trans. Knowl. Data Eng. 23(4), 627–637 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.School of Computer Science and InformaticsCardiff UniversityCardiffUK

Personalised recommendations