Extracting Keyphrase Set with High Diversity and Coverage Using Structural SVM

  • Weijian Ni
  • Tong Liu
  • Qingtian Zeng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7235)


Keyphrase extraction plays an important role in automatic document understanding. In order to obtain concise and comprehensive information about the content of document, the keyphrases extracted from a given document should meet two requirements. First, the keyphrases should be diverse to each other so as to avoid carrying duplicated information. Second, every keyphrases should cover various aspects of the topics in the document so as to avoid unnecessary information loss. In this paper, we address the issue of automatic keyphrases extraction, giving the emphasis on the diversity and coverage of keyphrases which is generally ignored in most conventional keyphrase extraction approaches. Specifically, the issue is formulated as a subset learning problem in the framework of structural learning and structural SVM is employed to preform the task. Experiments on a scientific literature dataset show that our approach outperforms several state-of-the-art keyphrase extraction approaches, which verifies the benefits of explicit diversity and coverage enhancement.


Loss Function Hill Climbing Algorithm Candidate Phrase Keyword Extraction Keyphrase Extraction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lehtonen, M., Doucet, A.: Enhancing Keyword Search with a Keyphrase Index. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 65–70. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  2. 2.
    Wu, Y., Li, Q.: Document Keyphrases as Subject Metadata: Incorporating Document Key Concepts in Search Results. Information Retrieval 11, 229–249 (2008)CrossRefGoogle Scholar
  3. 3.
    Turney, P.D.: Learning Algorithms for Keyphrase Extraction. Information Retrieval 2, 303–336 (2000)CrossRefGoogle Scholar
  4. 4.
    Hulth, A.: Improved Automatic Keyword Extraction Given More Linguistic Knowledge. In: Proceedings of EMNLP, pp. 216–223 (2003)Google Scholar
  5. 5.
    Medelyan, O., Witten, I.H.: Thesaurus Based Automatic Keyphrase Indexing. In: Proceedings of JCDL, pp. 296–297 (2006)Google Scholar
  6. 6.
    Jiang, X., Hu, Y., Li, H.: A Ranking Approach to Keyphrase Extraction. In: Proceedings of SIGIR, pp. 756–757 (2009)Google Scholar
  7. 7.
    Yih, W., Goodman, J., Carvalho, V.R.: Finding Advertising Keywords on Web Pages. In: Proceedings of WWW, pp. 213–222 (2006)Google Scholar
  8. 8.
    Li, Z., Zhou, D., Juan, Y., Han, J.: Keyword Extraction for Social Snippets. In: Proceedings of WWW, pp. 1143–1144 (2010)Google Scholar
  9. 9.
    Mihalcea, R., Tarau, P.: TextRank: Bringing Order into Texts. In: Proceedings of EMNLP, pp. 404–411 (2004)Google Scholar
  10. 10.
    Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic Keyphrase Extraction via Topic Decomposition. In: Proceedings of EMNLP, pp. 366–376 (2010)Google Scholar
  11. 11.
    Wan, X., Yang, J., Xiao, J.: Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction. In: Proceedings of ACL, pp. 552–559 (2007)Google Scholar
  12. 12.
    Grineva, M., Grinev, M., Lizorkin, D.: Extracting Key Terms From Noisy and Multi-theme Documents. In: Proceedings of WWW, pp. 661–670 (2009)Google Scholar
  13. 13.
    Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large Margin Methods for Structured and Interdependent Output Variables. JMLR, 1453–1484 (2005)Google Scholar
  14. 14.
    Joachims, T., Finley, T., Yu, C.J.: Cutting-plane training of structural SVMs. Machine Learning, 27–59 (2009)Google Scholar
  15. 15.
    Yu, C.J., Joachims, T.: Training Structural SVMs with Kernels Using Sampled Cuts. In: Proceeding of SIGKDD, pp. 794–802 (2008)Google Scholar
  16. 16.
    Sarawagi, S., Gupta, R.: Accurate Max-Margin Training for Structured Output Spaces. In: Proceedings of ICML, pp. 888–895 (2008)Google Scholar
  17. 17.
    Yue, Y., Joachims, T.: Predicting Diverse Subsets Using Structural SVMs. In: Proceedings of ICML, pp. 1224–1231 (2008)Google Scholar
  18. 18.
    Zhu, L., Chen, Y., Yuille, A., Freeman, W.: Latent Hierarchical Structural Learning for Object Detection. In: Proceedings of CVPR, pp. 1062–1069 (2010)Google Scholar
  19. 19.
    Wan, S., Angryk, R.A.: Measuring semantic similarity using wordnet-based context vectors. In: Proceedings of IEEE ICMSC, pp. 908–913 (2007)Google Scholar
  20. 20.
    Islam, A., Inkpen, D.: Semantic Text Similarity Using Corpus-Based Word Similarity and String Similarity. ACM TKDE 2, 10–25 (2008)Google Scholar
  21. 21.
    Sahami, M., Heilman, T.D.: A Web-based Kernel Function for Measuring the Similarity of Short Text Snippets. In: Proceedings of WWW, pp. 377–386 (2006)Google Scholar
  22. 22.
    Miller, G.A.: WordNet: A Lexical Database for English. Communications of the ACM 38, 39–41 (1995)CrossRefGoogle Scholar
  23. 23.
    Landauer, T., Foltz, P.W., Laham, D.: Introduction to Latent Semantic Analysis. Discourse Processes 25, 259–284 (1998)CrossRefGoogle Scholar
  24. 24.
    Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C.: KEA: Practical Automatic Keyphrase Extraction. In: Proceedings of JCDL, pp. 254–255 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Weijian Ni
    • 1
  • Tong Liu
    • 1
  • Qingtian Zeng
    • 1
  1. 1.Shandong University of Science and TechnologyQingdaoP.R. China

Personalised recommendations