Skip to main content
Log in

Short text keyphrase extraction with hypergraphs

  • Regular Paper
  • Published:
Progress in Artificial Intelligence Aims and scope Submit manuscript

Abstract

Graph-based ranking for keyphrase extraction has become an important approach for measuring saliency scores in text due to its ability to capture the context. By modeling words as vertices and the co-occurrence relation between words as edges, the importance of words is measured from the whole graph. However, graphs by nature can only capture the pair-wise relation between vertices. Therefore, it is not clear if graphs can capture high-order relations of more than two words. In this paper, we propose to use a hypergraph to capture high-order relations appearing in short documents, and use such information to infer better ranking of words. Additionally, we model the temporal and social attributes of short documents and discriminative weights of words into the hypergraph as weights which give us the ability of capturing recent and topical keyphrases. Furthermore, to rank vertices in the proposed hypergraph, we propose a probabilistic random walk that takes into account weights of both vertices and hyperedges. We show the effectiveness of our approach by conducting extensive experiments over two different data sets which demonstrate the robustness of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. www.twitter.com.

  2. https://blog.twitter.com/2013/new-tweets-per-second-record-and-how.

  3. http://kavita-ganesan.com/opinosis-opinion-dataset.

  4. www.mturk.com.

  5. http://www.noslang.com/dictionary/full/.

  6. http://www.ark.cs.cmu.edu/TweetNLP/.

  7. http://tartarus.org/martin/PorterStemmer/.

References

  1. Agarwal, A., Chakrabarti, S.: Learning random walks to rank nodes in graphs. In: Proceedings of the 24th international conference on machine learning, pp. 9–16. ACM, New York, NY (2007)

  2. Agarwal, A., Chakrabarti, S., Aggarwal, S.: Learning to rank networked entities. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 14–23. ACM, New York, NY (2006)

  3. Agarwal, S., Branson, K., Belongie, S.: Higher order learning with graphs. In: Proceedings of the 23rd international conference on machine learning, pp. 17–24. ACM, New York, NY (2006)

  4. Aldous, D., Fill, J.: Reversible markov chains and random walks on graphs (2002)

  5. Avin, C., Lando, Y., Lotker, Z.: Radio cover time in hyper-graphs. In: Proceedings of the 6th international workshop on foundations of mobile computing, pp. 3–12. ACM, New York, NY (2010)

  6. Backstrom, L., Leskovec, J.: Supervised random walks: predicting and recommending links in social networks. In: Proceedings of the fourth ACM international conference on web search and data mining, pp. 635–644. ACM, New York, NY (2011)

  7. Bellaachia, A., Al-Dhelaan, M.: Learning from twitter hashtags: leveraging proximate tags to enhance graph-based keyphrase extraction. In: Proceedings of the 2012 IEEE international conference on green computing and communications, pp. 348–357. IEEE Comput. Soc., Washington, DC (2012)

  8. Bellaachia, A., Al-Dhelaan, M.: Ne-rank: a novel graph-based keyphrase extraction in twitter. In: Proceedings of the 2012 IEEE/WIC/ACM international joint conferences on web intelligence and intelligent agent technology, vol. 01, pp. 372–379. IEEE Comput Soc (2012)

  9. Bellaachia, A., Al-Dhelaan, M.: Random walks in hypergraph. In: Proceedings of the 2013 international conference on applied mathematics and computational method, pp. 187–194. Europment (2013)

  10. Bellaachia, A., Al-Dhelaan, M.: HG-Rank: A hypergraph-based keyphrase extraction for short documents in dynamic genre. In: 4th workshop on making sense of microposts (#Microposts2014), pp. 42–49 (2014)

  11. Berge, C.: Hypergraphs: combinatorics of finite sets. North holland (1984)

  12. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  13. Bougouin, A., Boudin, F., Daille, B.: Topicrank: Graph-based topic ranking for keyphrase extraction. In: Proceedings of the sixth international joint conference on natural language processing, pp. 543–551. Asian federation of natural language processing (2013)

  14. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the seventh international conference on world wide web 7, pp. 107–117. Elsevier B.V., Amsterdam, The Netherlands (1998)

  15. Bu, J., Tan, S., Chen, C., Wang, C., Wu, H., Zhang, L., He, X.: Music recommendation by unified hypergraph: combining social media information and music content. In: Proceedings of the international conference on multimedia, pp. 391–400. ACM, New York, NY (2010)

  16. Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval, pp. 25–32. ACM, New York, NY (2004)

  17. Cooper, C., Frieze, A., Radzik, T.: The cover times of random walks on random uniform hypergraphs. Theoretical Computer Science, vol. 6796, pp. 210–221. Springer, Berlin, Heidelberg (2013)

  18. Diligenti, M., Gori, M., Maggini, M.: Learning web page scores by error back-propagation. In: Proceedings of the 19th international joint conference on artificial intelligence, pp. 684–689. Morgan Kaufmann Publishers Inc., San Francisco, CA (2005)

  19. Eisenstein, J.: What to do about bad language on the internet. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies, pp. 359–369. Association for computational linguistics, Atlanta (2013)

  20. Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in summarization. J. Artif. Intell. Res. 22(1), 457–479 (2004)

    Google Scholar 

  21. Ganesan, K., Zhai, C., Han, J.: Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions. In: Proceedings of the 23rd international conference on computational linguistics, pp. 340–348. Association for computational linguistics Stroudsburg, PA, (2010)

  22. Gao, Y., Liu, J., Ma, P.: The hot keyphrase extraction based on tf pdf. In: The 2011 IEEE 10th international conference on trust, security and privacy in computing and communications (TrustCom), pp. 1524–1528 (2011)

  23. Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Collins, M., Steedman, M. (eds.) Proceedings of the 2003 conference on empirical methods in natural language processing, pp. 216–223 (2003)

  24. Jabeur, L., Tamine, L., Boughanem, M.: Featured tweet search: modeling time and social influence for microblog retrieval. In: Proceedings of the 2012 IEEE/WIC/ACM international joint conferences on web intelligence and intelligent agent technology, pp. 166–173 (2012)

  25. Jamali, M., Ester, M.: Trustwalker: a random walk model for combining trust-based and item-based recommendation. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 397–406. ACM, New York, NY (2009)

  26. Jarvis, J., Shier, D.R.: Graph-theoretic analysis of finite markov chains. Applied mathematical modeling: a multidisciplinary approach (1999)

  27. Lee, J., Cho, M., Lee, K.M.: Hyper-graph matching via reweighted random walks. In: Proceedings of the 2011 IEEE conference on computer vision and pattern recognition, pp. 1633–1640. IEEE, Washington, DC (2011)

  28. Li, D., Li, S.: Hypergraph-based inductive learning for generating implicit key phrases. In: Proceedings of the 20th international conference companion on world wide web, pp. 77–78. ACM, New York, NY (2011)

  29. Li, D., Li, S., Li, W., Wang, W., Qu, W.: A semi-supervised key phrase extraction approach: learning from title phrases through a document semantic network. In: Proceedings of the ACL 2010 conference short papers, pp. 296–300. Association for computational linguistics, Stroudsburg, PA (2010)

  30. Li, L., Li, T.: News recommendation via hypergraph learning: encapsulation of user behavior and news content. In: Proceedings of the sixth ACM international conference on Web search and data mining, pp. 305–314. ACM, New York, NY (2013)

  31. Li, X., Liu, B., Yu, P.: Time sensitive ranking with application to publication search. In: Eighth IEEE International conference on data mining, pp. 893–898 (2008)

  32. Li, X., Su, X., Wang, M.: Social network-based recommendation: a graph random walk kernel approach. In: Proceedings of the 12th ACM/IEEE-CS joint conference on digital libraries, pp. 409–410. ACM, New York, NY (2012)

  33. Liu, H., Le Pendu, P., Jin, R., Dou, D.: A hypergraph-based method for discovering semantically associated itemsets. In: Proceedings of the 11th IEEE international conference on data mining, pp. 398–406. IEEE Computer Society, Washington, DC (2011)

  34. Liu, X., Li, Y., Wei, F., Zhou, M.: Graph-based multi-tweet summarization using social signals. In: Proceedings of COLING 2012, pp. 1699–1714. The COLING 2012 organizing committee (2012)

  35. Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 conference on empirical methods in natural language processing, pp. 366–376. Association for computational linguistics, Atlanta (2010)

  36. Lovász, L.: Random walks on graphs: a survey. Comb. Paul Erdos Eighty 2(1), 1–46 (1993)

    Google Scholar 

  37. Lu, L., Peng, X.: High-ordered random walks and generalized laplacians on hypergraphs. In: Proceedings of the 8th international conference on algorithms and models for the web graph, pp. 14–25. Springer, Berlin (2011)

  38. Medelyan, O., Witten, I.H.: Thesaurus based automatic keyphrase indexing. In: Proceedings of the 6th ACM/IEEE-CS Joint conference on digital libraries, pp. 296–297. ACM, New York, NY (2006)

  39. Mehrotra, R., Sanner, S., Buntine, W., Xie, L.: Improving lda topic models for microblogs via tweet pooling and automatic labeling. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, pp. 889–892. ACM, New York, NY (2013)

  40. Mihalcea, R., Tarau, P.: Textrank: bringing order into texts. In: Lin, D., Wu, D. (eds.) Proceedings of the 2004 conference on empirical methods in natural language processing, pp. 404–411. Association for computational linguistics (2004)

  41. Minkov, E., Cohen, W.W.: Learning to rank typed graph walks: local and global approaches. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on web mining and social network analysis, pp. 1–8. ACM, New York, NY (2007)

  42. O’Connor, B., Krieger, M., Ahn, D.: Tweetmotif: Exploratory search and topic summarization for twitter. In: Proceedings of the fourth international conference on weblogs and social media (2010)

  43. Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N., Smith, N.A.: Improved part-of-speech tagging for online conversational text with word clusters. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies, pp. 380–390. Association for computational linguistics (2013)

  44. Parikh, R., Karlapalem, K.: Et: events from tweets. In: Proceedings of the 22nd international conference on world wide web companion, pp. 613–620. International world wide web conferences steering committee, Republic and Canton of Geneva (2013)

  45. Ren, Z., Liang, S., Meij, E., de Rijke, M.: Personalized time-aware tweets summarization. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, pp. 513–522. ACM, New York, NY (2013)

  46. Sipos, R., Swaminathan, A., Shivaswamy, P., Joachims, T.: Temporal corpus summarization using submodular word coverage. In: Proceedings of the 21st ACM international conference on information and knowledge management, pp. 754–763. ACM, New York, NY (2012)

  47. Soulier, L., Jabeur, L.B., Tamine, L., Bahsoun, W.: On ranking relevant entities in heterogeneous networks using a language-based model. J. Am. Soc. Inf. Sci. Technol. 64(3), 500–515 (2013)

    Article  Google Scholar 

  48. Tan, H.K., Ngo, C.W., Wu, X.: Modeling video hyperlinks with hypergraph for web video reranking. In: Proceedings of the 16th ACM international conference on multimedia, pp. 659–662. ACM, New York, NY (2008)

  49. Tayebi, M.A., Jamali, M., Ester, M., Glässer, U., Frank, R.: Crimewalker: a recommendation model for suspect investigation. In: Proceedings of the fifth ACM conference on recommender systems, pp. 173–180. ACM, New York, NY (2011)

  50. Vempala, S.: Geometric random walks: a survey. MSRI volume on combinatorial and computational geometry (2005)

  51. Wan, X.: Timedtextrank: adding the temporal dimension to multi-document summarization. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, pp. 867–868. ACM, New York, NY (2007)

  52. Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the 23rd national conference on artificial intelligence, vol. 2, pp. 855–860. AAAI Press (2008)

  53. Wang, W., Li, S., Li, J., Li, W., Wei, F.: Exploring hypergraph-based semi-supervised ranking for query-oriented summarization. Inf. Sci. 237, 271–286 (2013)

    Article  MathSciNet  Google Scholar 

  54. Wang, W., Wei, F., Li, W., Li, S.: Hypersum: hypergraph based semi-supervised sentence ranking for query-oriented summarization. In: Proceedings of the 18th ACM conference on information and knowledge management, pp. 1855–1858. ACM, New York, NY (2009)

  55. Witten, I.H., Paynter, G.W., Frank, E., Gutwin, C., Nevill-Manning, C.G.: Kea: practical automatic keyphrase extraction. In: Proceedings of the fourth ACM conference on digital libraries, pp. 254–255. ACM, New York, NY (1999)

  56. Wu, W., Zhang, B., Ostendorf, M.: Automatic generation of personalized annotation tags for twitter users. In: Human language technologies: The 2010 annual conference of the North American chapter of the association for computational linguistics, pp. 689–692. Association for computational linguistics, Atlanta (2010)

  57. Yu, P.S., Li, X., Liu, B.: Adding the temporal dimension to search a case study in publication search. In: Proceedings of the 2005 IEEE/WIC/ACM international conference on web intelligence, pp. 543–549. IEEE computer society, Washington, DC (2005)

  58. Zhao, X., Jiang, J., He, J., Song, Y., Achanauparp, P., Lim, E.P., Li, X.: Topical keyphrase extraction from twitter. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, pp. 379–388. Association for computational linguistics (2011)

  59. Zhou, D., Huang, J., Scholkopf, B.: Learning with hypergraphs: clustering, classification, and embedding. Adv. Neural. Inf. Process. Syst. 19, 1601 (2007)

    Google Scholar 

  60. Zhou, D., Orshanskiy, S., Zha, H., Giles, C.: Co-ranking authors and documents in a heterogeneous network. In: Proceedings of the 7th IEEE international conference on data mining, pp. 739–744 (2007)

Download references

Acknowledgments

We would like to thank the anonymous reviewers for their constructive comments that helped improve the paper. The second author would like to thank King Saud University, Saudi Arabia, for their scholarship support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Al-Dhelaan.

Additional information

This article is an extension of our previous conference and workshop papers [9, 10].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bellaachia, A., Al-Dhelaan, M. Short text keyphrase extraction with hypergraphs. Prog Artif Intell 3, 73–87 (2015). https://doi.org/10.1007/s13748-014-0058-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13748-014-0058-1

Keywords

Navigation