Learning from Titles to Recommend Keywords for Academic Papers
With the increasing number of scientific papers, it is difficult for researchers to locate the most relevant and important keywords from the vast majority of papers and establish the research focus and preliminaries. Based on the commonly accepted assumption that the title of a document is always elaborated to reflect the content of a document and consequently keywords tend to be closely related to the title, a keyword ranking from paper titles involving both real-time and authoritativeness is presented in this paper. We suggest exploring paper titles as a weighted hypergraph and random walk is performed, which considers weights of both hyper-edges and hyper-vertices to model short documents social features as well as discriminative weights respectively, while measuring the centrality of words in the hyper-graph to obtain the recommended keywords. Experimental results demonstrate that the proposed approach is robust for extracting keywords from short texts.
KeywordsExtraction Weighted Hyper-graph Weighting strategy Word correlation Random walk
This work is supported by the National Natural Science Foundation of China (No. 61762078, 61363058, 61862058), Gansu province college students’ innovation and entrepreneurship training program (201610736041), and Guangxi Key Laboratory of Trusted Software (No. kx201705).
- 1.Blank, I., Rokach, L., Shani, G.: Leveraging the citation graph to recommend keywords. In: 7th ACM Conference on Recommender Systems, pp. 359-362. ACM, Hong Kong (2013)Google Scholar
- 3.Ma, H., Xing, Y., Wang, S., Li, M.: Leveraging term co-occurrence distance and strong classification features for short text feature selection. In: Li, G., Ge, Y., Zhang, Z., Jin, Z., Blumenstein, M. (eds.) KSEM 2017. LNCS (LNAI), vol. 10412, pp. 67–75. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63558-3_6CrossRefGoogle Scholar
- 4.Hua, W., Wang, Z., Wang, H., Zhou, X.F.: Short text understanding through lexical-semantic analysis. In: 31st International Conference on Data Engineering, pp. 495–506. IEEE Computer Society, Seoul (2015)Google Scholar
- 5.Blei, D.M., Ng, A.Y., Jordan, M,I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(1), 993–1022 (2012)Google Scholar
- 7.Qiang, J., Chen, P., Wang, T., Wu, X.: Topic modeling over short texts by incorporating word embeddings. In: Kim, J., Shim, K., Cao, L., Lee, J.-G., Lin, X., Moon, Y.-S. (eds.) PAKDD 2017. LNCS (LNAI), vol. 10235, pp. 363–374. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57529-2_29CrossRefGoogle Scholar
- 9.Abilhoa, W.D., Castro, L.N.: A keyword extraction method from twitter messages represented as graphs. Appl. Math. Comput. 240(4), 308–325 (2014)Google Scholar
- 11.Zhou, D., Huang, J.: Learning with hypergraphs: clustering, classification, and embedding. In: 20th International Conference on Neural Information Processing Systems, pp. 1601–1608. British Columbia (2006)Google Scholar
- 13.Bellaachia, A., Mohammed, A.: HG-RANK: A Hypergraph-based keyphrase extraction for short documents in dynamic genre. In: 4th Workshop on Making Sense of Microposts Co-located with the 23rd International World Wide Web Conference, pp. 42–49. Microposts2014, Seoul (2014)Google Scholar
- 14.DBLP Dataset. http://dblp.uni-trier.de/xml/. Accessed 20 April 2016
- 15.Porter, M.F.: An Algorithm for Suffix Stripping. Readings in Information Retrieval. 1st edn. Morgan Kaufmann Publishers (1997)Google Scholar