Advertisement

Learning from Titles to Recommend Keywords for Academic Papers

  • Huifang Ma
  • Fang Liu
  • Qin Xia
  • Li Yu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11303)

Abstract

With the increasing number of scientific papers, it is difficult for researchers to locate the most relevant and important keywords from the vast majority of papers and establish the research focus and preliminaries. Based on the commonly accepted assumption that the title of a document is always elaborated to reflect the content of a document and consequently keywords tend to be closely related to the title, a keyword ranking from paper titles involving both real-time and authoritativeness is presented in this paper. We suggest exploring paper titles as a weighted hypergraph and random walk is performed, which considers weights of both hyper-edges and hyper-vertices to model short documents social features as well as discriminative weights respectively, while measuring the centrality of words in the hyper-graph to obtain the recommended keywords. Experimental results demonstrate that the proposed approach is robust for extracting keywords from short texts.

Keywords

Extraction Weighted Hyper-graph Weighting strategy Word correlation Random walk 

Notes

Acknowledgement

This work is supported by the National Natural Science Foundation of China (No. 61762078, 61363058, 61862058), Gansu province college students’ innovation and entrepreneurship training program (201610736041), and Guangxi Key Laboratory of Trusted Software (No. kx201705).

References

  1. 1.
    Blank, I., Rokach, L., Shani, G.: Leveraging the citation graph to recommend keywords. In: 7th ACM Conference on Recommender Systems, pp. 359-362. ACM, Hong Kong (2013)Google Scholar
  2. 2.
    Erra, U., Senatore, S., Minnella, F., Caggianese, G.: Approximate TF–IDF based on topic extraction from massive message stream using the GPU. Inf. Sci. Int. J. 292(C), 143–161 (2015)CrossRefGoogle Scholar
  3. 3.
    Ma, H., Xing, Y., Wang, S., Li, M.: Leveraging term co-occurrence distance and strong classification features for short text feature selection. In: Li, G., Ge, Y., Zhang, Z., Jin, Z., Blumenstein, M. (eds.) KSEM 2017. LNCS (LNAI), vol. 10412, pp. 67–75. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-63558-3_6CrossRefGoogle Scholar
  4. 4.
    Hua, W., Wang, Z., Wang, H., Zhou, X.F.: Short text understanding through lexical-semantic analysis. In: 31st International Conference on Data Engineering, pp. 495–506. IEEE Computer Society, Seoul (2015)Google Scholar
  5. 5.
    Blei, D.M., Ng, A.Y., Jordan, M,I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(1), 993–1022 (2012)Google Scholar
  6. 6.
    Saeidi, R., Astudillo, R., Kolossa, D.: Uncertain LDA: including observation uncertainties in discriminative transforms. IEEE Trans. Pattern Anal. Mach. Intell. 38(7), 1479–1488 (2015)CrossRefGoogle Scholar
  7. 7.
    Qiang, J., Chen, P., Wang, T., Wu, X.: Topic modeling over short texts by incorporating word embeddings. In: Kim, J., Shim, K., Cao, L., Lee, J.-G., Lin, X., Moon, Y.-S. (eds.) PAKDD 2017. LNCS (LNAI), vol. 10235, pp. 363–374. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-57529-2_29CrossRefGoogle Scholar
  8. 8.
    Li, C.L., Duan, Y., Wang, H.R., Zhang Z.Q.: Enhancing topic modeling for short texts with auxiliary word embeddings. ACM Trans. Inf. Syst. 36(2), 11:1–11:30 (2017)CrossRefGoogle Scholar
  9. 9.
    Abilhoa, W.D., Castro, L.N.: A keyword extraction method from twitter messages represented as graphs. Appl. Math. Comput. 240(4), 308–325 (2014)Google Scholar
  10. 10.
    Wang, W., Li, S.J., Li, W.J., Wei, F.R.: Exploring hypergraph-based semi-supervised ranking for query-oriented summarization. Inf. Sci. Int. J. 237(13), 271–286 (2013)MathSciNetGoogle Scholar
  11. 11.
    Zhou, D., Huang, J.: Learning with hypergraphs: clustering, classification, and embedding. In: 20th International Conference on Neural Information Processing Systems, pp. 1601–1608. British Columbia (2006)Google Scholar
  12. 12.
  13. 13.
    Bellaachia, A., Mohammed, A.: HG-RANK: A Hypergraph-based keyphrase extraction for short documents in dynamic genre. In: 4th Workshop on Making Sense of Microposts Co-located with the 23rd International World Wide Web Conference, pp. 42–49. Microposts2014, Seoul (2014)Google Scholar
  14. 14.
    DBLP Dataset. http://dblp.uni-trier.de/xml/. Accessed 20 April 2016
  15. 15.
    Porter, M.F.: An Algorithm for Suffix Stripping. Readings in Information Retrieval. 1st edn. Morgan Kaufmann Publishers (1997)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.College of Computer Science and EngineeringNorthwest Normal UniversityLanzhouChina
  2. 2.Guangxi Key Laboratory of Trusted SoftwareGuilin University of Electronic TechnologyGuilinChina

Personalised recommendations