, Volume 114, Issue 3, pp 1345–1372 | Cite as

Collective topical PageRank: a model to evaluate the topic-dependent academic impact of scientific papers

  • Yongjun ZhangEmail author
  • Jialin Ma
  • Zijian Wang
  • Bolun Chen
  • Yongtao Yu


With the explosive growth of academic writing, it is difficult for researchers to find significant papers in their area of interest. In this paper, we propose a pipeline model, named collective topical PageRank, to evaluate the topic-dependent impact of scientific papers. First, we fit the model to a correlation topic model based on the textual content of papers to extract scientific topics and correlations. Then, we present a modified PageRank algorithm, which incorporates the venue, the correlations of the scientific topics, and the publication year of each paper into a random walk to evaluate the paper’s topic-dependent academic impact. Our experiments showed that the model can effectively identify significant papers as well as venues for each scientific topic, recommend papers for further reading or citing, explore the evolution of scientific topics, and calculate the venues’ dynamic topic-dependent academic impact.


Topic model PageRank Scientific evaluation 



This work was supported by the National Natural Science Foundation of China (Grant Nos. 61602202 and 61603146), the Natural Science Foundation of Jiangsu Province, China (Grant Nos. BK20160427 and BK20160428), Top-notch Academic Programs Project of Jiangsu Higher Education Institutions, the Social Key Research and Development Project of Huaian, Jiangsu, China (Grant No. HAS2015020).


  1. Bethard, S., & Dan, J. (2010). Who should i cite: Learning literature search models from citation behavior. In ACM conference on information and knowledge management, CIKM 2010, Toronto: Ontario, Canada, October (pp. 609–618).Google Scholar
  2. Blei, D. M., Jordan, M. I., Griffiths, T. L., & Tenenbaum, J. B. (2003a). Hierarchical topic models and the nested Chinese restaurant process. In International conference on neural information processing systems (pp. 17–24).Google Scholar
  3. Blei, D. M., Lafferty, J. D., Blei, D. M., & Lafferty, J. D. (2007). Correction: A correlated topic model of science. Annals of Applied Statistics, 1(2), 634–634.MathSciNetCrossRefzbMATHGoogle Scholar
  4. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003b). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.zbMATHGoogle Scholar
  5. Ding, Y. (2011). Topic-based PageRank on author cocitation networks. New York: Wiley.Google Scholar
  6. Erosheva, E., Fienberg, S., & Lafferty, J. (2004). Mixed-membership models of scientific publications. Proceedings of the National Academy of Sciences, 101(suppl 1), 5220–5227.CrossRefGoogle Scholar
  7. Fujii, A. (2007). Enhancing patent retrieval by citation analysis. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, ACM (pp. 793–794).Google Scholar
  8. Garfield, E. (2006). Citation indexes for science: A new dimension in documentation through association of ideas. International Journal of Epidemiology, 35(5), 1123–1127.CrossRefGoogle Scholar
  9. Gori, M., & Pucci, A. (2007). Research paper recommender systems: A random-walk based approach. In IEEE/WIC/ACM international conference on web intelligence (pp. 778–781).Google Scholar
  10. Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101(1), 5228.CrossRefGoogle Scholar
  11. Gross, P. L. K., & Gross, E. M. (1927). College libraries and chemical education. Science, 66(1713), 385–389.CrossRefGoogle Scholar
  12. Gyngyi, Z., Garcia-Molina, H., & Pedersen, J. (2004). Combating web spam with trustrank. In Thirtieth international conference on very large data bases (pp. 576–587).Google Scholar
  13. Haveliwala, T. H. (2003). Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Transactions on Knowledge and Data Engineering, 15(4), 784–796.CrossRefGoogle Scholar
  14. Jardine, J. G., & Teufel, S. (2014). Topical PageRank: A model of scientific expertise for bibliographic search. In EACL (pp. 501–510).Google Scholar
  15. MacLean, M., Davies, C., Lewison, G., & Anderson, J. (1998). Evaluating the research activity and impact of funding agencies. Research Evaluation, 7(1), 7–16.CrossRefGoogle Scholar
  16. Meij, E., & De Rijke, M. (2007). Using prior information derived from citations in literature search. In Large scale semantic access to content (text, image, video, and sound) (pp. 665–670). Le centre de Hautes etudes Internationales D’Informatique Documentaire.Google Scholar
  17. Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web, Technical report. Stanford InfoLab.Google Scholar
  18. Pal, S. K., & Narayan, B. L. (2005). A web surfer model incorporating topic continuity. IEEE Transactions on Knowledge and Data Engineering, 17(5), 726–729.CrossRefGoogle Scholar
  19. Richardson, M., & Domingos, P. (2002). The intelligent surfer: Probabilistic combination of link and content information in pagerank. In Nips (pp. 1441–1448).Google Scholar
  20. Steyvers, M., Smyth, P., Rosen-Zvi, M., & Griffiths, T. (2004). Probabilistic author-topic models for information discovery. In Tenth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 306–315).Google Scholar
  21. Tang, J., Jin, R., & Zhang, J. (2008a). A topic modeling approach and its integration into the random walk framework for academic search. In Eighth IEEE International Conference on Data Mining (pp. 1055–1060).Google Scholar
  22. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008b). Arnetminer: Extraction and mining of academic social networks. In ACM SIGKDD international conference on knowledge discovery and data mining (pp. 990–998).Google Scholar
  23. Walker, D., Xie, H., Yan, K.-K., & Maslov, S. (2007). Ranking scientific publications using a model of network traffic. Journal of Statistical Mechanics: Theory and Experiment, 2007(06), P06010.CrossRefGoogle Scholar
  24. Wang, X., Zhai, C., & Roth, D. (2013). Understanding evolution of research themes: A probabilistic generative model for citations. In Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, ACM (pp. 1115–1123).Google Scholar
  25. Wu, B., Goel, V., & Davison, B. D. (2006). Topical trustrank: Using topicality to combat web spam. In International conference on world wide web (pp. 63–72).Google Scholar
  26. Yan, E. (2014). Topic-based pagerank: Toward a topic-level scientific evaluation. Scientometrics, 100(2), 407–437.CrossRefGoogle Scholar
  27. Yang, Z., Tang, J., Zhang, J., Li, J., & Gao, B. (2009). Topic-level random walk through probabilistic model. In Proceedings of joint international conferences on advances in data and web management, APWeb/WAIM 2009, Suzhou, China, April 2–4 (pp. 162–173).Google Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2017

Authors and Affiliations

  1. 1.College of Computer and InformationHohai UniversityNanjingChina
  2. 2.Faculty of Computer and Software EngineeringHuaiyin Institute of TechnologyHuai’anChina

Personalised recommendations