AdaWIRL: A Novel Bayesian Ranking Approach for Personal Big-Hit Paper Prediction

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9659)

Abstract

Predicting the most impactful (big-hit) paper among a researcher’s publications so it can be well disseminated in advance not only has a large impact on individual academic success, but also provides useful guidance to the research community. In this work, we tackle the problem of given the corpus of a researcher’s publications in previous few years, how to effectively predict which paper will become the big-hit in the future. We explore a series of features that can drive a paper to become the big-hit, and design a novel Bayesian ranking algorithm AdaWIRL (Adaptive Weighted Impact Ranking Learning) that leverages a weighted training schema and an adaptive timely false correction strategy to predict big-hit papers. Experimental results on the large ArnetMiner dataset with over 1.7 million authors and 2 million papers demonstrate the effectiveness of AdaWIRL. Specifically, it correctly predicts over 78.3 % of all researchers’ big-hit papers and outperforms the compared regression and ranking algorithms, with an average of \(5.8\,\%\) and \(2.9\,\%\) improvement respectively. Further analysis shows that temporal features are the best indicator for personal big-hit papers, while authorship and social features are less relevant. We also demonstrate that there is a high correlation between the impact of a researcher’s future works and their similarity to the predicted big-hit paper.

Keywords

Latent Dirichlet Allocation Citation Count Average Citation Ranking Algorithm Academic Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China (No. 11305043, No. 61433014), and the Zhejiang Provincial Natural Science Foundation of China (No. LY14A050001), the EU FP7 Grant 611272 (project GROWTHCOM) and Zhejiang Provincial Qianjiang Talents Project (Grant No. QJC1302001). Chuxu Zhang thanks to the assistantship of Computer Science Department of Rutgers University and Internship Experience of IBM Thomas J. Watson Research Center.

References

  1. 1.
    Bethard, S., Jurafsky, D.: Who should i cite: learning literature search models from citation behavior. In: CIKM 2010, pp. 609–618. ACM (2010)Google Scholar
  2. 2.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. JMLR 3, 993–1022 (2003)MATHGoogle Scholar
  3. 3.
    Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: ICML 2005, pp. 89–96. ACM (2005)Google Scholar
  4. 4.
    Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: ICML 2007, pp. 129–136. ACM (2007)Google Scholar
  5. 5.
    Castillo, C., Donato, D., Gionis, A.: Estimating number of citations using author reputation. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 107–117. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  6. 6.
    Dong, Y., Johnson, R.A., Chawla, N.V.: Will this paper increase your h-index? Scientific impact prediction. In: WSDM 2015, pp. 149–158. ACM (2015)Google Scholar
  7. 7.
    Dong, Y., Johnson, R.A., Yang, Y., Chawla, N.V.: Collaboration signatures reveal scientific impact. In: ASONAM 2015, pp. 480–487. ACM (2015)Google Scholar
  8. 8.
    Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. JMLR 4, 933–969 (2003)MathSciNetMATHGoogle Scholar
  9. 9.
    Hirsch, J.E.: An index to quantify an individual’s scientific research output. PNAS 102(46), 16569–16572 (2005)CrossRefGoogle Scholar
  10. 10.
    Joachims, T.: Optimizing search engines using clickthrough data. In: KDD 2002, pp. 133–142. ACM (2002)Google Scholar
  11. 11.
    Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Liu, Y., Zhao, P., Sun, A., Miao, C.: A boosting algorithm for item recommendation with implicit feedback. In: AAAI 2015, pp. 1792–1798. AAAI Press (2015)Google Scholar
  13. 13.
    Lü, L., Zhou, T., Zhang, Q.-M., Stanley, H.E.: The h-index of a network node and its relation to degree and coreness. Nat. Commun. 7, 10168 (2016)CrossRefGoogle Scholar
  14. 14.
    Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. 103(1), 127–152 (2005)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web (1999)Google Scholar
  16. 16.
    Rendle, S., Freudenthaler, C.: Improving pairwise learning for item recommendation from implicit feedback. In: WSDM 2014, pp. 273–282. ACM (2014)Google Scholar
  17. 17.
    Rendle, S., Freudenthaler, C., Gantner, Z., Schmidt-Thieme, L.: Bpr: Bayesian personalized ranking from implicit feedback. In: UAI 2009, pp. 452–461. AUAI Press (2009)Google Scholar
  18. 18.
    Sun, Y., Han, J., Aggarwal, C.C., Chawla, N.V.: When will it happen? relationship prediction in heterogeneous information networks. In: WSDM 2012, pp. 663–672. ACM (2012)Google Scholar
  19. 19.
    Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: extraction and mining of academic social networks. In: KDD 2008, pp. 990–998. ACM (2008)Google Scholar
  20. 20.
    Wang, C., Han, J., Jia, Y., Tang, J., Zhang, D., Yu, Y., Guo, J.: Mining advisor-advisee relationships from research publication networks. In: KDD 2010, pp. 203–212. ACM (2010)Google Scholar
  21. 21.
    Wang, D., Song, C., Barabási, A.-L.: Quantifying long-term scientific impact. Science 342(6154), 127–132 (2013)CrossRefGoogle Scholar
  22. 22.
    Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retrieval 13(3), 254–270 (2010)CrossRefGoogle Scholar
  23. 23.
    Xu, J., Li, H.: Adarank: a boosting algorithm for information retrieval. In: SIGIR 2007, pp. 391–398. ACM (2007)Google Scholar
  24. 24.
    Yan, R., Huang, C., Tang, J., Zhang, Y., Li, X.: To better stand on the shoulder of giants. In: JCDL 2012, pp. 51–60. ACM (2012)Google Scholar
  25. 25.
    Yan, R., Tang, J., Liu, X., Shan, D., Li, X.: Citation count prediction: learning to estimate future citations for literature. In: CIKM 2011, pp. 1247–1252. ACM (2011)Google Scholar
  26. 26.
    Zhang, T.: Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: ICML 2004, p. 116. ACM (2004)Google Scholar
  27. 27.
    Zhao, T., McAuley, J., King, I.: Leveraging social connections to improve personalized ranking for collaborative filtering. In: CIKM 2014, pp. 261–270. ACM (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Alibaba Research Centre for Complexity SciencesHangzhou Normal UniversityHangzhouChina
  2. 2.Department of Computer ScienceRutgers UniversityNew BrunswickUSA
  3. 3.Alibaba GroupHangzhouChina
  4. 4.IBM Thomas J. Watson Research CenterYorktown HeightUSA
  5. 5.Big Data Research CenterUniversity of Electronic Science and Technology of ChinaChengduChina

Personalised recommendations