, Volume 102, Issue 2, pp 1687–1711 | Cite as

Using machine learning techniques for rising star prediction in co-author network

  • Ali Daud
  • Muhammad Ahmad
  • M. S. I. Malik
  • Dunren Che


Online bibliographic databases are powerful resources for research in data mining and social network analysis especially co-author networks. Predicting future rising stars is to find brilliant scholars/researchers in co-author networks. In this paper, we propose a solution for rising star prediction by applying machine learning techniques. For classification task, discriminative and generative modeling techniques are considered and two algorithms are chosen for each category. The author, co-authorship and venue based information are incorporated, resulting in eleven features with their mathematical formulations. Extensive experiments are performed to analyze the impact of individual feature, category wise and their combination w.r.t classification accuracy. Then, two ranking lists for top 30 scholars are presented from predicted rising stars. In addition, this concept is demonstrated for prediction of rising stars in database domain. Data from DBLP and Arnetminer databases (1996–2000 for wide disciplines) are used for algorithms’ experimental analysis.


Group leader Classification Prediction Rising star MEMM CART 


  1. Bermejo, P., Gamez, J. A., & Puerta, J. M. (2014). Speeding up incremental wrapper feature subset selection with Naive Bayes classifier. Knowledge-Based Systems, 55, 140–147.CrossRefGoogle Scholar
  2. Chen, J., Huang, H., Tian, S., & Qu, Y. (2009). Feature selection for text classification with Naïve Bayes. Expert Systems with Applications, 36(3), 5432–5435.CrossRefGoogle Scholar
  3. Chrysos, G., Dagritzikos, P., Papaefstathiou, I., & Dollas, A. (2013). HC-CART: A parallel system implementation of data mining classification and regression tree (CART) algorithm on a multi-FPGA system. ACM Transactions on Architecture and Code Optimization, 9(4), 47.CrossRefGoogle Scholar
  4. Constantinou, A. C., Fenton, N. E., & Neil, M. (2012). pi-football: A Bayesian network model for forecasting Association Football match outcomes. Knowledge-Based Systems, 36, 322–339.CrossRefGoogle Scholar
  5. Cui, X., Afify, M., Gao, Y., & Zhou, B. (2013). Stereo hidden Markov modeling for noise robust speech recognition. Computer Speech & Language, 27(2), 407–419.CrossRefGoogle Scholar
  6. Cuxac, P., Lamirel, J.-C., & Bonvallot, V. (2013). Efficient supervised and semi-supervised approaches for affiliations disambiguation. Scientometrics, 97(1), 47–58.CrossRefGoogle Scholar
  7. Daud, A., Abbasi, R., & Muhammad, F. (2013). Finding rising stars in social networks. Database Systems for Advanced Applications (LNCS), 7825, 13–24.Google Scholar
  8. Daud, A., Li, J., Zhou, L., & Muhammad, F. (2010). Temporal expert finding through generalized time topic modeling. Knowledge-Based Systems (KBS), 23(6), 615–625.CrossRefGoogle Scholar
  9. Fakhari, A., & Moghadam, A. M. E. (2013). Combination of classification and regression in decision tree for multi-labeling image annotation and retrieval. Applied Soft Computing, 13(2), 1292–1302.CrossRefGoogle Scholar
  10. Farid, D. M., Zhang, L., Rahman, C. F., Hossain, M. A., & Strachan, R. (2014). Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks. Expert Systems with Applications, 41(4) Part 2, 1937–1946.Google Scholar
  11. Gu, F., Zhang, H., & Zhu, D. (2013). Blind separation of non-stationary sources using continuous density hidden Markov models. Digital Signal Processing, 23(5), 1549–1564.CrossRefMathSciNetGoogle Scholar
  12. Guns, R., & Rousseau, R. (2014). Recommending research collaborations using link prediction and random forest classifiers. Scientometrics,. doi: 10.1007/s11192-013-1228-9.Google Scholar
  13. Huang, S., Yang, B., Yan, S., & Rousseau, R. (2013). Institution name disambiguation for research assessment. Scientometrics,. doi: 10.1007/s11192-013-1214-2.Google Scholar
  14. Kao, L. J., Chiu, C. C., & Chiu, F. Y. (2013). A Bayesian latent variable model with classification and regression tree approach for behavior and credit scoring. Knowledge-Based Systems, 36, 245–252.CrossRefGoogle Scholar
  15. Li, Z., Fang, H., & Xia, L. (2014). Increasing mapping based hidden Markov model for dynamic process monitoring and diagnosis. Expert Systems with Applications, 41(2), 744–751.CrossRefGoogle Scholar
  16. Li, X. K., Foo, C. S., Tew, K. L., & Ng, S. K. (2009).Searching for rising stars in bibliography networks. In Proceedings of the 14th international conference on database systems for advanced applications (pp. 288–292).Google Scholar
  17. Loh, W. J. (2011). Classification and regression trees. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(1), 14–23.Google Scholar
  18. López-Cruz, P. L., Larrañaga, P., DeFelipe, J., & Bielza, C. (2014). Bayesian network modeling of the consensus between experts: An application to neuron classification. International Journal of Approximate Reasoning, 55(1), 3–22.CrossRefMathSciNetGoogle Scholar
  19. Ma, Z., Sun, A., & Cong, G. (2013). On predicting the popularity of newly emerging hashtags in Twitter. Journal of the American Society for Information Science and Technology, 64(7), 1399–1410.CrossRefGoogle Scholar
  20. Mascaro, S., Nicholso, A. E., & Korb, K. B. (2014). Anomaly detection in vessel tracks using Bayesian networks. International Journal of Approximate Reasoning, 55(1), 84–98.CrossRefGoogle Scholar
  21. McCallum, A., Freitag, D., & Pereira, F. C. (2000). Maximum entropy Markov models for information extraction and segmentation. In Proceedings of the seventeenth international conference on machine learning (pp. 591–598). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.Google Scholar
  22. Orman, L. V. (2013). Bayesian inference in trust networks. ACM Transactions on Management Information Systems (TMIS), 4(2), Article No. 7. New York, USA: ACM.Google Scholar
  23. Ren, F., & Kang, X. (2013). Employing hierarchical Bayesian networks in simple and complex emotion topic analysis. Computer Speech & Language, 27(4), 943–968.CrossRefGoogle Scholar
  24. Santos, R. L. T., Macdonald, C., & Ounis, I. (2013). Learning to rank query suggestions for adhoc and diversity search. Information Retrieval, 16(4), 429–451.CrossRefGoogle Scholar
  25. Sekercioglu, C. H. (2008). Quantifying co-author contributions. Science, 322, 371.Google Scholar
  26. Song, I. J., & Cho, S. B. (2013). Bayesian and behavior networks for context-adaptive user interface in a ubiquitous home environment. Expert Systems with Applications, 40(5), 1827–1838.CrossRefGoogle Scholar
  27. Speybroeck, N. (2012). Classification and regression trees. International Journal of Public Health., 57(1), 243–246.CrossRefGoogle Scholar
  28. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008). Arnetminer: Extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 990–998).Google Scholar
  29. Tsatsaronis, G., Varlamis, I., & Norvag, K. (2011). How to become a group leader? Or modeling author types based on graph mining. LNCS, 6966, 15–26.Google Scholar
  30. Wang, G. A., Jiao, J., Abrahams, A. S., Fan, W., & Zhang, Z. (2013). Expert rank: A topic-aware expert finding algorithm for online knowledge communities. Decision Support Systems, 54(3), 1442–1451.CrossRefGoogle Scholar
  31. Yan, R., Huang, C., Tang, J., Zhang, Y., & Li, X. (2012). To better stand on the shoulder of giants. In JCDL ‘12 Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries New York (pp. 51–60).Google Scholar
  32. Zhang, G., Ding, Y., & Milojevic, S. (2013). Citation content analysis (CCA): A method for syntactic and semantic analysis of citation content. Journal of the American Society for Information Science and Technology, 64(7), 1490–1503.CrossRefGoogle Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2014

Authors and Affiliations

  • Ali Daud
    • 1
  • Muhammad Ahmad
    • 2
  • M. S. I. Malik
    • 1
  • Dunren Che
    • 3
  1. 1.Department of Computer Science and Software EngineeringInternational Islamic UniversityIslamabadPakistan
  2. 2.Department of Computer ScienceAllama Iqbal Open UniversityIslamabadPakistan
  3. 3.Department of Computer ScienceSouthern Illinois UniversityCarbondaleUSA

Personalised recommendations