Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

SCSMiner: mining social coding sites for software developer recommendation with relevance propagation


With the advent of social coding sites, software development has entered a new era of collaborative work. Social coding sites (e.g., GitHub) can integrate social networking and distributed version control in a unified platform to facilitate collaborative developments over the world. One unique characteristic of such sites is that the past development experiences of developers provided on the sites convey the implicit metrics of developer’s programming capability and expertise, which can be applied in many areas, such as software developer recruitment for IT corporations. Motivated by this intuition, we aim to develop a framework to effectively locate the developers with right coding skills. To achieve this goal, we devise a generativ e probabilistic expert ranking model upon which a consistency among projects is incorporated as graph regularization to enhance the expert ranking and a perspective of relevance propagation illustration is introduced. For evaluation, StackOverflow is leveraged to complement the ground truth of expert. Finally, a prototype system, SCSMiner, which provides expert search service based on a real-world dataset crawled from GitHub is implemented and demonstrated.

This is a preview of subscription content, log in to check access.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7


  1. 1.


  2. 2.


  3. 3.


  4. 4.


  5. 5.


  6. 6.


  7. 7.



  1. 1.

    Balog, K., Azzopardi, L., De Rijke, M.: Formal models for expert finding in enterprise corpora. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 43–50. ACM (2006)

  2. 2.

    Balog, K., Fang, Y., de Rijke, M., Serdyukov, P., Si, L.: Expertise retrieval. Found. Trends Inf. Retr. 6(2–3), 127–256 (2012)

  3. 3.

    Beeferman, D., Berger, A., Lafferty, J.: Statistical models for text segmentation. Mach. Learn. 34(1-3), 177–210 (1999)

  4. 4.

    Begel, A., Bosch, J., Storey, M.-A.: Social networking meets software development: perspectives from github, msdn, stack exchange, and topcoder. IEEE Softw. 30(1), 52–66 (2013)

  5. 5.

    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

  6. 6.

    Buckley, C., Voorhees, E.M.: Retrieval evaluation with incomplete information. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 25–32. ACM (2004)

  7. 7.

    Chung, F.R.: Spectral Graph Theory, vol. 92. American Mathematical Society, Providence (1997)

  8. 8.

    Dabbish, L., Stuart, C., Tsay, J., Herbsleb, J.: Social coding in github: transparency and collaboration in an open software repository. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, pp. 1277–1286. ACM (2012)

  9. 9.

    Deng, H., Han, J., Lyu, M.R., King, I.: Modeling and exploiting heterogeneous bibliographic networks for expertise ranking. In: Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 71–80. ACM (2012)

  10. 10.

    Fang, H., Zhai, C.: Probabilistic Models for Expert Finding. Springer, Berlin (2007)

  11. 11.

    Fang, Y., Si, L., Mathur, A.P.: Discriminative models of integrating document evidence and document-candidate associations for expert search. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 683–690. ACM (2010)

  12. 12.

    Gyimesi, P., Gyimesi, G., Tóth, Z., Ferenc, R.: Characterization of source code defects by data mining conducted on github. In: Computational Science and Its Applications–ICCSA 2015, pp. 47–62. Springer (2015)

  13. 13.

    Hauff, C., Gousios, G.: Matching github developer profiles to job advertisements. In: Proceedings of the 12th Working Conference on Mining Software Repositories, pp. 362–366. IEEE Press (2015)

  14. 14.

    Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)

  15. 15.

    Jiang, J., Zhang, L., Li, L.: Understanding project dissemination on a social coding site. In: 20th Working Conference on Reverse Engineering (WCRE), 2013, pp. 132–141. IEEE (2013)

  16. 16.

    Lima, A., Rossi, L., Musolesi, M.: Coding together at scale: github as a collaborative social network. arXiv:1407.2535

  17. 17.

    Macdonald, C., Ounis, I.: Voting for candidates: adapting data fusion techniques for an expert search task. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, pp. 387–396. ACM (2006)

  18. 18.

    Majumder, A., Datta, S., Naidu, K.: Capacitated team formation problem on social networks. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1005–1013. ACM (2012)

  19. 19.

    Marlow, J., Dabbish, L., Herbsleb, J.: Impression formation in online peer production: activity traces and personal profiles in github. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, pp. 117–128. ACM (2013)

  20. 20.

    Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281. ACM (1998)

  21. 21.

    Serdyukov, P., Rode, H., Hiemstra, D.: Modeling multi-step relevance propagation for expert finding. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 1133–1142. ACM (2008)

  22. 22.

    Soderland, S.: Learning information extraction rules for semi-structured and free text. Mach. Learn. 34(1-3), 233–272 (1999)

  23. 23.

    Steyvers, M., Smyth, P., Rosen-Zvi, M., Griffiths, T.: Probabilistic author-topic models for information discovery. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 306–315. ACM (2004)

  24. 24.

    Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 990–998. ACM (2008)

  25. 25.

    Thung, F., Bissyandé, T.F., Lo, D., Jiang, L.: Network structure of social coding in github. In: 17th European Conference on Software Maintenance and Reengineering (CSMR), 2013, pp. 323–326. IEEE (2013)

  26. 26.

    Vasilescu, B., Filkov, V., Serebrenik, A.: Stackoverflow and github: associations between software development and crowdsourced knowledge. In: International Conference on Social Computing (Socialcom), 2013, pp. 188–195. IEEE (2013)

  27. 27.

    Vendome, C., Linares-Vásquez, M., Bavota, G., Di Penta, M., German, D., Poshyvanyk, D.: License usage and changes: a large-scale study of java projects on github. In: IEEE 23rd International Conference on Program Comprehension (ICPC), 2015, pp. 218–228. IEEE (2015)

  28. 28.

    White, J.P.: Towards readme-eval: interpreting readme file instructions. ACL 2014, 76 (2014)

  29. 29.

    Zhao, Z., Cheng, J., Wei, F., Zhou, M., Ng, W., Wu, Y.: Socialtransfer: transferring social knowledge for cold-start cowdsourcing. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 779–788. ACM (2014)

  30. 30.

    Zhao, Z., Yang, Q., Cai, D., He, X., Zhuang, Y.: Expert finding for community-based question answering via ranking metric network learning. In: IJCAI, pp. 3000–3006 (2016)

  31. 31.

    Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. Advances in Neural Information Processing systems 16(16), 321–328 (2004)

  32. 32.

    Zhou, D., Huang, J., Schölkopf, B.: Learning from labeled and unlabeled data on a directed graph. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 1036–1043. ACM (2005)

  33. 33.

    Zhou, D., Schölkopf, B.: A regularization framework for learning from graph data. In: ICML Workshop on Statistical Relational Learning and its Connections to Other Fields, vol. 15, pp. 67–68 (2004)

  34. 34.

    Zhu, X., Li, X., Zhang, S.: Block-row sparse multiview multilabel learning for image classification. IEEE Transactions on Cybernetics 46(2), 450–461 (2016)

Download references


This work was down during Yao Wan’s visit to University of Technology Sydney. This research was partially supported by the Natural Science Foundation of China under grant of No. 61379119 and No. 61672453, Australia Research Council Linkage Project (LP140100937). We would like to thank Lishui Zhou who helps us a lot in the implementation of the demo of SCSMiner. We would like to thank Jie Liang, Tianhan Xia and Junqing Luan for sharing their crawler source code with us and their demo system githuber.info also gave us some inspiration.

Author information

Correspondence to Guandong Xu.

Additional information

This article belongs to the Topical Collection: Special Issue on Deep Mining Big Social Data

Guest Editors: Xiaofeng Zhu, Gerard Sanroma, Jilian Zhang, and Brent C. Munsell

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wan, Y., Chen, L., Xu, G. et al. SCSMiner: mining social coding sites for software developer recommendation with relevance propagation. World Wide Web 21, 1523–1543 (2018). https://doi.org/10.1007/s11280-018-0526-9

Download citation


  • SCSMiner
  • Social coding sites
  • Expert finding
  • Developer recommendation
  • Relevance propagation