What Attracts Newcomers to Onboard on OSS Projects? TL;DR: Popularity

  • Felipe FronchettiEmail author
  • Igor Wiese
  • Gustavo Pinto
  • Igor Steinmacher
Conference paper
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 556)


Voluntary contributions play an important role in maintaining Open Source Software (OSS) projects active. New volunteers feel motivated to contribute to OSS projects based on a set of motivations. In this study, we aim to understand which factors OSS projects usually maintain that might influence their new contributors’ onboarding. Using a set of 450 repositories, we investigated mixed factors, such as the project age, the number of stars, the programming language used, or the presence of text files that aid contributors (e.g., templates for pull-requests or license files). We used a K-Spectral Centroid (KSC) clustering algorithm to investigated the newcomers’ growth rate for the analyzed projects. We could found three common patterns: a logarithmic, an exponential, and a linear growth pattern. Based on these patterns, we used a Random Forest classifier to understand how each factor could explain the growth rates. We found that popularity of the project (in terms of stars), time to review pull requests, project age, and programming languages are the factors that best explain the newcomers’ growth patterns.


Open Source Software Newcomers Attractiveness 



This work is partially supported by CNPq (#430642/2016-4 and #406308/2016-0), Fundação Araucária and FAPESP (#2015/24527-3).


  1. 1.
    Borges, H., Valente, M.T.: What’s in a GitHub star? Understanding repository starring practices in a social coding platform. J. Syst. Softw. 146, 112–129 (2018)CrossRefGoogle Scholar
  2. 2.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  3. 3.
    Capiluppi, A., Michlmayr, M.: From the cathedral to the bazaar: an empirical study of the lifecycle of volunteer community projects. In: Feller, J., Fitzgerald, B., Scacchi, W., Sillitti, A. (eds.) OSS 2007. ITIFIP, vol. 234, pp. 31–44. Springer, Boston, MA (2007). Scholar
  4. 4.
    Chengalur-Smith, I.N., Sidorova, A., Daniel, S.L.: Sustainability of free/libre open source projects: a longitudinal study. J. Assoc. Inf. Syst. 11(11), 657–683 (2010)Google Scholar
  5. 5.
    Colazo, J., Fang, Y.: Impact of license choice on open source software development activity. J. Am. Soc. Inf. Sci. Technol. 60(5), 997–1011 (2009). Scholar
  6. 6.
    Dias, L.F., Steinmacher, I., Pinto, G.: Who drives company-owned OSS projects: internal or external members? J. Braz. Comp. Soc. 24(1), 16:1–16:17 (2018)CrossRefGoogle Scholar
  7. 7.
    Figueiredo, F.: On the prediction of popularity of trends and hits for user generated videos. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 741–746. ACM (2013)Google Scholar
  8. 8.
    Figueiredo, F., Almeida, J.M., Gonçalves, M.A., Benevenuto, F.: On the dynamics of social media popularity: a YouTube case study. ACM Trans. Internet Technol. (TOIT) 14(4), 24 (2014)CrossRefGoogle Scholar
  9. 9.
    Gousios, G., Pinzger, M., Deursen, A.: An exploratory study of the pull-based software development model. In: 36th International Conference on Software Engineering, ICSE 2014, pp. 345–355 (2014)Google Scholar
  10. 10.
    Gupta, Y., Khan, Y., Gallaba, K., McIntosh, S.: The impact of the adoption of continuous integration on developer attraction and retention. In: 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR), pp. 491–494, May 2017Google Scholar
  11. 11.
    Hartigan, J.A.: Clustering algorithms (1975)Google Scholar
  12. 12.
    Hauke, J., Kossowski, T.: Comparison of values of Pearson’s and Spearman’s correlation coefficients on the same sets of data. Quaestiones geograph. 30(2), 87–93 (2011)CrossRefGoogle Scholar
  13. 13.
    Ke, W., Zhang, P.: The effects of extrinsic motivations and satisfaction in open source software development. J. Assoc. Inf. Syst. 11(12), 784–808 (2010)Google Scholar
  14. 14.
    Louppe, G., Wehenkel, L., Sutera, A., Geurts, P.: Understanding variable importances in forests of randomized trees. In: Advances in Neural Information Processing Systems, pp. 431–439 (2013)Google Scholar
  15. 15.
    Maalej, W., Happel, H.J., Rashid, A.: When users become collaborators: towards continuous and context-aware user input. In: Proceeding of the 24th ACM SIGPLAN Conference Companion on Object Oriented Programming Systems Languages and Applications, OOPSLA 2009, pp. 981–990. ACM (2009)Google Scholar
  16. 16.
    Meirelles, P., Santos, C., Miranda, J., Kon, F., Terceiro, A., Chavez, C.: A study of the relationships between source code metrics and attractiveness in free software projects. In: 2010 Brazilian Symposium on Software Engineering, SBES 2010, pp. 11–20. IEEE (2010)Google Scholar
  17. 17.
    Menasce, D.A., Almeida, V.A.: Capacity Planning for Web Services: Metrics, Models, and Methods. Prentice Hall PTR, Upper Saddle River (2002)Google Scholar
  18. 18.
    Nakakoji, K., Yamamoto, Y., Nishinaka, Y., Kishida, K., Ye, Y.: Evolution patterns of open-source software systems and communities. In: Proceedings of the International Workshop on Principles of Software Evolution, IWPSE 2002, pp. 76–85. ACM, New York (2002)Google Scholar
  19. 19.
    Qureshi, I., Fang, Y.: Socialization in open source software projects: a growth mixture modeling approach. Organ. Res. Methods 14(1), 208–238 (2011)CrossRefGoogle Scholar
  20. 20.
    Ray, B., Posnett, D., Filkov, V., Devanbu, P.: A large scale study of programming languages and code quality in GitHub. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 155–165. ACM (2014)Google Scholar
  21. 21.
    Robnik-Šikonja, M.: Improving random forests. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 359–370. Springer, Heidelberg (2004). Scholar
  22. 22.
    Santos, C., Kuk, G., Kon, F., Pearson, J.: The attraction of contributors in free and open source software projects. J. Strategic Inf. Syst. 22(1), 26–45 (2013)CrossRefGoogle Scholar
  23. 23.
    Segal, M.R.: Machine learning benchmarks and random forest regression (2004)Google Scholar
  24. 24.
    Shah, S.K.: Motivation, governance, and the viability of hybrid forms in open source software development. Manag. Sci. 52(7), 1000–1014 (2006)CrossRefGoogle Scholar
  25. 25.
    Shi, T., Horvath, S.: Unsupervised learning with random forest predictors. J. Comput. Graph. Stat. 15(1), 118–138 (2006)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Sokolova, M., Japkowicz, N., Szpakowicz, S.: Beyond accuracy, f-score and ROC: a family of discriminant measures for performance evaluation. In: Sattar, A., Kang, B. (eds.) AI 2006. LNCS (LNAI), vol. 4304, pp. 1015–1021. Springer, Heidelberg (2006). Scholar
  27. 27.
    Sokolova, M., Lapalme, G.: A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45(4), 427–437 (2009)CrossRefGoogle Scholar
  28. 28.
    Steinmacher, I., Conte, T., Gerosa, M.A., Redmiles, D.: Social barriers faced by newcomers placing their first contribution in open source software projects. In: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work and Social Computing, CSCW 2015, pp. 1379–1392. ACM (2015)Google Scholar
  29. 29.
    Steinmacher, I., Conte, T.U., Treude, C., Gerosa, M.A.: Overcoming open source project entry barriers with a portal for newcomers. In: Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, pp. 273–284. ACM, New York (2016)Google Scholar
  30. 30.
    Steinmacher, I., Gerosa, M.A., Redmiles, D.: Attracting, onboarding, and retaining newcomer developers in open source software projects. In: Proceedings of the Workshop on Global Software Development in a CSCW Perspective, CSCW 2014 Workshops (2014)Google Scholar
  31. 31.
    von Krogh, G., Spaeth, S., Lakhani, K.R.: Community, joining, and specialization in open source software innovation: a case study. Res. Policy 32(7), 1217–1241 (2003)CrossRefGoogle Scholar
  32. 32.
    Yang, J., Leskovec, J.: Patterns of temporal variation in online media. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 177–186. ACM (2011)Google Scholar
  33. 33.
    Ye, Y., Kishida, K.: Toward an understanding of the motivation open source software developers. In: 25th International Conference on Software Engineering, ICSE 2003, pp. 419–429. IEEE Computer Society, Washington (2003)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2019

Authors and Affiliations

  • Felipe Fronchetti
    • 1
    Email author
  • Igor Wiese
    • 2
  • Gustavo Pinto
    • 3
  • Igor Steinmacher
    • 4
  1. 1.University of São PauloSão PauloBrazil
  2. 2.Federal University of TechnologyCampo MourãoBrazil
  3. 3.Federal University of ParáBelémBrazil
  4. 4.Northern Arizona UniversityFlagstaffUSA

Personalised recommendations