Skip to main content
Log in

Analyzing developer contributions using artifact traceability graphs

Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Context

In a software project, properly analyzing the contributions of developers could provide valuable insights for decision-makers. The contributions of a developer could be in many different forms such as committing and reviewing code, opening and resolving issues. Previous approaches mainly consider the commit-based contributions which provide an incomplete picture of developer contributions.

Objective

Different from the traditional commit-based approaches for analyzing developer contributions, we aim to provide a more holistic approach to reflect the rich set of software development activities using artifact traceability graphs.

Method

For analyzing the developer contributions, we propose a novel categorization of developers (Jacks, Mavens and Connectors) in a software project. We introduce a set of algorithms on artifact traceability graphs to identify key developers, recommend replacements for leaving developers and evaluate knowledge distribution among developers.

Results

We evaluate our proposed algorithms on six open-source projects and demonstrate that the identified key developers match the top commenters up to 98%, recommended replacements are correct up to 91% and identified knowledge distribution labels are compatible 94% on average with the baseline approaches.

Conclusions

The proposed algorithms using artifact traceability graphs for analyzing developer contributions could be used by software project decision-makers in several scenarios. (1) Identifying different types of key developers. (2) Finding a replacement developer in large teams. (3) Evaluating the overall knowledge distribution amongst developers to take early precautions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Notes

  1. https://www.payscale.com/data-packages/employee-loyalty/least-loyal-employees (Accessed on 17 Dec 2021)

  2. https://sourceforge.net/ (Accessed on 17 Dec 2021)

  3. https://www.merriam-webster.com/dictionary/jack-of-all-trades (Accessed on 17 Dec 2021)

  4. https://networkx.github.io/ (Accessed on 17 Dec 2021)

  5. https://bit.ly/2wukCHc (Accessed on 17 Dec 2021)

  6. https://hadoop.apache.org/ (Accessed on 17 Dec 2021)

  7. https://hive.apache.org/ (Accessed on 17 Dec 2021)

  8. https://pig.apache.org/ (Accessed on 17 Dec 2021)

  9. https://hbase.apache.org/ (Accessed on 17 Dec 2021)

  10. http://db.apache.org/derby/ (Accessed on 17 Dec 2021)

  11. https://zookeeper.apache.org/ (Accessed on 17 Dec 2021)

  12. https://git-scm.com/book/en/v2/Git-Basics-Recording-Changes-to-the-Repository(Accessed on 17 Dec 2021)

  13. https://github.com/hacetin/keydev

  14. https://plotly.com/dash/

  15. https://sourceforge.net/

  16. https://wiki.python.org/moin/TimeComplexity

  17. https://www.scipy.org/

  18. https://projects.eclipse.org/projects/modeling.sirius/who (Accessed on 17 Dec 2021)

  19. https://www.openhub.net/p/eclipse_sirius/contributors/summary (Accessed on 24 Sep 2020)

  20. https://www.apache.org/

References

  • Agrawal A, Rahman A, Krishna R, Sobran A, Menzies T (2018) We don’t need another hero?: the impact of heroes on software development. In: Proceedings of the 40th international conference on software engineering: software engineering in practice. ACM, pp 245–253

  • Allaho M Y, Lee W C (2013) Analyzing the social ties and structure of contributors in open source software community. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining, pp 56–60

  • Amirfallah A, Trautsch F, Grabowski J, Herbold S (2019) A systematic mapping study of developer social network research. arXiv:1902.07499

  • Avelino G, Passos L, Hora A, Valente M T (2016) A novel approach for estimating truck factors. In: 2016 IEEE 24th international conference on program comprehension (ICPC). IEEE, pp 1–10

  • Avelino G, Constantinou E, Valente M T, Serebrenik A (2019) On the abandonment and survival of open source projects: an empirical investigation. In: 2019 ACM/IEEE International symposium on empirical software engineering and measurement (ESEM). IEEE, pp 1–12

  • Balachandran V (2013) Reducing human effort and improving quality in peer code reviews using automatic static analysis and reviewer recommendation. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 931–940

  • Bird C, Gourley A, Devanbu P, Gertz M, Swaminathan A (2006) Mining email social networks. In: Proceedings of the 2006 international workshop on mining software repositories, pp 137–143

  • Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don’t touch my code! Examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on foundations of software engineering, pp 4–14

  • Brandes U (2001) A faster algorithm for betweenness centrality. J Math Sociol 25(2):163–177

    Article  Google Scholar 

  • Bulmer M G (1979) Principles of statistics. Courier Corporation

  • Canfora G, Di Penta M, Oliveto R, Panichella S (2012) Who is going to mentor newcomers in open source projects?. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering, pp 1–11

  • Cetin H A (2019) Identifying the most valuable developers using artifact traceability graphs. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 1196–1198

  • Çetin H A, Tüzün E (2020) Identifying key developers using artifact traceability graphs. In: Proceedings of the 16th ACM international conference on predictive models and data analytics in software engineering, pp 51–60

  • Cheng J, Guo J L (2019) Activity-based analysis of open source software contributors: roles and dynamics. In: 2019 IEEE/ACM 12th international workshop on cooperative and human aspects of software engineering (CHASE). IEEE, pp 11–18

  • Conway M E (1968) How do committees invent. Datamation 14 (4):28–31

    Google Scholar 

  • Cosentino V, Izquierdo J L C, Cabot J (2015) Assessing the bus factor of git repositories. In: 2015 IEEE 22nd international conference on software analysis, evolution, and reengineering (SANER). IEEE, pp 499–503

  • Crowston K, Wei K, Li Q, Howison J (2006) Core and periphery in free/libre and open source software team communications. In: Proceedings of the 39th annual hawaii international conference on system sciences (HICSS’06), vol 6. IEEE, pp 118a–118a

  • Di Bella E, Sillitti A, Succi G (2013) A multivariate classification of open source developers. Inf Sci 221:72–83

    Article  Google Scholar 

  • Ebbinghaus H (1885) ÜBer das gedächtnis: untersuchungen zur experimentellen psychologie. Duncker & Humblot

  • Ferreira M, Mombach T, Valente M T, Ferreira K (2019) Algorithms for estimating truck factors: a comparative study. Softw Qual J 27(4):1583–1617

    Article  Google Scholar 

  • Fischer M, Pinzger M, Gall H (2003) Populating a release history database from version control and bug tracking systems. In: International conference on software maintenance, 2003. ICSM 2003. Proceedings. IEEE, pp 23–32

  • Foucault M, Palyart M, Blanc X, Murphy G C, Falleri J R (2015) Impact of developer turnover on quality in open-source software. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering, pp 829–841

  • Freeman L C (1978) Centrality in social networks conceptual clarification. Social Netw 1(3):215–239

    Article  Google Scholar 

  • Fritz T, Murphy G C, Murphy-Hill E, Ou J, Hill E (2014) Degree-of-knowledge: modeling a developer’s knowledge of code. ACM Trans Softw Eng Methodol (TOSEM) 23(2):1–42

    Article  Google Scholar 

  • Gladwell M (2006) The tipping point: how little things can make a big difference. Little, Brown

    Google Scholar 

  • Goeminne M, Mens T (2011) Evidence for the pareto principle in open source software activity. In: The joint proceedings of the 1st international workshop on model driven software maintenance and 5th international workshop on software quality and maintainability. Citeseer, pp 74–82

  • Hayward M L, Shepherd D A, Griffin D (2006) A hubris theory of entrepreneurship. Manag Sci 52(2):160–172

    Article  Google Scholar 

  • Huntley C L (2003) Organizational learning in open-source software projects: an analysis of debugging data. IEEE Trans Eng Manag 50(4):485–493

    Article  Google Scholar 

  • Joblin M, Apel S, Hunsen C, Mauerer W (2017) Classifying developers into core and peripheral: an empirical study on count and network metrics. In: 2017 IEEE/ACM 39th international conference on software engineering (ICSE). IEEE, pp 164–174

  • Kakimoto T, Kamei Y, Ohira M, Matsumoto K (2006) Social network analysis on communications for knowledge collaboration in oss communities. In: Proceedings of the international workshop on supporting knowledge collaboration in software development (KCSD’06). Citeseer, pp 35–41

  • Kosti M V, Feldt R, Angelis L (2016) Archetypal personalities of software engineers and their work preferences: a new perspective for empirical studies. Empir Softw Eng 21(4):1509–1532

    Article  Google Scholar 

  • Kovalenko V, Tintarev N, Pasynkov E, Bird C, Bacchelli A (2018) Does reviewer recommendation help developers? IEEE Trans Softw Eng 46(7):710–731

    Article  Google Scholar 

  • Krüger J, Wiemann J, Fenske W, Saake G, Leich T (2018) Do you remember this source code?. In: 2018 IEEE/ACM 40th international conference on software engineering (ICSE). IEEE, pp 764–775

  • Massey F J Jr (1951) The kolmogorov-smirnov test for goodness of fit. J Am Stat Assoc 46(253):68–78

    Article  Google Scholar 

  • Milewicz R, Pinto G, Rodeghero P (2019) Characterizing the roles of contributors in open-source scientific software projects. In: 2019 IEEE/ACM 16th international conference on mining software repositories (MSR). IEEE, pp 421–432

  • Mockus A (2010) Organizational volatility and its effects on software defects. In: Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering, pp 117–126

  • Narayanan S, Balasubramanian S, Swaminathan J M (2009) A matter of balance: specialization, task variety, and individual learning in a software maintenance environment. Manag Sci 55(11):1861–1876

    Article  Google Scholar 

  • Nassif M, Robillard M P (2017) Revisiting turnover-induced knowledge loss in software projects. In: 2017 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 261–272

  • Oliva G A, da Silva J T, Gerosa M A, Santana F W S, Werner C M L, de Souza C R B, de Oliveira K C M (2015) Evolving the system’s core: a case study on the identification and characterization of key developers in apache ant. Comput Inform 34(3):678–724

    Google Scholar 

  • Ortu M, Hall T, Marchesi M, Tonelli R, Bowes D, Destefanis G (2018) Mining communication patterns in software development: a github analysis. In: Proceedings of the 14th international conference on predictive models and data analytics in software engineering, pp 70–79

  • Ouni A, Kula R G, Inoue K (2016) Search-based peer reviewers recommendation in modern code review. In: 2016 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 367–377

  • Padhye R, Mani S, Sinha V S (2014) A study of external community contribution to open-source projects on github. In: Proceedings of the 11th working conference on mining software repositories, pp 332–335

  • Rath M, Mäder P (2019) The seoss 33 dataset—requirements, bug reports, code history, and trace links for entire projects. Data Brief 25:104005

    Article  Google Scholar 

  • Razali N M, Wah Y B, et al. (2011) Power comparisons of shapiro-wilk, kolmogorov-smirnov, lilliefors and anderson-darling tests. J Stat Model Anal 2(1):21–33

    Google Scholar 

  • Rigby P C, Bird C (2013) Convergent contemporary software peer review practices. In: Proceedings of the 2013 9th joint meeting on foundations of software engineering, pp 202–212

  • Rigby P C, Zhu Y C, Donadelli S M, Mockus A (2016) Quantifying and mitigating turnover-induced knowledge loss: case studies of chrome and a project at avaya. In: 2016 IEEE/ACM 38th international conference on software engineering (ICSE). IEEE, pp 1006–1016

  • Robillard M P, Nassif M, McIntosh S (2018) Threats of aggregating software repository data. In: 2018 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 508–518

  • Royston P (1995) Remark as r94: a remark on algorithm as 181: The w-test for normality. J R Stat Soc Ser C (Appl Stat) 44(4):547–551

    Google Scholar 

  • Runeson P, Höst M (2009) Guidelines for conducting and reporting case study research in software engineering. Empir Softw Eng 14(2):131

    Article  Google Scholar 

  • Sadowski C, Söderberg E, Church L, Sipko M, Bacchelli A (2018) Modern code review: a case study at google. In: Proceedings of the 40th international conference on software engineering: software engineering in practice, pp 181–190

  • Shapiro S S, Wilk M B (1965) An analysis of variance test for normality (complete samples). Biometrika 52(3/4):591–611

    Article  MathSciNet  Google Scholar 

  • Sülün E, Tüzün E, Doğrusöz U (2019) Reviewer recommendation using software artifact traceability graphs. In: Proceedings of the fifteenth international conference on predictive models and data analytics in software engineering, pp 66–75

  • Sülün E, Tüzün E, Doğrusöz U (2021) Rstrace+: reviewer suggestion using software artifact traceability graphs. Inf Softw Technol 130:106455

    Article  Google Scholar 

  • Tüzün E, Tekinerdogan B (2015) Analyzing impact of experience curve on roi in the software product line adoption process. Inf Softw Technol 59:136–148

    Article  Google Scholar 

  • Tüzün E, Tekinerdogan B, Macit Y, İnce K (2019) Adopting integrated application lifecycle management within a large-scale software company: an action research approach. J Syst Softw 149:63–82

    Article  Google Scholar 

  • Wang Z, Feng Y, Wang Y, Jones J A, Redmiles D (2020) Unveiling elite developers’ activities in open source projects. ACM Trans Softw Eng Methodol (TOSEM) 29(3):1–35

    Google Scholar 

  • Wu J, Goh K Y (2009) Evaluating longitudinal success of open source software projects: a social network perspective. In: 2009 42nd Hawaii international conference on system sciences. IEEE, pp 1–10

  • Xia X, Lo D, Wang X, Zhou B (2013) Accurate developer recommendation for bug resolution. In: 2013 20th Working conference on reverse engineering (WCRE). IEEE, pp 72–81

  • Yamashita K, McIntosh S, Kamei Y, Hassan A E, Ubayashi N (2015) Revisiting the applicability of the pareto principle to core development teams in open source software projects. In: Proceedings of the 14th international workshop on principles of software evolution, pp 46–55

  • Zhou M, Mockus A (2012) What make long term contributors: willingness and opportunity in oss community. In: 2012 34th International conference on software engineering (ICSE). IEEE, pp 518–528

  • Zwillinger D, Kokoska S (1999) CRC Standard probability and statistics tables and formulae. CRC Press

Download references

Acknowledgements

We would like to thank anonymous reviewers for their constructive comments, which helped to improve the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to H. Alperen Çetin.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest that could influence the work reported in this paper.

Additional information

Communicated by: Tim Menzies and Mei Nagappan

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Inventing the Next Generation of Software Analytics

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Çetin, H.A., Tüzün, E. Analyzing developer contributions using artifact traceability graphs. Empir Software Eng 27, 77 (2022). https://doi.org/10.1007/s10664-022-10129-2

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-022-10129-2

Keywords

Navigation