Skip to main content
Log in

On the assignment of commits to releases

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Release is a ubiquitous concept in software development, referring to grouping multiple independent changes into a deliverable piece of software. Mining releases can help developers understand the software evolution at coarse grain, identify which features were delivered or bugs were fixed, and pinpoint who contributed on a given release. A typical initial step of release mining consists of identifying which commits compose a given release. We could find two main strategies used in the literature to perform this task: time-based and range-based. Some release mining works recognize that those strategies are subject to misclassifications but do not quantify the impact of such a threat. This paper analyzed 13,419 releases and 1,414,997 commits from 100 relevant open-source projects hosted at GitHub to assess both strategies in terms of precision and recall. We observed that, in general, the range-based strategy has superior results than the time-based strategy. Nevertheless, even when the range-based strategy is in place, some releases still show misclassifications. Thus, our paper also discusses some situations in which each strategy degrades, potentially leading to bias on the mining results if not adequately known and avoided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availability

The data that support the findings of this study and the scripts we used to run our experiments are available in our replication package: https://github.com/gems-uff/release-mining-extended.

Notes

  1. https://github.com/gems-uff/release-mining-extended

  2. https://insights.stackoverflow.com/survey/2019/#technology

  3. https://github.com/gems-uff/releasy

References

  • Abebe SL, Ali N, Hassan AE (2015) . An empirical study of software release notes 21(3):1107–1142. [Online]. Available: https://doi.org/10.1007/s10664-015-9377-5

    Article  Google Scholar 

  • Adams B, Bellomo S, Bird C, Debic B, Khomh F, Moir K, ODuinn J (2018) Release Engineering 3.0. IEEE Softw 35(02):22–25

    Article  Google Scholar 

  • AnsFourtyTwo (2019) How to find commit of first release in Git repository. Stack Overflow. [Online]. Available: https://stackoverflow.com/q/58766813/1090745

  • Beck K (2000) Extreme Programming Explained: Embrace Change. addison-wesley professional

  • BenMorel (2015) How to find out which release(s) contain a given GIT commit? Stack Overflow. [Online]. Available: https://stackoverflow.com/q/27886537/1090745

  • Bird C, Rigby PC, Barr ET, Hamilton DJ, German DM, Devanbu P (2009) The promises and perils of mining git. In: 2009 6th IEEE International Working Conference on Mining Software Repositories, pp 1–10

  • Borges H, Tulio Valente M (2018) . What’s in a GitHub Star? Understanding Repository Starring Practices in a Social Coding Platform 146:112–129. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0164121218301961

    Google Scholar 

  • Chacon S , Straub B (2014) Pro Git. Springer Nature

  • Clark S, Collis M, Blaze M, Smith JM (2014) Moving targets: Security and rapid-release in firefox. In: Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, ser. CCS ’14. [Online]. Available: https://doi.org/10.1145/2660267.2660320. ACM, pp 1256–1266

  • Curty F, Kohwalter T, Braganholo V, Murta L (2018) An infrastructure for software release analysis through provenance graphs. In: VI workshop on software visualization, evolution and maintenance. [Online]. Available: arXiv:1809.10265

  • Dhaliwal T, Khomh F, Zou Y, Hassan AE (2012) Recovering commit dependencies for selective code integration in software product lines. In: 2012 28th IEEE international conference on software maintenance (ICSM), pp 202–211

  • GitHub (2020) Comparing releases - GitHub Docs. [Online]. Available: https://docs.github.com/en/free-pro-team@latest/github/administering-a-repository/comparing-releases

  • Hammad M (2015) Identifying related commits from software repositories. Int J Comput Applic Technol 51(3):212–218

    Article  Google Scholar 

  • Hinkle DE, Wiersma W, Jurs SG (2003) Applied Statistics for the Behavioral Sciences. Houghton Mifflin College Division, vol 663

  • Israel GD (1992) Determining sample size

  • Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining GitHub. In: Proceedings of the 11th Working Conference on Mining Software Repositories, ser. MSR. [Online]. Available: https://doi.org/10.1145/2597073.2597074. ACM, pp 92–101

  • Khomh F, Dhaliwal T, Zou Y, Adams B (2012) Do faster releases improve software quality? An empirical case study of Mozilla Firefox. In: 2012 9th IEEE working conference on mining software repositories (MSR), pp 179–188

  • Khomh F, Adams B, Dhaliwal T, Zou Y (2015) . Understanding the impact of rapid releases on software quality 20(2):336–373. [Online]. Available: https://doi.org/10.1007/s10664-014-9308-x

    Google Scholar 

  • Le TB, Linares-Vasquez M, Lo D, Poshyvanyk D (2015) RCLinker: Automated linking of issue reports and commits leveraging rich contextual information. In: 2015 IEEE 23rd international conference on program comprehension, pp 36–47

  • Manning C, Raghavan P, Schütze H (2008) Introduction to Information Retrieval. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Mäntylä MV, Khomh F, Adams B, Engström E, Petersen K (2013) On rapid releases and software testing. In: 2013 IEEE international conference on software maintenance, pp 20–29

  • Mäntylä MV, Adams B, Khomh F, Engström E, Petersen K (2015) . On rapid releases and software testing: A case study and a semi-systematic literature review 20(5):1384–1425. [Online]. Available: https://doi.org/10.1007/s10664-014-9338-4

    Google Scholar 

  • Martin (2018) How to find out if a Git commit is included in a release? Stack Overflow. [Online]. Available: https://stackoverflow.com/q/32852374/1090745

  • Mayer P, Bauer A (2015) An empirical analysis of the utilization of multiple programming languages in open source projects. In: Proceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering, ser. EASE ’15. Association for Computing Machinery, pp 1–10. [Online]. Available: https://doi.org/10.1145/2745802.2745805

  • Moreno L, Bavota G, Di Penta M, Oliveto R, Marcus A, Canfora G (2014) Automatic Generation of Release Notes. In: Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE. [Online]. Available: https://doi.org/10.1145/2635868.2635870. ACM, pp 484–495

  • Moreno L, Bavota G, Penta M, Oliveto R, Marcus A, Canfora G (2017) ARENA: An Approach for the Automated Generation of Release Notes. IEEE Trans Softw Eng 43(02):106–127

    Article  Google Scholar 

  • Nguyen AT, Nguyen TT, Nguyen HA, Nguyen TN (2012) Multi-layered approach for recovering links between bug reports and fixes. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering, ser. FSE ’12. Association for Computing Machinery, pp 1–11. [Online]. Available: https://doi.org/10.1145/2393596.2393671

  • Nielsen J (1994) Usability Engineering. Morgan Kaufmann

  • Pimentel JF, Murta L, Braganholo V, Freire J (2021) . Understanding and improving the quality and reproducibility of Jupyter notebooks 26(4):65. [Online]. Available: https://doi.org/10.1007/s10664-021-09961-9

    Google Scholar 

  • Romano J, Kromrey JD, Coraggio J, Skowronek J (2006) Appropriate statistics for ordinal level data: Should we really be using t-test and Cohen’sd for evaluating group differences on the NSSE and other surveys. In: Annual Meeting of the Florida Association of Institutional Research, vol 177, p 34

  • Sun Y, Wang Q, Yang Y (2017) Frlink: Improving the recovery of missing issue-commit links by revisiting file relevance. Inform Softw Technol 84:33–47

    Article  Google Scholar 

  • Pinto F (2021) On the Impact of Rapid Release in Software Development

  • Pinto FCdR, Costa B, Murta L (2021) Assessing time-based and range-based strategies for commit assignment to releases. In: 2021 IEEE international conference on software analysis, evolution and reengineering (SANER), pp 142–153

  • Preston-Werner T (2013) Semantic Versioning 2.0.0. Semantic Versioning. [Online]. Available: https://semver.org/

  • Savrige (2019) How to get a list of commits related to a specific release? Stack Overflow. [Online]. Available: https://stackoverflow.com/q/54787120/1090745

  • Shobe JF, Karim MY, Zanjani MB, Kagdi H (2014) On mapping releases to commits in open source systems. In: Proceedings of the 22nd international conference on program comprehension, ser. ICPC 2014. Association for Computing Machinery, pp 68–71. [Online]. Available: https://doi.org/10.1145/2597008.2597792

  • Souza R, Chavez C, Bittencourt RA (2014) Do rapid releases affect bug reopening? A case study of firefox. In: 2014 Brazilian symposium on software engineering, pp 31–40

  • Souza R (2015) Rapid Releases and Patch Backouts: A Software Analytics Approach. IEEE Softw 32(2):89–96

    Article  Google Scholar 

  • Tsay J, Wright HK, Perry DE (2011) Experiences mining open source release histories. In: Proceedings of the 2011 International Conference on Software and Systems Process, ser. ICSSP ’11. [Online]. Available: https://doi.org/10.1145/1987875.1987911. ACM, pp 208–212

  • Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in Software Engineering. Springer Science & Business Media

  • Wohlin C (2014) Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proceedings of the 18th international conference on evaluation and assessment in software engineering, pp 1–10

Download references

Acknowledgements

The authors would like to thank CNPq (grant 311955/2020-7) and FAPERJ (grant E26/201.038/2021) for their financial support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Felipe Curty do Rego Pinto.

Ethics declarations

Competing interests

The authors declare that they have no competing interests. Furthermore, this publication is the sole initiative of the authors and does not represent the position of any company or organization.

Additional information

Communicated by: Rick Kazman, Marouane Kessentini, Yuanfang Cai

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Software Analysis, Evolution and Reengineering (SANER).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pinto, F.C.d.R., Murta, L.G.P. On the assignment of commits to releases. Empir Software Eng 28, 32 (2023). https://doi.org/10.1007/s10664-022-10263-x

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-022-10263-x

Keywords

Navigation