Pull request latency explained: an empirical overview

Zhang, Xunhui; Yu, Yue; Wang, Tao; Rastogi, Ayushi; Wang, Huaimin

doi:10.1007/s10664-022-10143-4

Pull request latency explained: an empirical overview

Published: 04 July 2022

Volume 27, article number 126, (2022)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Xunhui Zhang¹,
Yue Yu ORCID: orcid.org/0000-0002-9865-2212¹,
Tao Wang¹,
Ayushi Rastogi² &
…
Huaimin Wang¹

753 Accesses
8 Citations
Explore all metrics

Abstract

Pull request latency evaluation is an essential application of effort evaluation in the pull-based development scenario. It can help the reviewers sort the pull request queue, remind developers about the review processing time, speed up the review process and accelerate software development. There is a lack of work that systematically organizes the factors that affect pull request latency. Also, there is no related work discussing the differences and variations in characteristics in different scenarios and contexts. In this paper, we collected relevant factors through a literature review approach. Then we assessed their relative importance in five scenarios and six different contexts using the mixed-effects linear regression model. The most important factors differ in different scenarios. The length of the description is most important when pull requests are submitted. The existence of comments is most important when closing pull requests, using CI tools, and when the contributor and the integrator are different. When there exist comments, the latency of the first comment is the most important. Meanwhile, the influence of factors may change in different contexts. For example, the number of commits in a pull request has a more significant impact on pull request latency when closing than submitting due to changes in contributions brought about by the review process. Both human and bot comments are positively correlated with pull request latency. In contrast, the bot’s first comments are more strongly correlated with latency, but the number of comments is less correlated. Future research and tool implementation needs to consider the impact of different contexts. Researchers can conduct related studies based on our publicly available datasets and replication scripts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Questionnaire Design

What is Qualitative in Research

Article Open access 28 October 2021

Patrik Aspers & Ugo Corte

Data quality of platforms and panels for online behavioral research

Article 29 September 2021

Eyal Peer, David Rothschild, … Ekaterina Damer

Notes

References

Altaleb A, Altherwi M, Gravell A (2020) An industrial investigation into effort estimation predictors for mobile app development in agile processes. In: 2020 9th international conference on industrial technology and management (ICITM). IEEE, pp 291–296
Atkins M (2012) Gerrit code review, or github’s fork and pull model?. https://softwareengineering.stackexchange.com/questions/173262/gerrit-code-review-or-githubs-fork-and-pull-model [Online; accessed 4-November-2021]
Bacchelli A, Bird C (2013) Expectations, outcomes, and challenges of modern code review. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 712–721
Baysal O, Kononenko O, Holmes R, Godfrey M W (2012) The secret life of patches: A firefox case study. In: 2012 19th working conference on reverse engineering, pp 447–455
Baysal O, Kononenko O, Holmes R, Godfrey MW (2016) Investigating technical and non-technical factors influencing modern code review. Empir Softw Eng 21(3):932–959. https://doi.org/10.1007/s10664-015-9366-8
Article Google Scholar
Bernardo J H, Alencar da Costa D, Kulesza U (2018) Studying the impact of adopting continuous integration on the delivery time of pull requests. In: 2018 IEEE/ACM 15th international conference on mining software repositories (MSR), pp 131–141
Bernhart M, Mauczka A, Grechenig T (2010) Adopting code reviews for agile software development. In: 2010 Agile Conference. IEEE, pp 44–47
Bosu A, Carver JC (2014) Impact of developer reputation on code review outcomes in oss projects: An empirical investigation. In: Proceedings of the 8th ACM/IEEE international symposium on empirical software engineering and measurement, Association for Computing Machinery, ESEM ’14, New York, NY, USA. https://doi.org/10.1145/2652524.2652544
Cassee N, Kitsanelis C, Constantinou E, Serebrenik A (2021) Human, bot or both? a study on the capabilities of classification models on mixed accounts. In: 2021 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 654–658
Cohen J (1969) Statistical power analysis for the behavioral sciences. Academic Press, Cambridge
MATH Google Scholar
Cohen J, Cohen P, West S G, Aiken L S (2013) Applied multiple regression/correlation analysis for the behavioral sciences. Routledge, London
Book Google Scholar
Dasheng X, Shenglan H (2012) Estimation of project costs based on fuzzy neural network. In: 2012 World congress on information and communication technologies. IEEE, pp 1177–1181
Dey T, Mousavi S, Ponce E, Fry T, Vasilescu B, Filippova A, Mockus A (2020) Detecting and characterizing bots that commit code. In: Proceedings of the 17th international conference on mining software repositories, pp 209–219
Ecplise (2011) Mylyn reviews. https://projects.eclipse.org/projects/mylyn.reviews [Online; accessed 4-November-2021]
Fan Q, Yu Y, Yin G, Wang T, Wang H (2017) Where is the road for issue reports classification based on text mining?. In: 2017 ACM/IEEE international symposium on empirical software engineering and measurement (ESEM). IEEE, pp 121–130
Gerrit (2013) Gerritforge blog - git and gerrit code review supported and delivered to your enterprise. https://gitenterprise.me/2013/10/17/gerrit-code-review-or-githubs-fork-and-pull-take-both/ [Online; accessed 4-November-2021]
Gerrit (2021) Gerrit code review. https://www.gerritcodereview.com/ [Online; accessed 4-November-2021]
Golzadeh M, Decan A, Legay D, Mens T (2021) A ground-truth dataset and classification model for detecting bots in github issue and pr comments. J Syst Softw 175:110911
Article Google Scholar
Golzadeh M, Decan A, Mens T (2021) Evaluating a bot detection model on git commit messages. arXiv:2103.11779
Gousios G, Zaidman A, Storey M, V Deursen A (2015) Work practices and challenges in pull-based development: The integrator’s perspective. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 1, pp 358–368
Gousios G, Pinzger M, Deursen A (2014) An exploratory study of the pull-based software development model. In: Proceedings of the 36th international conference on software engineering, Association for Computing Machinery, New York, NY, USA, ICSE 2014. https://doi.org/10.1145/2568225.2568260, pp 345–355
Hall DB (2009) Data analysis using regression and multilevel/hierarchical models. J Am Stat Assoc
Hechtl C (2020) On the influence of developer coreness on patch acceptance: A survival analysis
Hilton M, Tunnell T, Huang K, Marinov D, Dig D (2016) Usage, costs, and benefits of continuous integration in open-source projects. In: 2016 31st IEEE/ACM international conference on automated software engineering (ASE), pp 426–437
Hu D, Wang T, Chang J, Zhang Y, Yin G (2018) Bugs and features, do developers treat them differently? 250–255
Imtiaz N, Middleton J, Chakraborty J, Robson N, Bai G, Murphy-Hill E (2019) Investigating the effects of gender bias on github. In: 2019 IEEE/ACM 41st international conference on software engineering (ICSE), pp 700–711
Jalali S, Wohlin C (2012) Systematic literature studies: Database searches vs. backward snowballing. In: Proceedings of the 2012 ACM-IEEE international symposium on empirical software engineering and measurement, pp 29–38
Jiang J, Mohamed A, Zhang L (2019) What are the characteristics of reopened pull requests? a case study on open source projects in github. IEEE Access 7:102751–102761
Article Google Scholar
Jiang J, Yang Y, He J, Blanc X, Zhang L (2017) Who should comment on this pull request? analyzing attributes for more accurate commenter recommendation in pull-based development. Inf Softw Technol 84:48–62
Article Google Scholar
Jiang Y, Adams B, German D M (2013) Will my patch make it? and how fast? case study on the linux kernel. In: 2013 10th Working conference on mining software repositories (MSR), pp 101–110
Jing J, Yun Y, He J, Blanc X, Li Z (2017) Who should comment on this pull request? analyzing attributes for more accurate commenter recommendation in pull-based development - sciencedirect. Inform Softw Technol 84(C):48–62
Article Google Scholar
Jones J S (2019) Learn to use the eta coefficient test in spss with data from the niosh quality of worklife survey (2014)
Jørgensen M (2011) Contrasting ideal and realistic conditions as a means to improve judgment-based software development effort estimation. Inf Softw Technol 53(12):1382–1390
Article Google Scholar
Kitchenham B, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering
Kocaguneli E, Misirli A T, Caglayan B, Bener A (2011) Experiences on developer participation and effort estimation. In: 2011 37th EUROMICRO conference on software engineering and advanced applications. IEEE, pp 419–422
Kononenko O, Rose T, Baysal O, Godfrey M, Theisen D, de Water B (2018) Studying pull request merges: A case study of shopify’s active merchant. In: Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice, Association for Computing Machinery, New York, NY, USA, ICSE-SEIP ’18. https://doi.org/10.1145/3183519.3183542, pp 124–133
Langsrud O (2003) Anova for unbalanced data: Use type ii instead of type iii sums of squares. Kluwer Academic Publishers, Amsterdam
Google Scholar
Lee A, Carver J C (2017) Are one-time contributors different? a comparison to core and periphery developers in floss repositories. In: 2017 ACM/IEEE international symposium on empirical software engineering and measurement (ESEM), pp 1–10
Lenarduzzi V (2015) Could social factors influence the effort software estimation?. In: Proceedings of the 7th international workshop on social software engineering, pp 21–24
Li Z, Yu Y, Wang T, Yin G, Wang H (2021) Are you still working on this an empirical study on pull request abandonment. IEEE Trans Softw Eng PP(99):1–1
Google Scholar
Liu Z, Xia X, Treude C, Lo D, Li S (2019) Automatic generation of pull request descriptions. In: 2019 34th IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 176–188
Maddila C, Bansal C, Nagappan N (2019) Predicting pull request completion time: a case study on large scale cloud services. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 874–882
Maddila C, Upadrasta S S, Bansal C, Nagappan N, Gousios G, van Deursen A (2020) Nudge: Accelerating overdue pull requests towards completion. arXiv:2011.12468
McIntosh S, Kamei Y, Adams B, Hassan A E (2014) The impact of code review coverage and code review participation on software quality: A case study of the qt, vtk, and itk projects. In: Proceedings of the 11th working conference on mining software repositories, pp 192–201
Minku LL, Yao X (2011) A principled evaluation of ensembles of learning machines for software effort estimation. In: Proceedings of the 7th international conference on predictive models in software engineering, pp 1–10
Nakagawa S, Schielzeth H (2013) A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods in Ecology and Evolution 4(2):133–142
Article Google Scholar
Overney C, Meinicke J, Kstner C, Vasilescu B (2020) How to not get rich: an empirical study of donations in open source. In: ICSE ’20: 42nd international conference on software engineering
Pinto G, Dias L F, Steinmacher I (2018) Who gets a patch accepted first? comparing the contributions of employees and volunteers. In: Proceedings of the 11th International Workshop on Cooperative and Human Aspects of Software Engineering. Association for Computing Machinery, New York, NY, USA, CHASE ’18. https://doi.org/10.1145/3195836.3195858, pp 110–113
Sadowski C, Söderberg E, Church L, Sipko M, Bacchelli A (2018) Modern code review: A case study at google. In: Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice, Association for Computing Machinery, New York, NY, USA, ICSE-SEIP ’18. https://doi.org/10.1145/3183519.3183525, pp 181–190
Sehra SK, Brar YS, Kaur N, Sehra SS (2017) Research patterns and trends in software effort estimation. Inf Softw Technol 91:1–21
Article Google Scholar
Singh D, Sekar V R, Stolee K T, Johnson B (2017) Evaluating how static analysis tools can reduce code review effort. In: 2017 IEEE symposium on visual languages and human-centric computing (VL/HCC)
Soares DM, DLJúnior ML, Murta L, Plastino A (2015) Rejection factors of pull requests filed by core team developers in software projects with high acceptance rates. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA), pp 960–965
Soares D M, de Lima Júnior ML, Murta L, Plastino A (2015) Acceptance factors of pull requests in open-source projects. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, Association for Computing Machinery, New York, NY, USA, SAC ’15. https://doi.org/10.1145/2695664.2695856, pp 1541–1546
Trendowicz A, Jeffery R (2014) Software project effort estimation. Foundations and Best Practice Guidelines for Success, Constructive Cost Model–COCOMO pags 12:277–293
Google Scholar
Tsay J, Dabbish L, Herbsleb J (2014) Influence of social and technical factors for evaluating contribution in github. In: Proceedings of the 36th International Conference on Software Engineering, Association for Computing Machinery, New York, NY, USA, ICSE 2014. https://doi.org/10.1145/2568225.2568315, pp 356–366
v. d. Veen E, Gousios G, Zaidman A (2015) Automatically prioritizing pull requests. In: 2015 IEEE/ACM 12th working conference on mining software repositories, pp 357–361
Vogel L (2020) Gerrit code review - tutorial. https://www.vogella.com/tutorials/Gerrit/article.html [Online; accessed 4-November-2021]
Wang F, Yang X, Zhu X, Chen L (2009) Extended use case points method for software cost estimation. In: 2009 International conference on computational intelligence and software engineering. IEEE, pp 1–5
Wang Q, Xu B, Xia X, Wang T, Li S (2019) Duplicate pull request detection: When time matters. In: Proceedings of the 11th Asia-Pacific Symposium on Internetware, pp 1–10
Wessel M, De Souza B M, Steinmacher I, Wiese I S, Polato I, Chaves A P, Gerosa M A (2018) The power of bots: Characterizing and understanding bots in oss projects. Proceedings of the ACM on Human-Computer Interaction 2(CSCW):1–19
Article Google Scholar
Yu Y, Wang H, Filkov V, Devanbu P, Vasilescu B (2015) Wait for it: Determinants of pull request evaluation latency on github. In: 2015 IEEE/ACM 12th working conference on mining software repositories, pp 367–371
Yu Y, Wang H, Yin G, Ling C X (2014) Who should review this pull-request: Reviewer recommendation to expedite crowd collaboration. In: 2014 21st Asia-Pacific software engineering conference, vol 1, pp 335–342
Yu Y, Li Z, Yin G, Wang T, Wang H (2018) A dataset of duplicate pull-requests in github. In: Proceedings of the 15th International Conference on Mining Software Repositories, pp 22–25
Yu Y, Yin G, Wang T, Yang C, Wang H (2016) Determinants of pull-based development in the context of continuous integration. Sci China Inform Sci 59(8):080104. https://doi.org/10.1007/s11432-016-5595-8
Article Google Scholar
Zampetti F, Bavota G, Canfora G, Penta M D (2019) A study on the interplay between pull request review and continuous integration builds. In: 2019 IEEE 26th international conference on software analysis, evolution and reengineering (SANER), pp 38–48
Zhang T, Song M, Kim M (2014) Critics: An interactive code review tool for searching and inspecting systematic changes. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, pp 755–758
Zhang X, Rastogi A, Yu Y (2020) On the shoulders of giants: A new dataset for pull-based development research. In: Proceedings of the 17th international conference on mining software repositories, pp 543–547
Zhang X, Rastogi A, Yu Y (2020) Technical Report. https://github.com/zhangxunhui/new_pullreq_msr2020/blob/master/technical_report.pdf [Online; accessed 3-March-2021]
Zhang X, Yu Y, Gousios G, Rastogi A (2021) Pull request decision explained: An empirical overview
Zhang X, Yu Y, Wang T, Rastogi A, Wang H (2021) Dataset for ESE submission “Pull Request Latency Explained: An Empirical Overview”. https://doi.org/10.5281/zenodo.5105117
Zhang Y, Yin G, Yu Y, Wang H (2014) A exploratory study of@-mention in github’s pull-requests. In: 2014 21st Asia-Pacific software engineering conference, vol 1. IEEE, pp 343–350
Zhao Y, Serebrenik A, Zhou Y, Filkov V, Vasilescu B (2017) The impact of continuous integration on other software development practices: A large-scale empirical study. In: 2017 32nd IEEE/ACM international conference on automated software engineering (ASE), pp 60–71

Download references

Acknowledgements

This work is supported by National Grand R&D Plan (Grant No.2020AAA0103504).

Author information

Authors and Affiliations

National University of Defense Technology, Changsha, China
Xunhui Zhang, Yue Yu, Tao Wang & Huaimin Wang
University of Groningen, Groningen, The Netherlands
Ayushi Rastogi

Authors

Xunhui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yue Yu
View author publications
You can also search for this author in PubMed Google Scholar
Tao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ayushi Rastogi
View author publications
You can also search for this author in PubMed Google Scholar
Huaimin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Yu.

Additional information

Communicated by: Igor Steinmacher

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, X., Yu, Y., Wang, T. et al. Pull request latency explained: an empirical overview. Empir Software Eng 27, 126 (2022). https://doi.org/10.1007/s10664-022-10143-4

Download citation

Accepted: 09 March 2022
Published: 04 July 2022
DOI: https://doi.org/10.1007/s10664-022-10143-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pull request latency explained: an empirical overview

Abstract

Access this article

Similar content being viewed by others

Questionnaire Design

What is Qualitative in Research

Data quality of platforms and panels for online behavioral research

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pull request latency explained: an empirical overview

Abstract

Access this article

Similar content being viewed by others

Questionnaire Design

What is Qualitative in Research

Data quality of platforms and panels for online behavioral research

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation