Abstract
The pull-based development model, widely used in distributed software teams on open source communities, can efficiently gather the wisdom from crowds. Instead of sharing access to a central repository, contributors create a fork, update it locally, and request to have their changes merged back, i.e., submit a pull-request. On the one hand, this model lowers the barrier to entry for potential contributors since anyone can submit pull-requests to any repository, but on the other hand it also increases the burden on integrators, who are responsible for assessing the proposed patches and integrating the suitable changes into the central repository. The role of integrators in pull-based development is crucial. They must not only ensure that pull-requests should meet the project’s quality standards before being accepted, but also finish the evaluations in a timely manner. To keep up with the volume of incoming pull-requests, continuous integration (CI) is widely adopted to automatically build and test every pull-request at the time of submission. CI provides extra evidences relating to the quality of pull-requests, which would help integrators to make final decision (i.e., accept or reject). In this paper, we present a quantitative study that tries to discover which factors affect the process of pull-based development model, including acceptance and latency in the context of CI. Using regression modeling on data extracted from a sample of GitHub projects deploying the Travis-CI service, we find that the evaluation process is a complex issue, requiring many independent variables to explain adequately. In particular, CI is a dominant factor for the process, which not only has a great influence on the evaluation process per se, but also changes the effects of some traditional predictors.
Similar content being viewed by others
References
Osterweil L. Software processes are software too. In: Proceedings of the 9th International Conference on Software Engineering. Los Alamitos: IEEE, 1987. 2–13
Jiang J J, Klein G, Hwang H-G, et al. An exploration of the relationship between software development process maturity and project performance. Inf Manag, 2004, 41: 279–288
Kogut B, Metiu A. Open-source software development and distributed innovation. Oxford Rev Econ Policy, 2001, 17: 248–264
Barr E T, Bird C, Rigby P C, et al. Cohesive and isolated development with branches. In: Proceedings of the 15th International Conference on Fundamental Approaches to Software Engineering. Berlin/Heidelberg: Springer-Verlag, 2012. 316–331
Gousios G, Pinzger M, van Deursen A. An exploratory study of the pull-based software development model. In: Proceedings of the 36th International Conference on Software Engineering. New York: ACM, 2014. 345–355
Gousios G, Zaidman A, Storey M-A, et al. Work practices and challenges in pull-based development: the integrator’s perspective. In: Proceedings of the 37th International Conference on Software Engineering. Piscataway: IEEE, 2015. 358–368
Bird C, Gourley A, Devanbu P, et al. Open borders? Immigration in open source projects. In: Proceedings of the 4th International Workshop on Mining Software Repositories. Washington, DC: IEEE, 2007. 6
Gharehyazie M, Posnett D, Vasilescu B, et al. Developer initiation and social interactions in OSS: a case study of the Apache Software Foundation. Empir Softw Eng, 2014, 20: 1318–1353
Gousios G, Storey M-A, Bacchelli A. Work practices and challenges in pull-based development: the contributor’s perspective. In: Proceedings of the 38th International Conference on Software Engineering. New York: ACM, 2016. 285–296
Dabbish L, Stuart C, Tsay J, et al. Social coding in GitHub: transparency and collaboration in an open software repository. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work. New York: ACM, 2012. 1277–1286
Dabbish L, Stuart C, Tsay J, et al. Leveraging transparency. IEEE Softw, 2013, 30: 37–43
Yu Y, Wang H M, Yin G, et al. Who should review this pull-request: reviewer recommendation to expedite crowd collaboration. In: Proceedings of the 2014 21st Asia-Pacific Software Engineering Conference, Jeju, 2014. 335–342
Yu Y, Wang H M, Yin G, et al. Reviewer recommender of pull-requests in GitHub. In: Proceedings of the 2014 International Conference on Software Maintenance and Evolution. Washington, DC: IEEE, 2014. 609–612
Yu Y, Yin G, Wang H M, et al. Exploring the patterns of social behavior in GitHub. In: Proceedings of the 1st International Workshop on Crowd-based Software Development Methods and Technologies. New York: ACM, 2014. 31–36
Pham R, Singer L, Liskin O, et al. Creating a shared understanding of testing culture on a social coding site. In: Proceedings of International Conference on Software Engineering. Piscataway: IEEE, 2013. 112–121
Tsay J, Dabbish L, Herbsleb J. Influence of social and technical factors for evaluating contribution in GitHub. In: Proceedings of the 36th International Conference on Software Engineering. New York: ACM, 2014. 356–366
Vasilescu B, Yu Y, Wang H M, et al. Quality and productivity outcomes relating to continuous integration in GitHub. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2015. 805–816
Yu Y, Wang H M, Filkov V, et al. Wait for it: determinants of pull request evaluation latency on GitHub. In: Proceedings of Working Conference on Mining Software Repositories, Florence, 2015. 367–371
Duvall P M, Matyas S, Glover A. Continuous Integration: Improving Software Quality and Reducing Risk. Boston: Pearson Education, 2007
Booch G. Object-Oriented Analysis and Design with Applications. 3rd ed. Redwood City: Addison Wesley Longman Publishing Co., Inc., 2004
Fowler M. Continuous integration, 2006. http://martinfowler.com/articles/continuousIntegration.html
Holck J, Jørgensen N. Continuous integration and quality assurance: a case study of two open source projects. Australas J Inform Syst, 2007, 11, doi: 10.3127/ajis.v11i1.145
Hars A, Ou S S. Working for free? Motivations of participating in open source projects. Int J Electron Comm, 2002, 6: 25–39
Dempsey B J, Weiss D, Jones P, et al. Who is an open source software developer? Commun ACM, 2002, 45: 67–72
Meyer M. Continuous integration and its tools. IEEE Softw, 2014, 31: 14–16
Vasilescu B, van Schuylenburg S, Wulms J, et al. Continuous integration in a social-coding world: empirical evidence from GitHub. In: Proceedings of International Conference on Software Maintenance and Evolution. New York: ACM, 2014. 401–405
Tsay J, Dabbish L, Herbsleb J. Let’s talk about it: evaluating contributions through discussion in GitHub. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. New York: ACM, 2014. 144–154
Hellendoorn V J, Devanbu P T, Bacchelli A. Will they like this? Evaluating code contributions with language models. In: Proceedings of Working Conference on Mining Software Repositories, Florence, 2015. 157–167
Hindle A, Barr E T, Su Z D, et al. On the naturalness of software. In: Proceedings of the 34th International Conference on Software Engineering. Piscataway: IEEE, 2012. 837–847
Nagappan N, Murphy B, Basili V. The influence of organizational structure on software quality: an empirical case study. In: Proceedings of the 30th International Conference on Software Engineering. New York: ACM, 2008. 521–530
Bettenburg N, Hassan A E. Studying the impact of social structures on software quality. In: Proceedings of the 18th International Conference on Program Comprehension, Braga, 2010. 124–133
Zimmermann T, Premraj R, Bettenburg N, et al. What makes a good bug report? IEEE Trans Softw Eng, 2010, 36: 618–643
Duc Anh N, Cruzes D S, Conradi R, et al. Empirical validation of human factors in predicting issue lead time in open source projects. In: Proceedings of International Conference on Predictive Models in Software Engineering. New York: ACM, 2011. 13
Vasilescu B, Filkov V, Serebrenik A. Perceptions of diversity on GitHub: a user survey. In: Proceedings of the 8th International Workshop on Cooperative and Human Aspects of Software Engineering. Piscataway: IEEE, 2015. 50–56
Gousios G. The GHTorrent dataset and tool suite. In: Proceedings of the 10th Working Conference on Mining Software Repositories. Piscataway: IEEE, 2013. 233–236
Vasilescu B, Posnett D, Ray B, et al. Gender and tenure diversity in GitHub teams. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. New York: ACM, 2015. 3789–3798
Gousios G, Zaidman A. A dataset for pull-based development research. In: Proceedings of the 11th Working Conference on Mining Software Repositories. New York: ACM, 2014. 368–371
Zhu J X, Zhou M H, Mockus A. Patterns of folder use and project popularity: a case study of GitHub repositories. In: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. New York: ACM, 2014. 30
Sauer C, Jeffery D R, Land L, et al. The effectiveness of software development technical reviews: a behaviorally motivated program of research. IEEE Trans Softw Eng, 2000, 26: 1–14
Rigby P C, Bird C. Convergent contemporary software peer review practices. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2013. 202–212
´Sliwerski J, Zimmermann T, Zeller A. When do changes induce fixes? In: Proceedings of the 2005 International Workshop on Mining Software Repositories. New York: ACM, 2005. 1–5
Bates D M. lme44: mixed-effects modeling with R. 2010. http://lme4.r-forge.r-project.org/lMMwR/lrgprt.pdf
Patel J K, Kapadia C H, Owen D B. Handbook of Statistical Distributions. New York: M. Dekker, 1976
Rousseeuw P J, Croux C. Alternatives to the median absolute deviation. J Amer Statist Assoc, 1993, 88: 1273–1283
Hanley J A, Mc Neil B J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 1982, 143: 29–36
Robin X, Turck N, Hainard A, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC bioinform, 2011, 12: 77, doi: 10.1186/1471-2105-12-77
Johnson P C D. Extension of Nakagawa & Schielzeth’s R2GLMM to random slopes models. Methods Ecol Evol, 2014, 5: 944–946
Nakagawa S, Schielzeth H. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods Ecol Evol, 2013, 4: 133–142
Barton K, Barton M K. Package ‘MuMIn’. 2015. https://cran.r-project.org/web/packages/MuMIn/MuMIn.pdf
Cohen J, Cohen P, West S G, et al. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. 3rd ed. New York: Routledge, 2013
Metz C E. Basic principles of ROC analysis. Semin Nucl Med, 1978, 8: 283–298
Stolberg S. Enabling agile testing through continuous integration. In: Proceedings of Agile Conference, Chicago, 2009. 369–374
Beck K. Embracing change with extreme programming. Computer, 1999, 32: 70–77
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yu, Y., Yin, G., Wang, T. et al. Determinants of pull-based development in the context of continuous integration. Sci. China Inf. Sci. 59, 080104 (2016). https://doi.org/10.1007/s11432-016-5595-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-016-5595-8