Determinants of pull-based development in the context of continuous integration

Abstract

The pull-based development model, widely used in distributed software teams on open source communities, can efficiently gather the wisdom from crowds. Instead of sharing access to a central repository, contributors create a fork, update it locally, and request to have their changes merged back, i.e., submit a pull-request. On the one hand, this model lowers the barrier to entry for potential contributors since anyone can submit pull-requests to any repository, but on the other hand it also increases the burden on integrators, who are responsible for assessing the proposed patches and integrating the suitable changes into the central repository. The role of integrators in pull-based development is crucial. They must not only ensure that pull-requests should meet the project’s quality standards before being accepted, but also finish the evaluations in a timely manner. To keep up with the volume of incoming pull-requests, continuous integration (CI) is widely adopted to automatically build and test every pull-request at the time of submission. CI provides extra evidences relating to the quality of pull-requests, which would help integrators to make final decision (i.e., accept or reject). In this paper, we present a quantitative study that tries to discover which factors affect the process of pull-based development model, including acceptance and latency in the context of CI. Using regression modeling on data extracted from a sample of GitHub projects deploying the Travis-CI service, we find that the evaluation process is a complex issue, requiring many independent variables to explain adequately. In particular, CI is a dominant factor for the process, which not only has a great influence on the evaluation process per se, but also changes the effects of some traditional predictors.

This is a preview of subscription content, access via your institution.

References

  1. 1

    Osterweil L. Software processes are software too. In: Proceedings of the 9th International Conference on Software Engineering. Los Alamitos: IEEE, 1987. 2–13

    Google Scholar 

  2. 2

    Jiang J J, Klein G, Hwang H-G, et al. An exploration of the relationship between software development process maturity and project performance. Inf Manag, 2004, 41: 279–288

    Article  Google Scholar 

  3. 3

    Kogut B, Metiu A. Open-source software development and distributed innovation. Oxford Rev Econ Policy, 2001, 17: 248–264

    Article  Google Scholar 

  4. 4

    Barr E T, Bird C, Rigby P C, et al. Cohesive and isolated development with branches. In: Proceedings of the 15th International Conference on Fundamental Approaches to Software Engineering. Berlin/Heidelberg: Springer-Verlag, 2012. 316–331

    Google Scholar 

  5. 5

    Gousios G, Pinzger M, van Deursen A. An exploratory study of the pull-based software development model. In: Proceedings of the 36th International Conference on Software Engineering. New York: ACM, 2014. 345–355

    Google Scholar 

  6. 6

    Gousios G, Zaidman A, Storey M-A, et al. Work practices and challenges in pull-based development: the integrator’s perspective. In: Proceedings of the 37th International Conference on Software Engineering. Piscataway: IEEE, 2015. 358–368

    Google Scholar 

  7. 7

    Bird C, Gourley A, Devanbu P, et al. Open borders? Immigration in open source projects. In: Proceedings of the 4th International Workshop on Mining Software Repositories. Washington, DC: IEEE, 2007. 6

    Google Scholar 

  8. 8

    Gharehyazie M, Posnett D, Vasilescu B, et al. Developer initiation and social interactions in OSS: a case study of the Apache Software Foundation. Empir Softw Eng, 2014, 20: 1318–1353

    Article  Google Scholar 

  9. 9

    Gousios G, Storey M-A, Bacchelli A. Work practices and challenges in pull-based development: the contributor’s perspective. In: Proceedings of the 38th International Conference on Software Engineering. New York: ACM, 2016. 285–296

    Google Scholar 

  10. 10

    Dabbish L, Stuart C, Tsay J, et al. Social coding in GitHub: transparency and collaboration in an open software repository. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work. New York: ACM, 2012. 1277–1286

    Google Scholar 

  11. 11

    Dabbish L, Stuart C, Tsay J, et al. Leveraging transparency. IEEE Softw, 2013, 30: 37–43

    Article  Google Scholar 

  12. 12

    Yu Y, Wang H M, Yin G, et al. Who should review this pull-request: reviewer recommendation to expedite crowd collaboration. In: Proceedings of the 2014 21st Asia-Pacific Software Engineering Conference, Jeju, 2014. 335–342

    Google Scholar 

  13. 13

    Yu Y, Wang H M, Yin G, et al. Reviewer recommender of pull-requests in GitHub. In: Proceedings of the 2014 International Conference on Software Maintenance and Evolution. Washington, DC: IEEE, 2014. 609–612

    Google Scholar 

  14. 14

    Yu Y, Yin G, Wang H M, et al. Exploring the patterns of social behavior in GitHub. In: Proceedings of the 1st International Workshop on Crowd-based Software Development Methods and Technologies. New York: ACM, 2014. 31–36

    Google Scholar 

  15. 15

    Pham R, Singer L, Liskin O, et al. Creating a shared understanding of testing culture on a social coding site. In: Proceedings of International Conference on Software Engineering. Piscataway: IEEE, 2013. 112–121

    Google Scholar 

  16. 16

    Tsay J, Dabbish L, Herbsleb J. Influence of social and technical factors for evaluating contribution in GitHub. In: Proceedings of the 36th International Conference on Software Engineering. New York: ACM, 2014. 356–366

    Google Scholar 

  17. 17

    Vasilescu B, Yu Y, Wang H M, et al. Quality and productivity outcomes relating to continuous integration in GitHub. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2015. 805–816

    Google Scholar 

  18. 18

    Yu Y, Wang H M, Filkov V, et al. Wait for it: determinants of pull request evaluation latency on GitHub. In: Proceedings of Working Conference on Mining Software Repositories, Florence, 2015. 367–371

    Google Scholar 

  19. 19

    Duvall P M, Matyas S, Glover A. Continuous Integration: Improving Software Quality and Reducing Risk. Boston: Pearson Education, 2007

    Google Scholar 

  20. 20

    Booch G. Object-Oriented Analysis and Design with Applications. 3rd ed. Redwood City: Addison Wesley Longman Publishing Co., Inc., 2004

    Google Scholar 

  21. 21

    Fowler M. Continuous integration, 2006. http://martinfowler.com/articles/continuousIntegration.html

    Google Scholar 

  22. 22

    Holck J, Jørgensen N. Continuous integration and quality assurance: a case study of two open source projects. Australas J Inform Syst, 2007, 11, doi: 10.3127/ajis.v11i1.145

    Google Scholar 

  23. 23

    Hars A, Ou S S. Working for free? Motivations of participating in open source projects. Int J Electron Comm, 2002, 6: 25–39

    Google Scholar 

  24. 24

    Dempsey B J, Weiss D, Jones P, et al. Who is an open source software developer? Commun ACM, 2002, 45: 67–72

    Article  Google Scholar 

  25. 25

    Meyer M. Continuous integration and its tools. IEEE Softw, 2014, 31: 14–16

    Google Scholar 

  26. 26

    Vasilescu B, van Schuylenburg S, Wulms J, et al. Continuous integration in a social-coding world: empirical evidence from GitHub. In: Proceedings of International Conference on Software Maintenance and Evolution. New York: ACM, 2014. 401–405

    Google Scholar 

  27. 27

    Tsay J, Dabbish L, Herbsleb J. Let’s talk about it: evaluating contributions through discussion in GitHub. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. New York: ACM, 2014. 144–154

    Google Scholar 

  28. 28

    Hellendoorn V J, Devanbu P T, Bacchelli A. Will they like this? Evaluating code contributions with language models. In: Proceedings of Working Conference on Mining Software Repositories, Florence, 2015. 157–167

    Google Scholar 

  29. 29

    Hindle A, Barr E T, Su Z D, et al. On the naturalness of software. In: Proceedings of the 34th International Conference on Software Engineering. Piscataway: IEEE, 2012. 837–847

    Google Scholar 

  30. 30

    Nagappan N, Murphy B, Basili V. The influence of organizational structure on software quality: an empirical case study. In: Proceedings of the 30th International Conference on Software Engineering. New York: ACM, 2008. 521–530

    Google Scholar 

  31. 31

    Bettenburg N, Hassan A E. Studying the impact of social structures on software quality. In: Proceedings of the 18th International Conference on Program Comprehension, Braga, 2010. 124–133

    Google Scholar 

  32. 32

    Zimmermann T, Premraj R, Bettenburg N, et al. What makes a good bug report? IEEE Trans Softw Eng, 2010, 36: 618–643

    Article  Google Scholar 

  33. 33

    Duc Anh N, Cruzes D S, Conradi R, et al. Empirical validation of human factors in predicting issue lead time in open source projects. In: Proceedings of International Conference on Predictive Models in Software Engineering. New York: ACM, 2011. 13

    Google Scholar 

  34. 34

    Vasilescu B, Filkov V, Serebrenik A. Perceptions of diversity on GitHub: a user survey. In: Proceedings of the 8th International Workshop on Cooperative and Human Aspects of Software Engineering. Piscataway: IEEE, 2015. 50–56

    Google Scholar 

  35. 35

    Gousios G. The GHTorrent dataset and tool suite. In: Proceedings of the 10th Working Conference on Mining Software Repositories. Piscataway: IEEE, 2013. 233–236

    Google Scholar 

  36. 36

    Vasilescu B, Posnett D, Ray B, et al. Gender and tenure diversity in GitHub teams. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. New York: ACM, 2015. 3789–3798

    Google Scholar 

  37. 37

    Gousios G, Zaidman A. A dataset for pull-based development research. In: Proceedings of the 11th Working Conference on Mining Software Repositories. New York: ACM, 2014. 368–371

    Google Scholar 

  38. 38

    Zhu J X, Zhou M H, Mockus A. Patterns of folder use and project popularity: a case study of GitHub repositories. In: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. New York: ACM, 2014. 30

    Google Scholar 

  39. 39

    Sauer C, Jeffery D R, Land L, et al. The effectiveness of software development technical reviews: a behaviorally motivated program of research. IEEE Trans Softw Eng, 2000, 26: 1–14

    Article  Google Scholar 

  40. 40

    Rigby P C, Bird C. Convergent contemporary software peer review practices. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2013. 202–212

    Google Scholar 

  41. 41

    ´Sliwerski J, Zimmermann T, Zeller A. When do changes induce fixes? In: Proceedings of the 2005 International Workshop on Mining Software Repositories. New York: ACM, 2005. 1–5

    Google Scholar 

  42. 42

    Bates D M. lme44: mixed-effects modeling with R. 2010. http://lme4.r-forge.r-project.org/lMMwR/lrgprt.pdf

    Google Scholar 

  43. 43

    Patel J K, Kapadia C H, Owen D B. Handbook of Statistical Distributions. New York: M. Dekker, 1976

    Google Scholar 

  44. 44

    Rousseeuw P J, Croux C. Alternatives to the median absolute deviation. J Amer Statist Assoc, 1993, 88: 1273–1283

    MathSciNet  Article  MATH  Google Scholar 

  45. 45

    Hanley J A, Mc Neil B J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 1982, 143: 29–36

    Article  Google Scholar 

  46. 46

    Robin X, Turck N, Hainard A, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC bioinform, 2011, 12: 77, doi: 10.1186/1471-2105-12-77

    Article  Google Scholar 

  47. 47

    Johnson P C D. Extension of Nakagawa & Schielzeth’s R2GLMM to random slopes models. Methods Ecol Evol, 2014, 5: 944–946

    Article  Google Scholar 

  48. 48

    Nakagawa S, Schielzeth H. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods Ecol Evol, 2013, 4: 133–142

    Article  Google Scholar 

  49. 49

    Barton K, Barton M K. Package ‘MuMIn’. 2015. https://cran.r-project.org/web/packages/MuMIn/MuMIn.pdf

    Google Scholar 

  50. 50

    Cohen J, Cohen P, West S G, et al. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. 3rd ed. New York: Routledge, 2013

    Google Scholar 

  51. 51

    Metz C E. Basic principles of ROC analysis. Semin Nucl Med, 1978, 8: 283–298

    Article  Google Scholar 

  52. 52

    Stolberg S. Enabling agile testing through continuous integration. In: Proceedings of Agile Conference, Chicago, 2009. 369–374

    Google Scholar 

  53. 53

    Beck K. Embracing change with extreme programming. Computer, 1999, 32: 70–77

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Yue Yu.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Yu, Y., Yin, G., Wang, T. et al. Determinants of pull-based development in the context of continuous integration. Sci. China Inf. Sci. 59, 080104 (2016). https://doi.org/10.1007/s11432-016-5595-8

Download citation

Keywords

  • pull-request
  • continuous integration
  • GitHub
  • distributed software development
  • empirical analysis