Determinants of pull-based development in the context of continuous integration

  • Yue Yu
  • Gang Yin
  • Tao Wang
  • Cheng Yang
  • Huaimin Wang
Research Paper

Abstract

The pull-based development model, widely used in distributed software teams on open source communities, can efficiently gather the wisdom from crowds. Instead of sharing access to a central repository, contributors create a fork, update it locally, and request to have their changes merged back, i.e., submit a pull-request. On the one hand, this model lowers the barrier to entry for potential contributors since anyone can submit pull-requests to any repository, but on the other hand it also increases the burden on integrators, who are responsible for assessing the proposed patches and integrating the suitable changes into the central repository. The role of integrators in pull-based development is crucial. They must not only ensure that pull-requests should meet the project’s quality standards before being accepted, but also finish the evaluations in a timely manner. To keep up with the volume of incoming pull-requests, continuous integration (CI) is widely adopted to automatically build and test every pull-request at the time of submission. CI provides extra evidences relating to the quality of pull-requests, which would help integrators to make final decision (i.e., accept or reject). In this paper, we present a quantitative study that tries to discover which factors affect the process of pull-based development model, including acceptance and latency in the context of CI. Using regression modeling on data extracted from a sample of GitHub projects deploying the Travis-CI service, we find that the evaluation process is a complex issue, requiring many independent variables to explain adequately. In particular, CI is a dominant factor for the process, which not only has a great influence on the evaluation process per se, but also changes the effects of some traditional predictors.

Keywords

pull-request continuous integration GitHub distributed software development empirical analysis 

References

  1. 1.
    Osterweil L. Software processes are software too. In: Proceedings of the 9th International Conference on Software Engineering. Los Alamitos: IEEE, 1987. 2–13Google Scholar
  2. 2.
    Jiang J J, Klein G, Hwang H-G, et al. An exploration of the relationship between software development process maturity and project performance. Inf Manag, 2004, 41: 279–288CrossRefGoogle Scholar
  3. 3.
    Kogut B, Metiu A. Open-source software development and distributed innovation. Oxford Rev Econ Policy, 2001, 17: 248–264CrossRefGoogle Scholar
  4. 4.
    Barr E T, Bird C, Rigby P C, et al. Cohesive and isolated development with branches. In: Proceedings of the 15th International Conference on Fundamental Approaches to Software Engineering. Berlin/Heidelberg: Springer-Verlag, 2012. 316–331CrossRefGoogle Scholar
  5. 5.
    Gousios G, Pinzger M, van Deursen A. An exploratory study of the pull-based software development model. In: Proceedings of the 36th International Conference on Software Engineering. New York: ACM, 2014. 345–355Google Scholar
  6. 6.
    Gousios G, Zaidman A, Storey M-A, et al. Work practices and challenges in pull-based development: the integrator’s perspective. In: Proceedings of the 37th International Conference on Software Engineering. Piscataway: IEEE, 2015. 358–368Google Scholar
  7. 7.
    Bird C, Gourley A, Devanbu P, et al. Open borders? Immigration in open source projects. In: Proceedings of the 4th International Workshop on Mining Software Repositories. Washington, DC: IEEE, 2007. 6Google Scholar
  8. 8.
    Gharehyazie M, Posnett D, Vasilescu B, et al. Developer initiation and social interactions in OSS: a case study of the Apache Software Foundation. Empir Softw Eng, 2014, 20: 1318–1353CrossRefGoogle Scholar
  9. 9.
    Gousios G, Storey M-A, Bacchelli A. Work practices and challenges in pull-based development: the contributor’s perspective. In: Proceedings of the 38th International Conference on Software Engineering. New York: ACM, 2016. 285–296Google Scholar
  10. 10.
    Dabbish L, Stuart C, Tsay J, et al. Social coding in GitHub: transparency and collaboration in an open software repository. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work. New York: ACM, 2012. 1277–1286CrossRefGoogle Scholar
  11. 11.
    Dabbish L, Stuart C, Tsay J, et al. Leveraging transparency. IEEE Softw, 2013, 30: 37–43CrossRefGoogle Scholar
  12. 12.
    Yu Y, Wang H M, Yin G, et al. Who should review this pull-request: reviewer recommendation to expedite crowd collaboration. In: Proceedings of the 2014 21st Asia-Pacific Software Engineering Conference, Jeju, 2014. 335–342CrossRefGoogle Scholar
  13. 13.
    Yu Y, Wang H M, Yin G, et al. Reviewer recommender of pull-requests in GitHub. In: Proceedings of the 2014 International Conference on Software Maintenance and Evolution. Washington, DC: IEEE, 2014. 609–612CrossRefGoogle Scholar
  14. 14.
    Yu Y, Yin G, Wang H M, et al. Exploring the patterns of social behavior in GitHub. In: Proceedings of the 1st International Workshop on Crowd-based Software Development Methods and Technologies. New York: ACM, 2014. 31–36Google Scholar
  15. 15.
    Pham R, Singer L, Liskin O, et al. Creating a shared understanding of testing culture on a social coding site. In: Proceedings of International Conference on Software Engineering. Piscataway: IEEE, 2013. 112–121Google Scholar
  16. 16.
    Tsay J, Dabbish L, Herbsleb J. Influence of social and technical factors for evaluating contribution in GitHub. In: Proceedings of the 36th International Conference on Software Engineering. New York: ACM, 2014. 356–366Google Scholar
  17. 17.
    Vasilescu B, Yu Y, Wang H M, et al. Quality and productivity outcomes relating to continuous integration in GitHub. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2015. 805–816Google Scholar
  18. 18.
    Yu Y, Wang H M, Filkov V, et al. Wait for it: determinants of pull request evaluation latency on GitHub. In: Proceedings of Working Conference on Mining Software Repositories, Florence, 2015. 367–371Google Scholar
  19. 19.
    Duvall P M, Matyas S, Glover A. Continuous Integration: Improving Software Quality and Reducing Risk. Boston: Pearson Education, 2007Google Scholar
  20. 20.
    Booch G. Object-Oriented Analysis and Design with Applications. 3rd ed. Redwood City: Addison Wesley Longman Publishing Co., Inc., 2004MATHGoogle Scholar
  21. 21.
    Fowler M. Continuous integration, 2006. http://martinfowler.com/articles/continuousIntegration.htmlGoogle Scholar
  22. 22.
    Holck J, Jørgensen N. Continuous integration and quality assurance: a case study of two open source projects. Australas J Inform Syst, 2007, 11, doi: 10.3127/ajis.v11i1.145Google Scholar
  23. 23.
    Hars A, Ou S S. Working for free? Motivations of participating in open source projects. Int J Electron Comm, 2002, 6: 25–39Google Scholar
  24. 24.
    Dempsey B J, Weiss D, Jones P, et al. Who is an open source software developer? Commun ACM, 2002, 45: 67–72CrossRefGoogle Scholar
  25. 25.
    Meyer M. Continuous integration and its tools. IEEE Softw, 2014, 31: 14–16Google Scholar
  26. 26.
    Vasilescu B, van Schuylenburg S, Wulms J, et al. Continuous integration in a social-coding world: empirical evidence from GitHub. In: Proceedings of International Conference on Software Maintenance and Evolution. New York: ACM, 2014. 401–405Google Scholar
  27. 27.
    Tsay J, Dabbish L, Herbsleb J. Let’s talk about it: evaluating contributions through discussion in GitHub. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. New York: ACM, 2014. 144–154Google Scholar
  28. 28.
    Hellendoorn V J, Devanbu P T, Bacchelli A. Will they like this? Evaluating code contributions with language models. In: Proceedings of Working Conference on Mining Software Repositories, Florence, 2015. 157–167Google Scholar
  29. 29.
    Hindle A, Barr E T, Su Z D, et al. On the naturalness of software. In: Proceedings of the 34th International Conference on Software Engineering. Piscataway: IEEE, 2012. 837–847Google Scholar
  30. 30.
    Nagappan N, Murphy B, Basili V. The influence of organizational structure on software quality: an empirical case study. In: Proceedings of the 30th International Conference on Software Engineering. New York: ACM, 2008. 521–530Google Scholar
  31. 31.
    Bettenburg N, Hassan A E. Studying the impact of social structures on software quality. In: Proceedings of the 18th International Conference on Program Comprehension, Braga, 2010. 124–133Google Scholar
  32. 32.
    Zimmermann T, Premraj R, Bettenburg N, et al. What makes a good bug report? IEEE Trans Softw Eng, 2010, 36: 618–643CrossRefGoogle Scholar
  33. 33.
    Duc Anh N, Cruzes D S, Conradi R, et al. Empirical validation of human factors in predicting issue lead time in open source projects. In: Proceedings of International Conference on Predictive Models in Software Engineering. New York: ACM, 2011. 13Google Scholar
  34. 34.
    Vasilescu B, Filkov V, Serebrenik A. Perceptions of diversity on GitHub: a user survey. In: Proceedings of the 8th International Workshop on Cooperative and Human Aspects of Software Engineering. Piscataway: IEEE, 2015. 50–56Google Scholar
  35. 35.
    Gousios G. The GHTorrent dataset and tool suite. In: Proceedings of the 10th Working Conference on Mining Software Repositories. Piscataway: IEEE, 2013. 233–236Google Scholar
  36. 36.
    Vasilescu B, Posnett D, Ray B, et al. Gender and tenure diversity in GitHub teams. In: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. New York: ACM, 2015. 3789–3798Google Scholar
  37. 37.
    Gousios G, Zaidman A. A dataset for pull-based development research. In: Proceedings of the 11th Working Conference on Mining Software Repositories. New York: ACM, 2014. 368–371Google Scholar
  38. 38.
    Zhu J X, Zhou M H, Mockus A. Patterns of folder use and project popularity: a case study of GitHub repositories. In: Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. New York: ACM, 2014. 30Google Scholar
  39. 39.
    Sauer C, Jeffery D R, Land L, et al. The effectiveness of software development technical reviews: a behaviorally motivated program of research. IEEE Trans Softw Eng, 2000, 26: 1–14CrossRefGoogle Scholar
  40. 40.
    Rigby P C, Bird C. Convergent contemporary software peer review practices. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2013. 202–212Google Scholar
  41. 41.
    ´Sliwerski J, Zimmermann T, Zeller A. When do changes induce fixes? In: Proceedings of the 2005 International Workshop on Mining Software Repositories. New York: ACM, 2005. 1–5Google Scholar
  42. 42.
    Bates D M. lme44: mixed-effects modeling with R. 2010. http://lme4.r-forge.r-project.org/lMMwR/lrgprt.pdfGoogle Scholar
  43. 43.
    Patel J K, Kapadia C H, Owen D B. Handbook of Statistical Distributions. New York: M. Dekker, 1976MATHGoogle Scholar
  44. 44.
    Rousseeuw P J, Croux C. Alternatives to the median absolute deviation. J Amer Statist Assoc, 1993, 88: 1273–1283MathSciNetCrossRefMATHGoogle Scholar
  45. 45.
    Hanley J A, Mc Neil B J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 1982, 143: 29–36CrossRefGoogle Scholar
  46. 46.
    Robin X, Turck N, Hainard A, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC bioinform, 2011, 12: 77, doi: 10.1186/1471-2105-12-77CrossRefGoogle Scholar
  47. 47.
    Johnson P C D. Extension of Nakagawa & Schielzeth’s R2GLMM to random slopes models. Methods Ecol Evol, 2014, 5: 944–946CrossRefGoogle Scholar
  48. 48.
    Nakagawa S, Schielzeth H. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods Ecol Evol, 2013, 4: 133–142CrossRefGoogle Scholar
  49. 49.
    Barton K, Barton M K. Package ‘MuMIn’. 2015. https://cran.r-project.org/web/packages/MuMIn/MuMIn.pdfGoogle Scholar
  50. 50.
    Cohen J, Cohen P, West S G, et al. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. 3rd ed. New York: Routledge, 2013Google Scholar
  51. 51.
    Metz C E. Basic principles of ROC analysis. Semin Nucl Med, 1978, 8: 283–298CrossRefGoogle Scholar
  52. 52.
    Stolberg S. Enabling agile testing through continuous integration. In: Proceedings of Agile Conference, Chicago, 2009. 369–374Google Scholar
  53. 53.
    Beck K. Embracing change with extreme programming. Computer, 1999, 32: 70–77CrossRefGoogle Scholar

Copyright information

© Science China Press and Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Yue Yu
    • 1
    • 2
  • Gang Yin
    • 1
    • 2
  • Tao Wang
    • 1
    • 2
  • Cheng Yang
    • 1
    • 2
  • Huaimin Wang
    • 1
    • 2
  1. 1.College of ComputerNational University of Defense TechnologyChangshaChina
  2. 2.National Laboratory for Parallel and Distributed ProcessingChangshaChina

Personalised recommendations