How does code style inconsistency affect pull request integration? An exploratory study on 117 GitHub projects

Abstract

GitHub is a popular code platform that provides infrastructures to facilitate collaborative development. A Pull Request (PR) is one of the key ideas to support collaboration. Developers are encouraged to submit PRs to ask for the integration of their contributions. In practice, not all submitted PRs can be integrated into the codebase by project maintainers. Existing studies have investigated factors affecting PR integration. Nevertheless, the code style of PRs, which is largely considered by project maintainers, has not been deeply studied yet. In this paper, we performed an exploratory analysis on the effect of code style on PR integration in GitHub. We modeled the code style via the inconsistency between a submitted PR and the existing code in its target codebase. Such modeling makes our study not limited by a specific definition of code style. We conducted our experiments on 50,092 closed PRs in 117 Java projects. Our findings show that: (1) There indeed exists code style inconsistency between PRs and the codebase. (2) Several code style criteria on how to use spaces or indents, make comments, and write code lines with a suitable length, tend to show more inconsistency among PRs. (3) A PR that is consistent with the current code style tends to be merged into the codebase more easily. (4) A PR that violates the current code style is likely to take more time to get closed. Our study shows evidence to developers about how to deliver better contributions to facilitate efficient collaboration.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Notes

  1. 1.

    http://github.com

  2. 2.

    http://github.com/features

  3. 3.

    In GitHub, a “repository” denotes a project in general. In this paper, we use “repository” and “project” interchangeably.

  4. 4.

    http://github.com/rails/rails

  5. 5.

    https://github.com/AndlyticsProject/andlytics

  6. 6.

    http://github.com/AndlyticsProject/andlytics#contributing

  7. 7.

    http://www.gnu.org/prep/standards_toc.html

  8. 8.

    http://github.com/google/styleguide

  9. 9.

    It is common for a project to have a readme file or a contribution file. The readme file broadly describes the project; while the contribution file mainly introduces the tips to contribute to this project.

  10. 10.

    http://guides.github.com/activities/contributing-to-open-source/#contributing

  11. 11.

    http://github.com/mongodb/mongo

  12. 12.

    http://github.com/rubinius/rubinius/pull/3512

  13. 13.

    http://github.com/querydsl/querydsl

  14. 14.

    In this study, some motivation examples (i.e., GNU, Goolge, and GitHub) mainly came from manual search of code style related documentation in well-known open source communities or company originated open source projects; while other examples (i.e., mongodb/mongo, rubinius/rubinius, and querydsl/querydsl) were collected by manually checking the documents and commit logs of some randomly selected popular projects on GitHub.

  15. 15.

    http://githut.info

  16. 16.

    https://www.githubarchive.org/

  17. 17.

    https://help.github.com/articles/filtering-issues-and-pull-requests-by-assignees/

  18. 18.

    http://github.com/filipg/amu_automata_2011

  19. 19.

    http://checkstyle.sourceforge.net/

  20. 20.

    Checkstyle is a highly configurable tool of checking code style. The code style by Google and Oracle are supported by the tool. In our experiment, we configured Checkstyle to check whether a piece of code violates 37 code style criteria.

  21. 21.

    http://github.com/magefree/mage

  22. 22.

    http://github.com/CUTR-at-USF/OpenTripPlanner-for-Android

  23. 23.

    http://sourceforge.net

  24. 24.

    http://bitbucket.org/

  25. 25.

    http://findbugs.sourceforge.net

References

  1. Allamanis M, Barr ET, Bird C, Sutton CA (2014) Learning natural coding conventions. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp 281–293

  2. Bacchelli A, Bird C (2013) Expectations, outcomes, and challenges of modern code review. In: Proceedings of the 35th International Conference on Software Engineering, pp 712–721

  3. Balachandran V (2013) Reducing human effort and improving quality in peer code reviews using automatic static analysis and reviewer recommendation. In: Proceedings of the 35th International Conference on Software Engineering, pp 931–940

  4. Bartoń K (2013) Mumin: Multi-model inference. r package version 1.9. 13 The Comprehensive R Archive Network (CRAN), Vienna, Austria

  5. Bates DM (2010) lme4: Mixed-effects modeling with r

  6. Berry RE, Meekings BAE (1985) A style analysis of C programs. Commun ACM 28(1):80–88

    Article  Google Scholar 

  7. Boogerd C, Moonen L (2009) Evaluating the relation between coding standard violations and faultswithin and across software versions. In: Proceedings of the 6th International Working Conference on Mining Software Repositories, pp 41–50

  8. Bridger A, Pisano J (2001) C++ coding standards

  9. Butler S, Wermelinger M, Yu Y, Sharp H (2009) Relating identifier naming flaws and code quality: An empirical study. In: Proceedings of the 16th Working Conference on Reverse Engineering, pp 31–35

  10. Cohen J (1977) Statistical power analysis for the behavioral sciences (revised ed.)

    Google Scholar 

  11. Cohen J, Cohen P, West SG, Aiken LS (2013) Applied multiple regression/correlation analysis for the behavioral sciences. Routledge, Evanston

    Book  Google Scholar 

  12. Cohen-Goldberg A M (2012) Phonological competition within the word: Evidence from the phoneme similarity effect in spoken production. J Mem Lang 67(1):184–198

    Article  Google Scholar 

  13. de Lima Júnior ML, Soares DM, Plastino A, Murta L (2015) Developers assignment for analyzing pull requests. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp 1567–1572

  14. Google (2013) Google Java code style. http://google.github.io/styleguide/javaguide.html

  15. Gousios G, Pinzger M, van Deursen A (2014) An exploratory study of the pull-based software development model. In: Proceedings of the 36th International Conference on Software Engineering, pp 345–355

  16. Gousios G, Zaidman A, Storey MD, van Deursen A (2015) Work practices and challenges in pull-based development: The integrator’s perspective. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering, pp 358–368

  17. Gousios G, Storey MD, Bacchelli A (2016) Work practices and challenges in pull-based development: the contributor’s perspective. In: Proceedings of the 38th International Conference on Software Engineering, pp 285–296

  18. Graham M H (2003) Confronting multicollinearity in ecological multiple regression. Ecol 84(11):2809–2815

    Article  Google Scholar 

  19. Hauke J, Kossowski T (2011) Comparison of values of pearson’s and spearman’s correlation coefficients on the same sets of data. Quaest Geogr 30(2):87–93

    Article  Google Scholar 

  20. Hellendoorn V, Devanbu PT, Bacchelli A (2015) Will they like this? evaluating code contributions with language models. In: Proceedings of the 12th IEEE/ACM Working Conference on Mining Software Repositories, pp 157–167

  21. Jaeger FT (2011) Fitting, Evaluating, and Reporting Mixed Models

  22. Jiarpakdee J, Tantithamthavorn C, Treude C (2018) Autospearman: Automatically mitigating correlated software metrics for interpreting defect models. In: Proceedings of the 34th International Conference on Software Maintenance and Evolution, pp 92–103

  23. Johnson P C (2014) Extension of nakagawa & schielzeth’s r2glmm to random slopes models. Methods Ecol Evol 5(9):944–946

    Article  Google Scholar 

  24. Kabacoff R (2015) R in action: data analysis and graphics with R. Manning Publications Co.

  25. Kalliamvakou E, Gousios G, Blincoe K, Singer L, Germán DM, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp 92–101

  26. Kalliamvakou E, Damian DE, Blincoe K, Singer L, Germán DM (2015) Open source-style collaborative development practices in commercial projects using github. In: Proceedings of the 37th IEEE/ACM International Conference on Software Engineering, pp 574–585

  27. Lemhöfer K, Dijkstra T, Schriefers H, Baayen R H, Grainger J, Zwitserlood P (2008) Native language influences on word recognition in a second language: A megastudy. J Exper Psychol Learn Memory Cogn 34(1):12

    Article  Google Scholar 

  28. Mäntylä MV, Lassenius C (2009) What types of defects are really discovered in code reviews? IEEE Trans Softw Eng 35(3):430–448

    Article  Google Scholar 

  29. Marca D (1981) Some pascal style guidelines. ACM Sigplan Not 16(4):70–80

    Article  Google Scholar 

  30. McConnell S (1993) Code complete: a practical handbook of software construction. Microsoft Press

  31. Miara R J, Musselman J A, Navarro J A, Shneiderman B (1983) Program indentation and comprehensibility. Commun ACM 26(11):861–867

    Article  Google Scholar 

  32. Nakagawa S, Schielzeth H (2013) A general and simple method for obtaining r2 from generalized linear mixed-effects models. Methods Ecol Evol 4(2):133–142

    Article  Google Scholar 

  33. Oman PW, Cook CR (1990) A taxonomy for programming style. In: Proceedings of the ACM 18th Annual Computer Science Conference on Cooperation, pp 244–250

  34. Oracle (1999) Oracle java code style. http://www.oracle.com/technetwork/java/codeconvtoc-136057.html

  35. Padhye R, Mani S, Sinha VS (2014) A study of external community contribution to open-source projects on github. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp 332–335

  36. Rahman MM, Roy CK (2014) An insight into the pull requests of github. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp 364–367

  37. Rees MJ (1982) Automatic assessment aids for pascal programs. SIGPLAN Not 17(10):33–42

    Article  Google Scholar 

  38. Rigby PC, Storey MD (2011) Understanding broadcast based peer review on open source software projects. In: Proceedings of the 33rd International Conference on Software Engineering, pp 541–550

  39. Sadowski C, Aftandilian E, Eagle A, Miller-Cushon L, Jaspan C (2018) Lessons from building static analysis tools at google. Commun ACM 61(4):58–66

    Article  Google Scholar 

  40. Selya AS, Rose JS, Dierker LC, Hedeker D, Mermelstein RJ (2012) A practical guide to calculating cohen’s f2, a measure of local effect size, from proc mixed. Front Psychol 3:111

    Article  Google Scholar 

  41. Smit M, Gergel B, Hoover HJ, Stroulia E (2011) Code convention adherence in evolving software. In: Proceedings of the IEEE 27th International Conference on Software Maintenance, pp 504–507

  42. Soares DM, de Lima Júnior ML, Murta L, Plastino A (2015) Acceptance factors of pull requests in open-source projects. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp 1541–1546

  43. Tsay J, Dabbish L, Herbsleb JD (2014a) Influence of social and technical factors for evaluating contribution in github. In: Proceedings of the 36th International Conference on Software Engineering, pp 356–366

  44. Tsay J, Dabbish L, Herbsleb JD (2014b) Let’s talk about it: evaluating contributions through discussion in github. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp 144–154

  45. Vasilescu B, Yu Y, Wang H, Devanbu PT, Filkov V (2015) Quality and productivity outcomes relating to continuous integration in github. In: Proceedings of the 10th Joint Meeting on Foundations of Software Engineering, pp 805–816

  46. van der Veen E, Gousios G, Zaidman A (2015) Automatically prioritizing pull requests. In: Proceedings of the 12th IEEE/ACM Working Conference on Mining Software Repositories, pp 357–361

  47. Vermeulen A (2000) The Elements of Java (TM) Style. Cambridge University Press, Cambridge

    Book  Google Scholar 

  48. Yu Y, Wang H, Yin G, Ling CX (2014) Reviewer recommender of pull-requests in github. In: Proceedings of the 30th IEEE International Conference on Software Maintenance and Evolution, pp 609–612

  49. Yu Y, Wang H, Filkov V, Devanbu PT, Vasilescu B (2015) Wait for it: Determinants of pull request evaluation latency on github. In: Proceedings of the 12th IEEE/ACM Working Conference on Mining Software Repositories, pp 367–371

  50. Zhang Y, Yin G, Yu Y, Wang H (2014) Investigating social media in github’s pull-requests: a case study on ruby on rails. In: Proceedings of the 1st International Workshop on Crowd-based Software Development Methods and Technologies, pp 37–41

  51. Zhang X, Chen Y, Gu Y, Zou W, Xie X, Jia X, Xuan J (2018) How do multiple pull requests change the same code: A study of competing pull requests in github. In: Proceedings of the 34th IEEE International Conference on Software Maintenance and Evolution, pp 228–239

Download references

Acknowledgments

The authors would like to greatly thank our lab members, Yufeng Zhao, Yiming Chen, and Mengting Zhou, for crawling GitHub project data for experiments. This work is partly supported by the National Natural Science Foundation of China (Grant No.61690201, 61772014, 61802171, 61872273, 61572375), and the China Scholarship Council Scholarship. Any opinions, findings, and conclusions in this paper are those of the authors only and do not necessarily reflect the views of our sponsors.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Zhenyu Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by: Ahmed E. Hassan

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zou, W., Xuan, J., Xie, X. et al. How does code style inconsistency affect pull request integration? An exploratory study on 117 GitHub projects. Empir Software Eng 24, 3871–3903 (2019). https://doi.org/10.1007/s10664-019-09720-x

Download citation

Keywords

  • Pull request
  • Code style inconsistency
  • Exploratory study