Empirical Software Engineering

, Volume 22, Issue 1, pp 547–578 | Cite as

Why and how developers fork what from whom in GitHub

  • Jing JiangEmail author
  • David Lo
  • Jiahuan He
  • Xin Xia
  • Pavneet Singh Kochhar
  • Li ZhangEmail author


Forking is the creation of a new software repository by copying another repository. Though forking is controversial in traditional open source software (OSS) community, it is encouraged and is a built-in feature in GitHub. Developers freely fork repositories, use codes as their own and make changes. A deep understanding of repository forking can provide important insights for OSS community and GitHub. In this paper, we explore why and how developers fork what from whom in GitHub. We collect a dataset containing 236,344 developers and 1,841,324 forks. We make surveys, and analyze programming languages and owners of forked repositories. Our main observations are: (1) Developers fork repositories to submit pull requests, fix bugs, add new features and keep copies etc. Developers find repositories to fork from various sources: search engines, external sites (e.g., Twitter, Reddit), social relationships, etc. More than 42 % of developers that we have surveyed agree that an automated recommendation tool is useful to help them pick repositories to fork, while more than 44.4 % of developers do not value a recommendation tool. Developers care about repository owners when they fork repositories. (2) A repository written in a developer’s preferred programming language is more likely to be forked. (3) Developers mostly fork repositories from creators. In comparison with unattractive repository owners, attractive repository owners have higher percentage of organizations, more followers and earlier registration in GitHub. Our results show that forking is mainly used for making contributions of original repositories, and it is beneficial for OSS community. Moreover, our results show the value of recommendation and provide important insights for GitHub to recommend repositories.


Fork Open source software GitHub 



This work is supported by National Natural Science Foundation of China under Grant No.61300006, the State Key Laboratory of Software Development Environment under Grant No.SKLSDE-2015ZX-24, and Beijing Natural Science Foundation under Grant No.4163074.


  1. Begel A, Bosch J, Storey MA (2013) Social networking meets software development: perspectives from github, msdn, stack exchange, and topcoder. IEEE Soft 30(1):52–66CrossRefGoogle Scholar
  2. Bird C, Rigby PC, Barr ET, Hamilton DJ, German DM, Devanbu P (2009) The promises and perils of mining git. In: Proceedings of MSR, VancouverGoogle Scholar
  3. Crowston K, Wei K, Howison J, Wiggins A (2012) Free/libre open source software development: What we know and what we do not know. ACM Comput Surv:44Google Scholar
  4. Dabbish L, Stuart C, Herbsleb J (2012) Social coding in github: transparency and collaboration in an open software repository. In: Proceedings of CSCW, WashingtonGoogle Scholar
  5. Dabbish L, Stuart C, Tsay J, Herbsleb J (2013) Leveraging transparency. IEE Soft 30(1):37– 43CrossRefGoogle Scholar
  6. DiBona C, Ockman S, Stone M (eds) (1999) Open sources: voices from the open source revolution. O’ReillyGoogle Scholar
  7. Ernst NA, Easterbrook S, Mylopoulos J (2010) Code forking in open-source software: a requirements perspective. arXiv:1004.2889
  8. FBissyande T, Thung F, Lo D, Jiang L, Reveillere L (2013) Popularity, interoperability, and impact of programming languages in 100,000 open source projects. In: Proceedings of COMPSAC , KyotoGoogle Scholar
  9. Fung KH, Aurum A, Tang D (2012) Social forking in open source software: an empirical study. In: CAiSE forum, PolandGoogle Scholar
  10. Gousios G, Pinzger M, van Deursen A (2014) An exploratory study of the pull-based software development model. In: ICSE, HyderabadGoogle Scholar
  11. Happel HJ, Maalej W (2008) Potentials and challenges of recommendation systems for software development. In: Proceedings of the international workshop on Recommendation systems for software engineering, pp 11–15Google Scholar
  12. Jiang J, Zhang L, Li L (2013) Understanding project dissemination on a social coding site. In: Proceedings of WCRE, KoblenzGoogle Scholar
  13. Kalliamvakou E, Gousios G, Blincoe K, Singer L, German D M, Damian D (2014) The promises and perils of mining github. In: Proceedings of MSR, HyderabadGoogle Scholar
  14. Lee MJ, Hahn J, Ferwerda B, Moon JY, Choi J, Kim J (2013) Github developers use rockstars to overcome overflow of news. In: Proceedings of CHI, pp 133–138Google Scholar
  15. Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18(1):50–60MathSciNetCrossRefzbMATHGoogle Scholar
  16. Marlow J, Dabbish L (2013) Activity traces and signals in software developer recruitment and hiring. In: San AntonioGoogle Scholar
  17. Muffatto M, Faldani M (2003) Open source as a complex adaptive system. EMERGENCE 5(3):83– 100CrossRefGoogle Scholar
  18. Nagy D, Yassin A, Bhattacherjee A (2010) Organizational adoption of open source software: barriers and remedies. Commun ACM 53(3):148–151CrossRefGoogle Scholar
  19. Neville-Neil G V (2011) Think before you fork. Commun ACM 54(6):34–35CrossRefGoogle Scholar
  20. Nyman L, Lindman J (2013) Code forking, governance, and sustainability in open source software. Technology Innovation Management Review:7–12Google Scholar
  21. Pham R, Singer L, Liskin O, Filho FF, Schneider K (2013) Creating a shared understanding of testing culture on a social coding site. In: Proceedings of ICSE, San FranciscoGoogle Scholar
  22. Robillard MP, Walker RJ, Zimmermann T (2010) Recommendation systems for software engineering. IEEE Soft 27(4):80–86CrossRefGoogle Scholar
  23. Robillard MP, Maalej W, Walker RJ, Zimmermann T (2014) Recommendation systems in software engineering. SpringerGoogle Scholar
  24. Robles G, Gonzalez-Barahona JM (2012) A comprehensive study of software forks: Dates, reasons and outcomes. Open Source Systems: Long-Term Sustainability 378:1–14Google Scholar
  25. Thung F, FBissyande T, Lo D, Jiang L (2013) Network structure of social coding in github. In: 17th European conference on software maintenance and reengineering, GenovaGoogle Scholar
  26. Tian Y, Achananuparp P, Lubis IN, Lo D, Lim EP (2012) What does software engineering community microblog about?. In: MSR, pp 247–250Google Scholar
  27. Tsay J, Herbsleb J, Dabbish L (2012) Social media and success in open source projects. In: Proceedings of CSCW, SeattleGoogle Scholar
  28. Zhang L, Zou Y, Xie B, Zhu Z (2014) Recommending relevant projects via user behaviour: An exploratory study on github. In: Proceedings of the international workshop on crowd-based software development methods and technologies, pp 25–30Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.State Key Laboratory of Software Development EnvironmentBeihang UniversityBeijingChina
  2. 2.School of Information SystemsSingapore Management UniversitySingaporeSingapore
  3. 3.College of Computer Science and TechnologyZhejiang UniversityHangzhouChina

Personalised recommendations