Empirical Software Engineering

, Volume 22, Issue 3, pp 1537–1577 | Cite as

License usage and changes: a large-scale study on gitHub

  • Christopher VendomeEmail author
  • Gabriele Bavota
  • Massimiliano Di Penta
  • Mario Linares-Vásquez
  • Daniel German
  • Denys Poshyvanyk


Open source software licenses determine, from a legal point of view, under which conditions software can be integrated and redistributed. The reason why developers of a project adopt (or change) a license may depend on various factors, e.g., the need for ensuring compatibility with certain third-party components, the perspective towards redistribution or commercialization of the software, or the need for protecting against somebody else’s commercial usage of the software. This paper reports a large empirical study aimed at quantitatively and qualitatively investigating when and why developers adopt or change software licenses. Specifically, we first identify license changes in 1,731,828 commits, representing the entire history of 16,221 Java projects hosted on GitHub. Then, to understand the rationale of license changes, we perform a qualitative analysis on 1,160 projects written in seven different programming languages, namely C, C++, C#, Java, Javascript, Python, and Ruby—following an open coding approach inspired by grounded theory—on commit messages and issue tracker discussions concerning licensing topics, and whenever possible, try to build traceability links between discussions and changes. On one hand, our results highlight how, in different contexts, license adoption or changes can be triggered by various reasons. On the other hand, the results also highlight a lack of traceability of when and why licensing changes are made. This can be a major concern, because a change in the license of a system can negatively impact those that reuse it. In conclusion, results of the study trigger the need for better tool support in guiding developers in choosing/changing licenses and in keeping track of the rationale of license changes.


Software licenses Mining software repositories Empirical studies 



This work is supported in part by NSF CAREER CCF-1253837 grant. Massimiliano Di Penta is partially supported by the Markos project, funded by the European Commission under Contract Number FP7-317743. Any opinions, findings, and conclusions expressed herein are the authors’ and do not necessarily reflect those of the sponsors.


  1. Bavota G, Canfora G, Di Penta M, Oliveto R, Panichella S (2013). The evolution of project inter-dependencies in a software ecosystem: The case of apache:280–289Google Scholar
  2. Bavota G, Ciemniewska A, Chulani I, De Nigro A, Di Penta M, Galletti D, Galoppini R, Gordon TF, Kedziora P, Lener I, Torelli F, Pratola R, Pukacki J, Rebahi Y, Villalonga SG (2014) The market for open source: an intelligent virtual open source marketplace. In: 2014 software evolution week - IEEE conference on software maintenance, reengineering, and reverse engineering, CSMR-WCRE 2014, Antwerp, Belgium February 3-6, 2014, pp 399–402Google Scholar
  3. Brock A (2010) Project harmony: inbound transfer of rights in FOSS projects. Intl. Free and Open Source Software Law Review 2(2):139–150CrossRefGoogle Scholar
  4. Corbin J, Strauss A (1990) Grounded theory research: procedures, canons, and evaluative criteria. Qual Sociol 13(1):3–21CrossRefGoogle Scholar
  5. Cortés-Coy LF, Linares-Vásquez M, Aponte J, Poshyvanyk D (2014) On automatically generating commit messages via summarization of source code changes. In: 2014 IEEE 14th international working conference on source code analysis and manipulation (SCAM), IEEE, pp 275–284Google Scholar
  6. Cubranic D, Murphy GC, Singer J, Booth K.S. (2005) Hipikat: a project memory for software development. IEEE Trans Softw Eng 31(6):446–465CrossRefGoogle Scholar
  7. Di Penta M, Germán DM, Antoniol G (2010) Identifying licensing of jar archives using a code-search approach. In: Proceedings of the 7th international working conference on mining software repositories, MSR 2010 (Co-located with ICSE), Cape Town, South Africa May 2–3, 2010, Proceedings, pp 151–160Google Scholar
  8. Di Penta M, Germán DM, Guéhéneuc Y, Antoniol G (2010) An exploratory study of the evolution of software licensing. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering - Volume 1, ICSE 2010 Cape Town, South Africa, 1–8 May 2010, pp 145–154Google Scholar
  9. Dickey DA, Fuller WA (1979) Distributions of the estimators for autoregressive time series with a unit root. J Am Stat Assoc 74:427–431MathSciNetzbMATHGoogle Scholar
  10. Dickey DA, Fuller WA (1981) Likelihood ratio statistics for autoregressive time series with a unit root. Econometrica 49(4):1057–1072MathSciNetCrossRefzbMATHGoogle Scholar
  11. Doll B The octoverse in 2012. Last accessed: 2015/01/15
  12. Dyer R, Nguyen HA, Rajan H, Nguyen TN (2013) Boa: a language and infrastructure for analyzing ultra-large-scale software repositories. In: 35th international conference on software engineering, ICSE ’13, San Francisco, CA USA, May 18–26, 2013, pp 422–431Google Scholar
  13. Free Software Foundation (2015) Categories of free and nonfree software. Last accessed: 2015/01/15
  14. F-Droid. Last accessed: 2015/01/15
  15. Germán DM, Hassan AE (2009) License integration patterns: addressing license mismatches in component-based development. In: 31st international conference on software engineering, ICSE 2009, May 16-24, 2009, Vancouver, Canada, Proceedings, pp 188–198Google Scholar
  16. Germán DM, Di Penta M, Guéhéneuc Y, siblings G. Antoniol. (2009) Code technical and legal implications of copying code between applications. In: Proceedings of the 6th international working conference on mining software repositories, MSR 2009 (Co-located with ICSE), Vancouver, BC Canada May 16-17, 2009 Proceedings, pp 81–90Google Scholar
  17. Germán DM, Di Penta M, Davies J (2010a) Understanding and auditing the licensing of open source software distributions. In: The 18th IEEE international conference on program comprehension, ICPC 2010, Braga, Minho, Portugal, June 30-July 2 2010, pp 84–93Google Scholar
  18. Germán DM, Manabe Y, Inoue K (2010b) A sentence-matching method for automatic license identification of source code files. In: ASE 2010, 25th IEEE/ACM international conference on automated software engineering, Antwerp Belgium, September 20–24 2010, pp 437–446Google Scholar
  19. GitHub API. Last accessed: 2015/01/15
  20. GNU General Public License (2015). Last accessed: 2015/01/15
  21. Gobeille R (2008) The FOSSology project. In: Proceedings of the 2008 international working conference on mining software repositories, MSR 2008 (Co-located with ICSE), Leipzig, Germany May 10–11, 2008 Proceedings, pp 47–50Google Scholar
  22. Grechanik M, Fu C, Xie Q, McMillan C, Poshyvanyk D, Cumby C (2010) A search engine for finding highly relevant applications. In: Proceedings of the 32Nd ACM/IEEE international conference on software engineering - Volume 1, ICSE ’10, New York, NY, USA ACM, pp 475–484Google Scholar
  23. Holmes R, Murphy GC (2005) Using structural context to recommend source code examples. In: 27th international conference on software engineering (ICSE 2005), 15–21 May 2005 St. Louis, Missouri USA, pp 117–125Google Scholar
  24. Howison J, Conklin M, Crowston K FLOSSmole: a collaborative repository for FLOSS research data and analyses. IJITWE’06 1:17–26Google Scholar
  25. Linares-Vásquez M, Cortés-Coy LF, Aponte J, Poshyvanyk D (2015) ChangeScribe: A tool for automatically generating commit messages. In: 37th IEEE/ACM international conference on software engineering (ICSE’15), formal research tool demonstration, page to appearGoogle Scholar
  26. Manabe Y, Hayase Y, Inoue K (2010) Evolutional analysis of licenses in FOSS. In: Proceedings of the joint ERCIM workshop on software evolution (EVOL) and international workshop on principles of software evolution (IWPSE), Antwerp, Belgium, September 20–21, 2010, pp 83–87 ACMGoogle Scholar
  27. McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C (2011) Portfolio: finding relevant functions and their usage. In: Proceedings of the 33rd international conference on software engineering, ICSE ’11, New York, NY, USA, ACMGoogle Scholar
  28. McMillan C, Grechanik M, Poshyvanyk D (2012a) Detecting similar software applications, pp 364– 374Google Scholar
  29. McMillan C, Grechanik M, Poshyvanyk D, Fu C, Xie Q (2012b) Exemplar: A source code search engine for finding highly relevant applications. IEEE Trans Softw Eng 38(5):1069–1087Google Scholar
  30. McMillan C, Hariri N, Poshyvanyk D, Cleland-Huang J, Mobasher B (2012c) Recommending source code for use in rapid software prototypes. In: Proceedings of the 34th international conference on software engineering, ICSE ’12, Piscataway, NJ, USA, IEEE Press, pp 848–858Google Scholar
  31. Mcmillan C, Poshyvanyk D, Grechanik M, Xie Q, Fu C. (2013) Portfolio: searching for relevant functions and their usages in millions of lines of code. ACM Trans Softw Eng Methodol 22(4):37:1–37:30Google Scholar
  32. Moreno L, Bavota G, Di Penta M, Oliveto R, Marcus A, Canfora G (2014) Automatic generation of release notes. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, (FSE-22), Hong Kong, China November 16–22 2014, pp 484–495Google Scholar
  33. Nagappan M, Zimmermann T, Bird C (2013) Diversity in software engineering research. In: Joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18–26 2013, pp 466–476Google Scholar
  34. Oracle MySQL - FOSS License Exception. Last accessed: 2015/01/15
  35. Penta MD, Germán DM (2009) Who are source code contributors and how do they change?. In: 16th working conference on reverse engineering, WCRE 2009, 13–16 October 2009, Lille France, pp 11–20Google Scholar
  36. PF: The OpenBSD Packet Filter Last accessed: 2015/01/15
  37. Ponzanelli L, Bacchelli A, Lanza M (2013) Leveraging crowd knowledge for software comprehension and development. In: 17th european conference on software maintenance and reengineering, CSMR 2013, Genova, Italy, March 5–8 2013, pp 57–66Google Scholar
  38. Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014) Mining stackoverflow to turn the IDE into a self-confident programming prompter. In: 11th working conference on mining software repositories, MSR 2014, Proceedings, May 31 - June 1 Hyderabad, India, pp 102–111Google Scholar
  39. Singh P, Phelps C (2009) Networks, social influence, and the choice among competing innovations: Insights from open source software licenses. Inf Syst Res 24 (3):539–560CrossRefGoogle Scholar
  40. Sojer M, Henkel J (2010) Code reuse in open source software development: Quantitative evidence, drivers, and impediments. J Assoc Inf Syst 11(12):868–901Google Scholar
  41. Software Package Data Exchange (SPDX) Llast accessed: 2015/01/15
  42. State of the Octoverse in 2012 Last accessed: 2015/01/15
  43. The BSD 2-Clause License. Last accessed: 2015/01/15
  44. Tuunanen T, Koskinen J, Kärkkäinen T (2009) Automated software license analysis. Softw Autom Eng 16(3-4):455–490CrossRefGoogle Scholar
  45. Vendome C, Linares-Vásquez M, Bavota G, Di Penta M, Germán DM, Poshyvanyk D (2015a) License usage and changes: A large-scale study of Java projects on GitHub. In: The 23rd IEEE international conference on program comprehension, ICPC 2015, Florence, Italy, May 18–19, 2015. IEEEGoogle Scholar
  46. Vendome C, Linares-Vásquez M, Bavota G, Di Penta M, German DM, Poshyvanyk D (2015b) When and why developers adopt and change software licenses. In: The 31st IEEE international conference on software maintenance and evolution, ICSME 2015 Bremen, Germany, September 29 - October 1, 2015, pages 31–40 IEEEGoogle Scholar
  47. Wu Y, Manabe Y, Kanda T, Germán DM, Inoue K (2015) A method to detect license inconsistencies in large-scale open source projectsGoogle Scholar
  48. Zapponi C Githut.

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Christopher Vendome
    • 1
    Email author
  • Gabriele Bavota
    • 2
  • Massimiliano Di Penta
    • 3
  • Mario Linares-Vásquez
    • 1
  • Daniel German
    • 4
  • Denys Poshyvanyk
    • 1
  1. 1.The College of William and MaryWilliamsburgUSA
  2. 2.Free University of Bozen-BolzanoBozen-BolzanoItaly
  3. 3.University of SannioBeneventoItaly
  4. 4.University of VictoriaBritish ColumbiaCanada

Personalised recommendations