Skip to main content

A Model of the Commit Size Distribution of Open Source

  • Conference paper
Book cover SOFSEM 2013: Theory and Practice of Computer Science (SOFSEM 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7741))

Abstract

A fundamental unit of work in programming is the code contribution (“commit”) that a developer makes to the code base of the project in work. We use statistical methods to derive a model of the probabilistic distribution of commit sizes in open source projects and we show that the model is applicable to different project sizes. We use both graphical as well as statistical methods to validate the goodness of fit of our model. By measuring and modeling a fundamental dimension of programming we help improve software development tools and our understanding of software development.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alali, A., Kagdi, H., Maletic, J.I.: What’s a typical commit? A characterization of open source software repositories. In: International Conference on Program Comprehension, pp. 182–191. IEEE Computer Society, Los Alamitos (2008)

    Chapter  Google Scholar 

  2. Arafat, O., Riehle, D.: The commit size distribution of open source software. In: Hawaii International Conference on System Sciences, pp. 1–8. IEEE Computer Society, Los Alamitos (2009)

    Google Scholar 

  3. Beecher, K., Boldyreff, C., Capiluppi, A., Rank, S.: Evolutionary success of open source software: An investigation into exogenous drivers. Electronic Communications of the EASST 8 (2008)

    Google Scholar 

  4. Canfora, G., Cerulo, L., Di Penta, M.: Ldiff: An enhanced line differencing tool. In: Proceedings of the 31st International Conference on Software Engineering, ICSE 2009, pp. 595–598. IEEE Computer Society, Washington, DC (2009), http://dx.doi.org/10.1109/ICSE.2009.5070564

    Google Scholar 

  5. Coles, S.: An introduction to statistical modeling of extreme values. Springer, London (2001)

    MATH  Google Scholar 

  6. Daffara, C.: How many stable and active libre software projects? (2007), http://flossmetrics.org/news/11

  7. Deshpande, A., Riehle, D.: Continuous Integration in Open Source Software Development. In: Russo, B., Damiani, E., Hissam, S., Lundell, B., Succi, G. (eds.) Open Source Development, Communities and Quality. IFIP, vol. 275, pp. 273–280. Springer, Boston (2008), http://dx.doi.org/10.1007/978-0-387-09684-1_23

    Chapter  Google Scholar 

  8. Gartner: User Survey Analysis: Open-Source Software, Worldwide (2008), http://www.gartner.com/DisplayDocument?id=757916

  9. Ghezzi, G., Gall, H.: Towards software analysis as a service. In: Proceedings of the 4th International ERCIM Workshop on Software Evolution and Evolvability, pp. 1–10. IEEE (2008)

    Google Scholar 

  10. Gibbons, J.D., Chakraborti, S.: Tests of Goodness of Fit. In: Nonparametric Statistical Inference, pp. 144–145. CRC Press (2003)

    Google Scholar 

  11. Hassan, A., Holt, R., Mockus, A. (eds.): Proceedings of the 1st International Workshop on Mining Software Repositories, MSR 2004 (2004)

    Google Scholar 

  12. Hindle, A., German, D.M., Holt, R.: What do large commits tell us? A taxonomical study of large commits. In: Proc. of the 2008 International Working Conference on Mining Software Repositories, MSR 2008, pp. 99–108. ACM, New York (2008), http://doi.acm.org/10.1145/1370750.1370773

    Chapter  Google Scholar 

  13. Hofmann, P., Riehle, D.: Estimating Commit Sizes Efficiently. In: Boldyreff, C., Crowston, K., Lundell, B., Wasserman, A. (eds.) OSS 2009. IFIP AICT, vol. 299, pp. 105–115. Springer, Boston (2009), http://dx.doi.org/10.1007/978-3-642-02032-2_11

    Google Scholar 

  14. Lind, R., Vairavan, K.: An experimental investigation of software metrics and their relationship to software development effort. IEEE Transactions on Software Engineering 15, 649–653 (1989)

    Article  Google Scholar 

  15. MathWorks: Generalized Pareto Distribution, http://www.mathworks.com/help/toolbox/stats/brn2ivz-52.html

  16. Nagappan, N., Zeller, A., Zimmermann, T.: Guest editors’ introduction: Mining software archives. IEEE Software 26(1), 24–25 (2009)

    Article  Google Scholar 

  17. Newman, M.E.J.: Power laws, Pareto distributions and Zipf’s law. Contemporary Physics 46(5), 323–351 (2005)

    Article  Google Scholar 

  18. Ohloh: Forum topic: Multiple enlistments (2010), http://www.ohloh.net/forums/8/topics/4497

  19. Paulson, J.W., Succi, G., Eberlein, A.: An empirical study of open-source and closed-source software products. IEEE Transactions on Software Engineering 30, 246–256 (2004)

    Article  Google Scholar 

  20. Purushothaman, R., Perry, D.E.: Toward understanding the rhetoric of small source code changes. IEEE Transactions on Software Engineering 31, 511–526 (2005)

    Article  Google Scholar 

  21. Ribatet, M.: A User’s Guide to the POT Package, 1.4 edn. (2007)

    Google Scholar 

  22. Singh, V.P., Guo, H.: Parameter estimation for 3-parameter generalized pareto distribution by the principle of maximum entropy (POME). Hydrological Sciences Journal 40(2), 165–181 (1995)

    Article  Google Scholar 

  23. Weißgerber, P., Neu, D., Diehl, S.: Small patches get in! In: Proceedings of the 5th Working Conference on Mining Software Repositories (MSR 2008), pp. 67–76 (2008)

    Google Scholar 

  24. Zenoss Inc.: 2010 Open Source Systems Management Survey (2010), http://community.zenoss.org/servlet/JiveServlet/download/38-3009/OpenSourceManagement.pdf

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kolassa, C., Riehle, D., Salim, M.A. (2013). A Model of the Commit Size Distribution of Open Source. In: van Emde Boas, P., Groen, F.C.A., Italiano, G.F., Nawrocki, J., Sack, H. (eds) SOFSEM 2013: Theory and Practice of Computer Science. SOFSEM 2013. Lecture Notes in Computer Science, vol 7741. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35843-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35843-2_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35842-5

  • Online ISBN: 978-3-642-35843-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics