Abstract
A fundamental unit of work in programming is the code contribution (“commit”) that a developer makes to the code base of the project in work. We use statistical methods to derive a model of the probabilistic distribution of commit sizes in open source projects and we show that the model is applicable to different project sizes. We use both graphical as well as statistical methods to validate the goodness of fit of our model. By measuring and modeling a fundamental dimension of programming we help improve software development tools and our understanding of software development.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alali, A., Kagdi, H., Maletic, J.I.: What’s a typical commit? A characterization of open source software repositories. In: International Conference on Program Comprehension, pp. 182–191. IEEE Computer Society, Los Alamitos (2008)
Arafat, O., Riehle, D.: The commit size distribution of open source software. In: Hawaii International Conference on System Sciences, pp. 1–8. IEEE Computer Society, Los Alamitos (2009)
Beecher, K., Boldyreff, C., Capiluppi, A., Rank, S.: Evolutionary success of open source software: An investigation into exogenous drivers. Electronic Communications of the EASST 8 (2008)
Canfora, G., Cerulo, L., Di Penta, M.: Ldiff: An enhanced line differencing tool. In: Proceedings of the 31st International Conference on Software Engineering, ICSE 2009, pp. 595–598. IEEE Computer Society, Washington, DC (2009), http://dx.doi.org/10.1109/ICSE.2009.5070564
Coles, S.: An introduction to statistical modeling of extreme values. Springer, London (2001)
Daffara, C.: How many stable and active libre software projects? (2007), http://flossmetrics.org/news/11
Deshpande, A., Riehle, D.: Continuous Integration in Open Source Software Development. In: Russo, B., Damiani, E., Hissam, S., Lundell, B., Succi, G. (eds.) Open Source Development, Communities and Quality. IFIP, vol. 275, pp. 273–280. Springer, Boston (2008), http://dx.doi.org/10.1007/978-0-387-09684-1_23
Gartner: User Survey Analysis: Open-Source Software, Worldwide (2008), http://www.gartner.com/DisplayDocument?id=757916
Ghezzi, G., Gall, H.: Towards software analysis as a service. In: Proceedings of the 4th International ERCIM Workshop on Software Evolution and Evolvability, pp. 1–10. IEEE (2008)
Gibbons, J.D., Chakraborti, S.: Tests of Goodness of Fit. In: Nonparametric Statistical Inference, pp. 144–145. CRC Press (2003)
Hassan, A., Holt, R., Mockus, A. (eds.): Proceedings of the 1st International Workshop on Mining Software Repositories, MSR 2004 (2004)
Hindle, A., German, D.M., Holt, R.: What do large commits tell us? A taxonomical study of large commits. In: Proc. of the 2008 International Working Conference on Mining Software Repositories, MSR 2008, pp. 99–108. ACM, New York (2008), http://doi.acm.org/10.1145/1370750.1370773
Hofmann, P., Riehle, D.: Estimating Commit Sizes Efficiently. In: Boldyreff, C., Crowston, K., Lundell, B., Wasserman, A. (eds.) OSS 2009. IFIP AICT, vol. 299, pp. 105–115. Springer, Boston (2009), http://dx.doi.org/10.1007/978-3-642-02032-2_11
Lind, R., Vairavan, K.: An experimental investigation of software metrics and their relationship to software development effort. IEEE Transactions on Software Engineering 15, 649–653 (1989)
MathWorks: Generalized Pareto Distribution, http://www.mathworks.com/help/toolbox/stats/brn2ivz-52.html
Nagappan, N., Zeller, A., Zimmermann, T.: Guest editors’ introduction: Mining software archives. IEEE Software 26(1), 24–25 (2009)
Newman, M.E.J.: Power laws, Pareto distributions and Zipf’s law. Contemporary Physics 46(5), 323–351 (2005)
Ohloh: Forum topic: Multiple enlistments (2010), http://www.ohloh.net/forums/8/topics/4497
Paulson, J.W., Succi, G., Eberlein, A.: An empirical study of open-source and closed-source software products. IEEE Transactions on Software Engineering 30, 246–256 (2004)
Purushothaman, R., Perry, D.E.: Toward understanding the rhetoric of small source code changes. IEEE Transactions on Software Engineering 31, 511–526 (2005)
Ribatet, M.: A User’s Guide to the POT Package, 1.4 edn. (2007)
Singh, V.P., Guo, H.: Parameter estimation for 3-parameter generalized pareto distribution by the principle of maximum entropy (POME). Hydrological Sciences Journal 40(2), 165–181 (1995)
Weißgerber, P., Neu, D., Diehl, S.: Small patches get in! In: Proceedings of the 5th Working Conference on Mining Software Repositories (MSR 2008), pp. 67–76 (2008)
Zenoss Inc.: 2010 Open Source Systems Management Survey (2010), http://community.zenoss.org/servlet/JiveServlet/download/38-3009/OpenSourceManagement.pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kolassa, C., Riehle, D., Salim, M.A. (2013). A Model of the Commit Size Distribution of Open Source. In: van Emde Boas, P., Groen, F.C.A., Italiano, G.F., Nawrocki, J., Sack, H. (eds) SOFSEM 2013: Theory and Practice of Computer Science. SOFSEM 2013. Lecture Notes in Computer Science, vol 7741. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35843-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-35843-2_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35842-5
Online ISBN: 978-3-642-35843-2
eBook Packages: Computer ScienceComputer Science (R0)