Abstract
Context Software systems continuously change for various reasons, such as adding new features, fixing bugs, or refactoring. Changes may either increase the source code complexity and disorganization, or help to reducing it. Aim This paper empirically investigates the relationship of source code complexity and disorganization—measured using source code change entropy—with four factors, namely the presence of refactoring activities, the number of developers working on a source code file, the participation of classes in design patterns, and the different kinds of changes occurring on the system, classified in terms of their topics extracted from commit notes. Method We carried out an exploratory study on an interval of the life-time span of four open source systems, namely ArgoUML, Eclipse-JDT, Mozilla, and Samba, with the aim of analyzing the relationship between the source code change entropy and four factors: refactoring activities, number of contributors for a file, participation of classes in design patterns, and change topics. Results The study shows that (i) the change entropy decreases after refactoring, (ii) files changed by a higher number of developers tend to exhibit a higher change entropy than others, (iii) classes participating in certain design patterns exhibit a higher change entropy than others, and (iv) changes related to different topics exhibit different change entropy, for example bug fixings exhibit a limited change entropy while changes introducing new features exhibit a high change entropy. Conclusions Results provided in this paper indicate that the nature of changes (in particular changes related to refactorings), the software design, and the number of active developers are factors related to change entropy. Our findings contribute to understand the software aging phenomenon and are preliminary to identifying better ways to contrast it.
Similar content being viewed by others
Notes
In the following for Eclipse-JDT we refer to classes rather than to files, knowing that in the discussed cases there is a correspondence between a class and a file.
References
Aversano L, Canfora G, Cerulo L, Del Grosso C, Di Penta M (2007) An empirical study on the evolution of design patterns. In: ESEC-FSE ’07: proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM Press, New York, pp 385–394
Aversano L, Cerulo L, Di Penta M (2009) The relationship between design patterns defects and crosscutting concern scattering degree: an empirical study. IET Softw 3(5):395–409
Bianchi A, Caivano D, Lanubile F, Visaggio G (2001) Evaluating software degradation through entropy. In: METRICS ’01: Proceedings of the 7th international symposium on software metrics. IEEE Computer Society, Washington, DC, p 210
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Canfora G, Cerulo L, Di Penta M, Pacilio F (2010) An exploratory study of factors influencing change entropy. In: The 18th IEEE international conference on program comprehension, ICPC 2010, Braga, Minho, Portugal, 30 June–2 July 2010. IEEE Computer Society, Washington, DC, pp 134–143
Capiluppi A, Fernández-Ramil J, Higman J, Sharp HC, Smith N (2007) An empirical study of the evolution of an agile-developed software system. In: 29th international conference on software engineering (ICSE 2007), Minneapolis, MN, USA, 20–26 May 2007. IEEE Computer Society, Washington, DC, pp 511–518
Chapin N (1995) An entropy metric for software maintainability. In: Proceedings of the 28th Hawaii international conference on system sciences, pp 522–523
Chikofsky EJ, Cross JH II (1990) Reverse engineering and design recovery: a taxonomy. IEEE Softw 7(1):13–17
Di Penta M, Germán DM (2009) Who are source code contributors and how do they change? In: 16th working conference on reverse engineering, WCRE 2009, 13–16 October 2009, Lille, France. IEEE Computer Society, Washington, DC, pp 11–20
Di Penta M, Germán DM, Guéhéneuc Y-G, Antoniol G (2010) An exploratory study of the evolution of software licensing. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering, ICSE 2010, Cape Town, South Africa, 1–8 May 2010. ACM, New York, pp 145–154
Eick SG, Graves TL, Karr AF, Marron JS, Mockus A (2001) Does code decay? Assessing the evidence from change management data. IEEE Trans Softw Eng 27(1):1–12
Fowler M, Beck K, Brant J, Opdyke W, Roberts D (1999) Refactoring: improving the design of existing code. Addison-Wesley, Reading
Gall H, Jazayeri M, Krajewski J (2003) CVS release history data for detecting logical couplings. In: IWPSE ’03: Proceedings of the 6th international workshop on principles of software evolution. IEEE Computer Society, Washington, DC, pp 13–23
Gamma E, Helm R, Johnson R, Vlissides J (1995) Design patterns: elements of reusable object oriented software. Addison-Wesley, Reading
Grissom RJ, Kim JJ (2005) Effect sizes for research: a broad practical approach, 2nd edn. Lawrence Earlbaum Associates, Hillsdale
Harrison W (1992) An entropy-based measure of software complexity. IEEE Trans Softw Eng 18(11):1025–1029
Hassan AE (2009) Predicting faults using the complexity of code changes. In: 31st international conference on software engineering, ICSE 2009, 16–24 May 2009, Vancouver, Canada, pp 78–88
Hassan AE, Holt RC (2003) The chaos of software development. In: IWPSE ’03: Proceedings of the 6th international workshop on principles of software evolution. IEEE Computer Society, Washington, DC, p 84
Holm S (1979) A simple sequentially rejective Bonferroni test procedure. Scand J Statist 6:65–70
Kuhn A, Ducasse S, Gírba T (2007) Semantic clustering: identifying topics in source code. Inf Softw Technol 49:230–243
Lehman MM (1980) Programs life cycles and laws of software evolution. Proc IEEE 68(9):1060–1076
Lehman MM, Belady LA (1985) Software evolution—processes of software change. Academic, London
Linstead E, Baldi P (2009) Mining the coherence of gnome bug reports with statistical topic models. In: Proceedings of the 2009 6th IEEE international working conference on mining software repositories, MSR ’09. IEEE Computer Society, Washington, DC, pp 99–102
Nagappan N, Ball T (2007) Using software dependencies and churn metrics to predict field failures: an empirical case study. In: Proceedings of the first international symposium on empirical software engineering and measurement, ESEM 2007, 20–21 September 2007, Madrid, Spain. IEEE Computer Society, Washington, DC, pp 364–373
Parnas DL (1994) Software aging. In: Proceedings of the international conference on software engineering, pp 279–287
Ratzinger J, Sigmund T, Gall H (2008) On the relation of refactorings and software defect prediction. In: Proceedings of the 2008 international working conference on mining software repositories, MSR 2008, Leipzig, Germany, 10–11 May 2008. ACM, New York, pp 35–38
Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423, 625–656
Sheskin DJ (2007) Handbook of parametric and nonparametric statistical procedures, 4th edn. Chapman & Hall, London
Thomas SW, Adams B, Hassan AE, Blostein D (2010) Validating the use of topic models for software evolution. In: IEEE international workshop on source code analysis and manipulation. IEEE Computer Society, Los Alamitos, pp 55–64
Tsantalis N, Chatzigeorgiou A, Stephanides G, Halkidis ST (2006) Design pattern detection using similarity scoring. IEEE Trans Softw Eng 32(11):896–909
van Rijsbergen CJ, Robertson SE, Porter MF (1980) New models in probabilistic information retrieval. In: British Library research and development report, no. 5587. British Library, London
Zimmermann T, Weisgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: ICSE ’04: Proceedings of the 26th international conference on software engineering. IEEE Computer Society, Washington, DC, pp 563–572
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is an extension of the paper “An Exploratory Study of Factors Influencing Change Entropy” (Canfora et al. 2010).
Appendix: Detailed Analyses
Appendix: Detailed Analyses
Rights and permissions
About this article
Cite this article
Canfora, G., Cerulo, L., Cimitile, M. et al. How changes affect software entropy: an empirical study. Empir Software Eng 19, 1–38 (2014). https://doi.org/10.1007/s10664-012-9214-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-012-9214-z