Empirical Software Engineering

, Volume 19, Issue 1, pp 1–38 | Cite as

How changes affect software entropy: an empirical study

  • Gerardo Canfora
  • Luigi Cerulo
  • Marta Cimitile
  • Massimiliano Di PentaEmail author


Context Software systems continuously change for various reasons, such as adding new features, fixing bugs, or refactoring. Changes may either increase the source code complexity and disorganization, or help to reducing it. Aim This paper empirically investigates the relationship of source code complexity and disorganization—measured using source code change entropy—with four factors, namely the presence of refactoring activities, the number of developers working on a source code file, the participation of classes in design patterns, and the different kinds of changes occurring on the system, classified in terms of their topics extracted from commit notes. Method We carried out an exploratory study on an interval of the life-time span of four open source systems, namely ArgoUML, Eclipse-JDT, Mozilla, and Samba, with the aim of analyzing the relationship between the source code change entropy and four factors: refactoring activities, number of contributors for a file, participation of classes in design patterns, and change topics. Results The study shows that (i) the change entropy decreases after refactoring, (ii) files changed by a higher number of developers tend to exhibit a higher change entropy than others, (iii) classes participating in certain design patterns exhibit a higher change entropy than others, and (iv) changes related to different topics exhibit different change entropy, for example bug fixings exhibit a limited change entropy while changes introducing new features exhibit a high change entropy. Conclusions Results provided in this paper indicate that the nature of changes (in particular changes related to refactorings), the software design, and the number of active developers are factors related to change entropy. Our findings contribute to understand the software aging phenomenon and are preliminary to identifying better ways to contrast it.


Software entropy Software complexity Mining software repositories 


  1. Aversano L, Canfora G, Cerulo L, Del Grosso C, Di Penta M (2007) An empirical study on the evolution of design patterns. In: ESEC-FSE ’07: proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering. ACM Press, New York, pp 385–394Google Scholar
  2. Aversano L, Cerulo L, Di Penta M (2009) The relationship between design patterns defects and crosscutting concern scattering degree: an empirical study. IET Softw 3(5):395–409CrossRefGoogle Scholar
  3. Bianchi A, Caivano D, Lanubile F, Visaggio G (2001) Evaluating software degradation through entropy. In: METRICS ’01: Proceedings of the 7th international symposium on software metrics. IEEE Computer Society, Washington, DC, p 210Google Scholar
  4. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  5. Canfora G, Cerulo L, Di Penta M, Pacilio F (2010) An exploratory study of factors influencing change entropy. In: The 18th IEEE international conference on program comprehension, ICPC 2010, Braga, Minho, Portugal, 30 June–2 July 2010. IEEE Computer Society, Washington, DC, pp 134–143Google Scholar
  6. Capiluppi A, Fernández-Ramil J, Higman J, Sharp HC, Smith N (2007) An empirical study of the evolution of an agile-developed software system. In: 29th international conference on software engineering (ICSE 2007), Minneapolis, MN, USA, 20–26 May 2007. IEEE Computer Society, Washington, DC, pp 511–518Google Scholar
  7. Chapin N (1995) An entropy metric for software maintainability. In: Proceedings of the 28th Hawaii international conference on system sciences, pp 522–523Google Scholar
  8. Chikofsky EJ, Cross JH II (1990) Reverse engineering and design recovery: a taxonomy. IEEE Softw 7(1):13–17CrossRefGoogle Scholar
  9. Di Penta M, Germán DM (2009) Who are source code contributors and how do they change? In: 16th working conference on reverse engineering, WCRE 2009, 13–16 October 2009, Lille, France. IEEE Computer Society, Washington, DC, pp 11–20Google Scholar
  10. Di Penta M, Germán DM, Guéhéneuc Y-G, Antoniol G (2010) An exploratory study of the evolution of software licensing. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering, ICSE 2010, Cape Town, South Africa, 1–8 May 2010. ACM, New York, pp 145–154Google Scholar
  11. Eick SG, Graves TL, Karr AF, Marron JS, Mockus A (2001) Does code decay? Assessing the evidence from change management data. IEEE Trans Softw Eng 27(1):1–12CrossRefGoogle Scholar
  12. Fowler M, Beck K, Brant J, Opdyke W, Roberts D (1999) Refactoring: improving the design of existing code. Addison-Wesley, ReadingGoogle Scholar
  13. Gall H, Jazayeri M, Krajewski J (2003) CVS release history data for detecting logical couplings. In: IWPSE ’03: Proceedings of the 6th international workshop on principles of software evolution. IEEE Computer Society, Washington, DC, pp 13–23Google Scholar
  14. Gamma E, Helm R, Johnson R, Vlissides J (1995) Design patterns: elements of reusable object oriented software. Addison-Wesley, ReadingGoogle Scholar
  15. Grissom RJ, Kim JJ (2005) Effect sizes for research: a broad practical approach, 2nd edn. Lawrence Earlbaum Associates, HillsdaleGoogle Scholar
  16. Harrison W (1992) An entropy-based measure of software complexity. IEEE Trans Softw Eng 18(11):1025–1029CrossRefGoogle Scholar
  17. Hassan AE (2009) Predicting faults using the complexity of code changes. In: 31st international conference on software engineering, ICSE 2009, 16–24 May 2009, Vancouver, Canada, pp 78–88Google Scholar
  18. Hassan AE, Holt RC (2003) The chaos of software development. In: IWPSE ’03: Proceedings of the 6th international workshop on principles of software evolution. IEEE Computer Society, Washington, DC, p 84Google Scholar
  19. Holm S (1979) A simple sequentially rejective Bonferroni test procedure. Scand J Statist 6:65–70zbMATHMathSciNetGoogle Scholar
  20. Kuhn A, Ducasse S, Gírba T (2007) Semantic clustering: identifying topics in source code. Inf Softw Technol 49:230–243CrossRefGoogle Scholar
  21. Lehman MM (1980) Programs life cycles and laws of software evolution. Proc IEEE 68(9):1060–1076CrossRefGoogle Scholar
  22. Lehman MM, Belady LA (1985) Software evolution—processes of software change. Academic, LondonGoogle Scholar
  23. Linstead E, Baldi P (2009) Mining the coherence of gnome bug reports with statistical topic models. In: Proceedings of the 2009 6th IEEE international working conference on mining software repositories, MSR ’09. IEEE Computer Society, Washington, DC, pp 99–102Google Scholar
  24. Nagappan N, Ball T (2007) Using software dependencies and churn metrics to predict field failures: an empirical case study. In: Proceedings of the first international symposium on empirical software engineering and measurement, ESEM 2007, 20–21 September 2007, Madrid, Spain. IEEE Computer Society, Washington, DC, pp 364–373Google Scholar
  25. Parnas DL (1994) Software aging. In: Proceedings of the international conference on software engineering, pp 279–287Google Scholar
  26. Ratzinger J, Sigmund T, Gall H (2008) On the relation of refactorings and software defect prediction. In: Proceedings of the 2008 international working conference on mining software repositories, MSR 2008, Leipzig, Germany, 10–11 May 2008. ACM, New York, pp 35–38Google Scholar
  27. Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423, 625–656CrossRefzbMATHMathSciNetGoogle Scholar
  28. Sheskin DJ (2007) Handbook of parametric and nonparametric statistical procedures, 4th edn. Chapman & Hall, LondonzbMATHGoogle Scholar
  29. Thomas SW, Adams B, Hassan AE, Blostein D (2010) Validating the use of topic models for software evolution. In: IEEE international workshop on source code analysis and manipulation. IEEE Computer Society, Los Alamitos, pp 55–64Google Scholar
  30. Tsantalis N, Chatzigeorgiou A, Stephanides G, Halkidis ST (2006) Design pattern detection using similarity scoring. IEEE Trans Softw Eng 32(11):896–909CrossRefGoogle Scholar
  31. van Rijsbergen CJ, Robertson SE, Porter MF (1980) New models in probabilistic information retrieval. In: British Library research and development report, no. 5587. British Library, LondonGoogle Scholar
  32. Zimmermann T, Weisgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: ICSE ’04: Proceedings of the 26th international conference on software engineering. IEEE Computer Society, Washington, DC, pp 563–572Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Gerardo Canfora
    • 1
  • Luigi Cerulo
    • 2
  • Marta Cimitile
    • 3
  • Massimiliano Di Penta
    • 1
    Email author
  1. 1.Department of Engineering-RCOSTUniversity of SannioBeneventoItaly
  2. 2.Department of Biological and Environmental StudiesUniversity of SannioBeneventoItaly
  3. 3.Department of JurisprudenceUnitelma SapienzaNapoliItaly

Personalised recommendations