Empirical Software Engineering

, Volume 22, Issue 3, pp 1146–1193 | Cite as

A large-scale study of architectural evolution in open-source software systems

  • Pooyan Behnamghader
  • Duc Minh Le
  • Joshua Garcia
  • Daniel Link
  • Arman Shahbazian
  • Nenad Medvidovic


From its very inception, the study of software architecture has recognized architectural decay as a regularly occurring phenomenon in long-lived systems. Architectural decay is caused by repeated, sometimes careless changes to a system during its lifespan. Despite decay’s prevalence, there is a relative dearth of empirical data regarding the nature of architectural changes that may lead to decay, and of developers’ understanding of those changes. In this paper, we take a step toward addressing that scarcity by introducing an architecture recovery framework, ARCADE, for conducting large-scale replicable empirical studies of architectural change across different versions of a software system. ARCADE includes two novel architectural change metrics, which are the key to enabling large-scale empirical studies of architectural change. We utilize ARCADE to conduct an empirical study of changes found in software architectures spanning several hundred versions of 23 open-source systems. Our study reveals several new findings regarding the frequency of architectural changes in software systems, the common points of departure in a system’s architecture during the system’s maintenance and evolution, the difference between system-level and component-level architectural change, and the suitability of a system’s implementation-level structure as a proxy for its architecture.


Software architecture Architectural change Software evolution Open-source software Architecture recovery 


  1. Agnew B, Hofmeister C, Purtilo J (1994) Planning for change: a reconfiguration language for distributed systems. Distrib Syst Eng 1(5):313CrossRefGoogle Scholar
  2. Amazon (2015) Amazon command line interface.
  3. Apache (2014a) Apache portable runtime versioning.
  4. Apache (2014b) Hadoop releases.
  5. Apache (2014c) Lucene wiki.
  6. Apache (2015a) Apache ant.
  7. Apache (2015b) Apache maven.
  8. ARCADE (2015) arcade:start [USC SoftArch Wiki].
  9. Bitbucket (2015) Bitbucket.
  10. Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84CrossRefGoogle Scholar
  11. Bouwers E, Correia JP, van Deursen A, Visser J (2011a) Quantifying the analyzability of software architectures. In: 9th working IEEE/IFIP conference on software architecture (WICSA), 2011. IEEE, pp 83–92Google Scholar
  12. Bouwers E, van Deursen A, Visser J (2011b) Dependency profiles for software architecture evaluations. In: 27th IEEE international conference on software maintenance (ICSM), 2011. IEEE, pp 540– 543Google Scholar
  13. Bouwers E, Deursen Av, Visser J (2013) Evaluating usefulness of software metrics: an industrial experience report. In: ICSE. IEEE Press, pp 921–930Google Scholar
  14. Chatzigeorgiou A, Manakos A (2010) Investigating the evolution of bad smells in object-oriented code. In: 17th international conference on the quality of information and communications technology (QUATIC), 2010. IEEE, pp 106–115Google Scholar
  15. D’Ambros M, Gall H, Lanza M, Pinzger M (2008) Analysing software repositories to understand software evolution. In: Software evolution. Springer, pp 37–67Google Scholar
  16. Ducasse S, Pollet D (2009) Software architecture reconstruction: a process-oriented taxonomy. IEEE Trans Softw Eng 35(4):573–591CrossRefGoogle Scholar
  17. Eick SG, Graves TL, Karr AF, Marron JS, Mockus A (2001) Does code decay? Assessing the evidence from change management data. IEEE Trans Softw Eng 27(1):1–12CrossRefGoogle Scholar
  18. Garcia J, Popescu D, Mattmann C, Medvidovic N, Cai Y (2011) Enhancing architectural recovery using concerns. In: Proceedings of the 2011 26th IEEE/ACM international conference on automated software engineering. IEEE Computer Society, pp 552–555Google Scholar
  19. Garcia J, Krka I, Medvidovic N, Douglas C (2012) A framework for obtaining the ground-truth in architectural recovery. In: Joint working IEEE/IFIP conference on software architecture (WICSA) and European conference on software architecture (ECSA), 2012. IEEE, pp 292–296Google Scholar
  20. Garcia J, Ivkovic I, Medvidovic N (2013a) A comparative analysis of software architecture recovery techniques. In: IEEE/ACM 28th international conference on automated software engineering (ASE), 2013. IEEE, pp 486–496Google Scholar
  21. Garcia J, Krka I, Mattmann C, Medvidovic N (2013b) Obtaining ground-truth software architectures. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 901–910Google Scholar
  22. Ghezzi G, Gall HC (2013) Replicating mining studies with sofas. In: Proceedings of the 10th working conference on mining software repositories. IEEE Press, pp 363–372Google Scholar
  23. Git (2014) Git log.
  24. Git (2015) Github.
  25. Godfrey MW, Tu Q (2000) Evolution in open source software: a case study. In: Proceedings of the international conference on software maintenance, 2000. IEEE, pp 131–142Google Scholar
  26. Google (2015a) Google cloud platform.
  27. Holt R, Pak JY (1996) Gase: visualizing software evolution-in-the-large. In: Proceedings of the 3rd working conference on reverse engineering, 1996. IEEE, pp 163–167Google Scholar
  28. Kim M, Sazawal V, Notkin D, Murphy G (2005) An empirical study of code clone genealogies. In: ACM SIGSOFT software engineering notes, vol 30. ACM, pp 187–196Google Scholar
  29. Koschke R (2005) What architects should know about reverse engineering and rengineering. In: Null. IEEE, pp 4–10Google Scholar
  30. Koschke R (2009) Architecture reconstruction. In: Software engineering. Springer, pp 140–173Google Scholar
  31. Kruchten PB (1995) The 4+ 1 view model of architecture. IEEE Softw 12(6):42–50CrossRefGoogle Scholar
  32. Langhammer M, Shahbazian A, Medvidovic N, Reussner R (2016) Automated extraction of rich software models from limited system information. In: Proceedings of the 13th working IEEE/IFIP conference on software architecture (WICSA). IEEEGoogle Scholar
  33. Le DM, Behnamghader P, Garcia J, Link D, Shahbazian A, Medvidovic N (2015) An empirical study of architectural change in open-source software systems. In: Proceedings of the 12th working conference on mining software repository (MSR)Google Scholar
  34. Le DM, Carrillo C, Capilla R, Medvidovic N (2016) Relating architectural decay and sustainability of software systems. In: Proceedings of the 13th working IEEE/IFIP conference on software architecture (WICSA). IEEEGoogle Scholar
  35. Lehman MM (1980) Programs, life cycles, and laws of software evolution. Proc IEEEGoogle Scholar
  36. Lutellier T, Chollack D, Garcia J, Tan L, Rayside D, Medvidovic N, Kroeger R (2015) Comparing software architecture recovery techniques using accurate dependencies. In: Proceedings of the 37th international conference on software engineering (ICSE 2015). Software Engineering in Practice TrackGoogle Scholar
  37. Mahajan S, Li B, Behnamghader P, Halfond WG (2016) Using visual symptoms for debugging presentation failures in web applications. In: Proceeding of the 9th IEEE international conference on software testing, verification, and validation (ICST)Google Scholar
  38. Maqbool O, Babri H et al (2007) Hierarchical clustering for software architecture recovery. IEEE Trans Softw Eng 33(11):759–780CrossRefGoogle Scholar
  39. McCallum A (2002) Mallet: A machine learning for language toolkitGoogle Scholar
  40. Medvidovic N (1996) Adls and dynamic architecture changes. In: Joint proceedings of the second international software architecture workshop (ISAW-2) and international workshop on multiple perspectives in software development (Viewpoints’ 96) on SIGSOFT’96 workshops. ACM, pp 24–27Google Scholar
  41. Mengué O (2014) Svn graph branches.
  42. Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1):32–38MathSciNetCrossRefzbMATHGoogle Scholar
  43. Murgia A, Concas G, Pinna S, Tonelli R, Turnu I (2009) Empirical study of software quality evolution in open source projects using agile practices. In: Proceedings of the 1st international symposium on emerging trends in software metrics, 2009. Lulu.comGoogle Scholar
  44. Nakamura T, Basili VR (2005) Metrics of software architecture changes based on structural distance. In: 11th IEEE international symposium on software metrics, 2005. IEEE, pp 24–24Google Scholar
  45. Oreizy P, Medvidovic N, Taylor RN (1998) Architecture-based runtime software evolution. In: Proceedings of the 20th international conference on Software engineering. IEEE Computer Society, pp 177–186Google Scholar
  46. Perry DE, Wolf AL (1992) Foundations for the study of software architecture. ACM SIGSOFT Softw Eng Notes 17(4):40–52CrossRefGoogle Scholar
  47. PMD (2015) Pmd documentation.
  48. Robles G (2010) Replicating msr: a study of the potential replicability of papers published in the mining software repositories proceedings. In: 7th IEEE working conference on mining software repositories (MSR), 2010. IEEE, pp 171–180Google Scholar
  49. Shahbazian A, Edwards G, Medvidovic N (2016) An end-to-end domain specific modeling and analysis platform. In: IEEE/ACM 38th IEEE international conference on software engineering (ICSE), 2016. IEEEGoogle Scholar
  50. Shirali S, Vasudeva HL (2005) Metric spaces. Springer Science & Business MediaGoogle Scholar
  51. Struts (2014) Struts wiki.
  52. Taylor R, Medvidovic N, Dashofy E (2009) Software architecture: foundations, theory, and practiceCrossRefGoogle Scholar
  53. Tu Q, Godfrey MW (2002) An integrated approach for studying architectural evolution. In: Proceedings of 10th international workshop on program comprehension, 2002. IEEE, pp 127–136Google Scholar
  54. Tzerpos V, Holt RC (1999) Mojo: a distance metric for software clusterings. In: Proceedings of 6th working conference on reverse engineering, 1999. IEEE, pp 187–193Google Scholar
  55. Tzerpos V, Holt RC (2000) Acdc: An algorithm for comprehension-driven clustering. In: wcre. IEEE, p 258Google Scholar
  56. Van Deursen A, Hofmeister C, Koschke R, Moonen L, Riva C (2004) Symphony: view-driven software architecture reconstruction. In: Proceedings of the 4th working IEEE/IFIP conference on software architecture, 2004 (WICSA 2004). IEEE, pp 122–132Google Scholar
  57. Wen Z, Tzerpos V (2004) An effectiveness measure for software clustering algorithms. In: Proceedings of the 12th IEEE international workshop on program comprehension, 2004. IEEE, pp 194–203Google Scholar
  58. Wettel R, Lanza M (2008) Visual exploration of large-scale system evolution. In: 15th working conference on reverse engineering, 2008 (WCRE’08). IEEE, pp 219–228Google Scholar
  59. Xing EP, Jordan MI, Russell S, Ng AY (2002) Distance metric learning with application to clustering with side-information. In: Advances in neural information processing systems, pp 505–512Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Computer Science DepartmentUniversity of Southern CaliforniaLos AngelesUSA
  2. 2.Institute for Software ResearchUniversity of CaliforniaIrvineUSA

Personalised recommendations