Skip to main content

A large-scale study of architectural evolution in open-source software systems


From its very inception, the study of software architecture has recognized architectural decay as a regularly occurring phenomenon in long-lived systems. Architectural decay is caused by repeated, sometimes careless changes to a system during its lifespan. Despite decay’s prevalence, there is a relative dearth of empirical data regarding the nature of architectural changes that may lead to decay, and of developers’ understanding of those changes. In this paper, we take a step toward addressing that scarcity by introducing an architecture recovery framework, ARCADE, for conducting large-scale replicable empirical studies of architectural change across different versions of a software system. ARCADE includes two novel architectural change metrics, which are the key to enabling large-scale empirical studies of architectural change. We utilize ARCADE to conduct an empirical study of changes found in software architectures spanning several hundred versions of 23 open-source systems. Our study reveals several new findings regarding the frequency of architectural changes in software systems, the common points of departure in a system’s architecture during the system’s maintenance and evolution, the difference between system-level and component-level architectural change, and the suitability of a system’s implementation-level structure as a proxy for its architecture.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14


  1. 1 The current version of ARCADE (ARCADE 2015) also analyzes and quantifies different symptoms of architectural decay for a given system. However, these features are currently under evaluation and are outside the scope of this paper.


  • Agnew B, Hofmeister C, Purtilo J (1994) Planning for change: a reconfiguration language for distributed systems. Distrib Syst Eng 1(5):313

    Article  Google Scholar 

  • Amazon (2015) Amazon command line interface.

  • Apache (2014a) Apache portable runtime versioning.

  • Apache (2014b) Hadoop releases.

  • Apache (2014c) Lucene wiki.

  • Apache (2015a) Apache ant.

  • Apache (2015b) Apache maven.

  • ARCADE (2015) arcade:start [USC SoftArch Wiki].

  • Bitbucket (2015) Bitbucket.

  • Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84

    Article  Google Scholar 

  • Bouwers E, Correia JP, van Deursen A, Visser J (2011a) Quantifying the analyzability of software architectures. In: 9th working IEEE/IFIP conference on software architecture (WICSA), 2011. IEEE, pp 83–92

  • Bouwers E, van Deursen A, Visser J (2011b) Dependency profiles for software architecture evaluations. In: 27th IEEE international conference on software maintenance (ICSM), 2011. IEEE, pp 540– 543

  • Bouwers E, Deursen Av, Visser J (2013) Evaluating usefulness of software metrics: an industrial experience report. In: ICSE. IEEE Press, pp 921–930

  • Chatzigeorgiou A, Manakos A (2010) Investigating the evolution of bad smells in object-oriented code. In: 17th international conference on the quality of information and communications technology (QUATIC), 2010. IEEE, pp 106–115

  • D’Ambros M, Gall H, Lanza M, Pinzger M (2008) Analysing software repositories to understand software evolution. In: Software evolution. Springer, pp 37–67

  • Ducasse S, Pollet D (2009) Software architecture reconstruction: a process-oriented taxonomy. IEEE Trans Softw Eng 35(4):573–591

    Article  Google Scholar 

  • Eick SG, Graves TL, Karr AF, Marron JS, Mockus A (2001) Does code decay? Assessing the evidence from change management data. IEEE Trans Softw Eng 27(1):1–12

    Article  Google Scholar 

  • Garcia J, Popescu D, Mattmann C, Medvidovic N, Cai Y (2011) Enhancing architectural recovery using concerns. In: Proceedings of the 2011 26th IEEE/ACM international conference on automated software engineering. IEEE Computer Society, pp 552–555

  • Garcia J, Krka I, Medvidovic N, Douglas C (2012) A framework for obtaining the ground-truth in architectural recovery. In: Joint working IEEE/IFIP conference on software architecture (WICSA) and European conference on software architecture (ECSA), 2012. IEEE, pp 292–296

  • Garcia J, Ivkovic I, Medvidovic N (2013a) A comparative analysis of software architecture recovery techniques. In: IEEE/ACM 28th international conference on automated software engineering (ASE), 2013. IEEE, pp 486–496

  • Garcia J, Krka I, Mattmann C, Medvidovic N (2013b) Obtaining ground-truth software architectures. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 901–910

  • Ghezzi G, Gall HC (2013) Replicating mining studies with sofas. In: Proceedings of the 10th working conference on mining software repositories. IEEE Press, pp 363–372

  • Git (2014) Git log.

  • Git (2015) Github.

  • Godfrey MW, Tu Q (2000) Evolution in open source software: a case study. In: Proceedings of the international conference on software maintenance, 2000. IEEE, pp 131–142

  • Google (2015a) Google cloud platform.

  • Google (2015b) Guava.

  • Holt R, Pak JY (1996) Gase: visualizing software evolution-in-the-large. In: Proceedings of the 3rd working conference on reverse engineering, 1996. IEEE, pp 163–167

  • Kim M, Sazawal V, Notkin D, Murphy G (2005) An empirical study of code clone genealogies. In: ACM SIGSOFT software engineering notes, vol 30. ACM, pp 187–196

  • Koschke R (2005) What architects should know about reverse engineering and rengineering. In: Null. IEEE, pp 4–10

  • Koschke R (2009) Architecture reconstruction. In: Software engineering. Springer, pp 140–173

  • Kruchten PB (1995) The 4+ 1 view model of architecture. IEEE Softw 12(6):42–50

    Article  Google Scholar 

  • Langhammer M, Shahbazian A, Medvidovic N, Reussner R (2016) Automated extraction of rich software models from limited system information. In: Proceedings of the 13th working IEEE/IFIP conference on software architecture (WICSA). IEEE

  • Le DM, Behnamghader P, Garcia J, Link D, Shahbazian A, Medvidovic N (2015) An empirical study of architectural change in open-source software systems. In: Proceedings of the 12th working conference on mining software repository (MSR)

  • Le DM, Carrillo C, Capilla R, Medvidovic N (2016) Relating architectural decay and sustainability of software systems. In: Proceedings of the 13th working IEEE/IFIP conference on software architecture (WICSA). IEEE

  • Lehman MM (1980) Programs, life cycles, and laws of software evolution. Proc IEEE

  • Lutellier T, Chollack D, Garcia J, Tan L, Rayside D, Medvidovic N, Kroeger R (2015) Comparing software architecture recovery techniques using accurate dependencies. In: Proceedings of the 37th international conference on software engineering (ICSE 2015). Software Engineering in Practice Track

  • Mahajan S, Li B, Behnamghader P, Halfond WG (2016) Using visual symptoms for debugging presentation failures in web applications. In: Proceeding of the 9th IEEE international conference on software testing, verification, and validation (ICST)

  • Maqbool O, Babri H et al (2007) Hierarchical clustering for software architecture recovery. IEEE Trans Softw Eng 33(11):759–780

    Article  Google Scholar 

  • McCallum A (2002) Mallet: A machine learning for language toolkit

  • Medvidovic N (1996) Adls and dynamic architecture changes. In: Joint proceedings of the second international software architecture workshop (ISAW-2) and international workshop on multiple perspectives in software development (Viewpoints’ 96) on SIGSOFT’96 workshops. ACM, pp 24–27

  • Mengué O (2014) Svn graph branches.

  • Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1):32–38

    MathSciNet  Article  MATH  Google Scholar 

  • Murgia A, Concas G, Pinna S, Tonelli R, Turnu I (2009) Empirical study of software quality evolution in open source projects using agile practices. In: Proceedings of the 1st international symposium on emerging trends in software metrics, 2009.

  • Nakamura T, Basili VR (2005) Metrics of software architecture changes based on structural distance. In: 11th IEEE international symposium on software metrics, 2005. IEEE, pp 24–24

  • Oreizy P, Medvidovic N, Taylor RN (1998) Architecture-based runtime software evolution. In: Proceedings of the 20th international conference on Software engineering. IEEE Computer Society, pp 177–186

  • Perry DE, Wolf AL (1992) Foundations for the study of software architecture. ACM SIGSOFT Softw Eng Notes 17(4):40–52

    Article  Google Scholar 

  • PMD (2015) Pmd documentation.

  • Robles G (2010) Replicating msr: a study of the potential replicability of papers published in the mining software repositories proceedings. In: 7th IEEE working conference on mining software repositories (MSR), 2010. IEEE, pp 171–180

  • Shahbazian A, Edwards G, Medvidovic N (2016) An end-to-end domain specific modeling and analysis platform. In: IEEE/ACM 38th IEEE international conference on software engineering (ICSE), 2016. IEEE

  • Shirali S, Vasudeva HL (2005) Metric spaces. Springer Science & Business Media

  • Struts (2014) Struts wiki.

  • Taylor R, Medvidovic N, Dashofy E (2009) Software architecture: foundations, theory, and practice

    Book  Google Scholar 

  • Tu Q, Godfrey MW (2002) An integrated approach for studying architectural evolution. In: Proceedings of 10th international workshop on program comprehension, 2002. IEEE, pp 127–136

  • Tzerpos V, Holt RC (1999) Mojo: a distance metric for software clusterings. In: Proceedings of 6th working conference on reverse engineering, 1999. IEEE, pp 187–193

  • Tzerpos V, Holt RC (2000) Acdc: An algorithm for comprehension-driven clustering. In: wcre. IEEE, p 258

  • Van Deursen A, Hofmeister C, Koschke R, Moonen L, Riva C (2004) Symphony: view-driven software architecture reconstruction. In: Proceedings of the 4th working IEEE/IFIP conference on software architecture, 2004 (WICSA 2004). IEEE, pp 122–132

  • Wen Z, Tzerpos V (2004) An effectiveness measure for software clustering algorithms. In: Proceedings of the 12th IEEE international workshop on program comprehension, 2004. IEEE, pp 194–203

  • Wettel R, Lanza M (2008) Visual exploration of large-scale system evolution. In: 15th working conference on reverse engineering, 2008 (WCRE’08). IEEE, pp 219–228

  • Xing EP, Jordan MI, Russell S, Ng AY (2002) Distance metric learning with application to clustering with side-information. In: Advances in neural information processing systems, pp 505–512

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Pooyan Behnamghader.

Additional information

Communicated by: Romain Robbes, Martin Pinzger and Yasutaka Kamei

Pooyan Behnamghader and Duc Minh Le contributed equally to this work.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Behnamghader, P., Le, D.M., Garcia, J. et al. A large-scale study of architectural evolution in open-source software systems. Empir Software Eng 22, 1146–1193 (2017).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • Software architecture
  • Architectural change
  • Software evolution
  • Open-source software
  • Architecture recovery