A large-scale study of architectural evolution in open-source software systems

Abstract

From its very inception, the study of software architecture has recognized architectural decay as a regularly occurring phenomenon in long-lived systems. Architectural decay is caused by repeated, sometimes careless changes to a system during its lifespan. Despite decay’s prevalence, there is a relative dearth of empirical data regarding the nature of architectural changes that may lead to decay, and of developers’ understanding of those changes. In this paper, we take a step toward addressing that scarcity by introducing an architecture recovery framework, ARCADE, for conducting large-scale replicable empirical studies of architectural change across different versions of a software system. ARCADE includes two novel architectural change metrics, which are the key to enabling large-scale empirical studies of architectural change. We utilize ARCADE to conduct an empirical study of changes found in software architectures spanning several hundred versions of 23 open-source systems. Our study reveals several new findings regarding the frequency of architectural changes in software systems, the common points of departure in a system’s architecture during the system’s maintenance and evolution, the difference between system-level and component-level architectural change, and the suitability of a system’s implementation-level structure as a proxy for its architecture.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Notes

  1. 1.

    1 The current version of ARCADE (ARCADE 2015) also analyzes and quantifies different symptoms of architectural decay for a given system. However, these features are currently under evaluation and are outside the scope of this paper.

References

  1. Agnew B, Hofmeister C, Purtilo J (1994) Planning for change: a reconfiguration language for distributed systems. Distrib Syst Eng 1(5):313

    Article  Google Scholar 

  2. Amazon (2015) Amazon command line interface. https://aws.amazon.com/cli/

  3. Apache (2014a) Apache portable runtime versioning. http://apr.apache.org/versioning.html

  4. Apache (2014b) Hadoop releases. http://hadoop.apache.org/releases.html#News

  5. Apache (2014c) Lucene wiki. http://en.wikipedia.org/wiki/Lucene

  6. Apache (2015a) Apache ant. http://ant.apache.org/

  7. Apache (2015b) Apache maven. http://maven.apache.org/

  8. ARCADE (2015) arcade:start [USC SoftArch Wiki]. http://softarch.usc.edu/wiki/doku.php?id=arcade:start

  9. Bitbucket (2015) Bitbucket. https://bitbucket.org

  10. Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84

    Article  Google Scholar 

  11. Bouwers E, Correia JP, van Deursen A, Visser J (2011a) Quantifying the analyzability of software architectures. In: 9th working IEEE/IFIP conference on software architecture (WICSA), 2011. IEEE, pp 83–92

  12. Bouwers E, van Deursen A, Visser J (2011b) Dependency profiles for software architecture evaluations. In: 27th IEEE international conference on software maintenance (ICSM), 2011. IEEE, pp 540– 543

  13. Bouwers E, Deursen Av, Visser J (2013) Evaluating usefulness of software metrics: an industrial experience report. In: ICSE. IEEE Press, pp 921–930

  14. Chatzigeorgiou A, Manakos A (2010) Investigating the evolution of bad smells in object-oriented code. In: 17th international conference on the quality of information and communications technology (QUATIC), 2010. IEEE, pp 106–115

  15. D’Ambros M, Gall H, Lanza M, Pinzger M (2008) Analysing software repositories to understand software evolution. In: Software evolution. Springer, pp 37–67

  16. Ducasse S, Pollet D (2009) Software architecture reconstruction: a process-oriented taxonomy. IEEE Trans Softw Eng 35(4):573–591

    Article  Google Scholar 

  17. Eick SG, Graves TL, Karr AF, Marron JS, Mockus A (2001) Does code decay? Assessing the evidence from change management data. IEEE Trans Softw Eng 27(1):1–12

    Article  Google Scholar 

  18. Garcia J, Popescu D, Mattmann C, Medvidovic N, Cai Y (2011) Enhancing architectural recovery using concerns. In: Proceedings of the 2011 26th IEEE/ACM international conference on automated software engineering. IEEE Computer Society, pp 552–555

  19. Garcia J, Krka I, Medvidovic N, Douglas C (2012) A framework for obtaining the ground-truth in architectural recovery. In: Joint working IEEE/IFIP conference on software architecture (WICSA) and European conference on software architecture (ECSA), 2012. IEEE, pp 292–296

  20. Garcia J, Ivkovic I, Medvidovic N (2013a) A comparative analysis of software architecture recovery techniques. In: IEEE/ACM 28th international conference on automated software engineering (ASE), 2013. IEEE, pp 486–496

  21. Garcia J, Krka I, Mattmann C, Medvidovic N (2013b) Obtaining ground-truth software architectures. In: Proceedings of the 2013 international conference on software engineering. IEEE Press, pp 901–910

  22. Ghezzi G, Gall HC (2013) Replicating mining studies with sofas. In: Proceedings of the 10th working conference on mining software repositories. IEEE Press, pp 363–372

  23. Git (2014) Git log. http://git-scm.com/docs/git-log

  24. Git (2015) Github. https://github.com

  25. Godfrey MW, Tu Q (2000) Evolution in open source software: a case study. In: Proceedings of the international conference on software maintenance, 2000. IEEE, pp 131–142

  26. Google (2015a) Google cloud platform. https://cloud.google.com

  27. Google (2015b) Guava. https://code.google.com/p/guava-libraries/

  28. Holt R, Pak JY (1996) Gase: visualizing software evolution-in-the-large. In: Proceedings of the 3rd working conference on reverse engineering, 1996. IEEE, pp 163–167

  29. Kim M, Sazawal V, Notkin D, Murphy G (2005) An empirical study of code clone genealogies. In: ACM SIGSOFT software engineering notes, vol 30. ACM, pp 187–196

  30. Koschke R (2005) What architects should know about reverse engineering and rengineering. In: Null. IEEE, pp 4–10

  31. Koschke R (2009) Architecture reconstruction. In: Software engineering. Springer, pp 140–173

  32. Kruchten PB (1995) The 4+ 1 view model of architecture. IEEE Softw 12(6):42–50

    Article  Google Scholar 

  33. Langhammer M, Shahbazian A, Medvidovic N, Reussner R (2016) Automated extraction of rich software models from limited system information. In: Proceedings of the 13th working IEEE/IFIP conference on software architecture (WICSA). IEEE

  34. Le DM, Behnamghader P, Garcia J, Link D, Shahbazian A, Medvidovic N (2015) An empirical study of architectural change in open-source software systems. In: Proceedings of the 12th working conference on mining software repository (MSR)

  35. Le DM, Carrillo C, Capilla R, Medvidovic N (2016) Relating architectural decay and sustainability of software systems. In: Proceedings of the 13th working IEEE/IFIP conference on software architecture (WICSA). IEEE

  36. Lehman MM (1980) Programs, life cycles, and laws of software evolution. Proc IEEE

  37. Lutellier T, Chollack D, Garcia J, Tan L, Rayside D, Medvidovic N, Kroeger R (2015) Comparing software architecture recovery techniques using accurate dependencies. In: Proceedings of the 37th international conference on software engineering (ICSE 2015). Software Engineering in Practice Track

  38. Mahajan S, Li B, Behnamghader P, Halfond WG (2016) Using visual symptoms for debugging presentation failures in web applications. In: Proceeding of the 9th IEEE international conference on software testing, verification, and validation (ICST)

  39. Maqbool O, Babri H et al (2007) Hierarchical clustering for software architecture recovery. IEEE Trans Softw Eng 33(11):759–780

    Article  Google Scholar 

  40. McCallum A (2002) Mallet: A machine learning for language toolkit

  41. Medvidovic N (1996) Adls and dynamic architecture changes. In: Joint proceedings of the second international software architecture workshop (ISAW-2) and international workshop on multiple perspectives in software development (Viewpoints’ 96) on SIGSOFT’96 workshops. ACM, pp 24–27

  42. Mengué O (2014) Svn graph branches. https://code.google.com/p/svn-graph-branches/

  43. Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1):32–38

    MathSciNet  Article  MATH  Google Scholar 

  44. Murgia A, Concas G, Pinna S, Tonelli R, Turnu I (2009) Empirical study of software quality evolution in open source projects using agile practices. In: Proceedings of the 1st international symposium on emerging trends in software metrics, 2009. Lulu.com

  45. Nakamura T, Basili VR (2005) Metrics of software architecture changes based on structural distance. In: 11th IEEE international symposium on software metrics, 2005. IEEE, pp 24–24

  46. Oreizy P, Medvidovic N, Taylor RN (1998) Architecture-based runtime software evolution. In: Proceedings of the 20th international conference on Software engineering. IEEE Computer Society, pp 177–186

  47. Perry DE, Wolf AL (1992) Foundations for the study of software architecture. ACM SIGSOFT Softw Eng Notes 17(4):40–52

    Article  Google Scholar 

  48. PMD (2015) Pmd documentation. http://pmd.sourceforge.net

  49. Robles G (2010) Replicating msr: a study of the potential replicability of papers published in the mining software repositories proceedings. In: 7th IEEE working conference on mining software repositories (MSR), 2010. IEEE, pp 171–180

  50. Shahbazian A, Edwards G, Medvidovic N (2016) An end-to-end domain specific modeling and analysis platform. In: IEEE/ACM 38th IEEE international conference on software engineering (ICSE), 2016. IEEE

  51. Shirali S, Vasudeva HL (2005) Metric spaces. Springer Science & Business Media

  52. Struts (2014) Struts wiki. http://en.wikipedia.org/wiki/Apache_Struts

  53. Taylor R, Medvidovic N, Dashofy E (2009) Software architecture: foundations, theory, and practice

    Google Scholar 

  54. Tu Q, Godfrey MW (2002) An integrated approach for studying architectural evolution. In: Proceedings of 10th international workshop on program comprehension, 2002. IEEE, pp 127–136

  55. Tzerpos V, Holt RC (1999) Mojo: a distance metric for software clusterings. In: Proceedings of 6th working conference on reverse engineering, 1999. IEEE, pp 187–193

  56. Tzerpos V, Holt RC (2000) Acdc: An algorithm for comprehension-driven clustering. In: wcre. IEEE, p 258

  57. Van Deursen A, Hofmeister C, Koschke R, Moonen L, Riva C (2004) Symphony: view-driven software architecture reconstruction. In: Proceedings of the 4th working IEEE/IFIP conference on software architecture, 2004 (WICSA 2004). IEEE, pp 122–132

  58. Wen Z, Tzerpos V (2004) An effectiveness measure for software clustering algorithms. In: Proceedings of the 12th IEEE international workshop on program comprehension, 2004. IEEE, pp 194–203

  59. Wettel R, Lanza M (2008) Visual exploration of large-scale system evolution. In: 15th working conference on reverse engineering, 2008 (WCRE’08). IEEE, pp 219–228

  60. Xing EP, Jordan MI, Russell S, Ng AY (2002) Distance metric learning with application to clustering with side-information. In: Advances in neural information processing systems, pp 505–512

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Pooyan Behnamghader.

Additional information

Pooyan Behnamghader and Duc Minh Le contributed equally to this work.

Communicated by: Romain Robbes, Martin Pinzger and Yasutaka Kamei

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Behnamghader, P., Le, D.M., Garcia, J. et al. A large-scale study of architectural evolution in open-source software systems. Empir Software Eng 22, 1146–1193 (2017). https://doi.org/10.1007/s10664-016-9466-0

Download citation

Keywords

  • Software architecture
  • Architectural change
  • Software evolution
  • Open-source software
  • Architecture recovery