Mining architectural violations from version history

Maffort, Cristiano; Valente, Marco Tulio; Terra, Ricardo; Bigonha, Mariza; Anquetil, Nicolas; Hora, André

doi:10.1007/s10664-014-9348-2

Mining architectural violations from version history

Published: 29 January 2015

Volume 21, pages 854–895, (2016)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Cristiano Maffort^1,2,
Marco Tulio Valente²,
Ricardo Terra³,
Mariza Bigonha²,
Nicolas Anquetil⁴ &
…
André Hora⁴

719 Accesses
10 Citations
1 Altmetric
Explore all metrics

Abstract

Software architecture conformance is a key software quality control activity that aims to reveal the progressive gap normally observed between concrete and planned software architectures. However, formally specifying an architecture can be difficult, as it must be done by an expert of the system having a high level understanding of it. In this paper, we present a lightweighted approach for architecture conformance based on a combination of static and historical source code analysis. The proposed approach relies on four heuristics for detecting absences (something expected was not found) and divergences (something prohibited was found) in source code based architectures. We also present an architecture conformance process based on the proposed approach. We followed this process to evaluate the architecture of two industrial-strength information systems, achieving an overall precision of 62.7 % and 53.8 %. We also evaluated our approach in an open-source information retrieval library, achieving an overall precision of 59.2 %. We envision that an heuristic-based approach for architecture conformance can be used to rapidly raise architectural warnings, without deeply involving experts in the process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Maintaining Security in Software Evolution

Exploring the suitability of source code metrics for indicating architectural inconsistencies

Article Open access 08 March 2018

Test-Driven Reuse: Key to Improving Precision of Search Engines for Software Reuse

Notes

In a depends predicate, the pattern _ (underscore) matches any value.
https://gforge.inria.fr/projects/verveinej.
To improve the paper’s comprehension, we translated the class names from Portuguese to English.
Mean precision is the average precision of the iterations evaluated by the architect, whereas Overall precision is the total number of true warnings by the total number of warnings.

References

Araujo JE, Souza S, Valente MT (2011) Study on the relevance of the warnings reported by Java bug-finding tools. IET Software 5 (4):366–374
Article Google Scholar
Baeza-Yates R, Ribeiro-Neto B (2011) Modern Information Retrieval: The Concepts and Technology Behind Search. Addison Wesley Professional
Bittencourt RA (2012) Enabling Static Architecture Conformance Checking of Evolving Software. Universidade Federal de Campina Grande, PhD thesis
Google Scholar
Brunet J, Guerreiro D, Figueiredo J (2011) Structural conformance checking with design tests: An evaluation of usability and scalability. In: 27th International Conference on Software Maintenance (ICSM), pp 143–152
Brunet J, Murphy GC, Serey D, Figueiredo J (2014) Five years of software architecture checking: A case study of Eclipse. IEEE Software:1–6
Copeland T (2005) PMD Applied. Centennial Books
Ducasse S, Anquetil N, Bhatti MU, Hora A, Laval J, Girba T (2011) MSE and FAMIX 3.0 an Interexchange Format and Source Code Model Family. Technical report. Software Composition Group - SCG, RMOD - INRIA Lille - Nord Europe
Google Scholar
Ducasse S, Pollet D (2009) Software architecture reconstruction: A process-oriented taxonomy. IEEE Trans Software Eng 35 (4):573–591
Article Google Scholar
Eichberg M, Kloppenburg S, Klose K, Mezini M (2008) Defining and continuous checking of structural program dependencies. In: 30th International Conference on Software Engineering (ICSE), pp 391–400
Fowler M (2002) Patterns of Enterprise Application Architecture. Addison-Wesley
Hammad M, Collard ML, Maletic JI (2009) Automatically identifying changes that impact code-to-design traceability. In: 17th IEEE/ACM International Conference on Program Comprehension (ICPC), pp 20–29
Holt RC (1998) Structural manipulations of software architecture using tarski relational algebra. In: 5th Working Conference on Reverse Engineering (WCRE), pp 210–219
Hora A, Anquetil N, Ducasse S, Valente MT (2013) Mining system specific rules from change patterns. In: 20th Working Conference on Reverse Engineering (WCRE), pp 1–10
Hou D, Hoover JH (2006) Using SCL to specify and check design intent in source code. IEEE Trans Software Eng 32(6):404–423
Article Google Scholar
Hovemeyer D, Pugh W (2004) Finding bugs is easy. SIGPLAN Notices 39(12):92–106
Article Google Scholar
Johnson SC (1977) Lint: A C program checker. Technical Report 65, Bell Laboratories
Kim S, Ernst MD (2007) Which warnings should I fix first?. In: 15th International Symposium on Foundations of Software Engineering (FSE), pp 45–54
Kim S, Pan K, Whitehead EEJ Jr. (2006) Memories of bug fixes. In: 14th International Symposium on Foundations of Software Engineering (FSE), pp 35–45
Knodel J, Popescu D (2007) A comparison of static architecture compliance checking approaches. In: 6th Working IEEE/IFIP Conference on Software Architecture (WICSA), p 12
Koschke R (2010) Incremental reflexion analysis. In: 14th European Conference on Software Maintenance and Reengineering (CSMR), pp 1–10
Koschke R, Simon D (2003) Hierarchical reflexion models. In: 10th Working Conference on Reverse Engineering (WCRE), pp 36–45
Larus JR, Ball T, Das M, DeLine R, Fahndrich M, Pincus J, Rajamani SK., Venkatapathy R (2004) Righting software. IEEE Software 21(3):92–100
Article Google Scholar
Livshits B, Zimmermann T (2005) DynaMine: finding common error patterns by mining software revision histories. In: 13th International Symposium on Foundations of Software Engineering (FSE), pp 296–305
Maffort C, Valente MT, Anquetil N, Hora A, Bigonha M (2013) Heuristics for discovering architectural violations. In: 20th Working Conference on Reverse Engineering (WCRE), pp 222–231
Maffort C, Valente MT, Bigonha M, Hora A, Anquetil N (2013) Mining architectural patterns using association rules. In: 25th International Conference on Software Engineering and Knowledge Engineering (SEKE), pp 375–380
Mens K, Kellens A, Pluquet F, Wuyts Roel (2006) Co-evolving code and design with intensional views A case study. Comput Languages Syst & Struct 32(2-3):140–156
Article MATH Google Scholar
Mileva YM, Wasylkowski A, Zeller A (2011) Mining evolution of object usage. In: 25th European conference on Object-oriented programming, pp 105–129
Murphy G, Notkin D, Sullivan K (1995) Software reflexion models Bridging the gap between source and high-level models. In: 3rd Symposium on Foundations of Software Engineering (FSE), pp 18–28
Murphy G, Notkin D, Sullivan KJ (2001) Software reflexion models bridging the gap between design and implementation. IEEE Trans Software Eng 27:364–380
Article Google Scholar
Nguyen TT, Nguyen HA, Pham NH, Al-Kofahi J, Nguyen TN (2010) Recurring bug fixes in object-oriented programs. In: 32nd International Conference on Software Engineering (ICSE), pp 315–324
Nierstrasz O, Ducasse S, Gı̌rba T (2005) The story of Moose: an agile reengineering environment. In: Foundations of Software Engineering (FSE), pp 1–10
Palomba F, Bavota G, Penta MD, Oliveto R, Lucia AD, Poshyvanyk D (2013) Detecting bad smells in source code using change history information. In: 28th International Conference on Automated Software Engineering (ASE), pp 268–278
Passos L, Terra R, Diniz R, Valente MT, Mendonca N (2010) Static architecture-conformance checking An illustrative overview. IEEE Software 27(5):82–89
Article Google Scholar
Perry DE, Wolf AL (1992) Foundations for the study of software architecture. Software Eng Notes 17(4):40–52
Article Google Scholar
Sarkar S, Maskeri G, Ramachandran S (2009) Discovery of architectural layers and measurement of layering violations in source code. J Syst Software 82:1891–1905
Article Google Scholar
Santonu S, Ramachandran S, Kumar GS, Madhu K, Iyengar K, Rangarajan, Sivagnanam S (2009) Modularization of a large-scale business application: A case study. IEEE Software 26:28–35
Article Google Scholar
Silva L, Valente MT, Maia M (2014) Assessing modularity using co-change clusters. In: 13th International Conference on Modularity, pp 49–60
Terra R, Valente MT (2009) A dependency constraint language to manage object-oriented software architectures. Software: Practice and Experience 32(12):1073–1094
Google Scholar
Terra R, Valente MT, Czarnecki K, Bigonha R (2012) Recommending refactorings to reverse software architecture erosion. In: 16th European Conference on Software Maintenance and Reengineering (CSMR), pp 335–340
Terra R, Valente MT, Czarnecki K, Bigonha RS (2013) A recommendation system for repairing violations detected by static architecture conformance checking. Software: Practice and Experience:1–36

Download references

Acknowledgments

Our research is supported by CAPES, FAPEMIG, and CNPq. We thank the architects of the SGA and M2M systems for validating the warnings raised by the proposed approach.

Author information

Authors and Affiliations

Department of Computer Engineering, CEFET-MG, Minas Gerais, Brazil
Cristiano Maffort
Department of Computer Science, UFMG, Minas Gerais, Brazil
Cristiano Maffort, Marco Tulio Valente & Mariza Bigonha
Department of Computer Science, UFLA, Lavras, Brazil
Ricardo Terra
RMoD Team, INRIA, Lille Nord Europe, France
Nicolas Anquetil & André Hora

Authors

Cristiano Maffort
View author publications
You can also search for this author in PubMed Google Scholar
Marco Tulio Valente
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Terra
View author publications
You can also search for this author in PubMed Google Scholar
Mariza Bigonha
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Anquetil
View author publications
You can also search for this author in PubMed Google Scholar
André Hora
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cristiano Maffort.

Additional information

Communicated by: Rocco Oliveto, Massimiliano Di Penta and Romain Robbes

Appendices

Appendix A: Formal Definition

In this appendix section, we describe the heuristics proposed by ArchLint.

1.1 Notation

The definition of the heuristics relies on the following notation:

$\mathcal {C}=\{c_{1}, c_{2}, ..., c_{n}\}$ is the set of all classes in the system under analysis.
C P={c p ₁,c p ₂,...,c p _n} is the set of components in the high-level component model.
d e p e n d s(c ₁,c ₂,t,v) indicates that class c ₁ has a dependency of type t with class c ₂ in a given version v.
c o m p(c) is the component cp of a class c.
m o d(c) is the module m of a class c.
fi r s t(c) is the version in which class c was originally inserted in the repository.
H is the identifier of the last version of the system in the repository.

In a depends predicate, the pattern _ (underscore) matches any value. For example,

d e p e n d s(c ₁,c ₂,_,_) indicates that class c ₁ depends on class c ₂, despite the dependency type and the version.

1.2 Detecting Absences

D e p C o m p C l a s s(c,t,c p) is the set of classes in a component cp that—in the current version of the system—have a dependency of type t with a class c, as follows:

$$\begin{array}{@{}rcl@{}} DepCompClass(c,t,cp) = \left\{ x\in \mathcal{C}\ |\ depends(x,x,t,H) \wedge comp(x) = cp \right\} \end{array} $$

C l a s s C o m p(c p) is the set of classes in the component cp, as follows:

$$\begin{array}{@{}rcl@{}} ClassComp(cp) = \left\{ x \in \mathcal{C} \ | \ comp(x) = cp \right\} \end{array} $$

D e p S c a R a t e(c,t,c p) is the ratio between (i) the number of classes in component cp that have a dependency of type t with a target class c and (ii) the total number of classes in component cp, as follows:

$$\begin{array}{@{}rcl@{}} DepScaRate(c,t,cp)=\frac{|DepCompClass(c,t,cp)|}{|ClassComp(cp)|} \end{array} $$

C r e a t e d W i t h o u t D e p(c,t,c p) is the set of classes of a component cp that were committed in the repository for the first time without a dependency of type t with a target class c, as defined next:

$$\begin{array}{@{}rcl@{}} CreatedWithoutDep(c,t,cp) = \left\{x \in \mathcal{C}\ |\ comp(x) = cp\ \wedge \neg depends(x,c,t,first(x))\right\} \end{array} $$

D e p A d d(c,t,c p) is the set of classes in component cp initially created without a dependency of type t with a target class c but that later were maintained to include this dependency, as follows:

$$\begin{array}{@{}rcl@{}} DepAdd(c,t,cp) = \left\{x \in CreatedWithoutDep(c,t,cp)\ |\ depends(x,c,t,H)\right\} \end{array} $$

D e p I n s R a t e(c,t,c p) is the ratio between (i) the number of classes in the component cp originally created without a dependency of type t with a target class c, but that have this dependency in the last version of the system under analysis, and (ii) the total number of classes in component cp originally created without a dependency of type t with class c, as follows:

$$\begin{array}{@{}rcl@{}} DepInsRate(c,t,cp) = \frac{|DepAdd(c,t,cp)|}{|CreatedwithoutDep(c,t,p)|} \end{array} $$

Finally, the candidates for absences in a component cp are defined as follows:

$$\begin{array}{@{}rcl@{}} Absences(cp) = \left\{(x,c,t) \ |\ comp(x) = cp\ \wedge \neg depends(x,c,t,H)\ \wedge \right. \\ DepScaRate(c,t,p) \geq A_{sca}\ \wedge \\ \left. DepScaRate(c,t,p) \geq A_{ins}\right\} \end{array} $$

1.3 Detecting Divergences

1.3.1 Heuristic #1

D e p S y s M o d(m) is the set of classes in the current version of the system that have a dependency with classes of a module m, as follows:

$$\begin{array}{@{}rcl@{}} DepSysMod(m) = \left\{x \in \mathcal{C}\ |\ depends(x,c,\_,H)\ \wedge\ mod(c)= m\right\} \end{array} $$

D e p C o m p M o d(m,c p) is the set of classes in component cp that have a dependency with a module m, as defined next:

$$\begin{array}{@{}rcl@{}} DepCompMod(m,cp) = \left\{ x \in DepSysMod(m)\ |\ comp(x) = cp\right\} \end{array} $$

D e p S c a R a t e(m,c p) is the ratio between (i) the number of classes in component cp that have a dependency with a module m and (ii) the total number of classes in the current version of the system that have a dependency with classes of m, as follows:

$$\begin{array}{@{}rcl@{}} DepScaRate(m,cp) = \frac{|DepCompMod(m,cp|)}{|DepSysMod(m)|} \end{array} $$

D e p A d d A n y(m,c p) is the set of classes in component cp that have established—in any version of the system—a dependency with a class in module m, as defined next:

$$\begin{array}{@{}rcl@{}} DepAddAny(m,cp) = \left\{x \in \mathcal{C}\ |\ comp(x) = cp\ \wedge\ depends(x,c,\_,\_)\ \wedge mod(c) = m \right\} \end{array} $$

D e p D e l(m,c p) is the set of classes returned by D e p A d d A n y(m,c p) that in the current version of the system no longer have a dependency with classes in module m, as defined next:

$$\begin{array}{@{}rcl@{}} DepDel(m,cp) = \left\{x \in DepAddAny(m,cp)\ |\ \neg depends(x,c,\_H)\ \wedge \mod(c) = m\right\} \end{array} $$

D e p D e l R a t e(m,c p) is the ratio between (i) the number of classes in component cp that no longer have a dependency with classes in module m and (ii) the total number of classes in component cp that have established a dependency with any class in module m, as defined next:

$$\begin{array}{@{}rcl@{}} DepDelRate(m,cp)= \frac{|DepDel(m,cp)|}{|DepAddAny(m,cp)|} \end{array} $$

H e a v y U s e r(m) is a function that returns the component whose classes mostly depend on classes located in module m, i.e., the component cp that provides the following maximal value:

$$\underset{\forall cp\ \in \ CP}{\max} \left(\frac{|DepCompMod(m,cp)|}{|DepSysMod(m)|}\right) $$

However, this maximal value must be greater than 0.5. Otherwise, the function H e a v y U s e r returns null.

Finally, the candidates for divergences in a given component cp are defined as follows:

$$\begin{array}{@{}rcl@{}} Div_{1}(cp) = \lbrace \ (x, c) & |\ \ comp(x) = cp\ \wedge\ mod(c) = m\ \wedge\ &depends(x, c,\!\_,H)\ \wedge \\ &{}DepScaRate(m, cp)\ \leq\ D_{sca} \wedge \\ &{\kern-2.5pt}DepDelRate(m, cp)\ \geq\ D_{del} \wedge \\ &{\kern-2.9pc}HeavyUser(m)\ \neq\ cp \ \rbrace \\ \end{array} $$

1.3.2 Heuristic #2

D e p A d d A n y(c,t,c p) is the set of classes in component cp that have established—in any version of the system—a dependency of type t with a class c, as defined next:

$$\begin{array}{@{}rcl@{}} DepAdd(c,t,cp) = \left\{x \in CreatedWithoutDep(c,t,cp)\ |\ depends(x,c,t,H)\right\} \end{array} $$

D e p D e l(c,t,c p) is the set of classes returned by D e p A d d A n y(c,t,c p) that no longer have a dependency of type t with a class c (i.e., the dependencies were removed), as defined next:

$$DepDel(c,t,cp)\,=\, \lbrace \; x \in DepAddAny(c,t,cp) \, | \, comp(x)\! =\! cp \ \wedge \ \neg depends(x, c, t,H) \; \rbrace $$

Additionally, D e p D e l R a t e(c,t,c p) is the ratio between (i) the number of classes in component cp that no longer have a dependency of type t with a class c, and (ii) the total number of classes in component cp that have established a dependency of type t with a class c, as defined next:

$$DepDelRate(c,t,cp) = \frac{|DepDel(c,t,cp)|}{|DepAddAny(c,t,cp)|} $$

Finally, the candidates for divergences in a given component cp are defined as follows:

$$\begin{array}{@{}rcl@{}} Div_{2}(cp) = \lbrace & (x, c, t)\; | \;comp(x) = cp \ \wedge \ depends(x, c, t,H) \wedge\\ &{\kern.1pc}DepScaRate(c,t,cp) \leq D_{sca} \wedge\\ &{\kern-.2pc}DepDelRate(c,t,cp) \geq D_{del} \;\rbrace \end{array} $$

1.3.3 Heuristic #3

This heuristic assumes that r f(c p ₁,c p ₂) denotes the number of references from classes in component c p ₁ to classes in component c p ₂, as defined next:

D e p D i r W e i g h t(c p ₁,c p ₂) is defined as follows:

$$\begin{array}{@{}rcl@{}} DepDirWeight(cp_{1},cp_{2})=\frac{rf(cp_{1},cp_{2})}{rf(cp_{1},cp_{2})+rf(cp_{2},cp_{1})} \end{array} $$

Finally, the candidates for divergences in a given component cp are defined as follows:

$$\begin{array}{@{}rcl@{}} Div_{3}(cp_{1}) = \left\{ (x,c)\ |\ comp(x) = cp_{1}\ \wedge comp(c) = cp_{2}\ \wedge cp_{1} \neq cp_{2}\ \wedge \right. \\ depends(x,c,\_,H)\ \wedge \\ \left. D_{dir} \leqslant DepDirWeight(cp_{1},cp_{2}) < 0.5 \right\} \end{array} $$

Appendix B: M2M Conformance Process

In this section, we show the results achieved after each iteration when detecting architectural violations in the M2M system. Table 14 shows the iterations performed for detecting absences. Tables 15 and 16 shows the results achieved by the second and third heuristics for detecting divergences, respectively. Heuristic #1 for divergences did not report warnings in the M2M system.

Table 14 Detecting absences in the M2M system

Full size table

Table 15 Detecting divergences in the M2M system using Heuristic #2

Full size table

Table 16 Detecting divergences in the M2M system using Heuristic #3

Full size table

Appendix C: Lucene Conformance Process

In this section, we show the results achieved after each iteration when detecting architectural violations in the Lucene system. Tables 17, 18, and 19 shows the results achieved by the first, second and third heuristics for detecting divergences, respectively. The heuristic for absences did not report warnings in the Lucene system.

Table 17 Detecting divergences in Lucene using Heuristic #1

Full size table

Table 18 Detecting divergences in Lucene using Heuristic #2

Full size table

Table 19 Detecting divergences in Lucene using Heuristic #3

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maffort, C., Valente, M.T., Terra, R. et al. Mining architectural violations from version history. Empir Software Eng 21, 854–895 (2016). https://doi.org/10.1007/s10664-014-9348-2

Download citation

Published: 29 January 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10664-014-9348-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining architectural violations from version history

Abstract

Access this article

Similar content being viewed by others

Maintaining Security in Software Evolution

Exploring the suitability of source code metrics for indicating architectural inconsistencies

Test-Driven Reuse: Key to Improving Precision of Search Engines for Software Reuse

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Formal Definition

1.1 Notation

1.2 Detecting Absences

1.3 Detecting Divergences

1.3.1 Heuristic #1

1.3.2 Heuristic #2

1.3.3 Heuristic #3

Appendix B: M2M Conformance Process

Appendix C: Lucene Conformance Process

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining architectural violations from version history

Abstract

Access this article

Similar content being viewed by others

Maintaining Security in Software Evolution

Exploring the suitability of source code metrics for indicating architectural inconsistencies

Test-Driven Reuse: Key to Improving Precision of Search Engines for Software Reuse

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Formal Definition

1.1 Notation

1.2 Detecting Absences

1.3 Detecting Divergences

1.3.1 Heuristic #1

1.3.2 Heuristic #2

1.3.3 Heuristic #3

Appendix B: M2M Conformance Process

Appendix C: Lucene Conformance Process

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation