Characteristics of multiple-component defects and architectural hotspots: a large system case study

Li, Zude; Madhavji, Nazim H.; Murtaza, Syed Shariyar; Gittens, Mechelle; Miranskyy, Andriy V.; Godwin, David; Cialini, Enzo

doi:10.1007/s10664-011-9155-y

Characteristics of multiple-component defects and architectural hotspots: a large system case study

Published: 19 February 2011

Volume 16, pages 667–702, (2011)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Zude Li^1,2,
Nazim H. Madhavji²,
Syed Shariyar Murtaza²,
Mechelle Gittens²^nAff3,
Andriy V. Miranskyy⁴,
David Godwin⁴ &
…
Enzo Cialini⁴

448 Accesses
14 Citations
Explore all metrics

Abstract

The architecture of a large software system is widely considered important for such reasons as: providing a common goal to the stakeholders in realising the envisaged system; helping to organise the various development teams; and capturing foundational design decisions early in the development. Studies have shown that defects originating in system architectures can consume twice as much correction effort as that for other defects. Clearly, then, scientific studies on architectural defects are important for their improved treatment and prevention. Previous research has focused on the extent of architectural defects in software systems. For this paper, we were motivated to ask the following two complementary questions in a case study: (i) How do multiple-component defects (MCDs)—which are of architectural importance—differ from other types of defects in terms of (a) complexity and (b) persistence across development phases and releases? and (ii) How do highly MCD-concentrated components (the so called, architectural hotspots) differ from other types of components in terms of their (a) interrelationships and (b) persistence across development phases and releases? Results indicate that MCDs are complex to fix and are persistent across phases and releases. In comparison to a non-MCD, a MCD requires over 20 times more changes to fix it and is 6 to 8 times more likely to cross a phase or a release. These findings have implications for defect detection and correction. Results also show that 20% of the subject system’s components contain over 80% of the MCDs and that these components are 2–3 times more likely to persist across multiple system releases than other components in the system. Such MCD-concentrated components constitute architectural “hotspots” which management can focus upon for preventive maintenance and architectural quality improvement. The findings described are from an empirical study of a large legacy software system of size over 20 million lines of code and age over 17 years.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Zones of Pain: Visualising the Relationship Between Software Architecture and Defects

A large-scale study of architectural evolution in open-source software systems

Article 07 November 2016

Pooyan Behnamghader, Duc Minh Le, … Nenad Medvidovic

No Code Anomaly is an Island

Notes

We do not exclude the possibility of a single-component architectural defect; however, this is not the focus of this paper.
The Eclipse project website: http://www.eclipse.org (last access in January 2011).
The Mozilla project website: http://www.mozilla.org (last access in January 2011).
There are at least two ways to log MCDs in a defect database. The first is as described in this study—based on parent and children references fields. The second is through change logs, which would describe changes made to multiple components for a given defect fix. We believe that change logs are more common than parent-children relationship logs.
The criterion to use the “top 20%” is based on our analysis that the top 20% of the system’s components contain over 80% of MCDs (see Section 4.3). Note that in a “very clean” software system (i.e., one with relatively few MCDs), there are still architectural hotspots based on the “top 20%” criterion relative to the quality of such a system. Likewise, in a “very defective” system, the “top 20%” criterion would identify over 80% of the MCDs. Thus, the quality of the system has no bearing on the “top 20%” criterion used.
While many defect attributes logged are common to that found in the literature (Chillarege et al. 1992). The use of parent-children references, to our knowledge, are not known in the literature.
Pearson correlation coefficient is a measure of the correlation between two arrays of data, ranging from −1 to +1. A value of 1 indicates a positive linear correlation, a value of −1 indicates a negative linear correlation; and a value of 0 or near 0 indicates no significant correlation.
In the system, approx. 70% of DPCs (i.e., the top 20% most-defective components) are architectural hotspots, and approx. 70% of hotspots are DPCs. Thus, the DPCs and hotspots have a significant overlap.
In our earlier paper (Li et al. 2009), we used the term “pervasive”; however, we consider the term “insidious” as a better fit.
We cannot be definitive about the “difficulty” of fixing MCDs because this would require detailed knowledge of fixing these MCDs—the data which was not recorded in the historical database.
For a given component (e.g., C3), the total number of MCDs on the fix relationships of that component will be greater than or equal to the number of MCDs in the component. However this figure only shows the fix relationships among the components that the compiler contains. The compiler is a part of the whole, much larger, system. There are other fix relationships between C3 and other components that are outside the compiler. These fix relationships are not shown in Fig. 2. Thus, for component C3, the comparison “(54 + 2 = 56) < 84” is only a partial view.
Note that this finding indicates that some hotspots do not have DPFRs.
A rigorous routine for the subject system is to first record a defect in the defect-tracking database, then fix this defect, and finally update the fields of the defect record. Thus, it can be said that all the defects fixed in the subject system have first been recorded in the defect-tracking database.
SWEBOK stands for IEEE’s Software Engineering Body of Knowledge; see www.swebok.org.

References

Abdelmoez W, Shereshevsky M, Gunnalan R, Bogazzi BYS, Korkmaz M, Ammar HH (2004) Software architectures change propagation tool (sacpt). In: Proc. of the 20th int’l conf. on software maintenance (ICSM’04). Chicago, USA, pp 517–517
Adams EN (1984) Optimizing preventive service of software products. IBM Res J 28(1):2–14
Article Google Scholar
Andersson C, Runeson P (2007) A replicated quantitative analysis of fault distribution in complex software systems. IEEE Trans Softw Eng 33(5):273–286
Article Google Scholar
Andrews A, Stringfellow C (2001) Quantitative analysis of development defects to guide testing: a case study. Softw Qual J 9:195–214
Article Google Scholar
Bachmann F, Bass L, Klein M (2003) Preliminary design of arche: a software architecture design assistant. Technical Report, cmu/sei-2003-tr-021; via: http://www.sei.cmu.edu/reports/03tr021.pdf
Basili VR, Perricone BT (1984) Software errors and complexity: an empirical investigation. Commun ACM 27(1):42–52
Article Google Scholar
Basili VR, Shull F (2005) Evolving defect “folklore”: a cross-study analysis of software defect behavior. In: Proc. of the int’l software process workshop (ISPW’05). Beijing, China, pp 1–9
Bass L, Berenbach B (2008) Leadership and management in software architecture (lmsa’08)—a report on an icse workshop. Leipzig, Germany
Bass L, Clements P, Kazman R (2003) Software architecture in practice, 2nd edn. Addison-Wesley Professional
Bertolino A, Inverardi P (1996) Architecture-based software testing. In: Proc. of the SIGSOFT 96 workshop. California, USA, pp 62–64
Boehm B, Basili VR (2001) Software defect reduction top 10 list. Computer 34(1):135–137
Article Google Scholar
Booch G (2007) The economics of architecture-first. IEEE Softw 24(5):18–20
Article Google Scholar
Booch G (2008) Nine things you can do with old software. IEEE Softw 25(5):93–94
Article Google Scholar
Chikofsky EJ, Cross II JH (1990) Reverse engineering and design recovery: a taxonomy. IEEE Softw 7(1):13–17
Article Google Scholar
Chillarege R, Bhandari IS, Chaar JK, Halliday MJ, Moebus DS, Ray BK, Wong MY (1992) Orthogonal defect classification—a concept for in-process measurements. IEEE Trans Softw Eng 18(11):943–956
Article Google Scholar
Clements PC, Northrop LM (1996) Software architecture: an executive overview. Technical Report, CMU/SEI-96-TR-003
Clements P, Bachmann F, Bass L, Garlan D, Ivers J, Little R, Nord R, Stafford J (2002) Documenting software architectures: views and beyond. Addison-Wesley Professional
Compton BT, Withrow C (1990) Prediction and control of ada software defects. J Syst Softw 12(3):199–207
Article Google Scholar
Creswell JW (2002) Research design: qualitative, quantitative, and mixed methods approaches, 2nd edn. Sage Publications
Dobrica L, Niemela E (2002) A survey of software architecture analysis methods. IEEE Trans Softw Eng 28(7):638–653
Article Google Scholar
Ebert C (1997) Experiences with criticality predictions in software development. ACM SIGSOFT Softw Eng Notes 22(6):278–293
Article Google Scholar
Ebert C, Dumke R, Bundschuh M, Schimietendorf A (2005) Defect detection and quality improvement. In: “Best practices in software measurement”. Springer, pp 133–156
El-Ramly M (2006) Experience in teaching a software reengineering course. In: Proc. of the 28th int’l conf. on software engineering (ICSE’06). Shanghai, China, pp 699–702
Endres A (1975) An analysis of errors and their causes in system programs. ACM SIGPLAN Not 10(6):327–336
Article Google Scholar
Fahmy H, Holt RC (2000) Software architecture transformations. In: Proc. of the 16th int’l conf. on software maintenance (ICSM’00). San Jose, California, pp 88–96
Fanta R, Rajlich V (1999) Removing clones from the code. J Softw Maint, Res & Pract 11(4):223–243
Article Google Scholar
Fenton N, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814
Article Google Scholar
Fowler M (2003) Who needs an architect? IEEE Softw 20(5):11–13
Article Google Scholar
Gittens M, Kim Y, Godwin D (2005) The vital few versus the trivial many: Examining the pareto principle for software. In: Proc. of the 29th annual int’l computer software and applications conf. (COMPSAC’05). Edinburgh, Scotland, pp 179–185
Grünbacher P, Egyed A, Medvidovic N (2001) Reconciling software requirements and architectures: the cbsp approach. In: Software and systems modeling. Springer, pp 202–211
Inverardi P, Wolf AL (1995) Formal specifications and analysis of software architectures using the chemical abstract machine model. IEEE Trans Softw Eng 21(4):100–114
Article Google Scholar
Jansen A, Bosch J (2005) Software architecture as a set of architectural design decisions. In: Proc. of the 5th working IEEE/IFIP conf. on software architecture (WICSA’05). Pittsburgh, Pennsylvania, pp 109–120
Kazman R, O’Brien L, Verhoef C (2002) Architecture reconstruction guidelines, 3rd edn. Technical Report, cmu/sei-2002-tr-034
Kulkarni S (2008) Software defect rediscoveries: causes, taxonomy and signficance. M.sc., The University of Western Ontario
Leszak M, Perry DE, Stoll D (2000) A case study in root cause defect analysis. In: Proc. of the 22nd int’l conf. on software engineering (ICSE’00). Limerick, Ireland, pp 428–437
Li Z, Gittens M, Murtaza SS, Madhavji NH, Miranskyy AV, Godwin D, Cialini E (2009) Analysis of pervasive multiple-component defects in a large software system. In: Proc. of the 25th IEEE int’l conf. on software maintenance (ICSM’09). Edmonton, Alberta, pp 265–273
Muccini H, Dias M, Richardson DJ (2006) Software architecture-based regression testing. J Syst Softw 79(10):1379–1396
Article Google Scholar
Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proc. of the 28th int’l conf. on software engineering (ICSE’06). Shanghai, China, pp 452–461
Nakajo T, Kume H (1991) A case history analysis of software error cause-effect relationships. IEEE Trans Softw Eng 17(8):830–838
Article Google Scholar
Nedstam J, Karlsson E-A, Host M (2004) The architectural change process. In: Proc. of the 2004 int’l symposium on empirical software engineering (ISESE’04). Redondo Beach, CA, pp 27–36
Ohlsson N, Alberg H (1996) Predicting fault-prone software modules in telephone switches. IEEE Trans Softw Eng 21(12):886–894
Article Google Scholar
Ohlsson MC, Wohlin C (1998) Identification of green, yellow and red legacy components. In: Proc. of the 14th int’l conf. on software maintenance (ICSM’98). Bethesda, Washington DC, pp 6–15
Ostrand TJ, Weyuker EJ (2002) The distribution of faults in a large industrial software system. In: Proc. of the 2002 ACM SIGSOFT int’l symposium on software testing and analysis (ISSTA’02). Rome, Italy, pp 55–64
Ostrand TJ, Weyuker E, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355
Article Google Scholar
Pareto V (1969) Manual of political economy (English version). Augustus M Kelley Pubs
Perry DE, Evangelist WM (1987) An empirical study of software interface faults. In: Proc. of the 20th annual Hawaii int’l conf. on systems sciences. Hawaii, pp 113–126
Schroter A, Zimmermann T, Zeller A (2006) Predicting component failures at design time. In: Proc. of the 5th int’l symposium on empirical software engineering (ISESE’06). Rio de Janeiro, Brazil
Shin ME, Xu Y, Paniagua F, An JH (2006) Detection of anomalies in software architecture with connectors. Sci Comput Program 61:16–26
Article MathSciNet MATH Google Scholar
Stringfellow C, Amory CD, Potnuri D, Andrews A, Georg M (2006) Comparison of software architecture reverse engineering methods. Inf Softw Technol 48:484–497
Article Google Scholar
Valenti S (2002) Successful software reengineering. IRM Press
von Mayrhauser A, Ohlsson MC, Wohlin C (2000) Deriving fault architectures from defect history. J Softw Maint, Res & Pract 12(5):287–304
Article Google Scholar
Weiss DM (1979) Evaluating software development by error analysis: the data from the architecture research facility. J Syst Softw 1:57–70
Article Google Scholar
Weiss DM, Basili VR (1985) Evaluating software development by analysis of changes: some data from the software engineering laboratory. IEEE Trans Softw Eng 11(2):157–168
Article Google Scholar
Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2000) Experimentation in software engineering: an introduction. Kluwer Academic Publishers
Ying ATT, Murphy GC, Ng R, Chu-Carroll MC (2004) Predicting source code changes by mining change history. IEEE Trans Softw Eng 30(9):574–586
Article Google Scholar
Yu WD (1998) A software fault prevention approach in coding and root cause analysis. Bell Labs Tech J 3(2):3–21
Article Google Scholar
Yu T-J, Shen VY, Dunsmore HE (1988) An analysis of several software defect models. IEEE Trans Softw Eng 14(9):1261–1270
Article Google Scholar
Zimmermann T, Weibgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: Proc. of the 26th int’l conf. on software engineering (ICSE’04). Edinburgh, UK, pp 563–572

Download references

Acknowledgements

We are very thankful to Remo Ferrari of the University of Western Ontario for invaluable discussions on investigating software architectures, and to the anonymous reviewers for their excellent comments and suggestions.

Author information

Mechelle Gittens
Present address: Department of Computer Science, Mathematics & Physics, University of the West Indies, Cave Hill, Barbados

Authors and Affiliations

School of Information Science and Engineering, Central South University, Changsha, Hunan, People’s Republic of China
Zude Li
Computer Science Department, University of Western Ontario, London, ON, Canada
Zude Li, Nazim H. Madhavji, Syed Shariyar Murtaza & Mechelle Gittens
IBM Canada Ltd., Toronto, ON, Canada
Andriy V. Miranskyy, David Godwin & Enzo Cialini

Authors

Zude Li
View author publications
You can also search for this author in PubMed Google Scholar
Nazim H. Madhavji
View author publications
You can also search for this author in PubMed Google Scholar
Syed Shariyar Murtaza
View author publications
You can also search for this author in PubMed Google Scholar
Mechelle Gittens
View author publications
You can also search for this author in PubMed Google Scholar
Andriy V. Miranskyy
View author publications
You can also search for this author in PubMed Google Scholar
David Godwin
View author publications
You can also search for this author in PubMed Google Scholar
Enzo Cialini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zude Li.

Additional information

Editors: Muhammad Ali Babar, Arie van Deursen and Patricia Lago

This paper is an enhanced version of paper (Li et al. 2009). This research is, in part, supported by research grants from Natural Science and Engineering Research Council (NSERC) of Canada and the Centre for Advanced Studies, IBM Canada.

The opinions expressed in this paper are those of the authors and not necessarily of IBM Corporation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Z., Madhavji, N.H., Murtaza, S.S. et al. Characteristics of multiple-component defects and architectural hotspots: a large system case study. Empir Software Eng 16, 667–702 (2011). https://doi.org/10.1007/s10664-011-9155-y

Download citation

Published: 19 February 2011
Issue Date: October 2011
DOI: https://doi.org/10.1007/s10664-011-9155-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Characteristics of multiple-component defects and architectural hotspots: a large system case study

Abstract

Access this article

Similar content being viewed by others

Zones of Pain: Visualising the Relationship Between Software Architecture and Defects

A large-scale study of architectural evolution in open-source software systems

No Code Anomaly is an Island

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Characteristics of multiple-component defects and architectural hotspots: a large system case study

Abstract

Access this article

Similar content being viewed by others

Zones of Pain: Visualising the Relationship Between Software Architecture and Defects

A large-scale study of architectural evolution in open-source software systems

No Code Anomaly is an Island

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation