Skip to main content
Log in

Characteristics of multiple-component defects and architectural hotspots: a large system case study

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

The architecture of a large software system is widely considered important for such reasons as: providing a common goal to the stakeholders in realising the envisaged system; helping to organise the various development teams; and capturing foundational design decisions early in the development. Studies have shown that defects originating in system architectures can consume twice as much correction effort as that for other defects. Clearly, then, scientific studies on architectural defects are important for their improved treatment and prevention. Previous research has focused on the extent of architectural defects in software systems. For this paper, we were motivated to ask the following two complementary questions in a case study: (i) How do multiple-component defects (MCDs)—which are of architectural importance—differ from other types of defects in terms of (a) complexity and (b) persistence across development phases and releases? and (ii) How do highly MCD-concentrated components (the so called, architectural hotspots) differ from other types of components in terms of their (a) interrelationships and (b) persistence across development phases and releases? Results indicate that MCDs are complex to fix and are persistent across phases and releases. In comparison to a non-MCD, a MCD requires over 20 times more changes to fix it and is 6 to 8 times more likely to cross a phase or a release. These findings have implications for defect detection and correction. Results also show that 20% of the subject system’s components contain over 80% of the MCDs and that these components are 2–3 times more likely to persist across multiple system releases than other components in the system. Such MCD-concentrated components constitute architectural “hotspots” which management can focus upon for preventive maintenance and architectural quality improvement. The findings described are from an empirical study of a large legacy software system of size over 20 million lines of code and age over 17 years.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. We do not exclude the possibility of a single-component architectural defect; however, this is not the focus of this paper.

  2. The Eclipse project website: http://www.eclipse.org (last access in January 2011).

  3. The Mozilla project website: http://www.mozilla.org (last access in January 2011).

  4. There are at least two ways to log MCDs in a defect database. The first is as described in this study—based on parent and children references fields. The second is through change logs, which would describe changes made to multiple components for a given defect fix. We believe that change logs are more common than parent-children relationship logs.

  5. The criterion to use the “top 20%” is based on our analysis that the top 20% of the system’s components contain over 80% of MCDs (see Section 4.3). Note that in a “very clean” software system (i.e., one with relatively few MCDs), there are still architectural hotspots based on the “top 20%” criterion relative to the quality of such a system. Likewise, in a “very defective” system, the “top 20%” criterion would identify over 80% of the MCDs. Thus, the quality of the system has no bearing on the “top 20%” criterion used.

  6. While many defect attributes logged are common to that found in the literature (Chillarege et al. 1992). The use of parent-children references, to our knowledge, are not known in the literature.

  7. Pearson correlation coefficient is a measure of the correlation between two arrays of data, ranging from −1 to +1. A value of 1 indicates a positive linear correlation, a value of −1 indicates a negative linear correlation; and a value of 0 or near 0 indicates no significant correlation.

  8. In the system, approx. 70% of DPCs (i.e., the top 20% most-defective components) are architectural hotspots, and approx. 70% of hotspots are DPCs. Thus, the DPCs and hotspots have a significant overlap.

  9. In our earlier paper (Li et al. 2009), we used the term “pervasive”; however, we consider the term “insidious” as a better fit.

  10. We cannot be definitive about the “difficulty” of fixing MCDs because this would require detailed knowledge of fixing these MCDs—the data which was not recorded in the historical database.

  11. For a given component (e.g., C3), the total number of MCDs on the fix relationships of that component will be greater than or equal to the number of MCDs in the component. However this figure only shows the fix relationships among the components that the compiler contains. The compiler is a part of the whole, much larger, system. There are other fix relationships between C3 and other components that are outside the compiler. These fix relationships are not shown in Fig. 2. Thus, for component C3, the comparison “(54 + 2 = 56) < 84” is only a partial view.

  12. Note that this finding indicates that some hotspots do not have DPFRs.

  13. A rigorous routine for the subject system is to first record a defect in the defect-tracking database, then fix this defect, and finally update the fields of the defect record. Thus, it can be said that all the defects fixed in the subject system have first been recorded in the defect-tracking database.

  14. SWEBOK stands for IEEE’s Software Engineering Body of Knowledge; see www.swebok.org.

References

  • Abdelmoez W, Shereshevsky M, Gunnalan R, Bogazzi BYS, Korkmaz M, Ammar HH (2004) Software architectures change propagation tool (sacpt). In: Proc. of the 20th int’l conf. on software maintenance (ICSM’04). Chicago, USA, pp 517–517

  • Adams EN (1984) Optimizing preventive service of software products. IBM Res J 28(1):2–14

    Article  Google Scholar 

  • Andersson C, Runeson P (2007) A replicated quantitative analysis of fault distribution in complex software systems. IEEE Trans Softw Eng 33(5):273–286

    Article  Google Scholar 

  • Andrews A, Stringfellow C (2001) Quantitative analysis of development defects to guide testing: a case study. Softw Qual J 9:195–214

    Article  Google Scholar 

  • Bachmann F, Bass L, Klein M (2003) Preliminary design of arche: a software architecture design assistant. Technical Report, cmu/sei-2003-tr-021; via: http://www.sei.cmu.edu/reports/03tr021.pdf

  • Basili VR, Perricone BT (1984) Software errors and complexity: an empirical investigation. Commun ACM 27(1):42–52

    Article  Google Scholar 

  • Basili VR, Shull F (2005) Evolving defect “folklore”: a cross-study analysis of software defect behavior. In: Proc. of the int’l software process workshop (ISPW’05). Beijing, China, pp 1–9

  • Bass L, Berenbach B (2008) Leadership and management in software architecture (lmsa’08)—a report on an icse workshop. Leipzig, Germany

  • Bass L, Clements P, Kazman R (2003) Software architecture in practice, 2nd edn. Addison-Wesley Professional

  • Bertolino A, Inverardi P (1996) Architecture-based software testing. In: Proc. of the SIGSOFT 96 workshop. California, USA, pp 62–64

  • Boehm B, Basili VR (2001) Software defect reduction top 10 list. Computer 34(1):135–137

    Article  Google Scholar 

  • Booch G (2007) The economics of architecture-first. IEEE Softw 24(5):18–20

    Article  Google Scholar 

  • Booch G (2008) Nine things you can do with old software. IEEE Softw 25(5):93–94

    Article  Google Scholar 

  • Chikofsky EJ, Cross II JH (1990) Reverse engineering and design recovery: a taxonomy. IEEE Softw 7(1):13–17

    Article  Google Scholar 

  • Chillarege R, Bhandari IS, Chaar JK, Halliday MJ, Moebus DS, Ray BK, Wong MY (1992) Orthogonal defect classification—a concept for in-process measurements. IEEE Trans Softw Eng 18(11):943–956

    Article  Google Scholar 

  • Clements PC, Northrop LM (1996) Software architecture: an executive overview. Technical Report, CMU/SEI-96-TR-003

  • Clements P, Bachmann F, Bass L, Garlan D, Ivers J, Little R, Nord R, Stafford J (2002) Documenting software architectures: views and beyond. Addison-Wesley Professional

  • Compton BT, Withrow C (1990) Prediction and control of ada software defects. J Syst Softw 12(3):199–207

    Article  Google Scholar 

  • Creswell JW (2002) Research design: qualitative, quantitative, and mixed methods approaches, 2nd edn. Sage Publications

  • Dobrica L, Niemela E (2002) A survey of software architecture analysis methods. IEEE Trans Softw Eng 28(7):638–653

    Article  Google Scholar 

  • Ebert C (1997) Experiences with criticality predictions in software development. ACM SIGSOFT Softw Eng Notes 22(6):278–293

    Article  Google Scholar 

  • Ebert C, Dumke R, Bundschuh M, Schimietendorf A (2005) Defect detection and quality improvement. In: “Best practices in software measurement”. Springer, pp 133–156

  • El-Ramly M (2006) Experience in teaching a software reengineering course. In: Proc. of the 28th int’l conf. on software engineering (ICSE’06). Shanghai, China, pp 699–702

  • Endres A (1975) An analysis of errors and their causes in system programs. ACM SIGPLAN Not 10(6):327–336

    Article  Google Scholar 

  • Fahmy H, Holt RC (2000) Software architecture transformations. In: Proc. of the 16th int’l conf. on software maintenance (ICSM’00). San Jose, California, pp 88–96

  • Fanta R, Rajlich V (1999) Removing clones from the code. J Softw Maint, Res & Pract 11(4):223–243

    Article  Google Scholar 

  • Fenton N, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814

    Article  Google Scholar 

  • Fowler M (2003) Who needs an architect? IEEE Softw 20(5):11–13

    Article  Google Scholar 

  • Gittens M, Kim Y, Godwin D (2005) The vital few versus the trivial many: Examining the pareto principle for software. In: Proc. of the 29th annual int’l computer software and applications conf. (COMPSAC’05). Edinburgh, Scotland, pp 179–185

  • Grünbacher P, Egyed A, Medvidovic N (2001) Reconciling software requirements and architectures: the cbsp approach. In: Software and systems modeling. Springer, pp 202–211

  • Inverardi P, Wolf AL (1995) Formal specifications and analysis of software architectures using the chemical abstract machine model. IEEE Trans Softw Eng 21(4):100–114

    Article  Google Scholar 

  • Jansen A, Bosch J (2005) Software architecture as a set of architectural design decisions. In: Proc. of the 5th working IEEE/IFIP conf. on software architecture (WICSA’05). Pittsburgh, Pennsylvania, pp 109–120

  • Kazman R, O’Brien L, Verhoef C (2002) Architecture reconstruction guidelines, 3rd edn. Technical Report, cmu/sei-2002-tr-034

  • Kulkarni S (2008) Software defect rediscoveries: causes, taxonomy and signficance. M.sc., The University of Western Ontario

  • Leszak M, Perry DE, Stoll D (2000) A case study in root cause defect analysis. In: Proc. of the 22nd int’l conf. on software engineering (ICSE’00). Limerick, Ireland, pp 428–437

  • Li Z, Gittens M, Murtaza SS, Madhavji NH, Miranskyy AV, Godwin D, Cialini E (2009) Analysis of pervasive multiple-component defects in a large software system. In: Proc. of the 25th IEEE int’l conf. on software maintenance (ICSM’09). Edmonton, Alberta, pp 265–273

  • Muccini H, Dias M, Richardson DJ (2006) Software architecture-based regression testing. J Syst Softw 79(10):1379–1396

    Article  Google Scholar 

  • Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proc. of the 28th int’l conf. on software engineering (ICSE’06). Shanghai, China, pp 452–461

  • Nakajo T, Kume H (1991) A case history analysis of software error cause-effect relationships. IEEE Trans Softw Eng 17(8):830–838

    Article  Google Scholar 

  • Nedstam J, Karlsson E-A, Host M (2004) The architectural change process. In: Proc. of the 2004 int’l symposium on empirical software engineering (ISESE’04). Redondo Beach, CA, pp 27–36

  • Ohlsson N, Alberg H (1996) Predicting fault-prone software modules in telephone switches. IEEE Trans Softw Eng 21(12):886–894

    Article  Google Scholar 

  • Ohlsson MC, Wohlin C (1998) Identification of green, yellow and red legacy components. In: Proc. of the 14th int’l conf. on software maintenance (ICSM’98). Bethesda, Washington DC, pp 6–15

  • Ostrand TJ, Weyuker EJ (2002) The distribution of faults in a large industrial software system. In: Proc. of the 2002 ACM SIGSOFT int’l symposium on software testing and analysis (ISSTA’02). Rome, Italy, pp 55–64

  • Ostrand TJ, Weyuker E, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355

    Article  Google Scholar 

  • Pareto V (1969) Manual of political economy (English version). Augustus M Kelley Pubs

  • Perry DE, Evangelist WM (1987) An empirical study of software interface faults. In: Proc. of the 20th annual Hawaii int’l conf. on systems sciences. Hawaii, pp 113–126

  • Schroter A, Zimmermann T, Zeller A (2006) Predicting component failures at design time. In: Proc. of the 5th int’l symposium on empirical software engineering (ISESE’06). Rio de Janeiro, Brazil

  • Shin ME, Xu Y, Paniagua F, An JH (2006) Detection of anomalies in software architecture with connectors. Sci Comput Program 61:16–26

    Article  MathSciNet  MATH  Google Scholar 

  • Stringfellow C, Amory CD, Potnuri D, Andrews A, Georg M (2006) Comparison of software architecture reverse engineering methods. Inf Softw Technol 48:484–497

    Article  Google Scholar 

  • Valenti S (2002) Successful software reengineering. IRM Press

  • von Mayrhauser A, Ohlsson MC, Wohlin C (2000) Deriving fault architectures from defect history. J Softw Maint, Res & Pract 12(5):287–304

    Article  Google Scholar 

  • Weiss DM (1979) Evaluating software development by error analysis: the data from the architecture research facility. J Syst Softw 1:57–70

    Article  Google Scholar 

  • Weiss DM, Basili VR (1985) Evaluating software development by analysis of changes: some data from the software engineering laboratory. IEEE Trans Softw Eng 11(2):157–168

    Article  Google Scholar 

  • Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2000) Experimentation in software engineering: an introduction. Kluwer Academic Publishers

  • Ying ATT, Murphy GC, Ng R, Chu-Carroll MC (2004) Predicting source code changes by mining change history. IEEE Trans Softw Eng 30(9):574–586

    Article  Google Scholar 

  • Yu WD (1998) A software fault prevention approach in coding and root cause analysis. Bell Labs Tech J 3(2):3–21

    Article  Google Scholar 

  • Yu T-J, Shen VY, Dunsmore HE (1988) An analysis of several software defect models. IEEE Trans Softw Eng 14(9):1261–1270

    Article  Google Scholar 

  • Zimmermann T, Weibgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: Proc. of the 26th int’l conf. on software engineering (ICSE’04). Edinburgh, UK, pp 563–572

Download references

Acknowledgements

We are very thankful to Remo Ferrari of the University of Western Ontario for invaluable discussions on investigating software architectures, and to the anonymous reviewers for their excellent comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zude Li.

Additional information

Editors: Muhammad Ali Babar, Arie van Deursen and Patricia Lago

This paper is an enhanced version of paper (Li et al. 2009). This research is, in part, supported by research grants from Natural Science and Engineering Research Council (NSERC) of Canada and the Centre for Advanced Studies, IBM Canada.

The opinions expressed in this paper are those of the authors and not necessarily of IBM Corporation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Z., Madhavji, N.H., Murtaza, S.S. et al. Characteristics of multiple-component defects and architectural hotspots: a large system case study. Empir Software Eng 16, 667–702 (2011). https://doi.org/10.1007/s10664-011-9155-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-011-9155-y

Keywords

Navigation