Skip to main content
Log in

Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Fault prediction by negative binomial regression models is shown to be effective for four large production software systems from industry. A model developed originally with data from systems with regularly scheduled releases was successfully adapted to a system without releases to identify 20% of that system’s files that contained 75% of the faults. A model with a pre-specified set of variables derived from earlier research was applied to three additional systems, and proved capable of identifying averages of 81, 94 and 76% of the faults in those systems. A primary focus of this paper is to investigate the impact on predictive accuracy of using data about the number of developers who access individual code units. For each system, including the cumulative number of developers who had previously modified a file yielded no more than a modest improvement in predictive accuracy. We conclude that while many factors can “spoil the broth” (lead to the release of software with too many defects), the number of developers is not a major influence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Adams EN (1984) Optimizing preventive service of software products. IBM J Res Develop 28(1):2–14, January

    Article  Google Scholar 

  • Arisholm E, Briand LC (2006) Predicting fault-prone components in a java legacy system. In: Proc ACM/IEEE ISESE, Rio de Janeiro, 21 September 2006

  • Basili VR, Perricone BT (1984) Software errors and complexity: an empirical investigation. Commun ACM 27(1):42–52, January

    Article  Google Scholar 

  • Bell RM, Ostrand TJ, Weyuker EJ (2006) Looking for bugs in all the right places. In: Proc ACM/international symposium on software testing and analysis (ISSTA2006), Portland, July 2006, pp 61–71

  • Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493, June

    Article  Google Scholar 

  • Denaro G, Pezze M (2002) An empirical evaluation of fault-proneness models. In: Proc international conf on software engineering (ICSE2002), Miami, May 2002

  • Eick SG, Graves TL, Karr AF, Marron JS, Mockus A (2001) Does code decay? Assessing the evidence from change management data. IEEE Trans Softw Eng 27(1):1–12, January

    Article  Google Scholar 

  • Fenton NE, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814, August

    Article  Google Scholar 

  • Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661, July

    Article  Google Scholar 

  • Guo L, Ma Y, Cukic B, Singh H (2004) Robust prediction of fault-proneness by random forests. In: Proc ISSRE 2004, Saint-Malo, November 2004

  • Hatton L (1997) Reexamining the fault density—component size connection. IEEE Softw 14:89–97, March/April

    Article  Google Scholar 

  • Khoshgoftaar TM, Allen EB, Deng J (2002) Using regression trees to classify fault-prone software modules. IEEE Trans Reliab 51(4):455–462, December

    Article  Google Scholar 

  • Khoshgoftaar TM, Allen EB, Kalaichelvan KS, Goel N (1996) Early quality prediction: a case study in telecommunications. IEEE Softw 13:65–71, January

    Article  Google Scholar 

  • McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, London

    MATH  Google Scholar 

  • Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13, January

    Article  Google Scholar 

  • Mockus A, Weiss DM (2000) Predicting risk of software changes. Bell Labs Tech J 5:169–180, April–June

    Article  Google Scholar 

  • Moeller K-H, Paulish DJ (1993) An empirical investigation of software fault distribution. In: Proc IEEE first international software metrics symposium, Baltimore, 21–22 May 1993, pp 82–90

  • Munson JC, Khoshgoftaar TM (1992) The detection of fault-prone programs. IEEE Trans Softw Eng 18(5):423–433, May

    Article  Google Scholar 

  • Nagappan N, Ball T (2007) Using software dependencies and churn metrics to predict field failures. In: Int symp on software engineering and measurement, Madrid, 21–22 September 2007

  • Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proc int conf on software engineering, Shanghai, May 2006, pp 452–461

  • Ohlsson N, Alberg H (1996) Predicting fault-prone software modules in telephone switches. IEEE Trans Softw Eng 22(12):886–894, December

    Article  Google Scholar 

  • Ostrand T, Weyuker EJ (2002) The distribution of faults in a large industrial software system. In: Proc ACM/international symposium on software testing and analysis (ISSTA2002), Rome, July 2002, pp 55–64

  • Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355, April

    Article  Google Scholar 

  • Ostrand TJ, Weyuker EJ, Bell RM (2007) Automating algorithms for the identification of fault-prone files. In: Proc. ACM/international symposium on software testing and analysis (ISSTA07), London, July 2007

  • Pighin M, Marzona A (2003) An empirical analysis of fault persistence through software releases. In: Proc. IEEE/ACM ISESE. IEEE, Piscataway, pp 206–212

    Google Scholar 

  • SAS Institute Inc (2004) SAS/STAT 9.1 user’s guide. SAS, Cary

    Google Scholar 

  • Succi G, Pedrycz W, Stefanovic M, Miller J (2003) Practical assessment of the models for identification of defect-prone classes in object-oriented commercial systems using design metrics. J Syst Softw 65(1):1–12, January

    Article  Google Scholar 

  • Witten IH, Frank E (2005) Data mining, 2nd edn. Morgan Kaufmann, San Francisco

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas J. Ostrand.

Additional information

Editor: Tim Menzies

Rights and permissions

Reprints and permissions

About this article

Cite this article

Weyuker, E.J., Ostrand, T.J. & Bell, R.M. Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models. Empir Software Eng 13, 539–559 (2008). https://doi.org/10.1007/s10664-008-9082-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-008-9082-8

Keywords

Navigation