Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models

Weyuker, Elaine J.; Ostrand, Thomas J.; Bell, Robert M.

doi:10.1007/s10664-008-9082-8

Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models

Published: 23 July 2008

Volume 13, pages 539–559, (2008)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Elaine J. Weyuker¹,
Thomas J. Ostrand¹ &
Robert M. Bell¹

942 Accesses
131 Citations
Explore all metrics

Abstract

Fault prediction by negative binomial regression models is shown to be effective for four large production software systems from industry. A model developed originally with data from systems with regularly scheduled releases was successfully adapted to a system without releases to identify 20% of that system’s files that contained 75% of the faults. A model with a pre-specified set of variables derived from earlier research was applied to three additional systems, and proved capable of identifying averages of 81, 94 and 76% of the faults in those systems. A primary focus of this paper is to investigate the impact on predictive accuracy of using data about the number of developers who access individual code units. For each system, including the cumulative number of developers who had previously modified a file yielded no more than a modest improvement in predictive accuracy. We conclude that while many factors can “spoil the broth” (lead to the release of software with too many defects), the number of developers is not a major influence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assumption-checking rather than (just) testing: The importance of visualization and effect size in statistical diagnostics

Article Open access 03 March 2023

Concept and Principles of Measurement

Software defect prediction: future directions and challenges

Article 27 February 2024

References

Adams EN (1984) Optimizing preventive service of software products. IBM J Res Develop 28(1):2–14, January
Article Google Scholar
Arisholm E, Briand LC (2006) Predicting fault-prone components in a java legacy system. In: Proc ACM/IEEE ISESE, Rio de Janeiro, 21 September 2006
Basili VR, Perricone BT (1984) Software errors and complexity: an empirical investigation. Commun ACM 27(1):42–52, January
Article Google Scholar
Bell RM, Ostrand TJ, Weyuker EJ (2006) Looking for bugs in all the right places. In: Proc ACM/international symposium on software testing and analysis (ISSTA2006), Portland, July 2006, pp 61–71
Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493, June
Article Google Scholar
Denaro G, Pezze M (2002) An empirical evaluation of fault-proneness models. In: Proc international conf on software engineering (ICSE2002), Miami, May 2002
Eick SG, Graves TL, Karr AF, Marron JS, Mockus A (2001) Does code decay? Assessing the evidence from change management data. IEEE Trans Softw Eng 27(1):1–12, January
Article Google Scholar
Fenton NE, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814, August
Article Google Scholar
Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661, July
Article Google Scholar
Guo L, Ma Y, Cukic B, Singh H (2004) Robust prediction of fault-proneness by random forests. In: Proc ISSRE 2004, Saint-Malo, November 2004
Hatton L (1997) Reexamining the fault density—component size connection. IEEE Softw 14:89–97, March/April
Article Google Scholar
Khoshgoftaar TM, Allen EB, Deng J (2002) Using regression trees to classify fault-prone software modules. IEEE Trans Reliab 51(4):455–462, December
Article Google Scholar
Khoshgoftaar TM, Allen EB, Kalaichelvan KS, Goel N (1996) Early quality prediction: a case study in telecommunications. IEEE Softw 13:65–71, January
Article Google Scholar
McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, London
MATH Google Scholar
Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13, January
Article Google Scholar
Mockus A, Weiss DM (2000) Predicting risk of software changes. Bell Labs Tech J 5:169–180, April–June
Article Google Scholar
Moeller K-H, Paulish DJ (1993) An empirical investigation of software fault distribution. In: Proc IEEE first international software metrics symposium, Baltimore, 21–22 May 1993, pp 82–90
Munson JC, Khoshgoftaar TM (1992) The detection of fault-prone programs. IEEE Trans Softw Eng 18(5):423–433, May
Article Google Scholar
Nagappan N, Ball T (2007) Using software dependencies and churn metrics to predict field failures. In: Int symp on software engineering and measurement, Madrid, 21–22 September 2007
Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proc int conf on software engineering, Shanghai, May 2006, pp 452–461
Ohlsson N, Alberg H (1996) Predicting fault-prone software modules in telephone switches. IEEE Trans Softw Eng 22(12):886–894, December
Article Google Scholar
Ostrand T, Weyuker EJ (2002) The distribution of faults in a large industrial software system. In: Proc ACM/international symposium on software testing and analysis (ISSTA2002), Rome, July 2002, pp 55–64
Ostrand TJ, Weyuker EJ, Bell RM (2005) Predicting the location and number of faults in large software systems. IEEE Trans Softw Eng 31(4):340–355, April
Article Google Scholar
Ostrand TJ, Weyuker EJ, Bell RM (2007) Automating algorithms for the identification of fault-prone files. In: Proc. ACM/international symposium on software testing and analysis (ISSTA07), London, July 2007
Pighin M, Marzona A (2003) An empirical analysis of fault persistence through software releases. In: Proc. IEEE/ACM ISESE. IEEE, Piscataway, pp 206–212
Google Scholar
SAS Institute Inc (2004) SAS/STAT 9.1 user’s guide. SAS, Cary
Google Scholar
Succi G, Pedrycz W, Stefanovic M, Miller J (2003) Practical assessment of the models for identification of defect-prone classes in object-oriented commercial systems using design metrics. J Syst Softw 65(1):1–12, January
Article Google Scholar
Witten IH, Frank E (2005) Data mining, 2nd edn. Morgan Kaufmann, San Francisco
MATH Google Scholar

Download references

Author information

Authors and Affiliations

AT&T Labs - Research, 180 Park Avenue, Florham Park, NJ, 07932, USA
Elaine J. Weyuker, Thomas J. Ostrand & Robert M. Bell

Authors

Elaine J. Weyuker
View author publications
You can also search for this author in PubMed Google Scholar
Thomas J. Ostrand
View author publications
You can also search for this author in PubMed Google Scholar
Robert M. Bell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas J. Ostrand.

Additional information

Editor: Tim Menzies

Rights and permissions

Reprints and permissions

About this article

Cite this article

Weyuker, E.J., Ostrand, T.J. & Bell, R.M. Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models. Empir Software Eng 13, 539–559 (2008). https://doi.org/10.1007/s10664-008-9082-8

Download citation

Published: 23 July 2008
Issue Date: October 2008
DOI: https://doi.org/10.1007/s10664-008-9082-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models

Abstract

Access this article

Similar content being viewed by others

Assumption-checking rather than (just) testing: The importance of visualization and effect size in statistical diagnostics

Concept and Principles of Measurement

Software defect prediction: future directions and challenges

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models

Abstract

Access this article

Similar content being viewed by others

Assumption-checking rather than (just) testing: The importance of visualization and effect size in statistical diagnostics

Concept and Principles of Measurement

Software defect prediction: future directions and challenges

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation