Mining software defect data to support software testing management

Hewett, Rattikorn

doi:10.1007/s10489-009-0193-8

Mining software defect data to support software testing management

Published: 19 September 2009

Volume 34, pages 245–257, (2011)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Rattikorn Hewett¹

548 Accesses
13 Citations
Explore all metrics

Abstract

Achieving high quality software would be easier if effective software development practices were known and deployed in appropriate contexts. Because our theoretical knowledge of the underlying principles of software development is far from complete, empirical analysis of past experience in software projects is essential for acquiring useful software practices. As advances in software technology continue to facilitate automated tracking and data collection, more software data become available. Our research aims to develop methods to exploit such data for improving software development practices.

This paper proposes an empirical approach, based on the analysis of defect data, that provides support for software testing management in two ways: (1) construction of a predictive model for defect repair times, and (2) a method for assessing testing quality across multiple releases. The approach employs data mining techniques including statistical methods and machine learning. To illustrate the proposed approach, we present a case study using the defect reports created during the development of three releases of a large medical software system, produced by a large well-established software company. We validate our proposed testing quality assessment using a statistical test at a significance level of 0.1. Despite the limitations of the available data, our predictive models give accuracies as high as 93%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Biyani S, Santhanam P (1998) Exploring defect data from development and customer usage on software modules over multiple releases. In: Procs of int’l conf on software reliability eng, Paderborn, Germany, pp 316–320
Boehm B, Basili V (2001) Software defect reduction top 10 list. IEEE Softw 34(1):135–137
Google Scholar
Boehm B, Horowitz E, Madachy R, Reifer D, Clark B, Steece B, Brown A, Chulani S, Abts C (2000) Software cost estimation with COCOMO II. Prentice-Hall, Englewood Cliffs
Google Scholar
Challagulla V, Bastani F, Yen I, Paul R (2005) Empirical assessment of machine learning based software defect prediction techniques. In: Proceedings of the 10^th IEEE international workshop on object-oriented real-time dependable systems
Culbertson R, Brown C, Cobb G (2001) Rapid testing. Prentice-Hall, Upper Saddle River
Google Scholar
Duda R, Hart P (1973) Pattern classification and scene analysis. Wiley, New York
MATH Google Scholar
Fayyad U, Irani K (1993) Multi-interval discretization of continuous valued attributes for classification learning. In: Proceedings of IJCAI-93, vol 2. Morgan Kaufmann, San Mateo, pp 1022–1027
Google Scholar
Fenton N, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675–689
Article Google Scholar
Fenton N, Ohlsson N (2000) Quantitative analysis of faults and failures in a complex software system. IEEE Trans Softw Eng 26(8):797–814
Article Google Scholar
Galen R (2005) Software endgames: eliminating defects, controlling change, and the countdown to on-time delivery. Dorset House Publishing, New York
Google Scholar
Han J, Kamber M (2000) Data mining: Concepts and techniques. Morgan Kaufmann, San Mateo
Google Scholar
Haykin S (1995) Neural networks: A comprehensive foundation, 2nd edn. Springer, New York
Google Scholar
Hewett R, Kulkarni A (2006) Alternative approach to utilize software defect reports. In: Proceedings of the 15^th international conference on software engineering and data engineering (SEDE-2006), Los Angeles, CA
Hewett R, Kulkarni A, Stringfellow C, Andrews A (2006) Software defect data and predictability for testing schedules. In: Proceedings of the 18^th international conference on software engineering and knowledge engineering, San Francisco, CA
John G, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: Proc of the 11^th international conference on machine learning, pp 121–129
Kan S, Parish J, Manlove D (2001) In-process metrics for software testing. IBM Syst J 40(1):220–241
Article Google Scholar
Khoshgoftaar T, Szabo R, Woodcock T (1994) An empirical study of program quality during testing and maintenance. Softw Qual J 3(3):137–151
Article Google Scholar
Kohavi R (1995) The power of decision tables. In: Proceedings of European conference on machine learning. Springer, New York
Google Scholar
Langley P, Iba W, Thompson K (1992) An analysis of Bayesian classifiers. In: Proceedings of the 10^th national conference on artificial intelligence. AAA Press, Menlo Park, pp 223–228
Google Scholar
Lapin L (1973) Statistics for modern business decisions. Harcourt Brace Jovanovich, Inc, San Diego
MATH Google Scholar
McConnell S (1997) Gauging software readiness with defect tracking. IEEE Softw 14(3):135–136
Article Google Scholar
Mitchell T (1997) Machine learning. McGraw-Hill, New York
MATH Google Scholar
Mullen R (2006) Characterizing software defect repair time. In: Proceedings of the 17th IEEE international symposium on software reliability engineering (ISSRE-06), Raleigh, NC
Musa J, Iannino A, Okumoto K (1987) Software reliability: measurement, prediction, application. McGraw-Hill, New York
Google Scholar
Myers G (1979) The art of software testing. Wiley, New York
Google Scholar
Pressman R (2004) Software engineering: A practitioner’s approach. McGraw-Hill, New York
Google Scholar
Quinlan R (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo
Google Scholar
Rumelhart D, Hinton G, Williams R (1986) Learning internal representations by error propagation. In: Rumelhart D, McClelland J (eds) Parallel distributed processing: Explorations in the microstructure of cognition. Foundations, vol 1. MIT Press, Cambridge, pp 318–362
Google Scholar
Schach S (1996) Testing: Principles and practices. ACM Comput Surv 28(1):277–279
Article MathSciNet Google Scholar
Schneidewind N (2001) Modelling the fault correction process. In: Proceedings of international symposium on software reliability engineering (ISSRE 2001), pp 185–190
Stringfellow C, Andrews A (2001) Quantitative analysis of development defects to guide testing. Softw Qual J 9(3):195–214
Article Google Scholar
Stringfellow C, Andrews A (2002) An empirical method for selecting software reliability growth models. Empir Softw Eng 7(4):319–343
Article MATH Google Scholar
Stringfellow C, von Mayhauser A (2002) Deriving a fault architecture to guide testing. Softw Qual J 10(4):299–330
Article Google Scholar
Witten I, Frank E (2005) Data mining practical machine learning tools and techniques. Morgan Kaufmann, San Francisco
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Texas Tech University, 302 Pine Street, Abilene, TX, 79601, USA
Rattikorn Hewett

Authors

Rattikorn Hewett
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rattikorn Hewett.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hewett, R. Mining software defect data to support software testing management. Appl Intell 34, 245–257 (2011). https://doi.org/10.1007/s10489-009-0193-8

Download citation

Received: 25 February 2008
Accepted: 28 August 2009
Published: 19 September 2009
Issue Date: April 2011
DOI: https://doi.org/10.1007/s10489-009-0193-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining software defect data to support software testing management

Abstract

Access this article

Similar content being viewed by others

Data Analytics: Predicting Software Bugs in Industrial Products

The Adoption of Machine Learning Techniques for Software Defect Prediction: An Initial Industrial Validation

Defect Prediction in Software Using Predictive Models Based on Historical Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining software defect data to support software testing management

Abstract

Access this article

Similar content being viewed by others

Data Analytics: Predicting Software Bugs in Industrial Products

The Adoption of Machine Learning Techniques for Software Defect Prediction: An Initial Industrial Validation

Defect Prediction in Software Using Predictive Models Based on Historical Data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation