Software Quality Journal

, Volume 14, Issue 2, pp 85–111

An empirical study of predicting software faults with case-based reasoning

  • Taghi M. Khoshgoftaar
  • Naeem Seliya
  • Nandini Sundaresh
Article

DOI: 10.1007/s11219-006-7597-z

Cite this article as:
Khoshgoftaar, T.M., Seliya, N. & Sundaresh, N. Software Qual J (2006) 14: 85. doi:10.1007/s11219-006-7597-z

Abstract

The resources allocated for software quality assurance and improvement have not increased with the ever-increasing need for better software quality. A targeted software quality inspection can detect faulty modules and reduce the number of faults occurring during operations. We present a software fault prediction modeling approach with case-based reasoning (CBR), a part of the computational intelligence field focusing on automated reasoning processes. A CBR system functions as a software fault prediction model by quantifying, for a module under development, the expected number of faults based on similar modules that were previously developed. Such a system is composed of a similarity function, the number of nearest neighbor cases used for fault prediction, and a solution algorithm. The selection of a particular similarity function and solution algorithm may affect the performance accuracy of a CBR-based software fault prediction system. This paper presents an empirical study investigating the effects of using three different similarity functions and two different solution algorithms on the prediction accuracy of our CBR system. The influence of varying the number of nearest neighbor cases on the performance accuracy is also explored. Moreover, the benefits of using metric-selection procedures for our CBR system is also evaluated. Case studies of a large legacy telecommunications system are used for our analysis. It is observed that the CBR system using the Mahalanobis distance similarity function and the inverse distance weighted solution algorithm yielded the best fault prediction. In addition, the CBR models have better performance than models based on multiple linear regression.

Keywords

Software qualityCase-based reasoningSoftware fault predictionSimilarity functionsSolution algorithmSoftware metrics

Copyright information

© Springer Science + Business Media, Inc. 2006

Authors and Affiliations

  • Taghi M. Khoshgoftaar
    • 1
  • Naeem Seliya
    • 1
  • Nandini Sundaresh
    • 1
  1. 1.Empirical Software Engineering Laboratory, Department of Computer Science and EngineeringFlorida Atlantic UniversityBoca RatonUSA