Software Quality Journal

, Volume 14, Issue 2, pp 85–111 | Cite as

An empirical study of predicting software faults with case-based reasoning

  • Taghi M. Khoshgoftaar
  • Naeem Seliya
  • Nandini Sundaresh


The resources allocated for software quality assurance and improvement have not increased with the ever-increasing need for better software quality. A targeted software quality inspection can detect faulty modules and reduce the number of faults occurring during operations. We present a software fault prediction modeling approach with case-based reasoning (CBR), a part of the computational intelligence field focusing on automated reasoning processes. A CBR system functions as a software fault prediction model by quantifying, for a module under development, the expected number of faults based on similar modules that were previously developed. Such a system is composed of a similarity function, the number of nearest neighbor cases used for fault prediction, and a solution algorithm. The selection of a particular similarity function and solution algorithm may affect the performance accuracy of a CBR-based software fault prediction system. This paper presents an empirical study investigating the effects of using three different similarity functions and two different solution algorithms on the prediction accuracy of our CBR system. The influence of varying the number of nearest neighbor cases on the performance accuracy is also explored. Moreover, the benefits of using metric-selection procedures for our CBR system is also evaluated. Case studies of a large legacy telecommunications system are used for our analysis. It is observed that the CBR system using the Mahalanobis distance similarity function and the inverse distance weighted solution algorithm yielded the best fault prediction. In addition, the CBR models have better performance than models based on multiple linear regression.


Software quality Case-based reasoning Software fault prediction Similarity functions Solution algorithm Software metrics 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aha, D.W. and Bankert, R.L. 1994. Feature selection for case-based classification of cloud types: an empirical comparison. In D.W. Aha, ed., Workshop on Case-Based Reasoning (Technical Report WS-94-01), Menlo Park, California, AAAI Press.Google Scholar
  2. Bartsch-Spoerl, B. 1995. Toward the integration of case-based, schema-based, and model-based reasoning for supporting complex design tasks. In Proceedings: First International Conference on Case-Based Reasoning, pp. 145–156. Springer-Verlag.Google Scholar
  3. Bell, B., Kedar, S. and Bareiss, R. 1994. Interactive model-driven case adaptation for instructional software design. in Proceedings: 16th Annual Conference of the Cognitive Science Society, pp. 33–38. Lawrence Erlbaum Publishers.Google Scholar
  4. Berenson, M.L., Levine, D.M. and Goldstein, M. 1983. Intermediate Statistical Methods and Applications: A Computer Package Approach. Prentice Hall, Englewood Cliffs, NJ, USA.Google Scholar
  5. Briand, L.C., Langley, T. and Wieczorek, I. 2000. Areplicated assessment and comparison of common software cost modeling techniques. In Proceedings: International Conference on Software Engineering, pp. 377–386, Limerick, Ireland. Association for Computing Machinery.Google Scholar
  6. Dillon, W.R. and Goldstein, M. 1984. Multivariate Analysis: Methods and Applications. John Wiley & Sons, New York.Google Scholar
  7. Fayyad, U.M. 1996. Data mining and knowledge discovery: making sense out of data. IEEE Expert, 11(4): 20–25.CrossRefGoogle Scholar
  8. Fenton, N.E. and Pfleeger, S.L. 1997. Software Metrics: A Rigorous and Practical Approach. PWS Publishing Company: ITP, Boston, MA, 2nd edition.Google Scholar
  9. Ganesan, K., Khoshgoftaar, T.M. and Allen, E.B. 2000. Case-based software quality prediction. International Journal of Software Engineering and Knowledge Engineering, 10(2): 139–152. World Scientific Publishing.Google Scholar
  10. Gokhale, S.S. and Lyu, M.R. 1997. Regression tree modeling for the prediction of software quality. In Pham, H., ed. Proceedings of 3rd International Conference on Reliability and Quality in Design, pp. 31–36, Anaheim, CA. International Society of Science and Applied Technologies.Google Scholar
  11. Gray, A.R. and MacDonell, S.G. 1999. Software metrics data analysis: exploring the relative performance of some commonly used modeling techniques. Empirical Software Engineering Journal, 4: 297–316.CrossRefGoogle Scholar
  12. Hall, M.A. and Smith, L.A. 1998. Practical feature subset selection. In Proceedings: 21st Australian Computer Science Conference, pp. 181–191. Springer Verlag.Google Scholar
  13. Hudepohl, J.P., Aud, S.J., Khoshgoftaar, T.M., Allen, E.B. and Mayrand, J. 1996. Emerald: Software metrics and models on the desktop. IEEE Software, 13(5): 56–60CrossRefGoogle Scholar
  14. Idri, A., Abran, A. and Khoshgoftaar, T.M. 2002. Estimating software project effort by analogy based on linguistic values. In Proceedings: 8th International Software Metrics Symposium, pp. 21–30, Ottawa, Ontario, Canada, IEEE Computer Society.Google Scholar
  15. Imam, K.E., Benlarbi, S., Goel, N. and Rai, S.N. 2001. Comparing case-based reasoning classifiers for predicting high-risk software componenets. Journal of Systems and Software, 55(3): 301–320. Elsevier Science Publishing.Google Scholar
  16. Kadoda, G., Cartwright, M., Chen, L. and Shepperd, M. 2000. Experiences using case-based reasoning to predict software project effort. In Proceedings of 4th International Conference on Empirical Assessment in Software Engineering, pp. 23–33, Staffordshire, UK.Google Scholar
  17. Khoshgoftaar, T.M., Allen, E.B. and Busboom, J.C. 2000. Modeling software quality: the software measurement analysis and reliability toolkit. In Proceedings of 12th International Conference on Tools with Artificial Intelligence, pp. 54–61, Vancouver, BC, Canada, November. IEEE Computer Society.Google Scholar
  18. Khoshgoftaar, T.M., Bullard, L.A. and Gao, K. 2003. Detecting outliers using rule-based modeling for improving cbr-based software quality classification models. In Ashley, K.D. and Bridge, D.G., (Eds.), Proceedings of the 16th International Conference on Case-Based Reasoning, volume 1689, pp. 216–230. Springer-Verlag LNAI.Google Scholar
  19. Khoshgoftaar, T.M., Ganesan, K., Allen, E.B., Ross, F.D., Munikoti, R., Goel, N. and Nandi, A. 1997. Predicting fault-prone modules with case-based reasoning. In Proceedings of 8th International Symposium on Software Reliability Engineering, pp. 27–35, Albuquerque, NM, IEEE Computer Society.Google Scholar
  20. Khoshgoftaar, T.M., Nguyen, L., Gao, K. and Rajeevalochanam, J. 2003. Application of an attribute selection method to cbr-based software quality classification. In Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence, pp. 47–52, Sacramento, CA.Google Scholar
  21. Khoshgoftaar, T.M., Pandya, A.S. and Lanning, D.L. 1995. Application of neural networks for predicting faults. Annals of Software Engineering, 1: 141–154.CrossRefGoogle Scholar
  22. Khoshgoftaar, T.M. and Seliya, N. 2002 Tree-based software quality models for fault prediction. In Proceedings: 8th International Software Metrics Symposium, pp. 203–214, Ottawa, Ontario, Canada IEEE Computer Society.Google Scholar
  23. Khoshgoftaar, T.M. and Seliya, N. 2003. Analogy-based practical classification rules for software quality estimation. Empirical Software Engineering Journal, 8(4): 325–350.CrossRefGoogle Scholar
  24. Kolodner, J. 1993. Case-Based Reasoning. Morgan Kaufmann Publishers, Inc., San Mateo, California USA.Google Scholar
  25. Korel, B. 1996. Automated test data generation for programs with procedures. Proceedings of the International Symposium on Software Testing and Analysis, 21(3): 209–215.Google Scholar
  26. Kriegsman, M. and Barletta, R. 1993. Building a case-based help desk application. IEEE Expert, 8(6): 18–24.CrossRefGoogle Scholar
  27. Leake, D.B. 1996. Editor. Case-Based Reasoning: Experience, Lessons and Future Directions. MIT Press, Cambridge, MA USA.Google Scholar
  28. Perry, W.E. 2000. Effective Methods for Software Testing. John Wiley & Sons, New York, NY, 2nd edition.Google Scholar
  29. Porter, A.A., Siy, H.P., Toman, C.A. and Votta, L.G. 1997. An experiment to assess the cost-benefits of code-inspection in large scale software development. IEEE Transactions on Software Engineering, 23(6): 329–346.CrossRefGoogle Scholar
  30. Ramamoorthy, C.V., Chandra, C., Ishihara, S. and Ng, Y. 1993. Knowledge-based tools for risk assessment in software development and reuse. In Proceedings: 5th International Conference on Tools with Artificial Intelligence, pp. 364–371, Boston, MA, USA IEEE Computer Society.Google Scholar
  31. Schneidewind, N.F. 2002. Body of knowledge for software quality measurement. IEEE Computer, 35(2): 77–83Google Scholar
  32. Shepperd, M. and Kadoda, G. 2001. Comparing software prediction techniques using simulation. IEEE Transactions on Software Engineering, 27(11): 1014–1022.CrossRefGoogle Scholar
  33. Shepperd, M. and Schofield, C. 1997. Estimating software project effort using analogies. IEEE Transactions on Software Engineering, 23(12): 736–743.CrossRefGoogle Scholar
  34. Smith, N.T. and Ganesan, K. 1995. Software design using case-based reasoning. In Proceedings: Fourth Software Engineering Research Forum, pp. 193–200, Boca Raton, FLGoogle Scholar
  35. Sundaresh, N. 2001. An empirical study of analogy based software fault prediction. Master’s thesis, Florida Atlantic University, Boca Raton, FL. Advised by Taghi M. Khoshgoftaar.Google Scholar
  36. Troster, J. and Tian, J. 1995. Measurement and defect modeling for a legacy software system. Annals of Software Engineering, 1: 95–118CrossRefGoogle Scholar
  37. Votta, L.G. and Porter, A.A. 1995. Experimental software engineering: a report on the state of the art. In Proceedings of the 17th. International Conference on Software Engineering, pp. 277–279, Seattle, WA USA. IEEE Computer Society.Google Scholar
  38. Whitten, I.H. and Frank, E. 2000. Data Mining: Practical Machine Learning Tools and Techniques with JAVA Implementations. Morgan Kaufmann, San Francisco, CA.Google Scholar
  39. Wohlin, C., Runeson, P., Host, M., Ohlsson, M.C., Regnell, B. and Wesslen, A. 2000. Experimentation in Software Engineering: An Introduction. Kluwer International Series in Software Engineering. Kluwer Academic Publishers, Boston, MA.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2006

Authors and Affiliations

  • Taghi M. Khoshgoftaar
    • 1
  • Naeem Seliya
    • 1
  • Nandini Sundaresh
    • 1
  1. 1.Empirical Software Engineering Laboratory, Department of Computer Science and EngineeringFlorida Atlantic UniversityBoca RatonUSA

Personalised recommendations