Skip to main content
Log in

An empirical study of predicting software faults with case-based reasoning

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

The resources allocated for software quality assurance and improvement have not increased with the ever-increasing need for better software quality. A targeted software quality inspection can detect faulty modules and reduce the number of faults occurring during operations. We present a software fault prediction modeling approach with case-based reasoning (CBR), a part of the computational intelligence field focusing on automated reasoning processes. A CBR system functions as a software fault prediction model by quantifying, for a module under development, the expected number of faults based on similar modules that were previously developed. Such a system is composed of a similarity function, the number of nearest neighbor cases used for fault prediction, and a solution algorithm. The selection of a particular similarity function and solution algorithm may affect the performance accuracy of a CBR-based software fault prediction system. This paper presents an empirical study investigating the effects of using three different similarity functions and two different solution algorithms on the prediction accuracy of our CBR system. The influence of varying the number of nearest neighbor cases on the performance accuracy is also explored. Moreover, the benefits of using metric-selection procedures for our CBR system is also evaluated. Case studies of a large legacy telecommunications system are used for our analysis. It is observed that the CBR system using the Mahalanobis distance similarity function and the inverse distance weighted solution algorithm yielded the best fault prediction. In addition, the CBR models have better performance than models based on multiple linear regression.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aha, D.W. and Bankert, R.L. 1994. Feature selection for case-based classification of cloud types: an empirical comparison. In D.W. Aha, ed., Workshop on Case-Based Reasoning (Technical Report WS-94-01), Menlo Park, California, AAAI Press.

  • Bartsch-Spoerl, B. 1995. Toward the integration of case-based, schema-based, and model-based reasoning for supporting complex design tasks. In Proceedings: First International Conference on Case-Based Reasoning, pp. 145–156. Springer-Verlag.

  • Bell, B., Kedar, S. and Bareiss, R. 1994. Interactive model-driven case adaptation for instructional software design. in Proceedings: 16th Annual Conference of the Cognitive Science Society, pp. 33–38. Lawrence Erlbaum Publishers.

  • Berenson, M.L., Levine, D.M. and Goldstein, M. 1983. Intermediate Statistical Methods and Applications: A Computer Package Approach. Prentice Hall, Englewood Cliffs, NJ, USA.

    Google Scholar 

  • Briand, L.C., Langley, T. and Wieczorek, I. 2000. Areplicated assessment and comparison of common software cost modeling techniques. In Proceedings: International Conference on Software Engineering, pp. 377–386, Limerick, Ireland. Association for Computing Machinery.

  • Dillon, W.R. and Goldstein, M. 1984. Multivariate Analysis: Methods and Applications. John Wiley & Sons, New York.

    Google Scholar 

  • Fayyad, U.M. 1996. Data mining and knowledge discovery: making sense out of data. IEEE Expert, 11(4): 20–25.

    Article  Google Scholar 

  • Fenton, N.E. and Pfleeger, S.L. 1997. Software Metrics: A Rigorous and Practical Approach. PWS Publishing Company: ITP, Boston, MA, 2nd edition.

    Google Scholar 

  • Ganesan, K., Khoshgoftaar, T.M. and Allen, E.B. 2000. Case-based software quality prediction. International Journal of Software Engineering and Knowledge Engineering, 10(2): 139–152. World Scientific Publishing.

    Google Scholar 

  • Gokhale, S.S. and Lyu, M.R. 1997. Regression tree modeling for the prediction of software quality. In Pham, H., ed. Proceedings of 3rd International Conference on Reliability and Quality in Design, pp. 31–36, Anaheim, CA. International Society of Science and Applied Technologies.

  • Gray, A.R. and MacDonell, S.G. 1999. Software metrics data analysis: exploring the relative performance of some commonly used modeling techniques. Empirical Software Engineering Journal, 4: 297–316.

    Article  Google Scholar 

  • Hall, M.A. and Smith, L.A. 1998. Practical feature subset selection. In Proceedings: 21st Australian Computer Science Conference, pp. 181–191. Springer Verlag.

  • Hudepohl, J.P., Aud, S.J., Khoshgoftaar, T.M., Allen, E.B. and Mayrand, J. 1996. Emerald: Software metrics and models on the desktop. IEEE Software, 13(5): 56–60

    Article  Google Scholar 

  • Idri, A., Abran, A. and Khoshgoftaar, T.M. 2002. Estimating software project effort by analogy based on linguistic values. In Proceedings: 8th International Software Metrics Symposium, pp. 21–30, Ottawa, Ontario, Canada, IEEE Computer Society.

  • Imam, K.E., Benlarbi, S., Goel, N. and Rai, S.N. 2001. Comparing case-based reasoning classifiers for predicting high-risk software componenets. Journal of Systems and Software, 55(3): 301–320. Elsevier Science Publishing.

    Google Scholar 

  • Kadoda, G., Cartwright, M., Chen, L. and Shepperd, M. 2000. Experiences using case-based reasoning to predict software project effort. In Proceedings of 4th International Conference on Empirical Assessment in Software Engineering, pp. 23–33, Staffordshire, UK.

  • Khoshgoftaar, T.M., Allen, E.B. and Busboom, J.C. 2000. Modeling software quality: the software measurement analysis and reliability toolkit. In Proceedings of 12th International Conference on Tools with Artificial Intelligence, pp. 54–61, Vancouver, BC, Canada, November. IEEE Computer Society.

  • Khoshgoftaar, T.M., Bullard, L.A. and Gao, K. 2003. Detecting outliers using rule-based modeling for improving cbr-based software quality classification models. In Ashley, K.D. and Bridge, D.G., (Eds.), Proceedings of the 16th International Conference on Case-Based Reasoning, volume 1689, pp. 216–230. Springer-Verlag LNAI.

  • Khoshgoftaar, T.M., Ganesan, K., Allen, E.B., Ross, F.D., Munikoti, R., Goel, N. and Nandi, A. 1997. Predicting fault-prone modules with case-based reasoning. In Proceedings of 8th International Symposium on Software Reliability Engineering, pp. 27–35, Albuquerque, NM, IEEE Computer Society.

  • Khoshgoftaar, T.M., Nguyen, L., Gao, K. and Rajeevalochanam, J. 2003. Application of an attribute selection method to cbr-based software quality classification. In Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence, pp. 47–52, Sacramento, CA.

  • Khoshgoftaar, T.M., Pandya, A.S. and Lanning, D.L. 1995. Application of neural networks for predicting faults. Annals of Software Engineering, 1: 141–154.

    Article  Google Scholar 

  • Khoshgoftaar, T.M. and Seliya, N. 2002 Tree-based software quality models for fault prediction. In Proceedings: 8th International Software Metrics Symposium, pp. 203–214, Ottawa, Ontario, Canada IEEE Computer Society.

  • Khoshgoftaar, T.M. and Seliya, N. 2003. Analogy-based practical classification rules for software quality estimation. Empirical Software Engineering Journal, 8(4): 325–350.

    Article  Google Scholar 

  • Kolodner, J. 1993. Case-Based Reasoning. Morgan Kaufmann Publishers, Inc., San Mateo, California USA.

    Google Scholar 

  • Korel, B. 1996. Automated test data generation for programs with procedures. Proceedings of the International Symposium on Software Testing and Analysis, 21(3): 209–215.

  • Kriegsman, M. and Barletta, R. 1993. Building a case-based help desk application. IEEE Expert, 8(6): 18–24.

    Article  Google Scholar 

  • Leake, D.B. 1996. Editor. Case-Based Reasoning: Experience, Lessons and Future Directions. MIT Press, Cambridge, MA USA.

    Google Scholar 

  • Perry, W.E. 2000. Effective Methods for Software Testing. John Wiley & Sons, New York, NY, 2nd edition.

    Google Scholar 

  • Porter, A.A., Siy, H.P., Toman, C.A. and Votta, L.G. 1997. An experiment to assess the cost-benefits of code-inspection in large scale software development. IEEE Transactions on Software Engineering, 23(6): 329–346.

    Article  Google Scholar 

  • Ramamoorthy, C.V., Chandra, C., Ishihara, S. and Ng, Y. 1993. Knowledge-based tools for risk assessment in software development and reuse. In Proceedings: 5th International Conference on Tools with Artificial Intelligence, pp. 364–371, Boston, MA, USA IEEE Computer Society.

  • Schneidewind, N.F. 2002. Body of knowledge for software quality measurement. IEEE Computer, 35(2): 77–83

    Google Scholar 

  • Shepperd, M. and Kadoda, G. 2001. Comparing software prediction techniques using simulation. IEEE Transactions on Software Engineering, 27(11): 1014–1022.

    Article  Google Scholar 

  • Shepperd, M. and Schofield, C. 1997. Estimating software project effort using analogies. IEEE Transactions on Software Engineering, 23(12): 736–743.

    Article  Google Scholar 

  • Smith, N.T. and Ganesan, K. 1995. Software design using case-based reasoning. In Proceedings: Fourth Software Engineering Research Forum, pp. 193–200, Boca Raton, FL

  • Sundaresh, N. 2001. An empirical study of analogy based software fault prediction. Master’s thesis, Florida Atlantic University, Boca Raton, FL. Advised by Taghi M. Khoshgoftaar.

  • Troster, J. and Tian, J. 1995. Measurement and defect modeling for a legacy software system. Annals of Software Engineering, 1: 95–118

    Article  Google Scholar 

  • Votta, L.G. and Porter, A.A. 1995. Experimental software engineering: a report on the state of the art. In Proceedings of the 17th. International Conference on Software Engineering, pp. 277–279, Seattle, WA USA. IEEE Computer Society.

  • Whitten, I.H. and Frank, E. 2000. Data Mining: Practical Machine Learning Tools and Techniques with JAVA Implementations. Morgan Kaufmann, San Francisco, CA.

    Google Scholar 

  • Wohlin, C., Runeson, P., Host, M., Ohlsson, M.C., Regnell, B. and Wesslen, A. 2000. Experimentation in Software Engineering: An Introduction. Kluwer International Series in Software Engineering. Kluwer Academic Publishers, Boston, MA.

Download references

Author information

Authors and Affiliations

Authors

Additional information

Taghi M. Khoshgoftaar is a professor of the Department of Computer Science and Engineering, Florida Atlantic University and the Director of the Empirical Software Engineering Laboratory. His research interests are in software engineering, software metrics, software reliability and quality engineering, computational intelligence, computer performance evaluation, data mining, and statistical modeling. He has published more than 200 refereed papers in these areas. He has been a principal investigator and project leader in a number of projects with industry, government, and other research-sponsoring agencies. He is a member of the Association for Computing Machinery, the IEEE Computer Society, and IEEE Reliability Society. He served as the general chair of the 1999 International Symposium on Software Reliability Engineering (ISSRE’99), and the general chair of the 2001 International Conference on Engineering of Computer Based Systems. Also, he has served on technical program committees of various international conferences, symposia, and workshops. He has served as North American editor of the Software Quality Journal, and is on the editorial boards of the journals Empirical Software Engineering, Software Quality, and Fuzzy Systems.

Naeem Seliya received the M.S. degree in Computer Science from Florida Atlantic University, Boca Raton, FL, USA, in 2001. He is currently a Ph.D. candidate in the Department of Computer Science and Engineering at Florida Atlantic University. His research interests include software engineering, computational intelligence, data mining, software measurement, software reliability and quality engineering, software architecture, computer data security, and network intrusion detection. He is a student member of the IEEE Computer Society and the Association for Computing Machinery.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khoshgoftaar, T.M., Seliya, N. & Sundaresh, N. An empirical study of predicting software faults with case-based reasoning. Software Qual J 14, 85–111 (2006). https://doi.org/10.1007/s11219-006-7597-z

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-006-7597-z

Keywords

Navigation