Empirical Software Engineering

, Volume 10, Issue 2, pp 183–218 | Cite as

Assessment of a New Three-Group Software Quality Classification Technique: An Empirical Case Study

  • Taghi M. KhoshgoftaarEmail author
  • Naeem Seliya
  • Kehan Gao
Original Article


The primary aim of risk-based software quality classification models is to detect, prior to testing or operations, components that are most-likely to be of high-risk. Their practical usage as quality assurance tools is gauged by the prediction-accuracy and cost-effective aspects of the models. Classifying modules into two risk groups is the more commonly practiced trend. Such models assume that all modules predicted as high-risk will be subjected to quality improvements. Due to the always-limited reliability improvement resources and the variability of the quality risk-factor, a more focused classification model may be desired to achieve cost-effective software quality assurance goals. In such cases, calibrating a three-group (high-risk, medium-risk, and low-risk) classification model is more rewarding. We present an innovative method that circumvents the complexities, computational overhead, and difficulties involved in calibrating pure or direct three-group classification models. With the application of the proposed method, practitioners can utilize an existing two-group classification algorithm thrice in order to yield the three risk-based classes. An empirical approach is taken to investigate the effectiveness and validity of the proposed technique. Some commonly used classification techniques are studied to demonstrate the proposed methodology. They include, the C4.5 decision tree algorithm, discriminant analysis, and case-based reasoning. For the first two, we compare the three-group model calibrated using the respective techniques with the one built by applying the proposed method. Any two-group classification technique can be employed by the proposed method, including those that do not provide a direct three-group classification model, e.x., logistic regression and certain binary classification trees, such as CART. Based on a case study of a large-scale industrial software system, it is observed that the proposed method yielded promising results. For a given classification technique, the expected cost of misclassification of the proposed three-group models were significantly better (generally) when compared to the technique’s direct three-group model. In addition, the proposed method is also evaluated against an alternate indirect three-group classification method.


Software quality prediction three-group classification discriminant analysis decision trees case-based reasoning expected cost of misclassification 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Basili, V. R., Briand, L. C., and Melo, W. L. 1996. A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Software Engineering 22(10): 751–761.Google Scholar
  2. Beizer, B. 1990. Software Testing Techniques. 2nd edition. New York, NY, USA: ITP Van Nostrand Rienhold.Google Scholar
  3. Berenson, M. L., Levine, D. M., and Goldstein, M. 1983. Intermediate Statistical Methods and Applications: A Computer Package Approach. Englewood Cliffs, NJ, USA: Prentice Hall.Google Scholar
  4. Bhupathiraju, S. S. 2002. An empirical study of a three-group classification model using case-based reasoning. Master’s thesis, Florida Atlantic University, Boca Raton, FL, USA. Advised by Taghi M. Khoshgoftaar.Google Scholar
  5. Briand, L. C., Basili, V. R., and Hetmanski, C. J. 1993. Developing interpretable models with optimized set reduction for identifying high-risk software components. IEEE Transactions on Software Engineering 19(11): 1028–1044.Google Scholar
  6. Ebert, C. 1996. Classification techniques for metric-based software development. Software Quality Journal 5(4): 255–272.Google Scholar
  7. Fayyad, U. M. 1996. Data mining and knowledge discovery: making sense out of data. IEEE Expert 11(4): 20–25.Google Scholar
  8. Fenton, N. E., and Pfleeger, S. L. 1997. Software Metrics: A Rigorous and Practical Approach. 2nd edition. Boston, MA, USA: PWS Publishing Company: ITP.Google Scholar
  9. Gray, A. R., and MacDonell, S. G. 1999. Software metrics data analysis: exploring the relative performance of some commonly used modeling techniques. Empirical Software Engineering Journal 4: 297–316.Google Scholar
  10. Hochman, R., Khoshgoftaar, T. M., Allen, E. B., and Hudepohl, J. P. 1997. Evolutionary neural networks: a robust approach to software reliability problems. Proceedings: 8th International Symposium on Software Reliability Engineering. Albuquerque, NM, USA, pp. 13–26.Google Scholar
  11. Khoshgoftaar, T. M., and Allen, E. B. 1999. Logistic regression modeling of software quality. International Journal of Reliability, Quality and Safety Engineering 6(4): 303–317.Google Scholar
  12. Khoshgoftaar, T. M., and Allen, E. B. 2000. A practical classification rule for software quality models. IEEE Transactions on Reliability 49(2): 209–216.Google Scholar
  13. Khoshgoftaar, T. M., and Allen, E. B. 2001. Modeling software quality with classification trees. In: H. Pham, (ed), Recent Advances in Reliability and Quality Engineering, Singapore: World Scientific Publishing, pp. 247–270, Chapt. 15.Google Scholar
  14. Khoshgoftaar, T. M., and Lanning, D. L. 1995. A neural network approach for early detection of program modules having high risk in the maintenance phase. Journal of Systems and Software 29(1): 85–91.Google Scholar
  15. Khoshgoftaar, T. M., and Seliya, N. 2002. Improving usefulness of software quality classification models based on boolean discriminant functions. Proceedings: 13th International Symposium on Software Reliability Engineering. Annapolis, MD, USA, pp. 221–230.Google Scholar
  16. Khoshgoftaar, T. M., and Seliya, N. 2003. Analogy-based practical classification rules for software quality estimation. Empirical Software Engineering Journal 8(4): 325–350.Google Scholar
  17. Khoshgoftaar, T. M. , Allen, E. B., and Busboom, J. C. 2000a. Modeling software quality: The software measurement analysis and reliability toolkit. Proceedings: 12th International Conference on Tools with Artificial Intelligence. Vancouver, BC, Canada, pp. 54–61.Google Scholar
  18. Khoshgoftaar, T. M., Allen, E. B., Jones, W. D., and Hudepohl, J. P. 2000b. Accuracy of software quality models over multiple releases. Annals of Software Engineering 9(1–4): 103–116. Kluwer Academic Publishers.Google Scholar
  19. Khoshgoftaar, T. M., Yuan, X., and Allen, E. B. 2000c. Balancing misclassification rates in classification tree models of software quality. Empirical Software Engineering Journal 5: 313–330. Kluwer Academic Publishers.zbMATHGoogle Scholar
  20. Khoshgoftaar, T. M., Allen, E. B., and Deng, J. 2002. Using regression trees to classify fault-prone software modules. IEEE Transactions on Reliability 51(4): 455–462.Google Scholar
  21. Kolodner, J. 1993. Case-Based Reasoning. San Mateo, CA, USA: Morgan Kaufmann Publishers Inc.Google Scholar
  22. Lanning, D. L., and Khoshgoftaar, T. M. 1995. The impact of software enhancement on software reliability. IEEE Transactions on Reliability 44(4): 677–682.Google Scholar
  23. Leake, D. B. 1996. Case-Based Reasoning: Experience, Lessons, and Future Directions. Cambridge, MA, USA: MIT Press.Google Scholar
  24. Michalski, R. S., Bratko, I., and Kubat, M. 1998. Machine Learning and Data Mining: Methods and Applications. New York, NY: John Wiley and Sons.Google Scholar
  25. Ohlsson, M. C., and Runeson, P. 2002. Experience from replicating empirical studies on prediction models. Proceedings: 8th International Software Metrics Symposium. Ottawa, Ontario, Canada, pp. 217–226.Google Scholar
  26. Ohlsson, M. C., and Wohlin, C. 1998. Identification of green, yellow and red legacy components. Proceedings: International Conference on Software Maintenance. Bethesda, Washington D.C., USA, pp. 6–15.Google Scholar
  27. Ohlsson, N., Helander, M., and Wohlin, C. 1996. Quality improvement by identification of fault-prone modules using software design metrics. Proceedings: International Conference on Software Quality. Ottawa, Ontario, Canada, pp. 1–13.Google Scholar
  28. Ohlsson, M. C., Mayrhauser, A. V., McGuire, B., and Wohlin, C. 1999. Code decay analysis of legacy software through successive releases. Proceedings: Aerospace Conference (Volume 5), Vol. 5. Aspen, CO, USA, pp. 69–81.Google Scholar
  29. Ponnuswamy, V. 2001. Classification of software quality with tree modeling using C4.5 algorithm. Master’s thesis, Florida Atlantic University, Boca Raton, FL, USA. Advised by Taghi M. Khoshgoftaar.Google Scholar
  30. Porter, A. A., and Selby, R. W. 1990. Empirically guided software development using metric-based classification trees. IEEE Software 7(2): 46–54.Google Scholar
  31. Quinlan, J. R. 1993. C4.5: Programs for Machine Learning, Machine Learning. San Mateo, CA: Morgan Kaufmann.Google Scholar
  32. Runeson, P., Ohlsson, M. C., Wohlin, C. 2001. A classification scheme for studies on fault-prone components. Lecture Notes in Computer Science 2188: 341–355. Springer Link.CrossRefGoogle Scholar
  33. Schneidewind, N. F. 1997. Software metrics model for integrating quality control and prediction. Proceedings: 8th International Symposium on Software Reliability Engineering. Albuquerque, NM, USA, pp. 402–415.Google Scholar
  34. Schneidewind, N. F. 2001. Investigation of logistic regression as a discriminant of software quality. Proceedings: 7th International Software Metrics Symposium. London, UK, pp. 328–337.Google Scholar
  35. Seber, G. A. F. 1984. Multivariate Observations. New York, NY, USA: John Wiley and Sons.zbMATHGoogle Scholar
  36. Shepperd, M., and Kadoda, G. 2001. Comparing software prediction techniques using simulation. IEEE Transactions on Software Engineering 27(11): 1014–1022.Google Scholar
  37. Song, H. 2001. Implementation of a three-group classification model using case-based reasoning. Master’s thesis, Florida Atlantic University, Boca Raton, FL, USA. Advised by T. M. Khoshgoftaar.Google Scholar
  38. Szabo, R. M. 1995. Improved models of software quality. Ph.D. thesis. Florida Atlantic University, Boca Raton, FL, USA. Advised by Taghi M. Khoshgoftaar.Google Scholar
  39. Szabo, R. M., and Khoshgoftaar, T. M. 2000. Classifying software modules into three risk groups. In H. Pham and M.-W. Lu, (eds.), Proceedings: 6th International Conference on Reliability and Quality in Design. Orlando, FL, USA, pp. 90–95.Google Scholar
  40. Takahashi, R., Muraoka, Y., and Nakamura, Y. 1997. Building software quality classification trees: Approach, experimentation, evaluation. Proceedings: 8th International Symposium on Software Reliability Engineering. Albuquerque, NM, USA, pp. 222–233.Google Scholar
  41. Votta, L. G., and Porter, A. A. 1995. Experimental software engineering: A report on the state of the art. Proceedings of the 17th. International Conference on Software Engineering. Seattle, WA, USA, pp. 277–279.Google Scholar
  42. Wohlin, C., Runeson, P., Host, M., Ohlsson, M. C., Regnell, B., and Wesslen, A. 2000. Experimentation in Software Engineering: An Introduction, Kluwer International Series in Software Engineering. Massachuesetts, USA: Kluwer Academic Publishers.Google Scholar
  43. Xu, Z., and Khoshgoftaar, T. M. 2001. Software quality prediction for high assurance network telecommunications systems. The Computer Journal 44(6): 557–568. British Computer Society.zbMATHGoogle Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  • Taghi M. Khoshgoftaar
    • 1
    Email author
  • Naeem Seliya
    • 1
  • Kehan Gao
    • 1
  1. 1.Empirical Software Engineering Laboratory, Department of Computer Science and EngineeringFlorida Atlantic UniversityBoca RatonUSA

Personalised recommendations