Software Quality Journal

, Volume 15, Issue 3, pp 327–344 | Cite as

Software quality estimation with limited fault data: a semi-supervised learning perspective

  • Naeem SeliyaEmail author
  • Taghi M. Khoshgoftaar


We addresses the important problem of software quality analysis when there is limited software fault or fault-proneness data. A software quality model is typically trained using software measurement and fault data obtained from a previous release or similar project. Such an approach assumes that fault data is available for all the training modules. Various issues in software development may limit the availability of fault-proneness data for all the training modules. Consequently, the available labeled training dataset is such that the trained software quality model may not provide predictions. More specifically, the small set of modules with known fault-proneness labels is not sufficient for capturing the software quality trends of the project. We investigate semi-supervised learning with the Expectation Maximization (EM) algorithm for software quality estimation with limited fault-proneness data. The hypothesis is that knowledge stored in software attributes of the unlabeled program modules will aid in improving software quality estimation. Software data collected from a large NASA software project is used during the semi-supervised learning process. The software quality model is evaluated with multiple test datasets collected from other NASA software projects. Compared to software quality models trained only with the available set of labeled program modules, the EM-based semi-supervised learning scheme improves generalization performance of the software quality models.


Semi-supervised learning Software quality estimation Unlabeled data Software metrics Expectation maximization 



We thank the anonymous reviewers for their comments and suggestions, which went toward improving this paper. We are grateful to Liam Mayron, Lili Zhao and Renee Zuleta for their assistance with editorial reviews. We thank the staff of the NASA Metrics Data Program for making the software measurement data available.


  1. Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In P. Bartlett & Y. Mansour (Eds), Proceedings of 11th annual ACM conference on computational learning theory, Madison, WI, July 1998, pp. 92–100, ACM Press.Google Scholar
  2. Brodley, C. E., & Friedl, M. A. (1999). Identifying mislabeled training data. Journal of Artificial Intelligence Research, 11, 131–167.CrossRefGoogle Scholar
  3. Demirez, A., & Bennett, K. (2000). Optimization approaches to semisupervised learning. In M. Ferris, O. Mangasarian, & J. Pang (Eds), Applications and algorithms of complementarity. Boston, MA: Kluwer Academic Publishers.Google Scholar
  4. Fenton, N. E., & Pfleeger, S. L. (1997). Software metrics: A rigorous and practical approach (2nd ed.). ITP, Boston, MA: PWS Publishing Company.Google Scholar
  5. Fung, G., & Mangasarian, O. (2001). Semi-supervised support vector machines for unlabeled data classification. Optimization Methods and Software, 15, 29–44.CrossRefGoogle Scholar
  6. Ghahramani, Z., & Jordan, M. I. (1994). Supervised learning from incomplete data via an EM approach. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems (Vol. 6, pp. 120–127). Morgan Kaufmann: San Francisco, CA.Google Scholar
  7. Gokhale, S. S., & Lyu, M. R. (1997). Regression tree modeling for the prediction of software quality. In H. Pham (Ed.), Proceedings of 3rd international conference on reliability and quality in design, Anaheim, CA, March 1997, pp. 31–36, International Society of Science and Applied Technologies.Google Scholar
  8. Goldman, S., & Zhou, Y. (2000). Enhancing supervised learning with unlabeled data. In Proceedings of 17th international conference on machine learning, Stanford University, CA, June–July 2000, pp. 327–334, Morgan Kaufmann.Google Scholar
  9. Gray, A. R., & MacDonell, S. G. (1999). Software metrics data analysis: Exploring the relative performance of some commonly used modeling techniques. Empirical Software Engineering Journal, 4, 297–316.CrossRefGoogle Scholar
  10. Guo, L., Cukic, B., & Singh, H. (2003). Predicting fault prone modules by the dempster-shafer belief networks. In Proceedings of the 18th international conference on automated software engineering, Montreal, Quebec, Canada, October 2003, pp. 249–252, IEEE Computer Society.Google Scholar
  11. Imam, K. E., Benlarbi, S., Goel, N., & Rai, S. N. (2001). Comparing case-based reasoning classifiers for predicting high-risk software componenets. Journal of Systems and Software, 55(3), 301–320.CrossRefGoogle Scholar
  12. Khoshgoftaar, T. M., & Joshi, V. (2004). Noise elimination with ensemble-classifier filtering: A case-study in software quality engineering. In Proceedings of the 16th international conference on software engineering and knowledge engineering, Banff, Canada, June 2004, pp. 226–231.Google Scholar
  13. Khoshgoftaar, T. M., Liu, Y., & Seliya, N. (2003). Genetic programming-based decision trees for software quality classification. In Proceedings of 15th international conference on tools with artificial intelligence, Sacramento, CA, USA, November 2003, pp. 374–383, IEEE Computer Society.Google Scholar
  14. Khoshgoftaar, T. M., & Seliya, N. (2002). Tree-based software quality models for fault prediction. In Proceedings of 8th international software metrics symposium, Ottawa, Ontario, Canada, June 2002, pp. 203–214, IEEE Computer Society.Google Scholar
  15. Khoshgoftaar, T. M., & Seliya, N. (2003). Analogy-based practical classification rules for software quality estimation. Empirical Software Engineering Journal, 8(4), 325–350.CrossRefGoogle Scholar
  16. Khoshgoftaar, T. M., Yuan, X., & Allen, E. B. (2000). Balancing misclassification rates in classification tree models of software quality. Empirical Software Engineering Journal, 5, 313–330, Kluwer Academic Publishers.CrossRefGoogle Scholar
  17. Khoshgoftaar, T. M., Zhong, S., & Joshi, V. (2005). Noise elimination with ensemble-classifier filtering for software quality estimation. Intelligent Data Analysis: An International Journal, 9(1), 3–27.CrossRefGoogle Scholar
  18. Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Hoboken, NJ: John Wiley and Sons.CrossRefGoogle Scholar
  19. Lyu, M. (1996). Handbook of software reliability engineering. New York, NY: IEEE Computer Press, McGraw Hill.Google Scholar
  20. McCallum, A. K., & Nigam K. (1998). Employing EM and pool-based active learning for text classification. In Proceedings of the 15th international conference on machine learning, Madison, WI, July 1998, pp. 350–358, Morgan Kaufmann.Google Scholar
  21. Mitchell, T. (1999). The role of unlabeled data in supervised learning. In Proceedings of the 6th international colloquium on cognitive science, Donostia, San Sebastian, Spain, May 1999, Institute for Logic, Cognition, Language and Information.Google Scholar
  22. Nigam K., & Ghani R. (2000). Analyzing the effectiveness and applicability of co-training. In Proceedings of 9th international conference on information and knowledge management, McLean, VA, November 2000, pp. 86–93, ACM Press.Google Scholar
  23. Nigam, K., McCallum, A. K., Thrun, S., & Mitchell, T. (1998). Learning to classify text from labeled and unlabeled documents. In Proceedings of 15th conference of the American association for artificial intelligence, Madison, WI, July 1998, pp. 792–799, AAAI Press.Google Scholar
  24. Ohlsson, M. C., & Runeson, P. (2002). Experience from replicating empirical studies on prediction models. In Proceedings of 8th international software metrics symposium, Ottawa, Ontario, Canada, June 2002, pp. 217–226, IEEE Computer Society.Google Scholar
  25. Pizzi, N. J., Summers, R., & Pedrycz ,W. (2002). Software quality prediction using median-adjusted class labels. In Proceedings of international joint conference on neural networks, Honolulu, HI, May 2002, Vol. 3, pp. 2405–2409, IEEE Computer Society.Google Scholar
  26. Schneidewind, N. F. (2001). Investigation of logistic regression as a discriminant of software quality. In Proceedings of 7th international software metrics symposium, London, UK, April 2001, pp. 328–337, IEEE Computer Society.Google Scholar
  27. Schneidewind, N. F. (2002). Body of knowledge for software quality measurement. IEEE Computer, 35(2), 77–83.CrossRefGoogle Scholar
  28. Seeger, M. (2001). Learning with labeled and unlabeled data. Technical report, Institute for Adaptive and Neural Computation, University of Edinburgh, Scotland, UK, February 2001.Google Scholar
  29. Suarez, A., & Lutsko, J. F. (1999). Globally optimal fuzzy decision trees for classification and regression. Pattern Analysis and Machine Intelligence, 21(12), 1297–1311.CrossRefGoogle Scholar
  30. Whitten, I. H., & Frank, E. (2000). Data mining: Practical machine learning tools and techniques with JAVA implementations. San Francisco, CA: Morgan Kaufmann.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Computer and Information ScienceUniversity of Michigan – DearbornDearbornUSA
  2. 2.Computer Science and EngineeringFlorida Atlantic UniversityBoca RatonUSA

Personalised recommendations