Skip to main content
Log in

Improving Software Quality Prediction by Noise Filtering Techniques

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Accuracy of machine learners is affected by quality of the data the learners are induced on. In this paper, quality of the training dataset is improved by removing instances detected as noisy by the Partitioning Filter. The fit dataset is first split into subsets, and different base learners are induced on each of these splits. The predictions are combined in such a way that an instance is identified as noisy if it is misclassified by a certain number of base learners. Two versions of the Partitioning Filter are used: Multiple-Partitioning Filter and Iterative-Partitioning Filter. The number of instances removed by the filters is tuned by the voting scheme of the filter and the number of iterations. The primary aim of this study is to compare the predictive performances of the final models built on the filtered and the un-filtered training datasets. A case study of software measurement data of a high assurance software project is performed. It is shown that predictive performances of models built on the filtered fit datasets and evaluated on a noisy test dataset are generally better than those built on the noisy (un-filtered) fit dataset. However, predictive performance based on certain aggressive filters is affected by presence of noise in the evaluation dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Taghi M Khoshgoftaar, Shi Zhong, Vedang Joshi. Noise elimination with ensemble-classifier filtering for software quality estimation. Intelligent Data Analysis, 2005, 9(1): 3–27.

    Google Scholar 

  2. Witten I H, Frank E. Data Mining, Practical Machine Learning Tools and Techniques. 2nd Edition, Morgan Kaufmann, 2005.

  3. Khoshgoftaar T M, Seliya N. Analogy-based practical classification rules for software quality estimation. Empirical Software Engineering Journal, December 2003, 8(4): 325–350.

    Article  Google Scholar 

  4. Khoshgoftaar T M, Allen E B. Logistic regression modeling of software quality. International Journal of Reliability, Quality, and Safety Engineering, 1999, 6(4): 303–317.

    Article  Google Scholar 

  5. Zhu X, Wu X, Chen Q. Eliminating class noise in large datasets. In Proc. the 20th Int. Conf. Machine Learning, Washington DC, August 2003, pp.920–927.

  6. Owen D B. Data Quality Control: Theory and Pragmatics. New York: Marcel Dekker, NY, 1990.

    Google Scholar 

  7. Wang R Y, Storey V C, Firth C P. A framework for analysis of data quality research. IEEE Trans. Knowledge and Data Engineering, August 1995, 7(4): 623–639.

    Article  Google Scholar 

  8. Teng C M. A comparison of noise handling techniques. In Proc. the Int. Florida Artificial Intelligence Research Symposium, 2001, pp.269–273.

  9. Gamberger D, Lavrač N, Džeroski S. Noise elimination in inductive concept learning: A case study in medical diagnosis. In Algorithmic Learning Theory: Proc. the 7th Int. Workshop, Sydney, Australia, LNCS 1160, Springer-Verlag, October, 1996, pp.199–212.

  10. Teng C M. Evaluating noise correction. In Lecture Notes in Artificial Intelligence: Proc. the 6th Pacific Rim Int. Conf. Artificial Intelligence, Melbourne, Australia, Springer-Verlag, 2000, pp.188–198.

    Google Scholar 

  11. Brodley C E, Friedl M A. Identifying mislabeled training data. Journal of Artificial Intelligence Research, 1999, 11: 131–167.

    MATH  Google Scholar 

  12. Rebours P. Partitioning filter approach to noise elimination: An empirical study in software quality classification [Thesis]. Florida Atlantic University, Boca Raton, FL, April 2004, Advised by Khoshgoftaar T M.

  13. Khoshgoftarr T M, Allen E B. A practical classification rule for software quality models. IEEE Trans. Reliability, June 2000, 49(2): 209–216.

    Article  Google Scholar 

  14. Jain R. The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. John Wiley & Sons, 1991.

  15. Berenson M L, Levine D M, Goldstein M. Intermediate Statistical Methods and Applications: A Computer Package Approach. Englewood Cliffs: Prentice Hall, NJ, 1983.

    Google Scholar 

  16. Christensen R. Analysis of Variance, Design and Regression. Applied Statistical Methods. 1st Edition, Chapman & Hall, 1996.

  17. Fenton N E, Pfleeger S L. Software Metrics: A Rigorous and Practical Approach. 2nd Edition, Boston: PWS Publishing, MA, 1997.

    Google Scholar 

  18. Quinlan J R. C4.5: Programs for Machine Learning. San Mateo: Morgan Kaufmann, CA, 1993.

    Google Scholar 

  19. Holte R C. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 1993, 11: 63–91.

    Article  MATH  Google Scholar 

  20. Atkeson C G, Moore A W, Schaal S. Locally weighted learning. Artificial Intelligence Review, 1997, 11(1/5): 11–73.

    Article  Google Scholar 

  21. Cohen W W. Fast effective rule induction. In Proc. the 12th Int. Conf. Machine Learning, Priedities A, Russell S (eds.), Tahoe City: Morgan Kaufmann, CA, July 1995, pp.115–123.

    Google Scholar 

  22. Kolodner J. Case-Based Reasoning. San Mateo, CA: Morgan Kaufmann, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Taghi M. Khoshgoftaar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khoshgoftaar, T.M., Rebours, P. Improving Software Quality Prediction by Noise Filtering Techniques. J Comput Sci Technol 22, 387–396 (2007). https://doi.org/10.1007/s11390-007-9054-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-007-9054-2

Keywords

Navigation