Improving Software Quality Prediction by Noise Filtering Techniques

Khoshgoftaar, Taghi M.; Rebours, Pierre

doi:10.1007/s11390-007-9054-2

Improving Software Quality Prediction by Noise Filtering Techniques

Regular Paper
Published: 30 May 2007

Volume 22, pages 387–396, (2007)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Taghi M. Khoshgoftaar¹ &
Pierre Rebours¹

413 Accesses
98 Citations
Explore all metrics

Abstract

Accuracy of machine learners is affected by quality of the data the learners are induced on. In this paper, quality of the training dataset is improved by removing instances detected as noisy by the Partitioning Filter. The fit dataset is first split into subsets, and different base learners are induced on each of these splits. The predictions are combined in such a way that an instance is identified as noisy if it is misclassified by a certain number of base learners. Two versions of the Partitioning Filter are used: Multiple-Partitioning Filter and Iterative-Partitioning Filter. The number of instances removed by the filters is tuned by the voting scheme of the filter and the number of iterations. The primary aim of this study is to compare the predictive performances of the final models built on the filtered and the un-filtered training datasets. A case study of software measurement data of a high assurance software project is performed. It is shown that predictive performances of models built on the filtered fit datasets and evaluated on a noisy test dataset are generally better than those built on the noisy (un-filtered) fit dataset. However, predictive performance based on certain aggressive filters is affected by presence of noise in the evaluation dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Taghi M Khoshgoftaar, Shi Zhong, Vedang Joshi. Noise elimination with ensemble-classifier filtering for software quality estimation. Intelligent Data Analysis, 2005, 9(1): 3–27.
Google Scholar
Witten I H, Frank E. Data Mining, Practical Machine Learning Tools and Techniques. 2nd Edition, Morgan Kaufmann, 2005.
Khoshgoftaar T M, Seliya N. Analogy-based practical classification rules for software quality estimation. Empirical Software Engineering Journal, December 2003, 8(4): 325–350.
Article Google Scholar
Khoshgoftaar T M, Allen E B. Logistic regression modeling of software quality. International Journal of Reliability, Quality, and Safety Engineering, 1999, 6(4): 303–317.
Article Google Scholar
Zhu X, Wu X, Chen Q. Eliminating class noise in large datasets. In Proc. the 20th Int. Conf. Machine Learning, Washington DC, August 2003, pp.920–927.
Owen D B. Data Quality Control: Theory and Pragmatics. New York: Marcel Dekker, NY, 1990.
Google Scholar
Wang R Y, Storey V C, Firth C P. A framework for analysis of data quality research. IEEE Trans. Knowledge and Data Engineering, August 1995, 7(4): 623–639.
Article Google Scholar
Teng C M. A comparison of noise handling techniques. In Proc. the Int. Florida Artificial Intelligence Research Symposium, 2001, pp.269–273.
Gamberger D, Lavrač N, Džeroski S. Noise elimination in inductive concept learning: A case study in medical diagnosis. In Algorithmic Learning Theory: Proc. the 7th Int. Workshop, Sydney, Australia, LNCS 1160, Springer-Verlag, October, 1996, pp.199–212.
Teng C M. Evaluating noise correction. In Lecture Notes in Artificial Intelligence: Proc. the 6th Pacific Rim Int. Conf. Artificial Intelligence, Melbourne, Australia, Springer-Verlag, 2000, pp.188–198.
Google Scholar
Brodley C E, Friedl M A. Identifying mislabeled training data. Journal of Artificial Intelligence Research, 1999, 11: 131–167.
MATH Google Scholar
Rebours P. Partitioning filter approach to noise elimination: An empirical study in software quality classification [Thesis]. Florida Atlantic University, Boca Raton, FL, April 2004, Advised by Khoshgoftaar T M.
Khoshgoftarr T M, Allen E B. A practical classification rule for software quality models. IEEE Trans. Reliability, June 2000, 49(2): 209–216.
Article Google Scholar
Jain R. The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. John Wiley & Sons, 1991.
Berenson M L, Levine D M, Goldstein M. Intermediate Statistical Methods and Applications: A Computer Package Approach. Englewood Cliffs: Prentice Hall, NJ, 1983.
Google Scholar
Christensen R. Analysis of Variance, Design and Regression. Applied Statistical Methods. 1st Edition, Chapman & Hall, 1996.
Fenton N E, Pfleeger S L. Software Metrics: A Rigorous and Practical Approach. 2nd Edition, Boston: PWS Publishing, MA, 1997.
Google Scholar
Quinlan J R. C4.5: Programs for Machine Learning. San Mateo: Morgan Kaufmann, CA, 1993.
Google Scholar
Holte R C. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 1993, 11: 63–91.
Article MATH Google Scholar
Atkeson C G, Moore A W, Schaal S. Locally weighted learning. Artificial Intelligence Review, 1997, 11(1/5): 11–73.
Article Google Scholar
Cohen W W. Fast effective rule induction. In Proc. the 12th Int. Conf. Machine Learning, Priedities A, Russell S (eds.), Tahoe City: Morgan Kaufmann, CA, July 1995, pp.115–123.
Google Scholar
Kolodner J. Case-Based Reasoning. San Mateo, CA: Morgan Kaufmann, 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

Empirical Software Engineering Laboratory, Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL, USA
Taghi M. Khoshgoftaar & Pierre Rebours

Authors

Taghi M. Khoshgoftaar
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Rebours
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Taghi M. Khoshgoftaar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khoshgoftaar, T.M., Rebours, P. Improving Software Quality Prediction by Noise Filtering Techniques. J Comput Sci Technol 22, 387–396 (2007). https://doi.org/10.1007/s11390-007-9054-2

Download citation

Received: 15 March 2006
Revised: 06 March 2007
Published: 30 May 2007
Issue Date: May 2007
DOI: https://doi.org/10.1007/s11390-007-9054-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving Software Quality Prediction by Noise Filtering Techniques

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A survey on ensemble learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving Software Quality Prediction by Noise Filtering Techniques

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A survey on ensemble learning

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation