A Combined-Learning Based Framework for Improved Software Fault Prediction

Yohannese, Chubato Wondaferaw; Li, Tianrui

doi:10.2991/ijcis.2017.10.1.43

A Combined-Learning Based Framework for Improved Software Fault Prediction

Research Article
Open access
Published: 25 January 2017

Volume 10, pages 647–662, (2017)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Computational Intelligence Systems Aims and scope Submit manuscript

A Combined-Learning Based Framework for Improved Software Fault Prediction

Download PDF

Chubato Wondaferaw Yohannese¹ &
Tianrui Li¹

17 Accesses
Explore all metrics

Abstract

Software Fault Prediction (SFP) is found to be vital to predict the fault-proneness of software modules, which allows software engineers to focus development activities on fault-prone modules, thereby prioritize and optimize tests, improve software quality and make better use of resources. In this regard, machine learning has been successfully applied to solve classification problems for SFP. Nevertheless, the presence of different software metrics, the redundant and irrelevant features and the imbalanced nature of software datasets have created more and more challenges for the classification problems. Therefore, the objective of this study is to independently examine software metrics with multiple Feature Selection (FS) combined with Data Balancing (DB) using Synthetic Minority Oversampling Techniques for improving classification performance. Accordingly, a new framework that efficiently handles those challenges in a combined form on both Object Oriented Metrics (OOM) and Static Code Metrics (SCM) datasets is proposed. The experimental results confirm that the prediction performance could be compromised without suitable Feature Selection Techniques (FST). To mitigate that, data must be balanced. Thus our combined technique assures the robust performance. Furthermore, a combination of Random Forts (RF) with Information Gain (IG) FS yields the highest Receiver Operating Characteristic (ROC) curve (0.993) value, which is found to be the best combination when SCM are used, whereas the combination of RF with Correlation-based Feature Selection (CFS) guarantees the highest ROC (0.909) value, which is found to be the best choice when OOM are used. Therefore, as shown in this study, software metrics used to predict the fault proneness of the software modules must be carefully examined and suitable FST for software metrics must be cautiously selected. Moreover, DB must be applied in order to obtain robust performance. In addition to that, dealing with the challenges mentioned above, the proposed framework ensures the remarkable classification performance and lays the pathway to quality assurance of software.

Article PDF

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

S. Shivaji, E. J. W., Jr., R. Akella, and S. Kim, “Reducing features to improve code change-based bug prediction,” IEEE Transactions on Software Engineering, 39: 552–569, 2013.
I. Gondra, “Applying machine learning to software fault-proneness prediction,” Journal of Systems and Software, 81: 186–195, 2008.
Google Scholar
K.O. Elish, M.O. Elish, “Predicting defect-prone software modules using support vector machines,” Journal of Systems and Software, 81: 649–660, 2008.
Google Scholar
E. Arisholm, L.C. Briand, and E.B. Johannessen, “A systematic and comprehensive investigation of methods to build and evaluate fault prediction models,” Journal of Systems and Software, 83: 2–17, 2010.
Google Scholar
S. Kim, J.E.J. Whitehead, Y. Zhang, “Classifying software changes: clean or buggy?” IEEE Transactions on Software Engineering, 34: 181–196, 2008.
Google Scholar
S. Kanmani, V. Uthariaraj, V. Sankaranarayanan, and P. Thambidurai, “Object-Oriented Software Fault Prediction Using Neural Networks,” Information and Software Technology, 49: 483–492, 2007.
Google Scholar
T. Gyimothy, R. Ferenc, and I. Siket, “Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction,” IEEE Transactions on Software Engineering, 31: 897–910, 2005.
Google Scholar
H. Wang, T. M. Khoshgoftaar, and A. Napolitano, “A comparative study of ensemble feature selection techniques for software defect prediction,” International Conference on Machine Learning and Applications, 9: 135–140, 2010
Google Scholar
H. Liu, H. Motoda, and L. Yu, “A selective sampling approach to active feature selection,” Artificial Intelligence, 159: 49–74, 2004
Google Scholar
T. M. Khoshgoftaar, C. Seiffert, J. V. Hulse, A. Napolitano, and A. Folleco, “Learning with limited minority class data,” International Conference on Machine Learning and Applications, 6: 348–353, 2007.
Google Scholar
D. Radjenovic, M. Hericko, R. Torkar, and A. Zivkovic, “Software fault prediction metrics: A systematic literature review,” Journal of Information and Software Technology, 55: 1397–1418, 2013.
Google Scholar
T.J. McCabe, “A complexity measure,” IEEE Transactions on Software Engineering, SE-2: 308–320, 1976.
T. Menzies, J. Greenwald, and A. Frank, “Data mining static code attributes to learn defect predictors,” IEEE Transactions on Software Engineering 33: 2–13, 2007.
Google Scholar
S.R. Chidamber, C.F. Kemerer, “A metrics suite for object-oriented design,” IEEE Transactions on Software Engineering, 20: 476–493, 1994.
Google Scholar
M. A. Hall, “Correlation-based Feature Subset Selection for Machine Learning,” PhD. dissertation, Hamilton, New Zealand, 1999.
A. Malhi, R. Gao, “PCA-Based feature selection scheme for machine defect classification,” IEEE Transactions on Instrumentation and Measurement, 53: 1517–1525, 2004.
Google Scholar
I. H. Witten, E. Frank and M. A. Hall, “Data Mining: Practical Machine Learning Tools and Techniques,” Morgan Kaufmann, 3^rd edition, 2011.
V. C. Nitesh, W. B. Kevin, O. H. Lawrence, and W. P. Kegelmeyer, “Synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, 16: 321–357, 2002.
Google Scholar
R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” International Joint Conference on Artificial Intelligence (IJCAI), 14: 1137–1143, 1995.
Google Scholar
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA data mining software: an update; SIGKDD Explorations,” Retrieved 01 Sep 2015.
T. Menzies, R. Krishna, and D. Pryor, “The Promise Repository of Empirical Software Engineering Data,” http://openscience.us/repo. North Carolina State University, Department of Computer Science bibtex. Retrieved 11 Jan 2015.
T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, “A systematic literature review on fault prediction performance in software engineering,” IEEE Transactions on Software Engineering 38: 1276–1304, 2012.
Google Scholar
H. Laradji, M. Alshayeb, and L. Ghouti, “Software defect prediction using ensemble learning on selected features,” Information and Software Technology, 58: 388–402, 2015.
Google Scholar
W. Li, Z. Huang, and Q. Li, “Three-way decisions based software defect prediction,” Knowledge-Based Systems, 91: 263–274, 2016.
Google Scholar
S. Liu, X. Chen, W. Liu, J Chen, Q. Gu, and D. Chen, “FECAR: A feature selection framework for software defect prediction,” Annual International Computers, Software and Applications Conference, 38: 426–435, 2014.
Google Scholar
C. Catal, “Software fault prediction: A literature review and current trends,” Expert Systems with Applications, 38: 4626–4636, 2011.
Google Scholar
G. Abaei and A. Selamat, “A survey on software fault detection based on different prediction approaches,” Vietnam Journal of Computer Science, 1: 79–95, 2014.
Google Scholar
K. Dejaeger, T. Verbraken, and B. Baesens, “Toward Comprehensible Software Fault Prediction Models Using Bayesian Network Classifiers,” IEEE Transactions on Software Engineering, 39: 237–257, 2013.
Google Scholar
H. B. Yadav,and D. K. Yadav, “A fuzzy logic based approach for phase-wise software defects prediction using software metrics,” Information and Software Technology, 63: 44–57, 2015.
Google Scholar
P. He, B. Li, X. Liu, J. Chen, and Y. Ma, “An empirical study on software defect prediction with a simplified metric set,” Information and Software Technology, 59: 170–190, 2015.
Google Scholar
F. Song, Z. Guo, and D. Mei, “Feature selection using principal component analysis,” International Conference on System Science, Engineering Design and Manufacturing Informatization, 1: 27–30, 2010.
Google Scholar
H.A. Al-Jamimi, and M. Ahmed, “Machine Learning-based Software Quality Prediction Models: State of the Art,” International Conference on Information Science and Applications, 1–4, 2013.
A.K. Tripathi, and K. Sharma, “Optimizing Testing Efforts Based on Change Proneness Through Machine Learning Techniques,” Power India International Conference, 1–4, 2014.
R. Malhotra, “A systematic review of machine learning techniques for software fault prediction,” Applied Soft Computing, 27: 504–518, 2015.
Google Scholar
V. García, J.S. Sánchez, and R.A. Mollineda, “On the effectiveness of preprocessing methods when dealing with different levels of class imbalance,” Knowledge-Based Systems, 25: 13–21, 2012.
Google Scholar
A.A. El-Sayed, M.A.M. Mahmood, N.A. Meguid, and H.A. Hefny, “Handling Autism Imbalanced Data using Synthetic Minority Over-Sampling Technique (SMOTE),” Third World Conference on Complex Systems, 1–5, 2015.
P. Sarakit, T. Theeramunkong, and C. Haruechaiyasak, “Improving emotion classification in imbalanced YouTube dataset using SMOTE algorithm,” International Conference on Advanced Informatics: Concepts, Theory and Applications, 1–5, 2015.
J. Li, H. Li, and J. Yu, “Application of Random-SMOTE on Imbalanced Data Mining,” International Conference on Business Intelligence and Financial Engineering, 130–133, 2011.
J. Nam and S. Kim, “Heterogeneous defect prediction,” The 2015 10th Joint Meeting on Foundations of Software Engineering-ESEC/FSE 2015, 508–519, ACM Press, 2015
R. Krishna, T. Menzies, and W. Fu, “Too Much Automation? The Bellwether Effect and Its Implications for Transfer Learning,” The 31st IEEE/ACM International Conference on Automated Software Engineering, 122–131, 2016.
M. D’Ambros, M. Lanza, and R. Robbes, “Evaluating defect prediction approaches: a benchmark and an extensive comparison,” Empirical Software Engineering, 17: 531–577, 2012.
Google Scholar
F. Qin, Z. Zheng, C. Bai, Y. Qiao, Z. Zhang, and C. Chen, “Cross-Project Aging Related Bug Prediction,” The 2015 IEEE International Conference on Software Quality, Reliability and Security, 43–48, 2015.

Download references

Author information

Authors and Affiliations

School of Information Science and Technology, Southwest Jiaotong University, 611756, Chengdu, China
Chubato Wondaferaw Yohannese & Tianrui Li

Authors

Chubato Wondaferaw Yohannese
View author publications
You can also search for this author in PubMed Google Scholar
Tianrui Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chubato Wondaferaw Yohannese.

Rights and permissions

This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Reprints and permissions

About this article

Cite this article

Yohannese, C.W., Li, T. A Combined-Learning Based Framework for Improved Software Fault Prediction. Int J Comput Intell Syst 10, 647–662 (2017). https://doi.org/10.2991/ijcis.2017.10.1.43

Download citation

Received: 21 August 2016
Accepted: 10 January 2017
Published: 25 January 2017
Issue Date: January 2017
DOI: https://doi.org/10.2991/ijcis.2017.10.1.43

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Combined-Learning Based Framework for Improved Software Fault Prediction

Abstract

Article PDF

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation