Abstract
Cross-project defect prediction (CPDP) is using due to the limitation of within project defect prediction (WPDP) in Software Defect Prediction (SDP) research. CPDP aims to train one project data to predict another project using the machine learning technique. The source and target projects are different in the CPDP setting, because of various structured source-target projects, sometimes it may not be a perfect combination. This study represents a categorical data set ensemble technique, where multiple data sets have been aggregated for source data instead of using a single data set. The method has been evaluated on nine data sets, taken from the publicly accessible repository with two performance indicators. The results of this data set ensemble approach show the improvement of the prediction performance over 65% combinations compared with traditional CPDP models. The results also show that same categories (homogeneous) train-test data set pairs give high performance; otherwise, the prediction performances of different category data sets are mostly collapsed. Therefore, the proposed scheme is recommended as an alternative to predict defects that can improve the prediction of most of the cases compared with traditional cross-project SDP models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For this study homogeneous or heterogeneous being called based on the number of non-defective and defective class in a data set.
- 2.
References
Wahono, R.S., Suryana, N.: Combining particle swarm optimization based feature selection and bagging technique for software defect prediction. Int. J. Softw. Eng. Appl. 7(5), 153–166 (2013)
Wahono, R.S.: A systematic literature review of software defect prediction: research trends, data sets, methods and frameworks. J. Softw. Eng. 1(1), 1–16 (2015)
Gayatri, N., Nickolas, S., Reddy, A.V., Reddy, S., Nickolas, A.V.: Feature selection using decision tree induction in class level metrics data set for software defect predictions. In: Proceedings of the World Congress on Engineering and Computer Science, pp. 124–129 (2010)
Ryu, D., Jang, J.-I., Baik, J.: A transfer cost-sensitive boosting approach for cross-project defect prediction. Software Qual. J. 25(1), 235–272 (2015). https://doi.org/10.1007/s11219-015-9287-1
Marjuni, A., Adji, T.B., Ferdiana, R.: Unsupervised software defect prediction using signed Laplacian-based spectral classifier. Soft. Comput. 23(24), 13679–13690 (2019). https://doi.org/10.1007/s00500-019-03907-6
Kamei, Y., Fukushima, T., McIntosh, S., Yamashita, K., Ubayashi, N., Hassan, A.E.: Studying just-in-time defect prediction using cross-project models. Empir. Softw. Eng. 21(5), 2072–2106 (2015). https://doi.org/10.1007/s10664-015-9400-x
He, Z., Shu, F., Yang, Y., Li, M., Wang, Q.: An investigation on the feasibility of cross-project defect prediction. Autom. Softw. Eng. 19(2), 167–199 (2012)
Jing, X., Wu, F., Dong, X., Qi, F., Xu, B.: Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, pp. 496–507 (2015)
Bowes, D., Hall, T., Petrić, J.: Software defect prediction: do different classifiers find the same defects? Software Qual. J. 26(2), 525–552 (2017). https://doi.org/10.1007/s11219-016-9353-3
Menzies, T., Krishna, R., Pryor, D.: The SEACRAFT Repository of Empirical Software Engineering Data (2017). https://zenodo.org/communities/seacraft
Catal, C., Diri, B.: A systematic review of software fault prediction studies. Expert Syst. Appl. 36(4), 7346–7354 (2009)
Porter, A.A., Selby, R.W.: Empirically guided software development using metric-based classification trees. IEEE Softw. 7(2), 46–54 (1990)
Liu, M., Miao, L., Zhang, D.: Two-stage cost-sensitive learning for software defect prediction. IEEE Trans. Reliab. 63(2), 676–686 (2014)
Sohan, M. F., Jabiullah, M. I., Rahman, S. S. M. M., Mahmud, S. H.: Assessing the effect of imbalanced learning on cross-project software defect prediction. In: 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–6. IEEE (2019)
Sohan, M.F., Kabir, M.A., Jabiullah, M.I., Rahman, S.S.M.M.: Revisiting the class imbalance issue in software defect prediction. In: 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), pp. 1–6 (2019)
Wang, S., Yao, X.: Using class imbalance learning for software defect prediction. IEEE Trans. Reliab. 62(2), 434–443 (2013)
Ma, Y., Luo, G., Zeng, X., Chen, A.: Transfer learning for cross-company software defect prediction. Inf. Softw. Technol. 54(3), 248–256 (2012)
Krishna, R., Menzies, T.: Bellwethers: a baseline method for transfer learning. IEEE Trans. Softw. Eng. (2018)
Fukushima, T., Kamei, Y., McIntosh, S., Yamashita, K., Ubayashi, N.: An empirical study of just-in-time defect prediction using cross-project models. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp. 172–181 (2014)
Jureczko, M., Madeyski, L.: Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th International Conference on Predictive Models in Software Engineering, p. 9. ACM, September 2010
Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Software Eng. 33(1), 2–13 (2006)
Chidamber, S.R., Kemerer, C.F.: A metrics suite for object oriented design. IEEE Trans. Software Eng. 20(6), 476–493 (1994)
Son, L.H., Pritam, N., Khari, M., Kumar, R., Phuong, P.T.M., Thong, P.H.: Empirical study of software defect prediction: a systematic mapping. Symmetry 11(2), 212 (2019)
Özakıncı, R., Tarhan, A.: Early software defect prediction: a systematic map and review. J. Syst. Softw. 144, 216–239 (2018)
Manjula, C., Florence, L.: Deep neural network based hybrid approach for software defect prediction using software metrics. Cluster Comput. 22(4), 9847–9863 (2018). https://doi.org/10.1007/s10586-018-1696-z
Xu, Z., et al.: TSTSS: a two-stage training subset selection framework for cross version defect prediction. J. Syst. Softw. 154, 59–78 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Sohan, M.F., Kabir, M.A., Rahman, M., Hasan Mahmud, S.M., Bhuiyan, T. (2020). Training Data Selection Using Ensemble Dataset Approach for Software Defect Prediction. In: Bhuiyan, T., Rahman, M.M., Ali, M.A. (eds) Cyber Security and Computer Science. ICONCS 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 325. Springer, Cham. https://doi.org/10.1007/978-3-030-52856-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-52856-0_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-52855-3
Online ISBN: 978-3-030-52856-0
eBook Packages: Computer ScienceComputer Science (R0)