An effective fault prediction model developed using an extreme learning machine with various kernel methods

  • Lov Kumar
  • Anand Tirkey
  • Santanu-Ku. Rath


System analysts often use software fault prediction models to identify fault-prone modules during the design phase of the software development life cycle. The models help predict faulty modules based on the software metrics that are input to the models. In this study, we consider 20 types of metrics to develop a model using an extreme learning machine associated with various kernel methods. We evaluate the effectiveness of the mode using a proposed framework based on the cost and efficiency in the testing phases. The evaluation process is carried out by considering case studies for 30 object-oriented software systems. Experimental results demonstrate that the application of a fault prediction model is suitable for projects with the percentage of faulty classes below a certain threshold, which depends on the efficiency of fault identification (low: 47.28%; median: 39.24%; high: 25.72%). We consider nine feature selection techniques to remove the irrelevant metrics and to select the best set of source code metrics for fault prediction.

Key words

CK metrics Cost analysis Extreme learning machine Feature selection techniques Object-oriented software 

CLC number



Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



The researchers are grateful to the FIST project, of DST, government of India for sponsoring the work on web engineering and cloud based computing. The researchers are thankful to the Computer Science & Engineering Department, NIT Rourkela, for providing all facilities and guidance.


  1. Abaei G, Selamat A, Fujita H, 2015. An empirical study based on semi–supervised hybrid self–organizing map for software fault prediction. Knowl–Based Syst, 74:28–39. Google Scholar
  2. Aggarwal KK, Singh Y, Kaur A, et al., 2009. Empirical analysis for investigating the effect of object–oriented metrics on fault proneness: a replicated case study. Softw Process Improv Pract, 14(1):39–62. CrossRefGoogle Scholar
  3. Arisholm E, Briand LC, Johannessen EB, 2010. A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Emp Softw Eng, 83(1):2–17. Google Scholar
  4. Briand LC, Wüst J, Daly JW, et al., 2000. Exploring the relationships between design measures and software quality in object–oriented systems. J Syst Softw, 51(3):245–273. CrossRefGoogle Scholar
  5. Camargo Cruz AE, Ochimizu K, 2009. Towards logistic regression models for predicting fault–prone code across software projects. Proc 3rd Int Symp on Empirical Software Engineering and Measurement, p.460–463. Google Scholar
  6. Cartwright M, Shepperd M, 2000. An empirical investigation of an object–oriented software system. IEEE Trans Softw Eng, 26(8):786–796. CrossRefGoogle Scholar
  7. Chidamber SR, Kemerer CF, 1991. Towards a metrics suite for object–oriented design. Proc 6th ACM Conf on Object–Oriented Programming Systems, Languages, and Applications, p.197–211. CrossRefGoogle Scholar
  8. Chidamber SR, Kemerer CF, 1994. A metrics suite for object–oriented design. IEEE Trans Softw Eng, 20(6):476–493. CrossRefGoogle Scholar
  9. Dash M, Liu H, 2003. Consistency–based search in feature selection. Artif Intell, 151(1–2):155–176. MathSciNetCrossRefzbMATHGoogle Scholar
  10. Doraisamy S, Golzari S, Mohd N, et al., 2008. A study on feature selection and classification techniques for automatic genre classification of traditional malay music. ISMIR, p.331–336.Google Scholar
  11. El Emam K, Melo W, Machado JC, 2001. The prediction of faulty classes using object–oriented design metrics. J Syst Softw, 56(1):63–75. CrossRefGoogle Scholar
  12. Erturk E, Sezer EA, 2015. A comparison of some soft computing methods for software fault prediction. Exp Syst Appl, 42(4):1872–1879. CrossRefGoogle Scholar
  13. Fokaefs M, Mikhaiel R, Tsantalis N, et al., 2011. An empirical study on web service evolution. IEEE Int Conf on Web Services, p.49–56. CrossRefGoogle Scholar
  14. Forman G, 2003. An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res, 3(2):1289–1305.MathSciNetzbMATHGoogle Scholar
  15. Furlanello C, Serafini M, Merler S, et al., 2003. Entropybased gene ranking without selection bias for the predictive classification of microarray data. BMC Bioinform, 4(1):54. CrossRefGoogle Scholar
  16. Gao K, Khoshgoftaar TM, Wang H, et al., 2011. Choosing software metrics for defect prediction: an investigation on feature selection techniques. Softw Pract Exp, 41(5):579–606. CrossRefGoogle Scholar
  17. Goyal R, Chandra P, Singh Y, 2014. Suitability of KNN regression in the development of interaction based software fault prediction models. IERI Proc, 6:15–21. CrossRefGoogle Scholar
  18. Gyimothy T, Ferenc R, Siket I, 2005. Empirical validation of object–oriented metrics on open source software for fault prediction. IEEE Trans Softw, 31(10):897–910. CrossRefGoogle Scholar
  19. Halstead MH, 1977. Elements of Software Science (Operating and Programming Systems Series). Elsevier Science Inc., New York, NY, USA.zbMATHGoogle Scholar
  20. Huang GB, Zhu QY, Siew CK, 2006. Extreme learning machine: theory and applications. Neurocomputing, 70(1):489–501. CrossRefGoogle Scholar
  21. Huitt R, Wilde N, 1992. Maintenance support for objectoriented programs. IEEE Trans Softw Eng, 18(12):1038–1044. CrossRefGoogle Scholar
  22. Jiang Y, Cukic B, Ma Y, 2008. Techniques for evaluating fault prediction models. Emp Softw Eng, 13(5):561–595. CrossRefGoogle Scholar
  23. Jing XY, Ying S, Zhang ZW, et al., 2014a. Dictionary learning based software defect prediction. Proc 36th Int Conf on Software Engineering, p.414–423. CrossRefGoogle Scholar
  24. Jing XY, Zhang ZW, Ying S, et al., 2014b. Software defect prediction based on collaborative representation classification. Companion Proc 36th Int Conf on Software Engineering, p.632–633. CrossRefGoogle Scholar
  25. Jing XY, Wu F, Dong XW, et al., 2015. Heterogeneous cross–company defect prediction by unified metric representation and CCA–based transfer learning. Proc 10th Joint Meeting on Foundations of Software Engineering, p.496–507. CrossRefGoogle Scholar
  26. Jing XY, Wu F, Dong XW, et al., 2017. An improved SDA based defect prediction framework for both within–project and cross–project class–imbalance problems. IEEE Trans Softw Eng, 43(4):321–339. CrossRefGoogle Scholar
  27. Kanmani S, Uthariaraj VR, Sankaranarayanan V, et al., 2007. Object–oriented software fault prediction using neural networks. Inform Softw Technol, 49(5):483–492. CrossRefGoogle Scholar
  28. Kapila H, Singh S, 2013. Analysis of CK metrics to predict software fault–proneness using Bayesian inference. Int J Comput Appl, 74(2):1–4. Google Scholar
  29. Kohavi R, 1995. A study of cross–validation and bootstrap for accuracy estimation and model selection. Proc 14th Int Joint Conf on Artificial Intelligence, p.1137–1143.Google Scholar
  30. Kohavi R, John GH, 1997. Wrappers for feature subset selection. Artif Intell, 97(1):273–324. CrossRefzbMATHGoogle Scholar
  31. Li W, Henry S, 1993. Maintenance metrics for the objectoriented paradigm. Proc 1st Int Software Metrics Symp, p.52–60. Google Scholar
  32. Lorenz M, Kidd J, 1994. Object–Oriented Software Metrics. Prentice–Hall, Englewood Google Scholar
  33. Cliffs, NJ. Malhotra R, Jain A, 2012. Fault prediction using statistical and machine learning methods for improving software quality. J Inform Process Syst, 8(2):241–262. CrossRefGoogle Scholar
  34. Malhotra R, Singh Y, 2011. On the applicability of machine learning techniques for object–oriented software fault prediction. Softw Eng Int J, 1(1):24–37.Google Scholar
  35. McCabe TJ, 1976. A complexity measure. IEEE Trans Softw Eng, 2(4):308–320. MathSciNetCrossRefzbMATHGoogle Scholar
  36. Mende T, Koschke R, 2009. Revisiting the evaluation of defect prediction models. Proc 5th Int Conf on Predictor Models in Software Engineering, p.1–10. Google Scholar
  37. Mende T, Koschke R, 2010. Effort–aware defect prediction models. 14th European Conf on Software Maintenance and Reengineering, p.107–116. Google Scholar
  38. Mishra B, Shukla KK, 2012. Defect prediction for object oriented software using support vector based fuzzy classification model. Int J Comput Appl, 60(15):8–16. Google Scholar
  39. Nagappan N, Williams L, Vouk M, et al., 2005. Early estimation of software quality using in–process testing metrics: a controlled case study. ACM SIGSOFT Softw Eng Notes, 30(4):1–7. CrossRefGoogle Scholar
  40. Novakovic J, 2010. The impact of feature selection on the accuracy of Naive Bayes classifier. 18th Telecommunications Forum TELFOR, p.1113–1116.Google Scholar
  41. Olague HM, Etzkorn LH, Gholston S, et al., 2007. Empirical validation of three software metrics suites to predict fault–proneness of object–oriented classes developed using highly iterative or agile software development processes. IEEE Trans Softw Eng, 33(6):402–419. CrossRefGoogle Scholar
  42. Pai GJ, Dugan JB, 2007. Empirical analysis of software fault content and fault proneness using Bayesian methods. IEEE Trans Softw Eng, 33(10):675–686. CrossRefGoogle Scholar
  43. Pawlak Z, 1982. Rough sets. Int J Comput Inform Sci, 11(5):341–356.CrossRefzbMATHGoogle Scholar
  44. Plackett RL, 1983. Karl Pearson and the Chi–squared test. Int Statist Rev, 51(1):59–72. MathSciNetCrossRefzbMATHGoogle Scholar
  45. Shatnawi R, Li W, 2008. The effectiveness of software metrics in identifying error–prone classes in post–release software evolution process. J Syst Softw, 81(11):1868–1882.CrossRefGoogle Scholar
  46. Singh Y, Kaur A, Malhotra R, 2010. Empirical validation of object–oriented metrics for predicting fault proneness models. Softw Qual J, 18(1):3–35. CrossRefGoogle Scholar
  47. Slowinski R, 1992. Intelligent decision support. In: Handbook of Applications and Advances of the Rough Sets Theory. Kluwer Academic Publishers, Dordrecht, p.396. CrossRefzbMATHGoogle Scholar
  48. Tomaszewski P, Håkansson J, Grahn H, et al., 2007. Statistical models vs. expert estimation for fault prediction in modified code—an industrial case study. J Syst Softw, 80(8):1227–1238. CrossRefGoogle Scholar
  49. Wagner S, 2006. A literature survey of the quality economics of defect–detection techniques. Proc ACM/IEEE Int Symp on Empirical Software Engineering, p.194–203. CrossRefGoogle Scholar
  50. Wang D, Romagnoli JA, 2005. Robust multi–scale principal components analysis with applications to process monitoring. J Process Contr, 15(8):869–882. CrossRefGoogle Scholar
  51. Wang T, Zhang Z, Jing X, et al., 2016. Multiple kernel ensemble learning for software defect prediction. Autom Softw Eng, 23(4):569–590. CrossRefGoogle Scholar
  52. Zhou Y, Leung H, 2006. Empirical analysis of objectoriented design metrics for predicting high and low severity faults. IEEE Trans Softw Eng, 32(10):771–789. CrossRefGoogle Scholar
  53. Zhou Y, Xu B, Leung H, 2010. On the ability of complexity metrics to predict fault–prone classes in object–oriented systems. J Syst Softw, 83(4):660–674. CrossRefGoogle Scholar

Copyright information

© Zhejiang University and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringNational Institute of Technology RourkelaRourkelaIndia

Personalised recommendations