Abstract
Numerous research studies have claimed that search-based algorithms have the potential to be effectively used in various software engineering domains. An important task in software organizations is to efficiently recognize change prone classes of a software, as it is crucial to plan efficient resource utilization and to take precautionary design measures as early as possible in the software product lifecycle. This assures development of good quality software products at lower costs. The current study attempts to evaluate the capability of search-based algorithms while developing prediction models for identification of the change prone classes in a software. Though previous literature has evaluated the use of statistical category and machine learning category of algorithms in this domain, the suitability of search-based algorithms needs extensive investigation in this area. Furthermore, the study compares the performance of search-based classifiers with statistical and machine learning classifiers, by empirically validating the results on fourteen open source data sets. The results indicate comparable and in some cases even better performance of search based algorithms in comparison to other evaluated categories of algorithms.
Similar content being viewed by others
References
Abdelhalim M.B, Habib SED (2009). Particle swarm optimization for HW/SW partitioning. In: Lazinica A (ed) Particle swarm optimization. In-Tech Publication, pp 49–76
Abdi Y, Parsa S, Seyfari Y (2015) A hybrid one-class rule learning approach based on swarm intelligence for software fault prediction. Innov Syst Softw Eng 11(4):289–301
Aggarwal KK, Singh Y, Kaur A, Malhotra R (2006) Empirical study of object-oriented metrics. J Object Technol 5(8):149–173
Aguilar-Ruiz JS, Riquelme JC, Toro M (2003) Evolutionary learning of hierarchical decision rules. IEEE Trans Syst Man Cybern Part B (Cybern) 33(2):324–331
Ali S, Briand LC, Hemmati H, Panesar-Walawege RK (2010) A systematic review of the application and empirical investigation of search-based test case generation. IEEE Trans Softw Eng 36(6):742–762
Arcuri A, Fraser G (2013) Parameter tuning or default values? an empirical investigation in search-based software engineering. Empir Softw Eng 18(3):594–623
Arisholm E, Briand LC, Foyen A (2004) Dynamic coupling measurement for object-oriented software. IEEE Trans Softw Eng 30(8):491–506
Azar D (2010) A genetic algorithm for improving accuracy of software quality predictive models: a search-based software engineering approach. Int J Comput Intell Appl 9(02):125–136
Azar D, Vybihal J (2011) An ant colony optimization algorithm to improve software quality prediction models: case of class stability. Inf Softw Technol 53(4):388–393
Bacardit J (2004) Pittsburgh genetics-based machine learning in the data mining era: representations, generalization, and run-time. Doctoral dissertation, Ramon Llull University, Barcelona, Catalonia, Spain
Bacardit J, Garrell JM (2003) Evolving multiple discretizations with adaptive intervals for a pittsburgh rule-based learning classifier system. In: Genetic and evolutionary computation conference 2003, pp. 1818–1831. Springer, Berlin
Bacardit J, Krasnogor N (2009) Performance and efficiency of memetic pittsburgh learning classifier systems. Evol Comput 17(3):307–342
Bansal A (2017) Empirical analysis of search based algorithms to identify change prone classes of open source software. Comput Lang Syst Struct 47:211–231
Bardsiri VK, Jawawi DN, Hashim SZ, Khatibi E (2013) A PSO-based model to increase the accuracy of software development effort estimation. Softw Qual J 21(3):501–526
Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761
Bernadó-Mansilla E, Garrell-Guiu JM (2003) Accuracy-based learning classifier systems: models, analysis and applications to classification tasks. Evol Comput 11(3):209–238
Boughorbel S, Jarray F, El-Anbari M (2017) Optimal classifier for imbalanced data using matthews correlation coefficient metric. PloS one 12(6):p.e0177678
Briand LC, Daly JW, Wüst JK (1998) A unified framework for cohesion measurement in object-oriented systems. Empir Softw Eng 3(1):65–117
Briand LC, Daly JW, Wust JK (1999) A unified framework for coupling measurement in object-oriented systems. IEEE Trans Softw Eng 25(1):91–121
Briand LC, Wüst J, Daly JW, Porter DV (2000) Exploring the relationships between design measures and software quality in object-oriented systems. J Syst Softw 51(3):245–273
Briand LC, Wüst J, Lounis H (2001) Replicated case studies for investigating quality factors in object-oriented designs. Empir Softw Eng 6(1):11–58
Burgess CJ, Lefley M (2001) Can genetic programming improve software effort estimation? A comparative evaluation. Inf Softw Technol 43(14):863–873
Butz MV, Kovacs T, Lanzi PL, Wilson SW (2001) How XCS evolves accurate classifiers. In: Pesic B (ed) Proceedings of the 3rd annual conference on genetic and evolutionary computation. morgan kaufmann publishers inc, USA, pp. 927–934
Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21(1):6
Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493
Cortes C, Vapnik V (1995) Support-vector networks. Mach learn 20(3):273–97
De Carvalho AB, Pozo A, Vergilio SR (2010) A symbolic fault-prediction model based on multiobjective particle swarm optimization. J Syst Softw 83(5):868–882
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach learn res 7:1–30
Elish MO, Al-Rahman Al-Khiaty M (2013) A suite of metrics for quantifying historical changes to predict future change-prone classes in object-oriented software. J Softw Evolut Process 25(5):407–437
Eski S, Buzluca F (2011) An empirical study on object-oriented metrics and software evolution in order to reduce testing costs by predicting change-prone classes. In: 2011 IEEE fourth international conference on software testing, verification and validation workshops, pp. 566–571. IEEE.
Ferreira C (2001) Gene expression programming: a new adaptive algorithm for solving problems. Complex Syst 13(2):89–129
Ferrucci F, Salza P, Sarro F (2018) Using hadoop mapreduce for parallel genetic algorithms: a comparison of the global, grid and island models. Evol Comput 26(4):535–567
Fogel DB (1997) The advantages of evolutionary computation. In: Proceedings of biocomputing and emergent computation, pp. 1–11
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Giger E, Pinzger M, Gall HC (2012). Can we predict types of code changes? an empirical analysis. In: 2012 9th IEEE working conference on mining software repositories (MSR), pp. 217–226. IEEE
Harman M (2010a) The relationship between search based software engineering and predictive modeling. In: Proceedings of the 6th international conference on predictive models in software engineering, pp. 1–13. ACM
Harman M (2010b) Why the virtual nature of software makes it ideal for search based optimization. In: International conference on fundamental approaches to software engineering, pp. 1–12. Springer, Berlin
Harman M, Clark J (2004) Metrics are fitness functions too. In: 10th international symposium on software metrics, pp. 58–69. IEEE
Harman M, Jones BF (2001) Search-based software engineering. Inf Softw Technol 43(14):833–839
Harman M, McMinn P, De Souza JT, Yoo S (2012) Search based software engineering: techniques, taxonomy, tutorial. Empirical software engineering and verification. Springer, Berlin, pp 1–59
Harman M, Islam S, Jia Y, Minku LL, Sarro F, Srivisut K (2014) Less is more: temporal fault predictive performance over multiple hadoop releases. international symposium on search based software engineering. Springer, Cham, pp 240–246
Haykin S, Network N (2004) A comprehensive foundation. Neural networks Pearson Education, Delhi
Hosseini S, Turhan B, Mäntylä M (2018) A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction. Inf Softw Technol 95:296–312
Jin C, Jin SW (2015) Prediction approach of software fault-proneness based on hybrid artificial neural network and quantum particle swarm optimization. Appl Soft Comput 35:717–725
Kaur L, Mishra A, (2018). A comparative analysis of evolutionary algorithms for the prediction of software change. In: International conference on innovations in information technology, pp. 187–192. IEEE
Koru AG, Liu H (2007) Identifying and characterizing change-prone classes in two large-scale open-source products. J Syst Softw 80(1):63–73
Koru AG, Tian J (2005) Comparing high-change modules and modules with the highest measurement values in two large-scale open-source products. IEEE Trans Software Eng 31(8):625–642
Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. Int Conf Mach Learn 97:179–186
Kumar S, Pal SK, Singh RP (2016) Intelligent energy conservation: indoor temperature forecasting with extreme learning machine. In: International symposium on intelligent systems technologies and applications, pp. 977–988. Springer, Cham
Kumar S, Kalia A, Sharma A (2017) Predictive analysis of alertness related features for driver drowsiness detection. In: International conference on intelligent systems design and applications , pp. 368–377. Springer, Cham
Kumar L, Behera RK, Rath S, Sureka A (2017) Transfer learning for cross-project change-proneness prediction in object-oriented software systems: a feasibility analysis. ACM SIGSOFT Softw Eng Notes 42(3):1–1
Kumar S, Singh J, Singh O (2020) Ensemble-based extreme learning machine model for occupancy detection with ambient attributes. Int J Syst Assur Eng Manag 11:173–183
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496
Lu H, Zhou Y, Xu B, Leung H, Chen L (2012) The ability of object-oriented metrics to predict change-proneness: a meta-analysis. Empir Softw Eng 17(3):200–242
Malhotra R, Khanna M (2013) Investigation of relationship between object-oriented metrics and change proneness. Int J Mach Learn Cybern 4(4):273–286
Malhotra R, Khanna M (2014) The ability of search-based algorithms to predict change-prone classes. Softw Qual Prof 17(1):17
Malhotra R, Khanna M (2017) An empirical study for software change prediction using imbalanced data. Empir Softw Eng 22(6):2806–2851
Malhotra R, Khanna M (2017) An exploratory study for software change prediction in object-oriented systems using hybridized techniques. Autom Softw Eng 24(3):673–717
Malhotra R, Khanna M (2018) Prediction of change prone classes using evolution-based and object-oriented metrics. J Intell Fuzzy Syst 34(3):1755–1766
Malhotra R, Khanna M, Raje RR (2017) On the application of search-based techniques for software engineering predictive modeling: a systematic review and future directions. Swarm Evol Comput 32:85–109
Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13
Rathore SS, Gupta A (2012) Validating the effectiveness of object-oriented metrics over multiple releases for predicting fault proneness. In: 2012 19th Asia-Pacific software engineering conference, Vol. 1, pp. 350–355. IEEE
Romano D, Pinzger M (2011) Using source code metrics to predict change-prone java interfaces. In: 2011 27th IEEE international conference on software maintenance (ICSM) ,pp. 303–312. IEEE
Ryu D, Baik J (2016) Effective multi-objective naïve Bayes learning for cross-project defect prediction. Appl Soft Comput 49:1062–1077
Singh Y, Malhotra R (2012) Object-oriented software engineering. PHI Learning, New Delhi
Singh Y, Kaur A, Malhotra R (2010) Empirical validation of object-oriented metrics for predicting fault proneness models. Softw Qual J 18(1):3–35
Sousa T, Silva A, Neves A (2004) Particle swarm based data mining algorithms for classification tasks. Parallel Comput 30(5–6):767–783
Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J Roy Stat Soc: Ser B (Methodol) 36(2):111–133
Xia X, Lo D, Pan SJ, Nagappan N, Wang X (2016) Hydra: massively compositional model for cross-project defect prediction. IEEE Trans Softw Eng 42(10):977–998
Zhou Y, Leung H, Xu B (2009) Examining the potentially confounding effect of class size on the associations between object-oriented metrics and change-proneness. IEEE Trans Softw Eng 35(5):607–623
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Malhotra, R., Khanna, M. On the applicability of search-based algorithms for software change prediction. Int J Syst Assur Eng Manag 14, 55–73 (2023). https://doi.org/10.1007/s13198-021-01099-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13198-021-01099-7