An exploratory study for software change prediction in object-oriented systems using hybridized techniques

Malhotra, Ruchika; Khanna, Megha

doi:10.1007/s10515-016-0203-0

An exploratory study for software change prediction in object-oriented systems using hybridized techniques

Published: 17 August 2016

Volume 24, pages 673–717, (2017)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Ruchika Malhotra¹ &
Megha Khanna^1,2

1203 Accesses
34 Citations
Explore all metrics

Abstract

Variation in software requirements, technological upgrade and occurrence of defects necessitate change in software for its effective use. Early detection of those classes of a software which are prone to change is critical for software developers and project managers as it can aid in efficient resource allocation of limited resources. Moreover, change prone classes should be efficiently restructured and designed to prevent introduction of defects. Recently, use of search based techniques and their hybridized counter-parts have been advocated in the field of software engineering predictive modeling as these techniques help in identification of optimal solutions for a specific problem by testing the goodness of a number of possible solutions. In this paper, we propose a novel approach for change prediction using search-based techniques and hybridized techniques. Further, we address the following issues: (i) low repeatability of empirical studies, (ii) less use of statistical tests for comparing the effectiveness of models, and (iii) non-assessment of trade-off between runtime and predictive performance of various techniques. This paper presents an empirical validation of search-based techniques and their hybridized versions, which yields unbiased, accurate and repeatable results. The study analyzes and compares the predictive performance of five search-based, five hybridized techniques and four widely used machine learning techniques and a statistical technique for predicting change prone classes in six application packages of a popular operating system for mobile—Android. The results of the study advocate the use of hybridized techniques for developing models to identify change prone classes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the applicability of search-based algorithms for software change prediction

Article 17 April 2021

Ruchika Malhotra & Megha Khanna

Investigation of various data analysis techniques to identify change prone parts of an open source software

Article 20 November 2017

Ruchika Malhotra & Ankita Bansal

Evolutionary Computation-Based Techniques Over Multiple Data Sets: An Empirical Assessment

Article 17 June 2017

Manju Khari & Prabhat Kumar

References

Ali, S., Briand, L.C., Hemmati, H., Panesar-Walawege, R.K.: A systematic review of the application and empirical investigation of search-based test case generation. IEEE Trans. Softw. Eng. 36(6), 742–762 (2010)
Article Google Scholar
Aguilar-Reiz, J.S., Riquelme, J.C., Toro, M.: Evolutionary learning of hierarchical decision rules. IEEE Trans. Syst. Man Cybern. 33(2), 324–331 (2003)
Article Google Scholar
Arcuri, A., Fraser, G.: On Parameter Tuning in Search Based Software Engineering. Springer, Berlin (2011)
Book Google Scholar
Arcuri, A., Briand, L.C.: A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: International Conference on Software Engineering, pp. 1–10 (2011)
Arisholm, E., Briand, L.C., Foyen, A.: Dynamic coupling measurement for object-oriented software. IEEE Trans. Softw. Eng. 30(8), 491–506 (2004)
Article Google Scholar
Arisholm, E., Briand, L.C.: Predicting fault-prone components in a java legacy system. In: Proceedings of ACM/IEEE international symposium on empirical software engineering, pp. 8–17 (2006)
Arisholm, E., Briand, L.C., Johannessen, E.B.: A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J. Syst. Softw. 83(1), 2–17 (2010)
Article Google Scholar
Bacardit, J., Garrell, J.M. : Evolving multiple discretizations with adaptive intervals for a pittsburgh rule-based learning classifier system. In: Genetic and Evolutionary Computation Conference (GECCO’03), 2724, pp. 1818–1831 (2003)
Bacardit, J.: Pittsburgh genetics-based machine learning in the data mining era: representations, generalization, and run-time. PhD Thesis (2004)
Bacardit, J., Krasnogor, N.: Performance and efficiency of memetic pittsburgh learning classifier systems. Evol. Comput. 17(3), 307–342 (2009)
Article Google Scholar
Bardsiri, V.K., Jawawi, D.N.A., Hashim, S.Z.M., Khatibi, E.: A PSO-based model to increase the accuracy of software development effort estimation. Softw. Qual. J. 21(3), 501–526 (2013)
Article Google Scholar
Bardsiri, V.K., Jawawi, D.N.A., Hashim, S.Z.M., Khatibi, E.: A flexible method to estimate the software development effort based on the classification of projects and localization of comparisons. Empir. Softw. Eng. 19(4), 857–884 (2014)
Article Google Scholar
Barros, M.O., Neto, A.C.D.: Threats to validity in search-based software engineering empirical studies. Technical Report TR 0006/2011, UNIRIO-Universidade Federal do Estado do Rio de Janeiro (2011)
Bernado-Mansilla, E., Garrell-Guiu, J.M.: Accuracy-based learning classifier systems: models, analysis and applications to classification tasks. Evol. Comput. 11(3), 209–238 (2003)
Article Google Scholar
Bieman, J., Jain, D., Yang, H.: OO design patterns, design structure, and program changes: an industrial case study. In: Proceedings of 17th International Conference on Software Maintenance, pp. 580–589 (2001)
Bieman, J., Andrews, A., Yang, H.: Understanding change-proneness in OO software through visualization. In: 11th IEEE International Workshop on Program Comprehension, pp. 44–53 (2003)
Braga, P.L., Oliveira, A.L., Meira, S.R.: A GA-based feature selection and parameters optimization for support vector regression applied to software effort estimation. In: ACM Symposium on Applied Computing, pp. 1788–1798 (2008)
Briand, L., Daly, J., Wust, J.: A unified framework for cohesion measurement in object-oriented systems. Empir. Softw. Eng. 3(1), 65–117 (1998)
Article Google Scholar
Briand, L., Daly, J., Wust, J.: A unified framework for coupling measurement in object-oriented systems. IEEE Trans. Softw. Eng. 25(1), 91–121 (1999)
Article Google Scholar
Briand, L., Wust, J., Daly, J.W.: Exploring the relationship between design measures and software quality in object-oriented Systems. J. Syst. Softw. 51(3), 245–273 (2000)
Article Google Scholar
Briand, L., Wust, J., Lounis, H.: Replicated case studies for investigating quality factors in object oriented designs. Empir. Softw. Eng. J. 6(1), 11–58 (2001)
Article MATH Google Scholar
Burgess, C.J., Lefley, M.: Can genetic programming improve software effort estimation? A comparative evaluation. Inf. Softw. Technol. 43(14), 863–873 (2001)
Article Google Scholar
Butz, M.V., Kovacs, T., Lanzi, P.L., Wilson, S.W.: How XCS evolves accurate classifiers. In: Proceedings of Genetic and Evolutionary Computation Conference, pp. 927–934 (2001)
Canfora, G., De Lucia, A., Di Penta, M., Oliveto, R., Panichella, A., Panichella, S.: Multi-objective cross-project defect prediction. In: 6th International Conference on Software Testing, Verification and Validation, pp. 252–261 (2013)
CartWright, M., Shepperd, M.: An empirical investigation of an object-oriented software system. IEEE Trans. Softw. Eng. 26(8), 786–796 (2000)
Article Google Scholar
Carvalho, D.R., Freitas, A.A.: A hybrid decision tree/genetic algorithm method for data mining. J. Inf. Sci. 163(1–3), 13–35 (2004)
Article Google Scholar
Carvalho, A.B.D., Pozo, A., Vergilio, S.R.: A symbolic fault-prediction model based on multi-objective particle swarm optimization. J. Syst. Softw. 83(5), 868–882 (2010)
Article Google Scholar
Catal, C., Diri, B.: Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Inf. Sci. 179(8), 1040–1058 (2009)
Article Google Scholar
Chidamber, S.R., Kemerer, C.F.: A metrics suite for object oriented design. IEEE Trans. Softw. Eng. 20(6), 476–493 (1994)
Article Google Scholar
Chiu, N.-H., Huang, S.-J.: The adjusted analogy-based software effort estimation based on similarity distances. J. Syst. Softw. 80(4), 628–640 (2007)
Article Google Scholar
Clarke, J., Dolado, J.J., Harman, M., Hierons, R., Jones, B., Lumkin, M., Mitchell, B., Mancordis, S., Rees, K., Roper, M., Shepperd, M.: Reformulating software engineering as a search problem. IEEE Proc. Softw. 150(3), 161–175 (2003)
Article Google Scholar
Clerc, M., Kennedy, J.: The particle swarm—explosion, stability, and convergence in a multidimensional complex space. IEEE Trans. Evol. Comput. 6(1), 58–73 (2002)
Article Google Scholar
Corazza, A., Di Martino, S., Ferrucci, F., Gravino, C., Sarro, F., Mendes, E.: Using tabu search to configure support vector regression for effort estimation. Empir. Softw. Eng. 18(1), 506–546 (2013)
Article Google Scholar
Craenen, B.G., Eiben, A.E., van Hemert, J.I.: Comparing evolutionary algorithms on binary constraint satisfaction problems. IEEE Trans. Evol. Comput. 7(5), 424–444 (2003)
Article Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Di Martino, S.D., Ferrucci, F., Gravino, C., Sarro, F.: A genetic algorithm to configure support vector machines for predicting fault-prone components. Prod. Focus. Softw. Process Improv. 6759, 247–261 (2011)
Article Google Scholar
Dolado, J.J.: A validation of component based method for software size estimation. IEEE Trans. Softw. Eng. 26(10), 1006–1021 (2000)
Article Google Scholar
Durbin, R., Rumelhart, D.: Product units: a computationally powerful and biologically plausible extensions to back-propagation networks. Neural Comput. 1, 133–142 (1989)
Article Google Scholar
Elish, K.O., Elish, M.O.: Predicting defect-prone software modules using support vector machines. J. Syst. Softw. 81(5), 649–660 (2008)
Article Google Scholar
Elish, M.O., Al-Khiaty, M.A.: A suite of metrics for quantifying historical changes to predict future change-prone classes in object-oriented software. J. Softw. Evol. Process 25(5), 407–437 (2013)
Article Google Scholar
El Emam, K., Melo, W., Machado, J.C.: The prediction of faulty classes using object-oriented design metrics. J. Syst. Softw. 56(1), 63–75 (2001)
Article Google Scholar
Eski, S., Buzluca, F.: An empirical study on object-oriented metrics and software evolution in order to reduce testing cost by predicting change prone classes. In: International Conference on Software Testing, Verification and Validation Workshop, pp. 566–571 (2011)
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)
Article MATH Google Scholar
Giger, E., Pinzger, M., Gall, H.C.: Can we predict type of code changes? An empirical analysis. In: 9th IEEE Working Conference on Mining Software Repositories, pp. 217–226 (2012)
Gottlieb, J., Marchiori, E., Rossi, C.: Evolutionary algorithms for the satisfiability problem. Evol. Comput. 10(1), 35–50 (2002)
Article Google Scholar
Grosan, C., Abraham, A.: Hybrid evolutionary algorithms: methodologies, architectures and reviews. Stud. Comput. Intell. 75, 1–17 (2007)
Google Scholar
Gyimothy, T., Ferenc, R., Siket, I.: Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans. Softw. Eng. 31(10), 897–910 (2005)
Article Google Scholar
Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceeding of the Seventeenth International Conference on Machine Learning, pp. 359–366 (2000)
Harman, M., Jones, B.F.: Search based software engineering. Inf. Softw. Technol. 43(14), 833–839 (2001)
Article Google Scholar
Harman, M.: The relationship between search based software engineering and predictive modeling. In: Proceedings of the 6th International Conference on Predictive Models in Software Engineering (2010a)
Harman, M.: Why the virtual nature of software makes it ideal for search based optimization. In: International Conference on Fundamental Approaches to Software Engineering. Springer, Berlin (2010b)
Harman, M., McMinn, P., Teixeira de Souza, J., Yoo, S.: Search based software engineering: techniques, taxonomy, tutorial. In: Empirical Software Engineering and Verification. Lecture Notes in Computer Science, vol. 7007, pp. 1–59 (2012a)
Harman, M., Mansouri, S.A., Zhang, Y.: Search-based software engineering: trends, techniques and applications. ACM Comput. Surv. 45(1), 11 (2012b)
Article Google Scholar
Harman, M., Islam, S., Jia, Y., Minku, L.L., Sarro, F., Sirivisut, K.: Less is more: temporal fault predictive performance over multiple hadoop releases. In: 6th International Symposium on Search Based Software Engineering, pp. 240–246 (2014)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Henderson-Sellers, B.: Object-Oriented Metrics, Measures of Complexity, Prentice Hall (1996)
Huang, S.J., Chiu, N.H.: Optimization of analogy weights by genetic algorithm for software effort estimation. Inf. Softw. Technol. 48(11), 1034–1045 (2006)
Article Google Scholar
Huang, C.-L., Dun, J.-F.: A distributed PSO-SVM hybrid system with feature selection and parameter optimization. Appl. Soft Comput. 8(4), 1381–1391 (2008)
Article Google Scholar
Khoshgoftaar, T.M., Seliya, N., Sundaresh, N.: An empirical study of predicting faults with case-based reasoning. Softw. Qual. J. 14(2), 85–111 (2006)
Article Google Scholar
Koru, A.G., Tian, J.: Comparing high-change modules and modules with the highest measurement values in two large-scale open-source products. IEEE Trans. Softw. Eng. 31(8), 625–642 (2005)
Article Google Scholar
Koru, A.G., Liu, H.: Identifying and characterizing change-prone classes in two large-scale open-source products. J. Syst. Softw. 80(1), 63–73 (2007)
Article Google Scholar
Kpodjedo, S., Ricca, F., Galnier, P., Gueheneuc, Y.G., Antoniol, G.: Design evolution metrics for defect prediction in object-oriented systems. Empir. Softw. Eng. 16(1), 141–175 (2011)
Article Google Scholar
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one sided selection. In: Proceedings of 14th International Conference on Machine Learning, vol. 97, pp. 179–186 (1997)
Lessmann, S., Baesans, B., Mues, C., Pietsch, S.: Benchmarking classification models for software defect prediction: a proposed framework and novel finding. IEEE Trans. Softw. Eng. 34(4), 485–496 (2008)
Article Google Scholar
Li, M., Zhang, H., Whu, R., Zhou, Z.: Sample-based software defect prediction with active and semi-supervised learning. Autom. Softw. Eng. 19(2), 201–230 (2012)
Article Google Scholar
Lin, S.-W., Ying, K.-C., Chen, S.-C., Lee, Z.-J.: Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst. Appl. 35(4), 1817–1824 (2008)
Article Google Scholar
Lin, S.-W., Chen, S.-C.: PSOLDA: a particle swarm optimization approach for enhancing classification accuracy rate of linear discriminant analysis. Appl. Soft Comput. 9, 1008–1015 (2009)
Article Google Scholar
Lindvall, M.: Are large C++ classes change-prone? An empirical investigation. Softw. Pract. Exp. 28(15), 1551–1558 (1998)
Article Google Scholar
Lu, H., Zhou, Y., Xu, B., Leung, H., Chen, L.: The ability of object-oriented metrics to predict change-proneness: a meta-analysis. Empir. Softw. Eng. J. 17(3), 200–242 (2012)
Article Google Scholar
Malhotra, R., Khanna, M.: Investigation of relationship between object-oriented metrics and change proneness. Int. J. Mach. Learn. Cybernet. 4(4), 273–286 (2013)
Article Google Scholar
Malhotra, R.: Search based techniques for software fault prediction: current trends and future directions. In: Proceedings of the 7th International Workshop on Search-Based Software Testing, pp. 35-36 (2014)
Malhotra, R., Khanna, M.: The ability of search-based algorithms to predict change-prone classes. Softw. Qual. Prof. 17(1), 17–31 (2014)
Google Scholar
Malhotra, R., Nagpal, K., Upmanyu, P., Pritam, N.: Defect collection and reporting system for Git based open source software. In: Proceedings of International Conference on Data Mining and Intelligent Computing, pp. 1–7 (2014)
Malhotra, R.: A systematic review of machine learning techniques for software fault prediction. Appl. Soft Comput. 27, 504–518 (2015)
Article Google Scholar
Malhotra, R., Khanna, M.: Software engineering predictive modeling using search-based techniques: systematic review and future directions. In: Proceedings of 1st American Search-Based Software Engineering Symposium, pp. 1–16 (2015)
Martin, R.C.: Agile Software Development: Principles, Patters, and Practices. Prentice Hall, Upper Saddle River (2002)
Google Scholar
Martinez-Estudillo, F.J., Hevas-Martinez, C., Gutierrez, P.A., Martinez-Estudillo, A.C.: Evolutionary product-unit neural network classifiers. J. Neurocomputing 72(1–3), 548–561 (2008)
Article Google Scholar
Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33(1), 2–13 (2007)
Article Google Scholar
Minku, L.L., Yao, X.: Software effort estimation as a multi-objective learning problem. ACM Trans. Softw. Eng. Methodol. 22(4), 35 (2013)
Article Google Scholar
Misirh, A.T., Bener, A.B., Turhan, B.: An industrial case study of classifier ensembles for locating software defects. Softw. Qual. J. 19(3), 515–536 (2011)
Article Google Scholar
Olague, H., Etzkorn, L., Gholston, S., Quattlebaum, S.: Empirical validation of three software metric suites to predict the fault-proneness of object-oriented classes developed using highly iterative or agile software development processes. IEEE Trans. Softw. Eng. 33(10), 402–419 (2007)
Article Google Scholar
Oliveira, A.L.I., Braga, P.L., Lima, R.M.L., Cornelio, M.L.: GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation. Inf. Softw. Technol. 52(11), 1155–1166 (2010)
Article Google Scholar
Otero, J., Sanchen, L.: Induction of descriptive fuzzy classifiers with the Logitboost Algorithm. Soft. Comput. 10(9), 825–835 (2006)
Article Google Scholar
Ouni, A., Kessentini, M., Sahraoui, H., Boukadoum, M.: Maintainability defects detection and correction: a multi-objective approach. Autom. Softw. Eng. 20(1), 47–79 (2013)
Article Google Scholar
Pai, G.J., Dugan, J.B.: Empirical analysis of software fault content and fault proneness using bayesian methods. IEEE Trans. Softw. Eng. 33(10), 675–686 (2007)
Article Google Scholar
Ramírez, A., Romero, J.R., Ventura, S.: A comparative study of many-objective evolutionary algorithms for the discovery of software architectures. In: Empirical Software Engineering, pp. 1–55 (2015)
Rivest, R.L.: Learning decision lists. Mach. Learn. 1(2), 229–246 (1987)
Google Scholar
Rodriguez, D., Ruiz, R., Riquelme, J.C., Aguluir-Ruiz, J.S.: Searching for rules to detect defective modules: a subgroup discovery approach. Inf. Sci. 191, 14–30 (2012)
Article Google Scholar
Romano, D., Pinzger, M.: Using source code metrics to predict change-prone java interfaces. In: 27th IEEE International Conference on Software Maintenance, pp. 303–312 (2011)
Singh, Y., Kaur, A., Malhotra, R.: Empirical validation of object-oriented metrics for predicting fault proneness models. Softw. Qual. J. 18, 3–35 (2009)
Article Google Scholar
Song, L., Minku, L.L., Yao, X.: The impact of parameter tuning on software effort estimation using learning machines. In: Proceedings of the 9th International Conference on Predictive Models in Software Engineering, p. 9 (2013)
Sousa, T., Silva, A., Neves, A.: Particle swarm based data mining algorithms for classification tasks. J. Parallel Comput. 30(5–6), 767–783 (2004)
Article Google Scholar
Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. R. Soc. Ser. A 36, 111–114 (1974)
MathSciNet MATH Google Scholar
Zhou, Y., Leung, H., Xu, B.: Examining the potentially confounding effect of class size on the associations between object metrics and change proneness. IEEE Trans. Softw. Eng. 35(5), 607–623 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Delhi Technological University, Delhi, India
Ruchika Malhotra & Megha Khanna
Sri Guru Gobind Singh College of Commerce, University of Delhi, Delhi, India
Megha Khanna

Authors

Ruchika Malhotra
View author publications
You can also search for this author in PubMed Google Scholar
Megha Khanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruchika Malhotra.

Appendix

1.1 Descriptive statistics

This appendix presents the descriptive statistics of each data set. Tables 12, 13, 14, 15, 16, and 17 report the minimum (Min.), maximum (Max), mean (Mean), standard deviation (SD), 25 % percentile and 75 % percentile for all the OO metrics used as independent variables in the study, for each data set respectively.

Table 12 Descriptive statistics for Android Bluetooth data set

Full size table

Table 13 Descriptive statistics for Android Contacts data set

Full size table

Table 14 Descriptive statistics for Android Calendar data set

Full size table

Table 15 Descriptive statistics for Android Gallery2 data set

Full size table

Table 16 Descriptive statistics for Android MMS data set

Full size table

Table 17 Descriptive statistics for Android Telephony data set

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Malhotra, R., Khanna, M. An exploratory study for software change prediction in object-oriented systems using hybridized techniques. Autom Softw Eng 24, 673–717 (2017). https://doi.org/10.1007/s10515-016-0203-0

Download citation

Received: 01 April 2015
Accepted: 02 August 2016
Published: 17 August 2016
Issue Date: September 2017
DOI: https://doi.org/10.1007/s10515-016-0203-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An exploratory study for software change prediction in object-oriented systems using hybridized techniques

Abstract

Access this article

Similar content being viewed by others

On the applicability of search-based algorithms for software change prediction

Investigation of various data analysis techniques to identify change prone parts of an open source software

Evolutionary Computation-Based Techniques Over Multiple Data Sets: An Empirical Assessment

References

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Descriptive statistics

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An exploratory study for software change prediction in object-oriented systems using hybridized techniques

Abstract

Access this article

Similar content being viewed by others

On the applicability of search-based algorithms for software change prediction

Investigation of various data analysis techniques to identify change prone parts of an open source software

Evolutionary Computation-Based Techniques Over Multiple Data Sets: An Empirical Assessment

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Descriptive statistics

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation