Skip to main content
Log in

An exploratory study for software change prediction in object-oriented systems using hybridized techniques

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Variation in software requirements, technological upgrade and occurrence of defects necessitate change in software for its effective use. Early detection of those classes of a software which are prone to change is critical for software developers and project managers as it can aid in efficient resource allocation of limited resources. Moreover, change prone classes should be efficiently restructured and designed to prevent introduction of defects. Recently, use of search based techniques and their hybridized counter-parts have been advocated in the field of software engineering predictive modeling as these techniques help in identification of optimal solutions for a specific problem by testing the goodness of a number of possible solutions. In this paper, we propose a novel approach for change prediction using search-based techniques and hybridized techniques. Further, we address the following issues: (i) low repeatability of empirical studies, (ii) less use of statistical tests for comparing the effectiveness of models, and (iii) non-assessment of trade-off between runtime and predictive performance of various techniques. This paper presents an empirical validation of search-based techniques and their hybridized versions, which yields unbiased, accurate and repeatable results. The study analyzes and compares the predictive performance of five search-based, five hybridized techniques and four widely used machine learning techniques and a statistical technique for predicting change prone classes in six application packages of a popular operating system for mobile—Android. The results of the study advocate the use of hybridized techniques for developing models to identify change prone classes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Ali, S., Briand, L.C., Hemmati, H., Panesar-Walawege, R.K.: A systematic review of the application and empirical investigation of search-based test case generation. IEEE Trans. Softw. Eng. 36(6), 742–762 (2010)

    Article  Google Scholar 

  • Aguilar-Reiz, J.S., Riquelme, J.C., Toro, M.: Evolutionary learning of hierarchical decision rules. IEEE Trans. Syst. Man Cybern. 33(2), 324–331 (2003)

    Article  Google Scholar 

  • Arcuri, A., Fraser, G.: On Parameter Tuning in Search Based Software Engineering. Springer, Berlin (2011)

    Book  Google Scholar 

  • Arcuri, A., Briand, L.C.: A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: International Conference on Software Engineering, pp. 1–10 (2011)

  • Arisholm, E., Briand, L.C., Foyen, A.: Dynamic coupling measurement for object-oriented software. IEEE Trans. Softw. Eng. 30(8), 491–506 (2004)

    Article  Google Scholar 

  • Arisholm, E., Briand, L.C.: Predicting fault-prone components in a java legacy system. In: Proceedings of ACM/IEEE international symposium on empirical software engineering, pp. 8–17 (2006)

  • Arisholm, E., Briand, L.C., Johannessen, E.B.: A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J. Syst. Softw. 83(1), 2–17 (2010)

    Article  Google Scholar 

  • Bacardit, J., Garrell, J.M. : Evolving multiple discretizations with adaptive intervals for a pittsburgh rule-based learning classifier system. In: Genetic and Evolutionary Computation Conference (GECCO’03), 2724, pp. 1818–1831 (2003)

  • Bacardit, J.: Pittsburgh genetics-based machine learning in the data mining era: representations, generalization, and run-time. PhD Thesis (2004)

  • Bacardit, J., Krasnogor, N.: Performance and efficiency of memetic pittsburgh learning classifier systems. Evol. Comput. 17(3), 307–342 (2009)

    Article  Google Scholar 

  • Bardsiri, V.K., Jawawi, D.N.A., Hashim, S.Z.M., Khatibi, E.: A PSO-based model to increase the accuracy of software development effort estimation. Softw. Qual. J. 21(3), 501–526 (2013)

    Article  Google Scholar 

  • Bardsiri, V.K., Jawawi, D.N.A., Hashim, S.Z.M., Khatibi, E.: A flexible method to estimate the software development effort based on the classification of projects and localization of comparisons. Empir. Softw. Eng. 19(4), 857–884 (2014)

    Article  Google Scholar 

  • Barros, M.O., Neto, A.C.D.: Threats to validity in search-based software engineering empirical studies. Technical Report TR 0006/2011, UNIRIO-Universidade Federal do Estado do Rio de Janeiro (2011)

  • Bernado-Mansilla, E., Garrell-Guiu, J.M.: Accuracy-based learning classifier systems: models, analysis and applications to classification tasks. Evol. Comput. 11(3), 209–238 (2003)

    Article  Google Scholar 

  • Bieman, J., Jain, D., Yang, H.: OO design patterns, design structure, and program changes: an industrial case study. In: Proceedings of 17th International Conference on Software Maintenance, pp. 580–589 (2001)

  • Bieman, J., Andrews, A., Yang, H.: Understanding change-proneness in OO software through visualization. In: 11th IEEE International Workshop on Program Comprehension, pp. 44–53 (2003)

  • Braga, P.L., Oliveira, A.L., Meira, S.R.: A GA-based feature selection and parameters optimization for support vector regression applied to software effort estimation. In: ACM Symposium on Applied Computing, pp. 1788–1798 (2008)

  • Briand, L., Daly, J., Wust, J.: A unified framework for cohesion measurement in object-oriented systems. Empir. Softw. Eng. 3(1), 65–117 (1998)

    Article  Google Scholar 

  • Briand, L., Daly, J., Wust, J.: A unified framework for coupling measurement in object-oriented systems. IEEE Trans. Softw. Eng. 25(1), 91–121 (1999)

    Article  Google Scholar 

  • Briand, L., Wust, J., Daly, J.W.: Exploring the relationship between design measures and software quality in object-oriented Systems. J. Syst. Softw. 51(3), 245–273 (2000)

    Article  Google Scholar 

  • Briand, L., Wust, J., Lounis, H.: Replicated case studies for investigating quality factors in object oriented designs. Empir. Softw. Eng. J. 6(1), 11–58 (2001)

    Article  MATH  Google Scholar 

  • Burgess, C.J., Lefley, M.: Can genetic programming improve software effort estimation? A comparative evaluation. Inf. Softw. Technol. 43(14), 863–873 (2001)

    Article  Google Scholar 

  • Butz, M.V., Kovacs, T., Lanzi, P.L., Wilson, S.W.: How XCS evolves accurate classifiers. In: Proceedings of Genetic and Evolutionary Computation Conference, pp. 927–934 (2001)

  • Canfora, G., De Lucia, A., Di Penta, M., Oliveto, R., Panichella, A., Panichella, S.: Multi-objective cross-project defect prediction. In: 6th International Conference on Software Testing, Verification and Validation, pp. 252–261 (2013)

  • CartWright, M., Shepperd, M.: An empirical investigation of an object-oriented software system. IEEE Trans. Softw. Eng. 26(8), 786–796 (2000)

    Article  Google Scholar 

  • Carvalho, D.R., Freitas, A.A.: A hybrid decision tree/genetic algorithm method for data mining. J. Inf. Sci. 163(1–3), 13–35 (2004)

    Article  Google Scholar 

  • Carvalho, A.B.D., Pozo, A., Vergilio, S.R.: A symbolic fault-prediction model based on multi-objective particle swarm optimization. J. Syst. Softw. 83(5), 868–882 (2010)

    Article  Google Scholar 

  • Catal, C., Diri, B.: Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Inf. Sci. 179(8), 1040–1058 (2009)

    Article  Google Scholar 

  • Chidamber, S.R., Kemerer, C.F.: A metrics suite for object oriented design. IEEE Trans. Softw. Eng. 20(6), 476–493 (1994)

    Article  Google Scholar 

  • Chiu, N.-H., Huang, S.-J.: The adjusted analogy-based software effort estimation based on similarity distances. J. Syst. Softw. 80(4), 628–640 (2007)

    Article  Google Scholar 

  • Clarke, J., Dolado, J.J., Harman, M., Hierons, R., Jones, B., Lumkin, M., Mitchell, B., Mancordis, S., Rees, K., Roper, M., Shepperd, M.: Reformulating software engineering as a search problem. IEEE Proc. Softw. 150(3), 161–175 (2003)

    Article  Google Scholar 

  • Clerc, M., Kennedy, J.: The particle swarm—explosion, stability, and convergence in a multidimensional complex space. IEEE Trans. Evol. Comput. 6(1), 58–73 (2002)

    Article  Google Scholar 

  • Corazza, A., Di Martino, S., Ferrucci, F., Gravino, C., Sarro, F., Mendes, E.: Using tabu search to configure support vector regression for effort estimation. Empir. Softw. Eng. 18(1), 506–546 (2013)

    Article  Google Scholar 

  • Craenen, B.G., Eiben, A.E., van Hemert, J.I.: Comparing evolutionary algorithms on binary constraint satisfaction problems. IEEE Trans. Evol. Comput. 7(5), 424–444 (2003)

    Article  Google Scholar 

  • Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  • Di Martino, S.D., Ferrucci, F., Gravino, C., Sarro, F.: A genetic algorithm to configure support vector machines for predicting fault-prone components. Prod. Focus. Softw. Process Improv. 6759, 247–261 (2011)

    Article  Google Scholar 

  • Dolado, J.J.: A validation of component based method for software size estimation. IEEE Trans. Softw. Eng. 26(10), 1006–1021 (2000)

    Article  Google Scholar 

  • Durbin, R., Rumelhart, D.: Product units: a computationally powerful and biologically plausible extensions to back-propagation networks. Neural Comput. 1, 133–142 (1989)

    Article  Google Scholar 

  • Elish, K.O., Elish, M.O.: Predicting defect-prone software modules using support vector machines. J. Syst. Softw. 81(5), 649–660 (2008)

    Article  Google Scholar 

  • Elish, M.O., Al-Khiaty, M.A.: A suite of metrics for quantifying historical changes to predict future change-prone classes in object-oriented software. J. Softw. Evol. Process 25(5), 407–437 (2013)

    Article  Google Scholar 

  • El Emam, K., Melo, W., Machado, J.C.: The prediction of faulty classes using object-oriented design metrics. J. Syst. Softw. 56(1), 63–75 (2001)

    Article  Google Scholar 

  • Eski, S., Buzluca, F.: An empirical study on object-oriented metrics and software evolution in order to reduce testing cost by predicting change prone classes. In: International Conference on Software Testing, Verification and Validation Workshop, pp. 566–571 (2011)

  • Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)

    Article  MATH  Google Scholar 

  • Giger, E., Pinzger, M., Gall, H.C.: Can we predict type of code changes? An empirical analysis. In: 9th IEEE Working Conference on Mining Software Repositories, pp. 217–226 (2012)

  • Gottlieb, J., Marchiori, E., Rossi, C.: Evolutionary algorithms for the satisfiability problem. Evol. Comput. 10(1), 35–50 (2002)

    Article  Google Scholar 

  • Grosan, C., Abraham, A.: Hybrid evolutionary algorithms: methodologies, architectures and reviews. Stud. Comput. Intell. 75, 1–17 (2007)

    Google Scholar 

  • Gyimothy, T., Ferenc, R., Siket, I.: Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans. Softw. Eng. 31(10), 897–910 (2005)

    Article  Google Scholar 

  • Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceeding of the Seventeenth International Conference on Machine Learning, pp. 359–366 (2000)

  • Harman, M., Jones, B.F.: Search based software engineering. Inf. Softw. Technol. 43(14), 833–839 (2001)

    Article  Google Scholar 

  • Harman, M.: The relationship between search based software engineering and predictive modeling. In: Proceedings of the 6th International Conference on Predictive Models in Software Engineering (2010a)

  • Harman, M.: Why the virtual nature of software makes it ideal for search based optimization. In: International Conference on Fundamental Approaches to Software Engineering. Springer, Berlin (2010b)

  • Harman, M., McMinn, P., Teixeira de Souza, J., Yoo, S.: Search based software engineering: techniques, taxonomy, tutorial. In: Empirical Software Engineering and Verification. Lecture Notes in Computer Science, vol. 7007, pp. 1–59 (2012a)

  • Harman, M., Mansouri, S.A., Zhang, Y.: Search-based software engineering: trends, techniques and applications. ACM Comput. Surv. 45(1), 11 (2012b)

    Article  Google Scholar 

  • Harman, M., Islam, S., Jia, Y., Minku, L.L., Sarro, F., Sirivisut, K.: Less is more: temporal fault predictive performance over multiple hadoop releases. In: 6th International Symposium on Search Based Software Engineering, pp. 240–246 (2014)

  • He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  • Henderson-Sellers, B.: Object-Oriented Metrics, Measures of Complexity, Prentice Hall (1996)

  • Huang, S.J., Chiu, N.H.: Optimization of analogy weights by genetic algorithm for software effort estimation. Inf. Softw. Technol. 48(11), 1034–1045 (2006)

    Article  Google Scholar 

  • Huang, C.-L., Dun, J.-F.: A distributed PSO-SVM hybrid system with feature selection and parameter optimization. Appl. Soft Comput. 8(4), 1381–1391 (2008)

    Article  Google Scholar 

  • Khoshgoftaar, T.M., Seliya, N., Sundaresh, N.: An empirical study of predicting faults with case-based reasoning. Softw. Qual. J. 14(2), 85–111 (2006)

    Article  Google Scholar 

  • Koru, A.G., Tian, J.: Comparing high-change modules and modules with the highest measurement values in two large-scale open-source products. IEEE Trans. Softw. Eng. 31(8), 625–642 (2005)

    Article  Google Scholar 

  • Koru, A.G., Liu, H.: Identifying and characterizing change-prone classes in two large-scale open-source products. J. Syst. Softw. 80(1), 63–73 (2007)

    Article  Google Scholar 

  • Kpodjedo, S., Ricca, F., Galnier, P., Gueheneuc, Y.G., Antoniol, G.: Design evolution metrics for defect prediction in object-oriented systems. Empir. Softw. Eng. 16(1), 141–175 (2011)

    Article  Google Scholar 

  • Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one sided selection. In: Proceedings of 14th International Conference on Machine Learning, vol. 97, pp. 179–186 (1997)

  • Lessmann, S., Baesans, B., Mues, C., Pietsch, S.: Benchmarking classification models for software defect prediction: a proposed framework and novel finding. IEEE Trans. Softw. Eng. 34(4), 485–496 (2008)

    Article  Google Scholar 

  • Li, M., Zhang, H., Whu, R., Zhou, Z.: Sample-based software defect prediction with active and semi-supervised learning. Autom. Softw. Eng. 19(2), 201–230 (2012)

    Article  Google Scholar 

  • Lin, S.-W., Ying, K.-C., Chen, S.-C., Lee, Z.-J.: Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst. Appl. 35(4), 1817–1824 (2008)

    Article  Google Scholar 

  • Lin, S.-W., Chen, S.-C.: PSOLDA: a particle swarm optimization approach for enhancing classification accuracy rate of linear discriminant analysis. Appl. Soft Comput. 9, 1008–1015 (2009)

    Article  Google Scholar 

  • Lindvall, M.: Are large C++ classes change-prone? An empirical investigation. Softw. Pract. Exp. 28(15), 1551–1558 (1998)

    Article  Google Scholar 

  • Lu, H., Zhou, Y., Xu, B., Leung, H., Chen, L.: The ability of object-oriented metrics to predict change-proneness: a meta-analysis. Empir. Softw. Eng. J. 17(3), 200–242 (2012)

    Article  Google Scholar 

  • Malhotra, R., Khanna, M.: Investigation of relationship between object-oriented metrics and change proneness. Int. J. Mach. Learn. Cybernet. 4(4), 273–286 (2013)

    Article  Google Scholar 

  • Malhotra, R.: Search based techniques for software fault prediction: current trends and future directions. In: Proceedings of the 7th International Workshop on Search-Based Software Testing, pp. 35-36 (2014)

  • Malhotra, R., Khanna, M.: The ability of search-based algorithms to predict change-prone classes. Softw. Qual. Prof. 17(1), 17–31 (2014)

    Google Scholar 

  • Malhotra, R., Nagpal, K., Upmanyu, P., Pritam, N.: Defect collection and reporting system for Git based open source software. In: Proceedings of International Conference on Data Mining and Intelligent Computing, pp. 1–7 (2014)

  • Malhotra, R.: A systematic review of machine learning techniques for software fault prediction. Appl. Soft Comput. 27, 504–518 (2015)

    Article  Google Scholar 

  • Malhotra, R., Khanna, M.: Software engineering predictive modeling using search-based techniques: systematic review and future directions. In: Proceedings of 1st American Search-Based Software Engineering Symposium, pp. 1–16 (2015)

  • Martin, R.C.: Agile Software Development: Principles, Patters, and Practices. Prentice Hall, Upper Saddle River (2002)

    Google Scholar 

  • Martinez-Estudillo, F.J., Hevas-Martinez, C., Gutierrez, P.A., Martinez-Estudillo, A.C.: Evolutionary product-unit neural network classifiers. J. Neurocomputing 72(1–3), 548–561 (2008)

    Article  Google Scholar 

  • Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33(1), 2–13 (2007)

    Article  Google Scholar 

  • Minku, L.L., Yao, X.: Software effort estimation as a multi-objective learning problem. ACM Trans. Softw. Eng. Methodol. 22(4), 35 (2013)

    Article  Google Scholar 

  • Misirh, A.T., Bener, A.B., Turhan, B.: An industrial case study of classifier ensembles for locating software defects. Softw. Qual. J. 19(3), 515–536 (2011)

    Article  Google Scholar 

  • Olague, H., Etzkorn, L., Gholston, S., Quattlebaum, S.: Empirical validation of three software metric suites to predict the fault-proneness of object-oriented classes developed using highly iterative or agile software development processes. IEEE Trans. Softw. Eng. 33(10), 402–419 (2007)

    Article  Google Scholar 

  • Oliveira, A.L.I., Braga, P.L., Lima, R.M.L., Cornelio, M.L.: GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation. Inf. Softw. Technol. 52(11), 1155–1166 (2010)

    Article  Google Scholar 

  • Otero, J., Sanchen, L.: Induction of descriptive fuzzy classifiers with the Logitboost Algorithm. Soft. Comput. 10(9), 825–835 (2006)

    Article  Google Scholar 

  • Ouni, A., Kessentini, M., Sahraoui, H., Boukadoum, M.: Maintainability defects detection and correction: a multi-objective approach. Autom. Softw. Eng. 20(1), 47–79 (2013)

    Article  Google Scholar 

  • Pai, G.J., Dugan, J.B.: Empirical analysis of software fault content and fault proneness using bayesian methods. IEEE Trans. Softw. Eng. 33(10), 675–686 (2007)

    Article  Google Scholar 

  • Ramírez, A., Romero, J.R., Ventura, S.: A comparative study of many-objective evolutionary algorithms for the discovery of software architectures. In: Empirical Software Engineering, pp. 1–55 (2015)

  • Rivest, R.L.: Learning decision lists. Mach. Learn. 1(2), 229–246 (1987)

    Google Scholar 

  • Rodriguez, D., Ruiz, R., Riquelme, J.C., Aguluir-Ruiz, J.S.: Searching for rules to detect defective modules: a subgroup discovery approach. Inf. Sci. 191, 14–30 (2012)

    Article  Google Scholar 

  • Romano, D., Pinzger, M.: Using source code metrics to predict change-prone java interfaces. In: 27th IEEE International Conference on Software Maintenance, pp. 303–312 (2011)

  • Singh, Y., Kaur, A., Malhotra, R.: Empirical validation of object-oriented metrics for predicting fault proneness models. Softw. Qual. J. 18, 3–35 (2009)

    Article  Google Scholar 

  • Song, L., Minku, L.L., Yao, X.: The impact of parameter tuning on software effort estimation using learning machines. In: Proceedings of the 9th International Conference on Predictive Models in Software Engineering, p. 9 (2013)

  • Sousa, T., Silva, A., Neves, A.: Particle swarm based data mining algorithms for classification tasks. J. Parallel Comput. 30(5–6), 767–783 (2004)

    Article  Google Scholar 

  • Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. R. Soc. Ser. A 36, 111–114 (1974)

    MathSciNet  MATH  Google Scholar 

  • Zhou, Y., Leung, H., Xu, B.: Examining the potentially confounding effect of class size on the associations between object metrics and change proneness. IEEE Trans. Softw. Eng. 35(5), 607–623 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruchika Malhotra.

Appendix

Appendix

1.1 Descriptive statistics

This appendix presents the descriptive statistics of each data set. Tables 12, 13, 14, 15, 16, and 17 report the minimum (Min.), maximum (Max), mean (Mean), standard deviation (SD), 25 % percentile and 75 % percentile for all the OO metrics used as independent variables in the study, for each data set respectively.

Table 12 Descriptive statistics for Android Bluetooth data set
Table 13 Descriptive statistics for Android Contacts data set
Table 14 Descriptive statistics for Android Calendar data set
Table 15 Descriptive statistics for Android Gallery2 data set
Table 16 Descriptive statistics for Android MMS data set
Table 17 Descriptive statistics for Android Telephony data set

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Malhotra, R., Khanna, M. An exploratory study for software change prediction in object-oriented systems using hybridized techniques. Autom Softw Eng 24, 673–717 (2017). https://doi.org/10.1007/s10515-016-0203-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10515-016-0203-0

Keywords

Navigation