Skip to main content
Log in

Investigation of various data analysis techniques to identify change prone parts of an open source software

  • Original Article
  • Published:
International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Abstract

Identifying and examining the change-prone parts of the software is gaining wide importance in the field of software engineering. This would help software practitioners to cautiously assign the resources for testing and maintenance. Software metrics can be used for constructing various classification models which allow timely identification of change prone classes. There have been various machine learning classification models proposed in the literature. However, due to varying results across studies, more research needs to be done to increase the confidence in the results and provide a valuable conclusion. In this paper, we have used a number of data analysis techniques (14 machine learning techniques and a statistical technique) to construct change prediction models and performed statistical testing to compare the performance of these models. The application of a large number of techniques will allow for fair evaluation and will thus, increase the conclusion validity of the study. The results are validated on five releases of an open source, widely used operating system in mobile phone and tablet computers, ‘Android’. To make the results more generalizable, we have also conducted inter-release and cross-project predictions. The results conclude that the machine learning techniques are effective in predicting change prone classes and thus, should be widely used by researchers and practitioners to reduce maintenance effort and thus efficient and better development of software.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Abdi M, Lounis H, Sahraou H (2006) Analyzing change impact in object oriented systems. In: 32nd EUROMICRO conference on software engineering and advanced applications, pp 310–319

  • Aggarwal KK, Singh Y, Kaur A, Malhotra R (2009) Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: a replicated study. Softw Process Improv Pract 16(1):39–62

    Article  Google Scholar 

  • Arisholm E, Briand LC, Føyen A (2004) Dynamic coupling measurement for object-oriented software. IEEE Trans Softw Eng 30(8):491–506

    Article  Google Scholar 

  • Basili VR, Briand LC, Melo LW (1996) A validation of object oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761

    Article  Google Scholar 

  • Belsley DA, Kuh E, Welsch RE (1980) Regression diagnostics: identifying influential data and sources of collinearity. Wiley, New York

    Book  MATH  Google Scholar 

  • Beszedes A, Gergely T, Farago S, Gyimothy T, Fischer F (2007) The dynamic function coupling metric and its use in software evolution. In: 11th European conference on software maintenance and reengineering, pp 103–112

  • Bieman JM, Straw G, Wang H, Munger PW, Alexander RT (2003) Design patterns and change proneness: an examination of five evolving systems. In: Proceedings of the ninth international software metrics symposium, pp 40–49

  • Briand L, Wust J, Lounis H (2001) Replicated case studies for investigating quality factors in object-oriented designs. Empir Softw Eng 6(1):11–58

    Article  MATH  Google Scholar 

  • Chaumun MA, Kabaili H, Keller RK, Lustman F (2002) A change impact model for changeability assessment in object-oriented software systems. Sci Comput Program 45(2–3):155–174

    Article  MATH  Google Scholar 

  • Chidamber SR, Kemerer CF (1994) A metrics suite for object-oriented design. IEEE Trans Softw Eng 20(6):476–493

    Article  Google Scholar 

  • De Carvalho AB, Pozo A, Vergilio SR (2010) A symbolic fault-prediction model based on mulit-objective particle swarm optimization. J Syst Softw 83(5):868–882

    Article  Google Scholar 

  • Demiroz G, Guvenir A (1997) Classification by voting feature intervals. In: 9th European conference on machine learning, pp 85–92

  • Elish MO, Al-Khiaty MA (2013) A suite for quantifying historical changes to predict future change-prone classes in object-oriented software. J Softw Evol Process 25(5):407–437

    Article  Google Scholar 

  • Emam KE, Benlarbi S, Goel N, Rai S (1999) A validation of object-oriented metrics. NRC technical report ERB-1063

  • Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874

    Article  MathSciNet  Google Scholar 

  • Freund Y, Schapire RE (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(5):771–780

    Google Scholar 

  • Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the seventeenth international conference on machine learning, pp 359–366

  • Han AR, Jeon SU, Bae DH, Hong JE (2008) Behavioural dependency measurement for change-proneness prediction in UML 2.0 design models. In: Annual IEEE international computer software and applications conference, pp 76–83

  • Han AR, Jeon SU, Bae DH, Hong JE (2010) Measuring behavioural dependency for improving change-proneness prediction in UML-based design models. J Syst Softw 83(2):222–234

    Article  Google Scholar 

  • Hassan AE, Holt RC (2004) Predicting change propagation in software systems. In: Proceedings of the 20th IEEE international conference on software maintenance, pp284–293

  • He Z, Shu F, Yang Y, Li M, Wang Q (2012) An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng 19(2):167–199

    Article  Google Scholar 

  • Hosmer D, Lemeshow S (1989) Applied logistic regression. Wiley, New York

    MATH  Google Scholar 

  • Koru AG, Liu H (2007) Identifying and characterizing change-prone classes in two large-scale open-source products. J Syst Softw 80(1):63–73

    Article  Google Scholar 

  • Koru AG, Tian J (2005) Comparing high-change modules and modules with the highest measurement values in two large-scale open-source products. IEEE Trans Softw Eng 31(8):625–642

    Article  Google Scholar 

  • Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach Learn 95(1-2):161–205

    Article  MATH  Google Scholar 

  • Lessmann S, Baesens B, Swantje CM, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496

    Article  Google Scholar 

  • Li W, Henry S (1993) Object oriented metrics that predict maintainability. J Syst Softw 23(2):111–122

    Article  Google Scholar 

  • Lindvall M (1998) Are large C++ classes change-prone? An empirical investigation. Softw Pract Ex 28(15):1551–1558

    Article  Google Scholar 

  • Lu H, Zhou Y, Xu B, Leung H, Chen L (2012) The ability of object oriented metrics to predict change-proneness: a meta-analysis. Empir Softw Eng J 17(3):200–242

    Article  Google Scholar 

  • Ma Y, Cukic B (2007) Adequate and precise evaluation of quality models in software engineering studies. In: 3rd IEEE/ACM workshop predictor models in software engineering (PROMISE)

  • Malhotra R, Khanna M (2013) Investigation of relationship between object-oriented metrics and change proneness. Int J Mach Learn Cybern 4(4):273–286

    Article  Google Scholar 

  • Malik H, Hassan AE (2008) Supporting software evolution using adaptive change propagation heuristics. In: IEEE international conference on software maintenance, pp 177–186

  • Menzies T, Greenwald J, Frank A (2007a) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13

    Article  Google Scholar 

  • Menzies T, Dekhtyar A, Distefano J, Greenwald J (2007b) Problems with precision: a response to “comments on ‘data mining static code attributes to learn defect predictors’”. IEEE Trans Softw Eng 33(9):637–640

    Article  Google Scholar 

  • Michalak K, Kwasnicka H (2006) Correlation-based feature selection strategy in neural classification. In: Sixth international conference on intelligent systems design and applications, pp 741–746

  • Myrtveit I, Stensrud E, Shepperd M (2005) Reliability and validity in comparative studies of software prediction models. IEEE Trans Softw Eng 31(5):380–391

    Article  Google Scholar 

  • Pai GJ, Dugan JB (2007) Empirical analysis of software fault content and fault proneness using Bayesian methods. IEEE Trans Softw Eng 33(10):675–686

    Article  Google Scholar 

  • Posnett D, Bird C, Dévanbu P (2011) An empirical study on the influence of pattern roles on change-proneness. Empir Softw Eng 16(3):396–423

    Article  Google Scholar 

  • Qinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Mateo

    Google Scholar 

  • Sharafat AR, Tahvildari L (2007) A probabilistic approach to predict changes in object-oriented software system. In: 11th European conference on software maintenance and reengineering, pp 27–38

  • Sharafat AR, Tahvildari L (2008) Change prediction in object-oriented software systems: a probabilistic approach. J Softw 3(5):26–39

    Article  Google Scholar 

  • Shatnawi R, Li W (2008) The effectiveness of software metrics in identifying error-prone classes in post-release software evolution process. J Syst Softw 81(11):1868–1882

    Article  Google Scholar 

  • Singh Y, Kaur A, Malhotra R (2010) Empirical validation of object-oriented metrics for predicting fault proneness models. Software Qual J 18(1):3–35

    Article  Google Scholar 

  • Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Soc Ser A 36:111–114

    MathSciNet  MATH  Google Scholar 

  • Succi G, Pedrycz W, Djokic S, Zuliani P, Russo B (2005) An empirical exploration of the distributions of the Chidamber and Kemerer object-oriented metrics suite. Empir Softw Eng 10(1):81–103

    Article  Google Scholar 

  • Watanabe S, Kaiya H, Kaijiri K (2008) Adapting a fault prediction model to allow inter language reuse. In: Proceedings of the 4th international workshop on predictor models in software engineering, ACM, pp 19–24

  • Wilkie FG, Kitchenham BA (2000) Coupling measures and change ripples in C++ application software. J Syst Softw 52(2–3):157–164

    Article  Google Scholar 

  • Zhang F, Zheng Q, Zou Y, Hassan AE (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier. In: Proceedings of the 38th international conference on software engineering, ACM, pp 309–320

  • Zhou Y, Leung H, Xu B (2009) Examining the potentially confounding effect of class size on the associations between object metrics and change proneness. IEEE Trans Softw Eng 35(5):607–623

    Article  Google Scholar 

  • Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the seventh joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pp 91–100

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruchika Malhotra.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Malhotra, R., Bansal, A. Investigation of various data analysis techniques to identify change prone parts of an open source software. Int J Syst Assur Eng Manag 9, 401–426 (2018). https://doi.org/10.1007/s13198-017-0686-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13198-017-0686-5

Keywords

Navigation