Investigation of various data analysis techniques to identify change prone parts of an open source software

Malhotra, Ruchika; Bansal, Ankita

doi:10.1007/s13198-017-0686-5

Investigation of various data analysis techniques to identify change prone parts of an open source software

Original Article
Published: 20 November 2017

Volume 9, pages 401–426, (2018)
Cite this article

International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

197 Accesses
2 Citations
Explore all metrics

Abstract

Identifying and examining the change-prone parts of the software is gaining wide importance in the field of software engineering. This would help software practitioners to cautiously assign the resources for testing and maintenance. Software metrics can be used for constructing various classification models which allow timely identification of change prone classes. There have been various machine learning classification models proposed in the literature. However, due to varying results across studies, more research needs to be done to increase the confidence in the results and provide a valuable conclusion. In this paper, we have used a number of data analysis techniques (14 machine learning techniques and a statistical technique) to construct change prediction models and performed statistical testing to compare the performance of these models. The application of a large number of techniques will allow for fair evaluation and will thus, increase the conclusion validity of the study. The results are validated on five releases of an open source, widely used operating system in mobile phone and tablet computers, ‘Android’. To make the results more generalizable, we have also conducted inter-release and cross-project predictions. The results conclude that the machine learning techniques are effective in predicting change prone classes and thus, should be widely used by researchers and practitioners to reduce maintenance effort and thus efficient and better development of software.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Challenges of Low-Code/No-Code Software Development: A Literature Review

How different are different diff algorithms in Git?

Article Open access 11 September 2019

Code and commit metrics of developer productivity: a study on team leaders perceptions

Article 13 April 2020

References

Abdi M, Lounis H, Sahraou H (2006) Analyzing change impact in object oriented systems. In: 32nd EUROMICRO conference on software engineering and advanced applications, pp 310–319
Aggarwal KK, Singh Y, Kaur A, Malhotra R (2009) Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: a replicated study. Softw Process Improv Pract 16(1):39–62
Article Google Scholar
Arisholm E, Briand LC, Føyen A (2004) Dynamic coupling measurement for object-oriented software. IEEE Trans Softw Eng 30(8):491–506
Article Google Scholar
Basili VR, Briand LC, Melo LW (1996) A validation of object oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761
Article Google Scholar
Belsley DA, Kuh E, Welsch RE (1980) Regression diagnostics: identifying influential data and sources of collinearity. Wiley, New York
Book MATH Google Scholar
Beszedes A, Gergely T, Farago S, Gyimothy T, Fischer F (2007) The dynamic function coupling metric and its use in software evolution. In: 11th European conference on software maintenance and reengineering, pp 103–112
Bieman JM, Straw G, Wang H, Munger PW, Alexander RT (2003) Design patterns and change proneness: an examination of five evolving systems. In: Proceedings of the ninth international software metrics symposium, pp 40–49
Briand L, Wust J, Lounis H (2001) Replicated case studies for investigating quality factors in object-oriented designs. Empir Softw Eng 6(1):11–58
Article MATH Google Scholar
Chaumun MA, Kabaili H, Keller RK, Lustman F (2002) A change impact model for changeability assessment in object-oriented software systems. Sci Comput Program 45(2–3):155–174
Article MATH Google Scholar
Chidamber SR, Kemerer CF (1994) A metrics suite for object-oriented design. IEEE Trans Softw Eng 20(6):476–493
Article Google Scholar
De Carvalho AB, Pozo A, Vergilio SR (2010) A symbolic fault-prediction model based on mulit-objective particle swarm optimization. J Syst Softw 83(5):868–882
Article Google Scholar
Demiroz G, Guvenir A (1997) Classification by voting feature intervals. In: 9th European conference on machine learning, pp 85–92
Elish MO, Al-Khiaty MA (2013) A suite for quantifying historical changes to predict future change-prone classes in object-oriented software. J Softw Evol Process 25(5):407–437
Article Google Scholar
Emam KE, Benlarbi S, Goel N, Rai S (1999) A validation of object-oriented metrics. NRC technical report ERB-1063
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874
Article MathSciNet Google Scholar
Freund Y, Schapire RE (1999) A short introduction to boosting. J Jpn Soc Artif Intell 14(5):771–780
Google Scholar
Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the seventeenth international conference on machine learning, pp 359–366
Han AR, Jeon SU, Bae DH, Hong JE (2008) Behavioural dependency measurement for change-proneness prediction in UML 2.0 design models. In: Annual IEEE international computer software and applications conference, pp 76–83
Han AR, Jeon SU, Bae DH, Hong JE (2010) Measuring behavioural dependency for improving change-proneness prediction in UML-based design models. J Syst Softw 83(2):222–234
Article Google Scholar
Hassan AE, Holt RC (2004) Predicting change propagation in software systems. In: Proceedings of the 20th IEEE international conference on software maintenance, pp284–293
He Z, Shu F, Yang Y, Li M, Wang Q (2012) An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng 19(2):167–199
Article Google Scholar
Hosmer D, Lemeshow S (1989) Applied logistic regression. Wiley, New York
MATH Google Scholar
Koru AG, Liu H (2007) Identifying and characterizing change-prone classes in two large-scale open-source products. J Syst Softw 80(1):63–73
Article Google Scholar
Koru AG, Tian J (2005) Comparing high-change modules and modules with the highest measurement values in two large-scale open-source products. IEEE Trans Softw Eng 31(8):625–642
Article Google Scholar
Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach Learn 95(1-2):161–205
Article MATH Google Scholar
Lessmann S, Baesens B, Swantje CM, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496
Article Google Scholar
Li W, Henry S (1993) Object oriented metrics that predict maintainability. J Syst Softw 23(2):111–122
Article Google Scholar
Lindvall M (1998) Are large C++ classes change-prone? An empirical investigation. Softw Pract Ex 28(15):1551–1558
Article Google Scholar
Lu H, Zhou Y, Xu B, Leung H, Chen L (2012) The ability of object oriented metrics to predict change-proneness: a meta-analysis. Empir Softw Eng J 17(3):200–242
Article Google Scholar
Ma Y, Cukic B (2007) Adequate and precise evaluation of quality models in software engineering studies. In: 3rd IEEE/ACM workshop predictor models in software engineering (PROMISE)
Malhotra R, Khanna M (2013) Investigation of relationship between object-oriented metrics and change proneness. Int J Mach Learn Cybern 4(4):273–286
Article Google Scholar
Malik H, Hassan AE (2008) Supporting software evolution using adaptive change propagation heuristics. In: IEEE international conference on software maintenance, pp 177–186
Menzies T, Greenwald J, Frank A (2007a) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13
Article Google Scholar
Menzies T, Dekhtyar A, Distefano J, Greenwald J (2007b) Problems with precision: a response to “comments on ‘data mining static code attributes to learn defect predictors’”. IEEE Trans Softw Eng 33(9):637–640
Article Google Scholar
Michalak K, Kwasnicka H (2006) Correlation-based feature selection strategy in neural classification. In: Sixth international conference on intelligent systems design and applications, pp 741–746
Myrtveit I, Stensrud E, Shepperd M (2005) Reliability and validity in comparative studies of software prediction models. IEEE Trans Softw Eng 31(5):380–391
Article Google Scholar
Pai GJ, Dugan JB (2007) Empirical analysis of software fault content and fault proneness using Bayesian methods. IEEE Trans Softw Eng 33(10):675–686
Article Google Scholar
Posnett D, Bird C, Dévanbu P (2011) An empirical study on the influence of pattern roles on change-proneness. Empir Softw Eng 16(3):396–423
Article Google Scholar
Qinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Mateo
Google Scholar
Sharafat AR, Tahvildari L (2007) A probabilistic approach to predict changes in object-oriented software system. In: 11th European conference on software maintenance and reengineering, pp 27–38
Sharafat AR, Tahvildari L (2008) Change prediction in object-oriented software systems: a probabilistic approach. J Softw 3(5):26–39
Article Google Scholar
Shatnawi R, Li W (2008) The effectiveness of software metrics in identifying error-prone classes in post-release software evolution process. J Syst Softw 81(11):1868–1882
Article Google Scholar
Singh Y, Kaur A, Malhotra R (2010) Empirical validation of object-oriented metrics for predicting fault proneness models. Software Qual J 18(1):3–35
Article Google Scholar
Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Soc Ser A 36:111–114
MathSciNet MATH Google Scholar
Succi G, Pedrycz W, Djokic S, Zuliani P, Russo B (2005) An empirical exploration of the distributions of the Chidamber and Kemerer object-oriented metrics suite. Empir Softw Eng 10(1):81–103
Article Google Scholar
Watanabe S, Kaiya H, Kaijiri K (2008) Adapting a fault prediction model to allow inter language reuse. In: Proceedings of the 4th international workshop on predictor models in software engineering, ACM, pp 19–24
Wilkie FG, Kitchenham BA (2000) Coupling measures and change ripples in C++ application software. J Syst Softw 52(2–3):157–164
Article Google Scholar
Zhang F, Zheng Q, Zou Y, Hassan AE (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier. In: Proceedings of the 38th international conference on software engineering, ACM, pp 309–320
Zhou Y, Leung H, Xu B (2009) Examining the potentially confounding effect of class size on the associations between object metrics and change proneness. IEEE Trans Softw Eng 35(5):607–623
Article Google Scholar
Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B (2009) Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the seventh joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pp 91–100

Download references

Author information

Authors and Affiliations

Department of Software Engineering, Delhi Technological University, New Delhi, Delhi, India
Ruchika Malhotra
Department of Information Technology, Netaji Subhas Institute of Technology, New Delhi, Delhi, India
Ankita Bansal

Authors

Ruchika Malhotra
View author publications
You can also search for this author in PubMed Google Scholar
Ankita Bansal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruchika Malhotra.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Malhotra, R., Bansal, A. Investigation of various data analysis techniques to identify change prone parts of an open source software. Int J Syst Assur Eng Manag 9, 401–426 (2018). https://doi.org/10.1007/s13198-017-0686-5

Download citation

Received: 03 June 2016
Revised: 20 May 2017
Published: 20 November 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s13198-017-0686-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Investigation of various data analysis techniques to identify change prone parts of an open source software

Abstract

Access this article

Similar content being viewed by others

Challenges of Low-Code/No-Code Software Development: A Literature Review

How different are different diff algorithms in Git?

Code and commit metrics of developer productivity: a study on team leaders perceptions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Investigation of various data analysis techniques to identify change prone parts of an open source software

Abstract

Access this article

Similar content being viewed by others

Challenges of Low-Code/No-Code Software Development: A Literature Review

How different are different diff algorithms in Git?

Code and commit metrics of developer productivity: a study on team leaders perceptions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation