Skip to main content

Does class size matter? An in-depth assessment of the effect of class size in software defect prediction


In the past 20 years, defect prediction studies have generally acknowledged the effect of class size on software prediction performance. To quantify the relationship between object-oriented (OO) metrics and defects, modelling has to take into account the direct, and potentially indirect, effects of class size on defects. However, some studies have shown that size cannot be simply controlled or ignored, when building prediction models. As such, there remains a question whether, and when, to control for class size. This study provides a new in-depth examination of the impact of class size on the relationship between OO metrics and software defects or defect-proneness. We assess the impact of class size on the number of defects and defect-proneness in software systems by employing a regression-based mediation (with bootstrapping) and moderation analysis to investigate the direct and indirect effect of class size in count and binary defect prediction. Our results show that the size effect is not always significant for all metrics. Of the seven OO metrics we investigated, size consistently has significant mediation impact only on the relationship between Coupling Between Objects (CBO) and defects/defect-proneness, and a potential moderation impact on the relationship between Fan-out and defects/defect-proneness. Other metrics show mixed results, in that they are significant for some systems but not for others. Based on our results we make three recommendations. One, we encourage researchers and practitioners to examine the impact of class size for the specific data they have in hand and through the use of the proposed statistical mediation/moderation procedures. Two, we encourage empirical studies to investigate the indirect effect of possible additional variables in their models when relevant. Three, the statistical procedures adopted in this study could be used in other empirical software engineering research to investigate the influence of potential mediators/moderators.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12


  1. 1.

  2. 2.

    Version 3.6.0

  3. 3.

    all associations are statistically significant at an p < 0.05.


  1. Baron RM, Kenny DA (1986) The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. J Pers Soc Psychol 51(6):1173

    Article  Google Scholar 

  2. Basili VR, Briand LC, Melo WL (1996a) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761

    Article  Google Scholar 

  3. Basili VR, Briand LC, Melo WLWL (1996b) How reuse influences productivity in object-oriented systems. Commun ACM 39(10):104–116

    Article  Google Scholar 

  4. Bennin KE, Keung J, Phannachitta P, Monden A, Mensah S (2018) Mahakil: Diversity based oversampling approach to alleviate the class imbalance issue in software defect prediction. IEEE Trans Softw Eng 44(6)

  5. Bennin KE, Keung J, Monden A (2018) On the relative value of data resampling approaches for software defect prediction. Empir Softw Eng 24(2):1–35

    Google Scholar 

  6. Binkley AB, Schach SR (1998) Validation of the coupling dependency metric as a predictor of run-time failures and maintenance measures. In: Proceedings of the 20th International conference on software engineering. IEEE

  7. Bollen K, Stinet R (1990) Direct and indirect effects : classical and bootstrap estimates of variability. Sociol Methodol 20:115–140

    Article  Google Scholar 

  8. Briand LC, Wüst J., Daly JW, Porter DV (2000) Exploring the relationships between design measures and software quality in object-oriented systems. J Syst Softw 51(3):245–273

    Article  Google Scholar 

  9. Catal C, Diri B (2009) Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Inform Sci 179(8):1040–1058

    Article  Google Scholar 

  10. Chidamber S, Kemerer C (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493

    Article  Google Scholar 

  11. D’Ambros M, Lanza M, Robbes R (2012a) Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir Softw Eng 17(4-5):531–577

    Article  Google Scholar 

  12. D’Ambros M, Lanza M, Robbes R (2012b) Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir Softw Eng 17(4-5):531–577

    Article  Google Scholar 

  13. Dickinson W, Leon D, Fodgurski A (2001) Finding failures by cluster analysis of execution profiles. In: Proceedings of the 23rd International conference on software engineering. IEEE, pp 339–348

  14. El Emam K, Benlarbi S, Goel N, Rai SN (2001) The confounding effect of class size on the validity of object-oriented metrics. IEEE Trans Softw Eng 27(7):630–650

    Article  Google Scholar 

  15. Evanco WM (2003) Comments on “the confounding effect of class size on the validity of object-oriented metrics”. IEEE Trans Softw Eng 29(7):670–672

    Article  Google Scholar 

  16. Fenton NE, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675–689

    Article  Google Scholar 

  17. Fritz MS, Mackinnon DP (2007) Required sample size to detect the mediated effect. Psychol Sci 18(3):233–239

    Article  Google Scholar 

  18. Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the 37th International conference on software engineering. IEEE, pp 789–800

  19. Gil Y, Lalouche G (2017) On the correlation between size and metric validity. Empir Softw Eng 22(5):2585–2611

    Article  Google Scholar 

  20. Gyimothy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Softw Eng 31(10):897–910

    Article  Google Scholar 

  21. Hall T, Beecham S, Bowes D, Gray D, Counsell S (2011) A systematic review of fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276–1304

    Article  Google Scholar 

  22. Harrison R, Counsell S, Nithi R (1998) Coupling metrics for object-oriented design. In: Proceedings of 5th International software metrics symposium. IEEE

  23. Hassan AE (78) Predicting faults using the complexity of code changes. In: Proceedings of the International conference on software engineering

  24. Hayakawa T, Tsunoda M, Toda K, Nakasai K, Tahir A, Bennin KE, Monden A, Matsumoto K (2021) A novel approach to address external validity issues in fault prediction using bandit algorithms. IEICE Trans Inf Syst 104(2):327–331

    Article  Google Scholar 

  25. Hayes AF (2009) Beyond Baron and Kenny: statistical mediation analysis in the new millennium. Commun Monogr 76(4):408–420

    MathSciNet  Article  Google Scholar 

  26. Hayes AF (2013) Introduction to mediation, moderation and conditional process analysis: a regression-based approach. Guilford Press, New York

    Google Scholar 

  27. He Z, Shu F, Yang Y, Li M, Wang Q (2012) An investigation on the feasibility of cross-project defect prediction. Autom Softw Eng 19 (2):167–199

    Article  Google Scholar 

  28. Herraiz I, Hassan AE (2010) Beyond lines of code: Do we need more complexity metrics?. Making software: what really works, and why we believe it, pp 125–141

  29. Johnson PE, Grothendieck G (2012) rockchalk: Regression estimation and presentation. R package version 1(2)

  30. Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th international conference on predictive models in software engineering. ACM

  31. Kamei Y, Fukushima T, McIntosh S, Yamashita K, Ubayashi N, Hassan AE (2016) Studying just-in-time defect prediction using cross-project models. Empir Softw Eng 21(5):2072–2106

    Article  Google Scholar 

  32. Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773

    Article  Google Scholar 

  33. Kitchenham B, Madeyski L, Budgen D, Keung J, Brereton P, Charters S, Gibbs S, Pohthong A (2016) Robust statistical methods for empirical software engineering. Empir Softw Eng 21(1):212–259

    Article  Google Scholar 

  34. MacKinnon DP, Lockwood CM, Hoffman JM, West SG, Sheets V (2002) A comparison of methods to test mediation and other intervening variable effects. Psychol Methods 7(1):83–104

    Article  Google Scholar 

  35. MacKinnon DP, Lockwood CM, Williams J (2004) Comparison of approaches in estimating interaction and quadratic effects of latent variables. Multivar Behav Res 39(1):37–67

    Article  Google Scholar 

  36. Majumder S, Mody P, Menzies T (2020) Revisiting process versus product metrics: a large scale analysis. arXiv:2008.09569

  37. McNamee R (2005) Regression modelling and other methods to control confounding. Occup Environ Med 62(7):500–506

    Article  Google Scholar 

  38. Mende T., Koschke R (2009) Revisiting the evaluation of defect prediction models. In: Proceedings of the 5th International conference on predictor models in software engineering. ACM

  39. Olague HM, Etzkorn LH, Gholston S, Quattlebaum S (2007) Empirical validation of three software metrics suites to predict fault-proneness of object-oriented classes developed using highly iterative or agile software development processes. IEEE Trans Softw Eng 33(6):402–419

    Article  Google Scholar 

  40. Osman H, Ghafari M, Nierstrasz O (2018) The impact of feature selection on predicting the number of bugs. arXiv:1807.04486

  41. Pascarella L, Palomba F, Bacchelli A (2019) Fine-grained just-in-time defect prediction. J Syst Softw 150:22–36

    Article  Google Scholar 

  42. Preacher KJ, Hayes AF (2008) Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models. Behav Res Methods 40(3):879–891

    Article  Google Scholar 

  43. Schröter A., Zimmermann T, Zeller A (2006) Predicting component failures at design time. In: Proceedings of the International symposium on empirical software engineering. ACM, pp 18–27

  44. Shepperd M, Bowes D, Hall T (2014) Researcher Bias: The Use of Machine Learning in Software Defect Prediction. IEEE Trans Softw Eng 40(6):603–616

    Article  Google Scholar 

  45. Sobel ME (1982) Asymptotic confidence intervals for indirect effects in structural equation models. Sociol Methodol 13(1982):290–312

    Article  Google Scholar 

  46. Song Q, Jia Z, Shepperd M, Ying S, Liu J (2011) A general software defect-proneness prediction framework. IEEE Trans Softw Eng 37(3):356–370

    Article  Google Scholar 

  47. Tahir A (2021) Does class size matter? online repository and replication package.

  48. Tahir A, Bennin KE, MacDonell SG, Marsland S (2018) Revisiting the size effect in software fault prediction models. In: Proceedings of the 12th International symposium on empirical software engineering and measurement. ACM

  49. Tang M.-H., Kao M.-H., Chen M.-H. (1999) An empirical study on object-oriented metrics. In: Proceedings 6th International software metrics symposium. IEEE

  50. Tantithamthavorn C, Hassan AE (2018) An experience report on defect modelling in practice: Pitfalls and challenges. In: Proceedings of the 40th International conference on software engineering: software engineering in practice. IEEE

  51. Tingley D, Yamamoto T, Hirose K, Keele L, Imai K (2014) Mediation: R package for causal mediation analysis. J Stat Softw 59(5)

  52. Turhan B, Menzies T, Bener AB, Di Stefano J (2009) On the relative value of cross-company and within-company data for defect prediction. Empir Softw Eng 14(5):540–578

    Article  Google Scholar 

  53. Ying AT, Murphy GC, Ng R, Chu-Carroll MC (2004) Predicting source code changes by mining change history. IEEE Trans Softw Eng 30(9)

  54. Zhang H (2009) An investigation of the relationships between lines of code and defects. In: Proceedings of the international conference on software maintenance. IEEE

  55. Zhou Y, Leung H (2006) Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Trans Softw Eng 32(10):771–789

    Article  Google Scholar 

  56. Zhou Y, Xu B, Leung H, Chen LIN (2014) An in-depth study of the potentially confounding effect of class size in fault prediction. ACM Trans Softw Eng Methodol 23(1)

  57. Zimmermann T, Nagappan N, Zeller A (2008) Predicting bugs from history. In: Software evolution. Springer, pp. 69–88

  58. Zimmermann T, Premraj R, Zeller A (2007) Predicting defects for eclipse. In: Proceedings of the 3rd International workshop on predictor models in software engineering

Download references


The authors would like to thank the reviewers for the detailed and constructive comments on the earlier version of this paper, which were instrumental to improving the quality of the work.

Author information



Corresponding author

Correspondence to Amjed Tahir.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by: Audris Mockus

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tahir, A., Bennin, K.E., Xiao, X. et al. Does class size matter? An in-depth assessment of the effect of class size in software defect prediction. Empir Software Eng 26, 106 (2021).

Download citation


  • Defect prediction
  • Class size
  • Metrics
  • Software quality