Abstract
Empirical validation of code metrics has a long history of success. Many metrics have been shown to be good predictors of external features, such as correlation to bugs. Our study provides an alternative explanation to such validation, attributing it to the confounding effect of size. In contradiction to received wisdom, we argue that the validity of a metric can be explained by its correlation to the size of the code artifact. In fact, this work came about in view of our failure in the quest of finding a metric that is both valid and free of this confounding effect. Our main discovery is that, with the appropriate (non-parametric) transformations, the validity of a metric can be accurately (with R-squared values being at times as high as 0.97) predicted from its correlation with size. The reported results are with respect to a suite of 26 metrics, that includes the famous Chidamber and Kemerer metrics. Concretely, it is shown that the more a metric is correlated with size, the more able it is to predict external features values, and vice-versa. We consider two methods for controlling for size, by linear transformations. As it turns out, metrics controlled for size, tend to eliminate their predictive capabilities. We also show that the famous Chidamber and Kemerer metrics are no better than other metrics in our suite. Overall, our results suggest code size is the only “unique” valid metric.
Similar content being viewed by others
Notes
A full year before Java (Arnold and Gosling 1996) even came out.
Non-Java files, though found in many of the projects, were ignored.
We used the first to download from the top members of both lists.
The rationale is that CHAM is the best approximation (within the limits of the depth of our code analysis) of the “Chameleonicity” (Gil et al. 1998) metric, which was shown to be linked to the algorithmic complexity of analyzing virtual function calls.
both are based on a syntactical code analysis: tokens are the language lexical tokens while the term “statements” refers to the respective production in the language EBNF, except that bracketed blocks are ignored.
The number of positive integers smaller than a given number which are relatively prime to it.
i.e., with no meetings with other team-members and no guidance.
References
Alan O, Catal C (2011) Thresholds based outlier detection approach for mining class outliers: an empirical case study on software measurement datasets. Expert Syst Appl 38(4):3440–3445
Allamanis M, Charles S (2013) Mining source code repositories at massive scale using language modeling. In: The 10th working conference on mining software repositories. IEEE, pp 207–216
Arnold K, Gosling J (1996) The Java programming language. The Java Series. Addison-Wesley, Reading MA
Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761
Baxter G, Frean M, Noble J, Rickerby M, Smith H, Visser M, Melton H, Tempero E (2006) Understanding the shape of Java software. In: Tarr PL, Cook WR (eds) Proceedings of the 21st Ann. Conf. on OO Prog. Sys., Lang., & Appl. (OOPSLA’06). ACM, Portland, Oregon
Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced?: bias in bug-fix datasets. In: Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pp 121–130. ACM
Caldiera G, Basili VR (1991) Identifying and qualifying reusable software components. IEEE Comp. pp 61–70
Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493
Coleman D, Ash D, Lowther B, Oman P (1994) Using metrics to evaluate software system maintainability. Computer 27(8):44–49
Deutsch LP (1996) Gzip file format specification version 4.3. RFC #1952
Eastlake D, Jones P (2001) US secure hash algorithm 1 (SHA1)
El Emam K, Benlarbi S, Goel N, Rai SN (2001) The confounding effect of class size on the validity of object-oriented metrics. IEEE Trans Softw Eng 27(7):630–650
Fayad ME, Altman A (2001) Thinking objectively: an introduction to software stability. Commun ACM 44(9):95
Fenton N, Bieman J (2014) Software metrics: a rigorous and practical approach, 3rd edn. CRC Press, Inc., Boca Raton, FL
Gil J, Itai A, Jul E (1998) The complexity of type analysis of object oriented programs, vol 1445. Springer, Brussels, Belgium
Gillies A (2011) Software quality: theory and management. http://Lulu.com
Gyimothy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Softw Eng 31 (10):897–910
Halstead MH (1977) Elements of software science. Elsevier, NY
Harrison WA, Magel KI (1981) A complexity measure based on nesting level. ACM Sigplan Notices 16(3):63–74
Henderson-Sellers B (1996) Software metrics. Prentice-Hall
Herraiz I, Gonzalez-Barahona JM, Robles G (2007) Towards a theoretical model for software growth. In: Proceedings of the 4th international workshop on mining software repositories. IEEE Computer Society, p 21
Hindle A, Godfrey M, Holt R (2008) Reading beside the lines: Indentation as a proxy for complexity metric. In: The 16th IEEE international conference on program comprehension (ICPC’08), pp 133–142
Jbara A, Feitelson DG (2014) On the effect of code regularity on comprehension. In: ICPC, pp 189–200
Khoshgoftaar TM, Munson JC (1990) Predicting software development errors using software complexity metrics. IEEE J Sel Areas Commun 8(2):253–261
Li HF, Cheung WK (1987) An empirical study of software metrics. IEEE Trans Softw Eng 13(6):697–708
Li W, Henry S (1993) Object-oriented metrics that predict maintainability. Sys Soft 23:111–122
Lorenz M, Kidd J (1994) Object-oriented software metrics: a practical guide. Prentice-Hall
McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 2(4):308–320
Meyer B (1988) Object-oriented software construction. International Series in Computer Science. Prentice-Hall
Mohagheghi P, Conradi R, Killi OM, Schwarz H (2004) An empirical study of software reuse vs. defect-density and stability. In: Proceedings of the 26th international conference on software engineering (ICSE’04) . IEEE Computer Society Press, Edinburgh, Scotland, United Kingdom, pp 282–291
Olague HM, Etzkorn LH, Gholston S, Quattlebaum S (2007) Empirical validation of three software metrics suites to predict fault-proneness of object-oriented classes developed using highly iterative or agile software development processes. IEEE Trans Softw Eng 33(6):402–419
Piwowarski P (1982) A nesting level complexity measure. ACM Sigplan Notices 17(9):44–50
Shepperd M (1988) A critique of cyclomatic complexity as a software metric. Softw Eng J 3(2):30–36
Subramanyam R, Krishnan M (2003) Empirical analysis of CK metrics for object-oriented design complexity: implications for software defects. IEEE Trans Softw Eng 29(4):297–310. doi:10.1109/TSE.2003.1191795
Zhou Y, Leung H, Xu B (2009) Examining the potentially confounding effect of class size on the associations between object-oriented metrics and change-proneness. IEEE Trans Software Eng 35(5):607–623
Ziv J, Lempel A (1978) Compression of individual sequences via variable-rate coding. IEEE Trans Inf Theory 24(5):530–536
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Richard Paige, Jordi Cabot and Neil Ernst
This research was supported by the Israel Science Foundation (ISF), grant No. 1803/13.
Rights and permissions
About this article
Cite this article
Gil, Y., Lalouche, G. On the correlation between size and metric validity. Empir Software Eng 22, 2585–2611 (2017). https://doi.org/10.1007/s10664-017-9513-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-017-9513-5