Skip to main content
Log in

On the correlation between size and metric validity

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Empirical validation of code metrics has a long history of success. Many metrics have been shown to be good predictors of external features, such as correlation to bugs. Our study provides an alternative explanation to such validation, attributing it to the confounding effect of size. In contradiction to received wisdom, we argue that the validity of a metric can be explained by its correlation to the size of the code artifact. In fact, this work came about in view of our failure in the quest of finding a metric that is both valid and free of this confounding effect. Our main discovery is that, with the appropriate (non-parametric) transformations, the validity of a metric can be accurately (with R-squared values being at times as high as 0.97) predicted from its correlation with size. The reported results are with respect to a suite of 26 metrics, that includes the famous Chidamber and Kemerer metrics. Concretely, it is shown that the more a metric is correlated with size, the more able it is to predict external features values, and vice-versa. We consider two methods for controlling for size, by linear transformations. As it turns out, metrics controlled for size, tend to eliminate their predictive capabilities. We also show that the famous Chidamber and Kemerer metrics are no better than other metrics in our suite. Overall, our results suggest code size is the only “unique” valid metric.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. A full year before Java (Arnold and Gosling 1996) even came out.

  2. Non-Java files, though found in many of the projects, were ignored.

  3. We used the first to download from the top members of both lists.

  4. https://github.com/trending?l=java&since=monthly.

  5. http://groups.inf.ed.ac.uk/cup/javaGithub/.

  6. http://github.com/.

  7. https://code.google.com/.

  8. The rationale is that CHAM is the best approximation (within the limits of the depth of our code analysis) of the “Chameleonicity” (Gil et al. 1998) metric, which was shown to be linked to the algorithmic complexity of analyzing virtual function calls.

  9. both are based on a syntactical code analysis: tokens are the language lexical tokens while the term “statements” refers to the respective production in the language EBNF, except that bracketed blocks are ignored.

  10. Similar metrics are mentioned in the literature, e.g., Harrison and Magel (1981) suggested “a complexity measure based on nesting level”. See also the work of Piwowarski (1982) and that of Hindle et al. (2008).

  11. The number of positive integers smaller than a given number which are relatively prime to it.

  12. https://github.com/GalLalouche/validity_independence.

  13. i.e., with no meetings with other team-members and no guidance.

References

  • Alan O, Catal C (2011) Thresholds based outlier detection approach for mining class outliers: an empirical case study on software measurement datasets. Expert Syst Appl 38(4):3440–3445

    Article  Google Scholar 

  • Allamanis M, Charles S (2013) Mining source code repositories at massive scale using language modeling. In: The 10th working conference on mining software repositories. IEEE, pp 207–216

    Google Scholar 

  • Arnold K, Gosling J (1996) The Java programming language. The Java Series. Addison-Wesley, Reading MA

  • Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761

    Article  Google Scholar 

  • Baxter G, Frean M, Noble J, Rickerby M, Smith H, Visser M, Melton H, Tempero E (2006) Understanding the shape of Java software. In: Tarr PL, Cook WR (eds) Proceedings of the 21st Ann. Conf. on OO Prog. Sys., Lang., & Appl. (OOPSLA’06). ACM, Portland, Oregon

    Google Scholar 

  • Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced?: bias in bug-fix datasets. In: Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pp 121–130. ACM

    Google Scholar 

  • Caldiera G, Basili VR (1991) Identifying and qualifying reusable software components. IEEE Comp. pp 61–70

  • Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493

    Article  Google Scholar 

  • Coleman D, Ash D, Lowther B, Oman P (1994) Using metrics to evaluate software system maintainability. Computer 27(8):44–49

    Article  Google Scholar 

  • Deutsch LP (1996) Gzip file format specification version 4.3. RFC #1952

  • Eastlake D, Jones P (2001) US secure hash algorithm 1 (SHA1)

  • El Emam K, Benlarbi S, Goel N, Rai SN (2001) The confounding effect of class size on the validity of object-oriented metrics. IEEE Trans Softw Eng 27(7):630–650

    Article  Google Scholar 

  • Fayad ME, Altman A (2001) Thinking objectively: an introduction to software stability. Commun ACM 44(9):95

    Article  Google Scholar 

  • Fenton N, Bieman J (2014) Software metrics: a rigorous and practical approach, 3rd edn. CRC Press, Inc., Boca Raton, FL

  • Gil J, Itai A, Jul E (1998) The complexity of type analysis of object oriented programs, vol 1445. Springer, Brussels, Belgium

    Google Scholar 

  • Gillies A (2011) Software quality: theory and management. http://Lulu.com

  • Gyimothy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Softw Eng 31 (10):897–910

    Article  Google Scholar 

  • Halstead MH (1977) Elements of software science. Elsevier, NY

    MATH  Google Scholar 

  • Harrison WA, Magel KI (1981) A complexity measure based on nesting level. ACM Sigplan Notices 16(3):63–74

    Article  Google Scholar 

  • Henderson-Sellers B (1996) Software metrics. Prentice-Hall

  • Herraiz I, Gonzalez-Barahona JM, Robles G (2007) Towards a theoretical model for software growth. In: Proceedings of the 4th international workshop on mining software repositories. IEEE Computer Society, p 21

    Google Scholar 

  • Hindle A, Godfrey M, Holt R (2008) Reading beside the lines: Indentation as a proxy for complexity metric. In: The 16th IEEE international conference on program comprehension (ICPC’08), pp 133–142

    Google Scholar 

  • Jbara A, Feitelson DG (2014) On the effect of code regularity on comprehension. In: ICPC, pp 189–200

    Google Scholar 

  • Khoshgoftaar TM, Munson JC (1990) Predicting software development errors using software complexity metrics. IEEE J Sel Areas Commun 8(2):253–261

    Article  Google Scholar 

  • Li HF, Cheung WK (1987) An empirical study of software metrics. IEEE Trans Softw Eng 13(6):697–708

    Article  Google Scholar 

  • Li W, Henry S (1993) Object-oriented metrics that predict maintainability. Sys Soft 23:111–122

    Article  Google Scholar 

  • Lorenz M, Kidd J (1994) Object-oriented software metrics: a practical guide. Prentice-Hall

  • McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 2(4):308–320

    Article  MathSciNet  MATH  Google Scholar 

  • Meyer B (1988) Object-oriented software construction. International Series in Computer Science. Prentice-Hall

  • Mohagheghi P, Conradi R, Killi OM, Schwarz H (2004) An empirical study of software reuse vs. defect-density and stability. In: Proceedings of the 26th international conference on software engineering (ICSE’04) . IEEE Computer Society Press, Edinburgh, Scotland, United Kingdom, pp 282–291

    Chapter  Google Scholar 

  • Olague HM, Etzkorn LH, Gholston S, Quattlebaum S (2007) Empirical validation of three software metrics suites to predict fault-proneness of object-oriented classes developed using highly iterative or agile software development processes. IEEE Trans Softw Eng 33(6):402–419

    Article  Google Scholar 

  • Piwowarski P (1982) A nesting level complexity measure. ACM Sigplan Notices 17(9):44–50

    Article  Google Scholar 

  • Shepperd M (1988) A critique of cyclomatic complexity as a software metric. Softw Eng J 3(2):30–36

    Article  Google Scholar 

  • Subramanyam R, Krishnan M (2003) Empirical analysis of CK metrics for object-oriented design complexity: implications for software defects. IEEE Trans Softw Eng 29(4):297–310. doi:10.1109/TSE.2003.1191795

    Article  Google Scholar 

  • Zhou Y, Leung H, Xu B (2009) Examining the potentially confounding effect of class size on the associations between object-oriented metrics and change-proneness. IEEE Trans Software Eng 35(5):607–623

    Article  Google Scholar 

  • Ziv J, Lempel A (1978) Compression of individual sequences via variable-rate coding. IEEE Trans Inf Theory 24(5):530–536

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yossi Gil.

Additional information

Communicated by: Richard Paige, Jordi Cabot and Neil Ernst

This research was supported by the Israel Science Foundation (ISF), grant No. 1803/13.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gil, Y., Lalouche, G. On the correlation between size and metric validity. Empir Software Eng 22, 2585–2611 (2017). https://doi.org/10.1007/s10664-017-9513-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-017-9513-5

Keywords

Navigation