Abstract
The size of the effect of the difference in two groups with respect to a variable of interest may be estimated by the classical Cohen’s d. A recently proposed generalized estimator allows conditioning on further independent variables within the framework of a linear regression model. In this note, it is demonstrated how unbiased estimation of the effect size parameter together with a corresponding standard error may be obtained based on the non-central t distribution. The portrayed estimator may be considered as a natural generalization of the unbiased Hedges’ g. In addition, confidence interval estimation for the unknown parameter is demonstrated by applying the so-called inversion confidence interval principle. The regarded properties collapse to already known ones in case of absence of any additional independent variables. The stated remarks are illustrated with a publicly available data set.
Avoid common mistakes on your manuscript.
1 Introduction
Consider independent samples of sizes \(n_{0}\) and \(n_{1}\) of a statistical variable Y in two groups such that Y follows a normal distribution with expectation \(\mu _{0}\) and variance \(\sigma ^2\) in group ”0” and expectation \(\mu _{1}\) and the same variance \(\sigma ^2\) in group ”1”. As an estimator for the size d of the group effect, Cohen (1988, p. 66ff) considers
where \({\overline{Y}}_{0}\) and \({\overline{Y}}_{1}\) are the sample means and \(Q_{0}\) and \(Q_{1}\) are the sums of squared differences from the respective sample means in the two groups. The estimator \({\widehat{d}}\) is related to the test statistic t of the usual two-sample test statistic for the null hypotheses \(H_{0}: \mu _{0} = \mu _{1}\) versus the alternative \(H_{1}: \mu _{0} \not = \mu _{1}\) by the formula
see (2.5.3) in Cohen (1988). Here, \({\widetilde{n}}\) denotes the harmonic mean of the two numbers \(n_{0}\) and \(n_{1}\), so that \(2/{\widetilde{n}} = (n_{0}+n_{1})/(n_{0} \cdot n_{1})\). Cohen suggests values \(|{\widehat{d}}| =0.2\), \(|{\widehat{d}}|=0.5\) and \(|{\widehat{d}}|=0.8\) as an indication for a small, medium and large effect, respectively.
When the variable Y of interest is considered to possibly depend on a number of explanatory variables, one may consider a linear regression model described by
where \(\varepsilon \) follows a normal distribution with expectation 0 and variance \(\sigma ^2 >0\). The grouping variable \(X_{1}\) takes 0 as a value when a a response observation falls into group “0” and 1 when an observation falls into group “1”. There are k further explanatory variables \(X_{2}, \ldots , X_{1+k}\), which are assumed as absent in the simple case \(k=0\).
From Eq. (3) the expectation of Y conditional on the group is given as
in group ”0” and \(\mu _{1} = \text {E}[Y|X_{1} = 1] = \text {E}[Y|X_{1} = 0] + \beta _{1}\) in group ”1”. Hence,
is the unknown population effect size, see Cohen (1988, (2.5.1)).
Recently, Groß and Möller (2023) considered a natural generalization of Cohen’s estimator from (2) based on the above outlined properties in the regression setting with additional explanatory variables. This estimator is given by
where \({\widehat{\beta }}_{1}\) is the least squares estimator of \(\beta _{1}\) and \({\widehat{\sigma }}^2\) is the usual unbiased estimator for \(\sigma ^2\) under model (3). In matrix notation the model may also be written as
where \({\varvec{Y}}\) represents the \(n \times 1\) vector of observations of the explained variable Y, the \(n\times (2 +k)\) model matrix \({\varvec{X}}\) is assumed to be of full column rank, and \({\varvec{\varepsilon }}\) follows a n-variate normal distribution with expectation vector \({\varvec{0}}\) and variance-covariance matrix \(\sigma ^2 {\varvec{I}}_{n}\) with \({\varvec{I}}_{n}\) denoting the \(n\times n\) identity matrix. Generalizations of this setting are reviewed e.g. by Groß (2004). Under model (7),
where \({\varvec{e}}\) is a \(n\times 1\) vector of 0s except for a 1 at the position of \(\beta _{1}\) in the \((2+k)\times 1\) parameter vector \({\varvec{\beta }} =(\beta _{0}, \beta _{1}, \beta _{2}, \ldots , \beta _{k})^{\prime }\) with \(\beta _{2}\) up to \(\beta _{k}\) considered as absent in case \(k=0\). Moreover,
In this setting, as noted by Groß and Möller (2023), both formulas (6) and (2) yield identical estimates for the special case of \(k=0\).
In the following Sect. 2 we state some theoretical results referring to and also generalizing known results in the literature specifically with respect to unbiased estimation of Cohen’s d in the presence of covariates. Section 3 then provides a concise workflow for application of the proposed method on the basis of a publicly available data set.
2 Statistical properties of Cohen’s d
Consider the two quantities
For example from Seber and Lee (2003, Theorem 3.5), it is seen that the random variable \(X = {\widehat{\beta }}_{1}/\sqrt{\sigma ^2 v_{1}^{2}}\) follows a normal distribution with expectation \(\tau \) and variance 1, and the random variable \(Y = m {\widehat{\sigma }}^{2}/\sigma ^2\) independently follows a (central) chi squared distribution with m degrees of freedom. Then, the ratio \(X/\sqrt{Y/m} = {\widehat{d}}/\sqrt{v_{1}^{2}}\) follows a non-central t distribution, see e.g. Johnson and Welch (1940). Thus, one may state the following.
Proposition 1
Under the assumptions of model (7),
where \(t(m, \tau )\) denotes the non-central t distribution with m degrees of freedom and non-centrality parameter \(\tau \).
In case \(k=0\) the model matrix \({\varvec{X}}\) becomes
where \({\varvec{1}}_{\nu }\) denotes the \(\nu \times 1\) vectors of 1s, and a little matrix algebra reveals
Then Proposition 1 is easily seen to reduce to the result given by Hedges (1981, Sect. 3).
From Johnson and Welch (1940), the expectation and variance of the \(t(m, \tau )\) distribution are given as
where
Remark 1
From Tricomi and Erdélyi (1951) one may conclude
implying that \(\lim _{m\rightarrow \infty } c(m) = 1\).
From Remark 1, the number \(1 + 3/(4 m)\) may serve as a rough approximation of c(m), which, however, is less precise than the proposal
from Hedges (1981), see also Table 2 in Goulet-Pelletier and Cousineau (2018) for a comparison of exact values with corresponding approximations. As another approximation not further investigated here one may consider \(c(m)\approx \sqrt{2 m/(2m-3)}\) for larger m, which may be concluded from Theorem 2.1 in Laforgia and Natalini (2012).
Now, combining Proposition 1 with (13) and noting \(d = \tau \sqrt{v_{1}^{2}}\) gives the following.
Proposition 2
Under the assumptions of model (7) with \(m = n - 2 - k > 2\),
for \({\widehat{d}}\) from (6).
When considering the asymptotic behaviour of \({\widehat{d}}\) for increasing number of observations it is reasonable to assume that the group size proportion remains constant, i.e.
for some \(0< \gamma <1\) and any positive integer n. Then, letting n approach \(\infty \) is equivalent to letting m approach \(\infty \), provided the number k of additional independent variables does not depend on the number of observations.
Remark 2
From the above Proposition 2 and Remark 1, it follows that \({\widehat{d}}\) is asymptotically unbiased for d, i.e. \(\lim _{m\rightarrow \infty } \text {E}({\widehat{d}}) = d\). If \(\lim _{m\rightarrow \infty } \text {var}({\widehat{\beta }}_{1}) = 0\), then \(\lim _{m\rightarrow \infty } \text {var}({\widehat{d}}) = 0\), in which case \({\widehat{d}}\) is consistent in quadratic mean for d under model (7).
From (12) it is easily seen that the assumption \(\lim _{m\rightarrow \infty } \text {var}({\widehat{\beta }}_{1}) = 0\) is satisfied when \(k=0\). The consistency of \({\widehat{d}}\) in this case has already been noted by Hedges (1981, p. 112).
From Proposition 2 an unbiased estimator for d is provided by \({\widehat{d}}_{u} = c(m)^{-1} {\widehat{d}}\), which has also been called Hedges’ g when \(k=0\), cf. Hedges (1981). A corresponding standard error may be derived from Proposition 2 by considering the square root of \(\text {var}({\widehat{d}}_{u})\) with d replaced by \({\widehat{d}}_{u}\).
Remark 3
An unbiased estimator for the parameter d is given by \({\widehat{d}}_{u} = c(m)^{-1} {\widehat{d}}\) with corresponding standard error
As also noted in Goulet-Pelletier and Cousineau (2018), unbiased estimation is to be preferred over biased estimation, but for large m the difference between \({\widehat{d}}\) and \({\widehat{d}}_{u}\) is quite small.
From Proposition 1 it is possible to construct a confidence interval for the parameter \(\tau \) defined in (9) by applying the inversion confidence interval principle from Proposition 2 in Steiger and Fouladi (1997). For this, let \(F(\tau ) \equiv \text {Pr}((-\infty , {\widehat{d}}/\sqrt{v_{1}^{2}}])\) be the cumulative distribution function of the \(t(m,\tau )\) distribution with \(m=n-2 - k\) degrees of freedom at \({\widehat{d}}/\sqrt{v_{1}^{2}}\), considered as a function of the non-centrality parameter \(\tau \). For a specified \(\alpha \) with \(0< \alpha < 1\) let \(\tau _{1}\) satisfy \(F(\tau _{1}) = 1-\alpha /2\) and let \(\tau _{2}\) satisfy \(F(\tau _{2}) = \alpha /2\). Then the interval \([\tau _{1}, \tau _{2}]\) specifies a \((1-\alpha )\) confidence interval for \(\tau \). Hence we may state the following.
Remark 4
Let \(\tau _{1}\) and \(\tau _{2}\) be obtained as described above. Then the interval
specifies a \((1-\alpha )\) confidence interval for the parameter d.
Note that this approach has also been illustrated in Example 3 by Steiger and Fouladi (1997) for the special case \(k=0\).
3 Data example
To illustrate and apply the above listed properties, a data set available from the UCI machine learning repository is employed, see Dua and Graff (2017). It contains student achievement measurements in secondary education of two Portuguese schools, see Cortez and Silva (2008). The following computations are carried out with the statistical software R (R Core Team 2023).
We consider the final Mathematics grade with integer values between 0 and 20 as the dependent variable Y from \(n=395\) students in two groups. The group 0 is defined by home address indicated as ‘rural’ with \(n_{0} = 307\) observations, while group 1 is defined by home address indicated as ‘urban’ with \(n_{1} =88\) observations, see Fig. 1. On average, students from group 1 perform better (\({\overline{Y}}_{1} = 10.674267\)) than students form group 0 (\({\overline{Y}}_{0} = 9.511364\)). The corresponding two-sample two-sided t test statistic with equal variances reads \(|t| = 2.1084\) admitting a p-value of 0.03563. Hence, one may conclude a significant difference at significance level 0.05.
Cohen’s d estimator from (2) reads
indicating a rather small effect. The very same value may also be obtained from the R package effectsize, see Ben-Shachar et al. (2020), except for an opposite sign. This is supposed to originate from using the difference \({\overline{Y}}_{0} - {\overline{Y}}_{1}\) instead of \({\overline{Y}}_{1} - {\overline{Y}}_{0}\) in the involved formulas. As a matter of fact the t test statistic in R is computed by applying the first difference, while the regression coefficient \({\widehat{\beta }}_{1}\) is identical to the second difference for the special case of \(k=0\). In the general regression context with \(k>0\) the coefficient \({\widehat{\beta }}_{1}\) is the estimated positive or negative increment of the intercept in group 1 compared to group 0 conditional on the independent variables – and may even receive an opposite sign to the unconditional mean difference \({\overline{Y}}_{1} - {\overline{Y}}_{0}\).
From fitting the model \(Y= \beta _{0} + \beta _{1} X_{1} + \varepsilon \) one gets \({\widehat{\beta }}_{1} = 1.162903\) and \(\sqrt{{\widehat{\sigma }}^{2}} = 4.561543\) yielding the same estimate \({\widehat{d}}\) by (6) as before. By using the approximation \(c(n-2) \approx 1.001913\) from (16) one gets \({\widehat{d}}_{u} = c(n-2) {\widehat{d}} = 0.2544\). Except for the sign, this is also exactly the value of Hedges’ g computed from the package effectsize.
Now two additional independent variables are considered. The home to school travel time (incorporated as a discrete variable \(X_{2}\) with values 1 to 4 corresponding to travel times less that 15 min, 15 to 30 min, 30 to 60 min and more than 60 min) and the number of past class failures (incorporated as a discrete variable \(X_{3}\) with values from 0 to 4 where 4 is noted for more than 3 failures). Then from fitting a model
one gets
where \(v_{1}^2\) is computed as before from \(({\varvec{X}}^{\prime } {\varvec{X}})^{-1}\) on the \(395\times 4\) model matrix \({\varvec{X}}\). As noted by Groß and Möller (2023), the estimated regression coefficient \({\widehat{\beta }}_{1}\) is also identical to the group mean difference \({\overline{Y}}_{*1} - {\overline{Y}}_{*0}\) when a new variable \(Y_{*} = Y - {\widehat{\beta }}_{2}X_{2} - {\widehat{\beta }}_{3}X_{3}\) is created. This does, however, not imply that the classical effect size formulas may simply be applied with Y replaced by \(Y_{*}\), since that would ignore an additional required adjustment for the degrees of freedom, see formula (4) in Groß and Möller (2023). The (biased) effect size estimate then reads
implying a very small net group effect size when home school travel time and number of past failures are held constant. As noted by Groß and Möller (2023), this value may also be converted to Cohen’s \(f^2\) as
being in line with the indication of a very small effect size. The approximation (16) gives \(c(m)\approx 1.001923\). Then, by Remark 3
The usual approximate normal \(95\%\) confidence interval \({\widehat{d}}_{u} \pm 1.96 \,\text {se}({\widehat{d}}_{u})\) reads
To obtain a 95% confidence interval by the inversion principle, the cumulative distribution function \(F(\tau )\) of the \(t(m, \tau )\) distribution with \(m=391\) is considered at \({\widehat{d}}/\sqrt{v_{1}^{2}} = 1.136455\). Then \(F(\tau _{1}) = 0.975\) for \(\tau _{1} = -0.8258511\) and \(F(\tau _{2}) = 0.025\) for \(\tau _{2} = 3.0973105\). Remark 4 yields
as the corresponding \(95\%\) confidence interval for d, quite similar to the above.
4 Conclusion
As illustrated above, a linear regression model may be applied to obtain the net effect size of a group difference with respect to a response variable of interest when other variables are held constant. The size of the group difference effect with respect to each of the incorporated additional variables, however, naturally remains unaccounted for. Nonetheless the above remarks show that some classical results on Cohen’s d carry over within a more general regression framework by (a) applying the estimator from (6), (b) replacing the number of the degrees of freedom \(n-2\) by \(n-2 -k\), with k being the number of additional independent variables, and (c) replacing the quantity \(2/ {\widetilde{n}} = (n_{0} + n_{1})/(n_{0} \cdot n_{1})\) by \(v_{1}^2\), the latter being the diagonal element of the matrix \(({\varvec{X}}^{\prime } {\varvec{X}})^{-1}\) corresponding to the group 1 regression coefficient. The results fit in between the classical effect size measure for the (unconditional) difference in two groups and the effect size measure \(f^2\) usually considered within an even more general regression context.
References
Ben-Shachar MS, Lüdecke D, Makowski D (2020) Effectsize: Estimation of effect size indices and standardized parameters. J Open Source Softw 5:2815. https://doi.org/10.21105/joss.02815
Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum Associates, Mahwah
Cortez P, Silva AMG (2008) Using data mining to predict secondary school student performance. In: Carvalho Brito, A.E.S., Feliz-Teixeira, J.M. (eds.) 15th European Concurrent Engineering Conference, 5th Future Business Technology Conference, pp 5–12. EUROSIS-ETI, Ghent, Belgium
Dua D, Graff C (2017) UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
Goulet-Pelletier J-C, Cousineau D (2018) A review of effect sizes and their confidence intervals, Part I: the Cohen’s d family. Quant Methods Psychol 14:242–265
Groß J (2004) The general Gauss-Markov model with possibly singular dispersion matrix. Statistical Papers 45:311–336
Groß J, Möller A (2023) A note on Cohen’s d from a partitioned linear regression model. J Stat Theory Practice 17:22
Hedges LV (1981) Distribution theory for Glass’s estimator of effect size and related estimators. J Educational Stat 6:107–128
Johnson N, Welch B (1940) Applications of the non-central t-distribution. Biometrika 31:362–389
Laforgia A, Natalini P (2012) On the asymptotic expansion of a ratio of gamma functions. J Math Anal Appl 389:833–837
R Core Team (2023) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. R Foundation for Statistical Computing. https://www.R-project.org/
Seber GA, Lee AJ (2003) Linear Regression Analysis, 2nd edn. John Wiley, Hoboken
Steiger JH, Fouladi RT (1997) Noncentrality interval estimation and the evaluation of statistical models. In: Harlow LL, Mulaik SA, Steiger JH (eds) What if there were no significance tests. Lawrence Erlbaum Associates, Mahwah, pp 221–257
Tricomi FG, Erdélyi A (1951) The asymptotic expansion of a ratio of gamma functions. Pacific J Math 1:133–142
Acknowledgements
The second author gratefully acknowledges support by the Helmholtz Association’s pilot project ‘Uncertainty Quantification’.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Groß, J., Möller, A. Some additional remarks on statistical properties of Cohen’s d in the presence of covariates. Stat Papers (2024). https://doi.org/10.1007/s00362-023-01527-9
Received:
Revised:
Published:
DOI: https://doi.org/10.1007/s00362-023-01527-9