Abstract
In this note, we introduce a generalized formula for Cohen’s d under the presence of additional independent variables, providing a measure for the size of a possible effect concerning the size of a difference location effect of a variable in two groups. This is done by employing the so-called Frisch–Waugh–Lovell theorem in a partitioned linear regression model. The generalization is motivated by demonstrating the relationship to appropriate t and F statistics. Our discussion is further illustrated by inference about a publicly available data set.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
When applying statistical testing of hypotheses to data, it is often recommended not only to report the corresponding p value, but in addition to provide a measure for the effect associated with a possible rejection of the null hypothesis, see e.g. Wilkinson [18]. Such a measure may be useful when sample sizes are to be fixed during the planning phase of a study, or when it is desired to assess the relevance of an actual rejection when given sample sizes are large. Effect size measures are strongly related to power analysis as carried out in the seminal book by Cohen [3].
A widely used measure is the so-called Cohen’s d, see also Hedges [12] and Kraemer [13], which is an effect size measure for the two-sample t test with equal variances. Consider independent samples of sizes \(n_{1}\) and \(n_{2}\) of a statistical variable y in two groups such that y follows a normal distribution with expectation \(\mu _{1}\) and variance \(\sigma ^2\) in group 1 and expectation \(\mu _{2}\) and the same variance \(\sigma ^2\) in group 2. Let t denote the usual two-sample test statistic for the null hypotheses \(H_{0}: \mu _{1} = \mu _{2}\) versus the alternative \(H_{1}: \mu _{1} \not = \mu _{2}\). As a measure for the size of an effect, Cohen [3, p. 66ff] considers the absolute value of
where \({\overline{y}}_{j}\) is the sample mean in group j, \(j=1,2\), and \(s_{j}^2 =\sum _{i} (y_{i} - {\overline{y}}_{j})^2\), where summation is carried out with respect to all observations from group j. The effect size d is related to the test statistic t by the formula
see (2.5.3) in Cohen [3]. According to Cohen, values \(|d|=0.2\), \(|d|=0.5\) and \(|d |=0.8\) indicate a small, medium and large effect, respectively.
It may also be of interest to have a corresponding measure when the variable y depends on further independent variables. In his Chapter 9, Cohen [3] deals with such a multiple regression situation and discusses the effect size measure \(f^2\) at length, as will further be explicated in our Section 4.
However, an analogous measure to d is rare to find; see Wilson [19, Sect. 3.14], Lipsey and Wilson [14] for such a proposal. Nonetheless, it may be of particular interest to have comparable measures of an effect size for the very same grouped variable y but additionally depending on different sets of independent variables. This is exemplarily carried out in our Section 5. In the following, we introduce such a measure as a generalization to d by considering a linear regression model
where z takes the value \(z_{i} = 0\) if the corresponding observation \(y_{i}\) of the dependent variable y belongs to group 1 and \(z_{i} = 1\) if \(y_{i}\) belongs to group 2, \(i= 1,\ldots , n_{1} + n_{2}\). It is assumed that there are w independent variables \(x_{1}, \ldots , x_{w}\). The error variable \(\varepsilon \) is assumed to follow a normal distribution with expectation 0 and variance \(\sigma ^2\).
As will be shown in the following Sections 2, 3, and 4, a natural generalization of Cohen’s d is given by
where \(y_{*} = y - {\widehat{\beta }}_{2} x_{1} - \cdots - {\widehat{\beta }}_{w+1} x_{w}\) is the dependent variable adjusted for the independent variables. The \({\widehat{\beta }}_{k}\) are the ordinary least squares estimates of the regression coefficients \(\beta _{k}\), \(k=2,\ldots , w+1\) in model (3). In case \(w=0\), the adjusted \(y_{*}\) coincides with the original y, so that (4) reduces to (1) and therefore can be seen as a natural generalization of Cohen’s d.
2 Partitioned Linear Regression
Let \(n= n_{1} + n_{2}\) be the total sample size. The above model (3) may also be written in vector-matrix notation as
where now y represents the \(n \times 1\) vector of observations of the dependent variable. Without loss of generality, it is assumed that the first \(n_{1}\) observations belong to group 1, while the last \(n_{2}\) observations belong to group 2. By introducing the notation \(1_{m}\) for an \(m\times 1\) vectors of ones, the \(n\times 2\) matrix \(X_{1}\) and the corresponding \(2\times 1\) parameter vector \(\delta _{1}\) may be written as
The \(n\times w\) matrix \(X_{2}\) contains the observations of the independent variables with corresponding regression coefficients \(\delta _{2}^{T} = (\beta _{2}, \ldots , \beta _{w+1})\), where the T superscript denotes transposition. The \(n\times 1\) random vector \(\varepsilon \) is assumed to follow a multivariate normal distribution with expectation vector 0 and variance-covariance matrix \(\sigma ^2 I_{n}\), where \(I_{n}\) stands for the \(n\times n\) identity matrix. It is assumed that the \(n\times (2+w)\) model matrix \((X_{1}, X_{2})\) has full column rank \(2 +w\). Equation (5) represents a partitioned linear regression model as considered e.g. in Fiebig et al [7]. Generalizations and further properties are investigated by Puntanen [16], Groß and Puntanen [10, 11], Ding [5], among others.
Under model (5), the ordinary least squares estimator for the parameter vector \((\delta _{1}^{T}, \delta _{2}^{T})\) is given by
The Frisch–Waugh–Lovell theorem, see Fiebig et al [7], Lovell [15], Frisch and Waugh [8], states that
For the specific choice (6), the matrix \(M_{1}\) becomes
The following result is not restricted to the case (6) but remains valid in situations where the matrix \(X_{1}\) corresponds to an arbitrary set of v independent variables such that the assumptions of (5) are satisfied.
Theorem 1
Under the partitioned linear regression model (5),
is the ordinary least squares estimator of \(\delta _{1}\).
A proof is given in the appendix. Theorem 1 means that if \({\widehat{\delta }}_{2}\) is known (e.g. computed by (8)), then the remaining parameters \(\delta _{1}\) can be estimated by regressing the adjusted
on the remaining \(X_{1}\) and this procedure just yields the identical estimate of \(\delta _{1}\) from (7).
Theorem 2
Under the partitioned linear regression model (5) and (6),
is an unbiased estimator for \(\sigma ^2\).
As a matter of fact, \({\widehat{\sigma }}^2\) coincides with the usual estimator for \(\sigma ^2\) in model (5). Identity (13) follows immediately from (9) when the above \(y_{*}\) is partitioned into two vectors of length \(n_{1}\) and \(n_{2}\), respectively.
3 Testing for a group effect
From Theorem 1 with \(X_{1}\) from (6), it follows that
Hence, it is seen that \(|d_{*}|\) from (4) is identical to
with \({\widehat{\sigma }}\) being the square root of \({\widehat{\sigma }}^2\) from Theorem 2. The statistic \(d_{*}\) is closely related to the test statistic \(t_{*}\) (defined below) for the null hypothesis \(H_{0}: \beta _{1} = 0\) in model (5).
Theorem 3
Under the partitioned linear regression model (5) and (6) let \(M_{2} = I_{n} - P_{2}\), \(P_{2} = X_{2} (X_{2}^{T} X_{2})^{-1} X_{2}^{T}\), and let \(\gamma \) be the lower-right element of the \(2\times 2\) matrix \((X_{1}^{T} M_{2} X_{1})^{-1}\). Then, the statistic
follows a central t distribution with \(n - 2-w\) degrees of freedom, provided \(\beta _{1} =0\).
In the above theorem, \(\gamma \) is the scaled variance of \({\widehat{\beta }}_{1}\), i.e. \(\text {Var}({\widehat{\beta }}_{1}) = \sigma ^2 \gamma \). See the proof of Theorem 3 in the appendix. The standard error of \({\widehat{\beta }}_{1}\) is thus \(\text {se}({\widehat{\beta }}_{1}) = {\widehat{\sigma }} \sqrt{\gamma }\) with \({\widehat{\sigma }}\) being the square root of \({\widehat{\sigma }}^2\) from Theorem 2.
Note that if \(w = 0\), then \(M_{2}=I_{n}\), and one gets
and hence
which is just a reformulation of (2). These considerations show that \(d_{*}\) is a natural extension of Cohen’s d in the context of additional independent variables.
4 Effect Size in Multiple Regression
In his Chapter 9, Cohen [3] discusses the effect size measure \(f^{2}\) based on the F test of a linear hypothesis. It may be applied when \(X_{1}\) does not only comprise intercept and one dummy as under model (5), but a total of u independent variables of arbitrary type. Then, it might be of interest to measure the effect size of the set of variables in \(X_{1}\) given the set in \(X_{2}\), which is Cohen’s case 1. Cohen suggests values \(f^2=0.02\), \(f^2=0.15\) and \(f^2=0.35\) for a small, medium and large effect, respectively. Since the measure \(d_{*}\) refers to one dummy (\(u=1\)), one might expect a relationship between \(d_{*}\) and the corresponding \(f^2\). Actually, as noted in our Remark below, such a relationship can be specified.
The measure \(f^2\) for Cohen’s case 1 is given by
where under model (5) F is the F statistic for testing the null hypothesis \(H_{0}: \beta _{1}=0\). From (9.2.3) in Cohen [3],
where \(R^2\) is the coefficient of determination from model (5) and \(R_{0}^2\) is the coefficient of determination in the reduced model with \(\beta _{1}=0\), admitting model matrix \(X_{0}=(1_{n}, X_{2})\). If P denotes the orthogonal projector onto the column space of the model matrix of a regression model with intercept, the coefficient of determination is given by
with \(C= I_{n} - n^{-1} 1_{n} 1_{n}^{T}\) being the so-called centering matrix, e.g. see [9, Sect. 6.2]. From this, (20) becomes
with \(P=X(X^{T} X)^{-1} X^{T}\), \(X=(X_{1}, X_{2})\), and \(P_{0} = X_{0} (X_{0}^{T} X_{0})^{-1} X_{0}^{T}\). In view of \(\text {rank}(P-P_{0}) =1\) and \(\text {rank}(I_{n} - P) = n - (2 + w)\), the corresponding F statistic reads
Then, from Theorem 3.2.1 (ii) in Christensen [2], F follows a central F distribution with 1 and \(n-2-w\) degrees of freedom, provided \(\beta _{1}=0\).
Now, it is well known and readily verified that the squared t statistic for the null hypothesis \(H_{0}: \beta _{1}=0\) is identical to the test statistic of the F test for the very same hypothesis. Thus, by combining (16) and (19), the following is true.
Remark
The identity
specifies the exact relationship between the effect size measures \(f^2\) and \(d_{*}\) from above.
Note that in case \(w=0\), the above identity reads
which slightly differs from formula (9.3.5) in Cohen [3] and lacks some of its beauty. Since (25) is expected to coincide with (9.3.5), this reveals a fallacy in the latter formula. Formula (25) may also be verified independently by directly assuming model (5) without any additional independent variables \(X_{2}\). In most cases, actual computations of the two formulas in question only differ in a later digit after the decimal point (say the fifth or sixth), so the difference usually has no practical meaning. The correctness of (24) and (25) is additionally confirmed by applications to real data.
5 Data Example
To give a possible outline for applications and an illustration of the previous formulas, we employ a data set available from the UCI machine learning repository, see Dua and Graff [6]. It contains student achievement in secondary education of two Portuguese schools, see Cortez and Silva [4]. In the following, computations are carried out with the statistical software R [17].
As the dependent variable y, we consider the final grade with integer values ranging between 0 and 20 in Portuguese language (variable G3) of \(n_{1} = 383\) female and \(n_{2} = 266\) male students. The dummy variable z takes values 0 for female and 1 for male. As also indicated by Figure 1, female students perform better with an average of \({\overline{y}}_{1} = 12.25326\) compared to \({\overline{y}}_{2} = 11.40602\) for male students. The corresponding equal variances two-sample test statistic admits \(t=3.310938\) with p value 0.0009815287. Although this implies strong significance, the corresponding effect size from (2) reads \(d = 0.264261\), thereby indicating only a slightly more than low effect. This value may also be obtained by function cohens_d from the R package effectsize, see Ben-Shachar et al [1].
As additional independent variables, we consider the education of the father (Fedu) \(x_{1}\) and the travel time from home to school (traveltime) \(x_{2}\). Both variables are measured on an ordinal scale with integer values ranging from 0 to 4 and 1 to 4, respectively, and are included as quantitative variables in our regression approach, implying \(w=2\).
Least squares estimates of the coefficients with corresponding t statistic values are given in Table 1. The intercept estimate \({\widehat{\beta }}_{0}=11.41385\) is the average of the adjusted final grade \(y_{*}\) of females, while the dummy variable estimate \({\widehat{\beta }}_{1}=-0.9406209\) is the difference between the average of \(y_{*}\) in the male group minus the average in the female group as given in 14. As it is also seen, better father’s education comes along with better grades (positive \({\widehat{\beta }}_{2}\)) while longer travel times to school come along with lower grades (negative \({\widehat{\beta }}_{3}\)) when the other variables are held constant, respectively. The coefficient of determination from this model reads \(R^2 = 0.07238847\), while for the reduced model (omitting sex) it is \(R_{0}^2 = 0.05207054\). Then, \(f^2\) from (20) is \(f^2 = 0.0219035\), implying a slightly more than small effect concerning the difference between female and male final grades, ceteris paribus.
The relationship (24) may also be used to infer Cohen’s \(d_{*}\) from \(f^2\). For this
the lower-right element being the scaled variance \(\gamma \) of \({\widehat{\beta }}_{1}\). With \({\widehat{\sigma }} = 3.118756\) it follows \(d_{*}= 0.3016013\). This indicates a slightly stronger effect when variables \(x_{1}\) and \(x_{2}\) are held constant than seen before from \(d = 0.264261\) not considering any additional independent variables at all. Alternatively, \(d_{*}\) may be computed by either of the formulas (15) or (4) yielding the very same absolute value.
References
Ben-Shachar MS, Lüdecke D, Makowski D (2020) Effectsize: estimation of effect size indices and standardized parameters. J Open Source Soft 5:2815. https://doi.org/10.21105/joss.02815
Christensen R (2020) Plane answers to complex questions, 5th edn. Springer, Germany
Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum Associates, New Jersey
Cortez P, Silva AMG (2008) Using data mining to predict secondary school student performance. In: Brito A, Teixeira J (eds) EUROSIS, pp 5–22
Ding P (2021) The Frisch-Waugh-Lovell theorem for standard errors. Stat Probabil Lett 168(108):945
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Fiebig DG, Bartels R, Krämer W (1996) The Frisch-Waugh theorem and generalized least squares. Economet Rev 15:431–443
Frisch R, Waugh FV (1933) Partial time regressions as compared with individual trends. Econometrica: J Econ Soc 1:387–401
Groß J (2003) Linear Regression. Springer. Lecture Notes in Statistics 175
Groß J, Puntanen S (2000) Estimation under a general partitioned linear model. Linear Algebra Appl 321:131–144
Groß J, Puntanen S (2005) Extensions of the Frisch-Waugh-Lovell theorem. Discuss Math Probabil Stat 25:39–49
Hedges LV (1981) Distribution theory for Glass’s estimator of effect size and related estimators. J Educ Stat 6:107–128
Kraemer HC (1983) Theory of estimation and testing of effect sizes: Use in meta-analysis. J Educ Stat 8:93–101
Lipsey MW, Wilson DB (2001) Practical meta-analysis. SAGE publications, Inc, California
Lovell MC (1963) Seasonal adjustment of economic time series and multiple regression analysis. J Am Stat Assoc 58(304):993–1010
Puntanen S (1996) Some matrix results related to a partitioned singular linear model. Commun Stat-Theory Methods 25:269–279
R Core Team (2022) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/
Wilkinson L (1999) Statistical methods in psychology journals: Guidelines and explanations. Am Psychol 54:594–604
Wilson DB (2016) Formulas used by the “practical meta-analysis effect size calculator”. Practical meta-analysis https://mason.gmu.edu/~dwilsonb/
Acknowledgements
Annette Möller acknowledges support by the Helmholtz Association’s pilot project “Uncertainty Quantification”. We are thankful to two anonymous referees for helpful comments which improved the presentation.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A Proofs of Theorems
Appendix A Proofs of Theorems
In this section, we give short proofs of the three stated theorems. Although most of the supporting formulas and derivations may already be found throughout the literature, we present them here in order to provide a unified and self-contained treatment of the topic.
Proof of Theorem 1
From (7), it follows that
where \(P=X(X^{T}X)^{-1} X^{T}\) is the orthogonal projector onto the column space of the model matrix \(X=(X_{1}, X_{2})\). Since the column space of \(X_{1}\) is contained in the column space of X, it follows
for \(P_{1} =X_{1} (X_{1}^{T} X_{1})^{-1} X_{1}^{T}\). Hence,
Since \(X_{1}\) is of full column rank, Theorem 1 follows. \(\square \)
Proof of Theorem 2
From (8)
Then,
From Theorem 1.3.2 in Christensen [2], the expectation of the quadratic form \(y^{T} L y\) is given by
with \(\mu = X_{1} \delta _{1} + X_{2} \delta _{2}\). From (9) it follows \(\text {trace}(M_{1}) =n_{1} - 1 + n_{2} -1\) and therefore \(\text {trace}(L) = n_{1} + n_{2} - 2- w\). In addition \(L X_{1}=0\) and \(L X_{2}=0\), implying \(\text {E}(y^{T} L y) = \sigma ^2 (n_{1} + n_{2} - 2- w)\). This gives Theorem 2. \(\square \)
Proof of Theorem 3
Similarly to (8), the Frisch–Waugh–Lovell theorem states that
Hence, it is seen that \({\widehat{\delta }}_{1}\) follows a multivariate normal distribution with expectation vector \(\delta _{1}\) and variance-covariance matrix \(\sigma ^2 (X_{1}^{T} M_{2} X_{1})^{-1}\), e.g. by Exercise 1.8 in Christensen [2]. Then, \({\widehat{\beta }}_{1}\) follows a univariate normal distribution with expectation \(\beta _{1}\) and variance \(\sigma ^2 \gamma \), were \(\gamma \) is the lower-right element of the \(2\times 2\) matrix \((X_{1}^{T} M_{2} X_{1})^{-1}\). If \(\beta _{1} =0\), then
follows a standard normal distribution. Let
with L from (A5). It is readily verified that L is symmetric and idempotent, so that Theorem 1.3.3 in Christensen [2] implies that V follows a central \(\chi ^2\) distribution with \(\text {rank}(L) = \text {trace}(L) =n-2-w\) degrees of freedom. From
it is seen that \(F L=0\), implying that Fy ad \(y^{T} L y\) are independent, see Theorem 1.3.7 in Christensen [2]. Then, also U and V are independent and
follows a central t distribution with \(n-2-w\) degrees of freedom when \(\beta _{1}=0\). \(\square \)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Groß, J., Möller, A. A Note on Cohen’s d From a Partitioned Linear Regression Model. J Stat Theory Pract 17, 22 (2023). https://doi.org/10.1007/s42519-023-00323-w
Accepted:
Published:
DOI: https://doi.org/10.1007/s42519-023-00323-w
Keywords
- Hypothesis testing
- Effect size
- Cohen’s d
- Partitioned linear regression
- Frisch–Waugh–Lovell theorem
- Multivariate normal distribution