A Note on Cohen’s d From a Partitioned Linear Regression Model

In this note, we introduce a generalized formula for Cohen’s d under the presence of additional independent variables, providing a measure for the size of a possible effect concerning the size of a difference location effect of a variable in two groups. This is done by employing the so-called Frisch–Waugh–Lovell theorem in a partitioned linear regression model. The generalization is motivated by demonstrating the relationship to appropriate t and F statistics. Our discussion is further illustrated by inference about a publicly available data set.


Introduction
When applying statistical testing of hypotheses to data it is often recommended not only to report the corresponding p-value, but in addition to provide a measure for the effect associated with a possible rejection of the null hypothesis, see e.g.Wilkinson (1999).Such a measure may be useful when sample sizes are to be fixed during the planning phase of a study, or when it is desired to assess the relevance of an actual rejection when given sample sizes are large.Effect size measures are strongly related to power analysis as carried out in the seminal book by Cohen (1988).
A widely used measure is the so-called Cohen's d, see also Hedges (1981); Kraemer (1983), which is an effect size measure for the two-sample t test with equal variances.Consider independent samples of sizes n 1 and n 2 of a statistical variable y in two groups such that y follows a normal distribution with expectation µ 1 and variance σ 2 in group 1 and expectation µ 2 and the same variance σ 2 in group 2. Let t denote the usual two-sample test statistic for the null hypotheses H 0 : µ 1 = µ 2 versus the alternative H 1 : µ 1 = µ 2 .As a measure for the size of an effect, Cohen (1988, p. 66ff) considers the absolute value of (1) , where y j is the sample mean in group j, j = 1, 2, and s 2 j = i (y i − y j ) 2 , where summation is carried out with respect to all observation from group j.The effect size d is related to the test statistic t by the formula (2) Cohen (1988).According to Cohen, values |d| = 0.2, |d| = 0.5 and |d| = 0.8 indicate a small, medium and large effect, respectively.It may also be of interest to have a corresponding measure when the variable y depends on further independent variables.In his Chapter 9, Cohen (1988) deals with such a multiple regression situation and discusses the effect size measure f 2 at length, as will further be explicated in our Section 4.
However, an analogous measure to d is rare to find, see Wilson (2016, Sect. 3.14), Lipsey and Wilson (2001) for such a proposal.Nonetheless, it may be of particular interest to have comparable measures of an effect size for the very same grouped variable y but additionally depending on different sets of independent variables.This is exemplarily carried out in our Section 5.In the following we introduce such a measure as a generalization to d by considering a linear regression model where z takes the value z i = 0 if the corresponding observation y i of the dependent variable y belongs to group 1 and z i = 1 if y i belongs to group 2, i = 1, . . ., n 1 + n 2 .It is assumed that there are w independent variables x 1 , . . ., x w .The error variable ε is assumed to follow a normal distribution with expectation 0 and variance σ 2 .
As will be shown in the following Sections 2, 3, and 4, a natural generalization of Cohen's d is given by ( 4) where x w is the dependent variable adjusted for the independent variables.The β k are the ordinary least squares estimates of the regression coefficients β k , k = 2, . . ., w + 1 in model (3).In case w = 0, the adjusted y * coincides with the original y, so that (4) reduces to (1) and therefore can be seen as a natural generalization of Cohen's d.

Partitioned Linear Regression
Let n = n 1 + n 2 be the total sample size.The above model (3) may also be written in vector-matrix notation as ( 5) where now y represents the n × 1 vector of observations of the dependent variable.Without loss of generality it is assumed that the first n 1 observations belong to group 1, while the last n 2 observations belong to group 2. By introducing the notation 1 m for an m × 1 vectors of ones, the n × 2 matrix X 1 and the corresponding 2 × 1 parameter vector δ 1 may be written as ( 6) The n×w matrix X 2 contains the observations of the independent variables with corresponding regression coefficients δ T 2 = (β 2 , . . ., β w+1 ), where the T superscript denotes transposition.The n × 1 random vector ε is assumed to follow a multivariate normal distribution with expectation vector 0 and variance-covariance matrix σ 2 I n , where I n stands for the n × n identity matrix.It is assumed that the n × (2 + w) model matrix (X 1 , X 2 ) has full column rank 2 + w.Equation ( 5) represents a partitioned linear regression model as considered e.g. in Fiebig et al. (1996).Generalizations and further properties are investigated by Puntanen (1996); Groß andPuntanen (2000, 2005); Ding (2021), among others.
Under model ( 5) the ordinary least squares estimator for the parameter vector (δ T 1 , δ T 2 ) is given by ( 7) The Frisch-Waugh-Lovell theorem, see Fiebig et al. (1996); Lovell (1963); Frisch and Waugh (1933), states that For the specific choice (6), the matrix M 1 becomes ( 9) The following result is not restricted to the case (6) but remains valid in situations where the matrix X 1 corresponds to an arbitrary set of v independent variables such that the assumptions of (5) are satisfied.
Theorem 1.Under the partitioned linear regression model ( 5), A proof is given in the appendix.Theorem 1 means that if δ 2 is known (e.g.computed by ( 8)), then the remaining parameters δ 1 can be estimated by regressing the adjusted on the remaining X 1 and this procedure just yields the identical estimate of δ 1 from (7).
As a matter of fact, σ 2 coincides with the usual estimator for σ 2 in model (5).Identity (13) follows immediately from (9) with the above definition of y * .

Testing for a group effect
From Theorem 1 with X 1 from (6) it follows that ( 14) Hence, it is seen that |d * | from ( 4) is identical to Theorem 3.Under the partitioned linear regression model ( 5) and ( 6 , and let γ be the lower-right element of the 2 × 2 matrix (X T 1 M 2 X 1 ) −1 .Then the statistic In the above theorem, γ is the scaled variance of β 1 , i.e.Var( β 1 ) = σ 2 γ, see the proof of Theorem 3 in the appendix.The standard error of β 1 is thus se( β 1 ) = σ √ γ with σ being the square root of σ 2 from Theorem 2. Note that in case w = 0 by setting M 2 = I n one gets and hence which is just a reformulation of (2).These considerations show that d * is a natural extension of Cohen's d in the context of additional independent variables.

Effect Size in Multiple Regression
In his Chapter 9, Cohen (1988) discusses the effect size measure f 2 based on the F test of a linear hypothesis.It may be applied when X 1 does not only comprise intercept and one dummy as under model ( 5), but a total of u independent variables.Then it might be of interest to measure the effect size of the set of variables in X 1 given the set in X 2 , which is Cohen's case 1. Cohen suggests values f 2 = 0.02, f 2 = 0.15 and f 2 = 0.35 for a small, medium and large effect, respectively.Since the measure d * refers to one dummy (u = 1), one might expect a relationship between d * and the corresponding f 2 .Actually, as noted in our Remark below, such a relationship can be specified.
The measure f 2 for Cohen's case 1 is given by ( 19) where under model ( 5) F is the F statistic for testing the null hypothesis H 0 : β 1 = 0. From (9.2.3) in Cohen (1988), ( 20) , where R 2 is the coefficient of determination from model (5) and R 2 0 is the coefficient of determination in the reduced model with β 1 = 0, admitting model matrix X 3 = (1 n , X 2 ).If P denotes the orthogonal projector onto the column space of the model matrix of a regression model with intercept, the coefficient of determination is given by ( 21) n being the so-called centering matrix, e.g.see Groß (2003, Sect. 6.2).From this, (20) becomes ( 22) f 2 = y T (P − P 3 )y y T (I n − P )y with P = X(X T X) −1 X T , X = (X 1 , X 2 ), and In view of rank(P −P 3 ) = 1 and rank(I n − P ) = n − (2 + w), the corresponding F statistic reads (23) F = y T (P − P 3 )y/rank(P − P 3 ) y T (I n − P )y/rank(I n − P ) .
Then, from Theorem 3.2.1 (ii) in Christensen (2020), F follows a central F distribution with 1 and n − 2 − w degrees of freedom, provided β 1 = 0. Now, it is well known and readily verified that the squared t statistic for the null hypothesis H 0 : β 1 = 0 is identical to the test statistic of the F test for the very same hypothesis.Thus, by combining ( 16) and ( 19) the following is true.

Remark. The identity (24)
f 2 = d 2 * /γ n − 2 − w specifies the exact relationship between the effect size measures f 2 and d * from above.
Note that in case w = 0 the above identity reads ( 25) which slightly differs from formula (9.3.5) in Cohen (1988) and lacks some of its beauty.Since ( 25) is expected to coincide with (9.3.5) this reveals a fallacy in the latter formula.Formula (25) may also be verified independently by directly assuming model ( 5) without any additional independent variables X 2 .In most cases actual computations of the two formulas in question only differ in a later digit after the dot (say the fifth or sixth), so the difference usually has no practical meaning.The correctness of ( 24) and ( 25) is additionally confirmed by applications to real data.

Data Example
To give a possible outline for applications and an illustration of the previous formulas we employ a data set available from the UCI machine learning repository, see Dua and Graff (2017).It contains student achievement in secondary education of two Portuguese schools, see Cortez and Silva (2008).In the following, computations are carried out with the statistical software R (R Core Team, 2022).As the dependent variable y we consider the final grade with integer values ranging between 0 and 20 in Portuguese language (variable G3) of n 1 = 383 female and n 2 = 266 male students.The dummy variable z takes values 0 for female and 1 for male.As also indicated by Figure 1 female students perform better with an average of y 1 = 12.25326 compared to y 2 = 11.40602 for male students.The corresponding equal variances two-sample test statistic admits t = 3.310938 with p-value 0.0009815287.Although this implies strong significance the corresponding effect size from (2) reads d = 0.264261, thereby indicating only a slightly more than low effect.This value may also be obtained by function cohens d from the R package effectsize, see Ben-Shachar et al. (2020).
As additional independent variables we consider the education of the father (Fedu) x 1 and the travel time from home to school (traveltime) x 2 .Both variables are measured on an ordinal scale with integer values ranging from 0 to 4 and 1 to 4, respectively, and are included as quantitative variables in our regression approach, implying w = 2.  1.The intercept estimate β 0 = 11.41385 is the average of the adjusted final grade y * of females, while the dummy variable estimate β 1 = −0.9406209 is the difference between the average of y * in the male group minus the average in the female group as given in 14.As it is also seen, better father's eduction comes along with better grades (positive β 2 ) while longer travel times to school come along with lower grades (negative β 3 ) when the other variables are held constant, respectively.The coefficient of determination from this model reads R 2 = 0.07238847, while for the reduced model (omitting sex) it is R 2 0 = 0.05207054.Then f 2 from ( 20) is f 2 = 0.0219035, implying a slightly more than small effect concerning the difference between female and male final grades, ceteris paribus.
The relationship (24) may also be used to infer Cohen's d * from f 2 .For this (26) (X T 1 M 2 X 1 ) −1 = 0.019062813 −0.001591796 −0.001591796 0.006438624 , the lower-right element being the scaled variance γ of β 1 .With σ = 3.118756 it follows d * = 0.3016013.This indicates a slightly stronger effect when variables x 1 and x 2 are held constant than seen before from d = 0.264261 not considering any additional independent variables at all.Alternatively, d * may be computed by either of the formulas (15) or (4) yielding the very same absolute value.
the square root of σ 2 from Theorem 2. The statistic d * is closely related to the test statistic t * for the null hypothesis H 0 : β 1 = 0 in model (5).

Figure 1 .
Figure 1.Frequency of final results (variable G3) in Portuguese language of n = 649 students

Table 1 .
Extract from R output by fitting the complete model (5) with function lm, where variable sex is included as a factor, implying that ZM represents the dummy z Least squares estimates of the coefficients with corresponding t statistic values are given in Table