A Note on Cohen’s d From a Partitioned Linear Regression Model

Groß, Jürgen; Möller, Annette

doi:10.1007/s42519-023-00323-w

A Note on Cohen’s d From a Partitioned Linear Regression Model

Original Article
Open access
Published: 08 February 2023

Volume 17, article number 22, (2023)
Cite this article

Download PDF

You have full access to this open access article

Journal of Statistical Theory and Practice Aims and scope Submit manuscript

A Note on Cohen’s d From a Partitioned Linear Regression Model

Download PDF

1760 Accesses
4 Citations
Explore all metrics

Abstract

In this note, we introduce a generalized formula for Cohen’s d under the presence of additional independent variables, providing a measure for the size of a possible effect concerning the size of a difference location effect of a variable in two groups. This is done by employing the so-called Frisch–Waugh–Lovell theorem in a partitioned linear regression model. The generalization is motivated by demonstrating the relationship to appropriate t and F statistics. Our discussion is further illustrated by inference about a publicly available data set.

Clustering Methods for Statistical Inference

Group Differences in Generalized Linear Models

r2mlm: An R package calculating R-squared measures for multilevel models

Article 07 July 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

When applying statistical testing of hypotheses to data, it is often recommended not only to report the corresponding p value, but in addition to provide a measure for the effect associated with a possible rejection of the null hypothesis, see e.g. Wilkinson [18]. Such a measure may be useful when sample sizes are to be fixed during the planning phase of a study, or when it is desired to assess the relevance of an actual rejection when given sample sizes are large. Effect size measures are strongly related to power analysis as carried out in the seminal book by Cohen [3].

A widely used measure is the so-called Cohen’s d, see also Hedges [12] and Kraemer [13], which is an effect size measure for the two-sample t test with equal variances. Consider independent samples of sizes $n_{1}$ and $n_{2}$ of a statistical variable y in two groups such that y follows a normal distribution with expectation $\mu _{1}$ and variance $\sigma ^2$ in group 1 and expectation $\mu _{2}$ and the same variance $\sigma ^2$ in group 2. Let t denote the usual two-sample test statistic for the null hypotheses $H_{0}: \mu _{1} = \mu _{2}$ versus the alternative $H_{1}: \mu _{1} \not = \mu _{2}$. As a measure for the size of an effect, Cohen [3, p. 66ff] considers the absolute value of

$$\begin{aligned} d = \frac{{\overline{y}}_{1}-{\overline{y}}_{2}}{\sqrt{\frac{s_{1}^2 + s_{2}^2}{n_{1} + n_{2} -2}}}\; , \end{aligned}$$

(1)

where ${\overline{y}}_{j}$ is the sample mean in group j, $j=1,2$, and $s_{j}^2 =\sum _{i} (y_{i} - {\overline{y}}_{j})^2$, where summation is carried out with respect to all observations from group j. The effect size d is related to the test statistic t by the formula

$$\begin{aligned} d = t \sqrt{\frac{n_{1}+ n_{2}}{n_{1} n_{2}}}\; , \end{aligned}$$

(2)

see (2.5.3) in Cohen [3]. According to Cohen, values $|d|=0.2$, $|d|=0.5$ and $|d |=0.8$ indicate a small, medium and large effect, respectively.

It may also be of interest to have a corresponding measure when the variable y depends on further independent variables. In his Chapter 9, Cohen [3] deals with such a multiple regression situation and discusses the effect size measure $f^2$ at length, as will further be explicated in our Section 4.

However, an analogous measure to d is rare to find; see Wilson [19, Sect. 3.14], Lipsey and Wilson [14] for such a proposal. Nonetheless, it may be of particular interest to have comparable measures of an effect size for the very same grouped variable y but additionally depending on different sets of independent variables. This is exemplarily carried out in our Section 5. In the following, we introduce such a measure as a generalization to d by considering a linear regression model

$$\begin{aligned} y = \beta _{0}+ \beta _{1} z + \beta _{2} x_{1} + \cdots + \beta _{w+1} x_{w} +\varepsilon \; , \end{aligned}$$

(3)

where z takes the value $z_{i} = 0$ if the corresponding observation $y_{i}$ of the dependent variable y belongs to group 1 and $z_{i} = 1$ if $y_{i}$ belongs to group 2, $i= 1,\ldots , n_{1} + n_{2}$. It is assumed that there are w independent variables $x_{1}, \ldots , x_{w}$. The error variable $\varepsilon $ is assumed to follow a normal distribution with expectation 0 and variance $\sigma ^2$.

As will be shown in the following Sections 2, 3, and 4, a natural generalization of Cohen’s d is given by

$$\begin{aligned} d_{*} = \frac{\overline{y_{*}}_{1}-\overline{y_{*}}_{2}}{\sqrt{\frac{s_{*1}^2 + s_{*2}^2}{n_{1} + n_{2} - 2 - w}}}, \quad s_{*j}^2 = \sum _{i} (y_{*i} - \overline{y_{*}}_{j})^2, \quad j=1,2\; , \end{aligned}$$

(4)

where $y_{*} = y - {\widehat{\beta }}_{2} x_{1} - \cdots - {\widehat{\beta }}_{w+1} x_{w}$ is the dependent variable adjusted for the independent variables. The ${\widehat{\beta }}_{k}$ are the ordinary least squares estimates of the regression coefficients $\beta _{k}$, $k=2,\ldots , w+1$ in model (3). In case $w=0$, the adjusted $y_{*}$ coincides with the original y, so that (4) reduces to (1) and therefore can be seen as a natural generalization of Cohen’s d.

2 Partitioned Linear Regression

Let $n= n_{1} + n_{2}$ be the total sample size. The above model (3) may also be written in vector-matrix notation as

$$\begin{aligned} y = X_{1} \delta _{1} + X_{2} \delta _{2} + \varepsilon , \quad \end{aligned}$$

(5)

where now y represents the $n \times 1$ vector of observations of the dependent variable. Without loss of generality, it is assumed that the first $n_{1}$ observations belong to group 1, while the last $n_{2}$ observations belong to group 2. By introducing the notation $1_{m}$ for an $m\times 1$ vectors of ones, the $n\times 2$ matrix $X_{1}$ and the corresponding $2\times 1$ parameter vector $\delta _{1}$ may be written as

$$\begin{aligned} X_{1} = \begin{pmatrix} 1_{n_{1}} &{} 0\\ 1_{n_{2}} &{} 1_{n_{2}} \end{pmatrix} \quad \text {and}\quad \delta _{1} = \begin{pmatrix} \beta _{0}\\ \beta _{1} \end{pmatrix}\; . \end{aligned}$$

(6)

The $n\times w$ matrix $X_{2}$ contains the observations of the independent variables with corresponding regression coefficients $\delta _{2}^{T} = (\beta _{2}, \ldots , \beta _{w+1})$, where the T superscript denotes transposition. The $n\times 1$ random vector $\varepsilon $ is assumed to follow a multivariate normal distribution with expectation vector 0 and variance-covariance matrix $\sigma ^2 I_{n}$, where $I_{n}$ stands for the $n\times n$ identity matrix. It is assumed that the $n\times (2+w)$ model matrix $(X_{1}, X_{2})$ has full column rank $2 +w$. Equation (5) represents a partitioned linear regression model as considered e.g. in Fiebig et al [7]. Generalizations and further properties are investigated by Puntanen [16], Groß and Puntanen [10, 11], Ding [5], among others.

Under model (5), the ordinary least squares estimator for the parameter vector $(\delta _{1}^{T}, \delta _{2}^{T})$ is given by

$$\begin{aligned} \begin{pmatrix} {\widehat{\delta }}_{1}\\ {\widehat{\delta }}_{2} \end{pmatrix} = (X^{T} X)^{-1} X^{T} y, \quad X=(X_{1}, X_{2})\; . \end{aligned}$$

(7)

The Frisch–Waugh–Lovell theorem, see Fiebig et al [7], Lovell [15], Frisch and Waugh [8], states that

$$\begin{aligned} {\widehat{\delta }}_{2} = (X_{2}^{T} M_{1} X_{2})^{-1} X_{2}^{T} M_{1} y,\quad M_{1} = I_{n}- X_{1} (X_{1}^{T} X_{1})^{-1} X_{1}^{T}\; . \end{aligned}$$

(8)

For the specific choice (6), the matrix $M_{1}$ becomes

$$\begin{aligned} M_{1} = \begin{pmatrix} C_{1} &{} 0\\ 0 &{} C_{2} \end{pmatrix},\quad C_{j} = I_{n_{j}} - n_{j}^{-1} 1_{n_{j}} 1_{n_{j}}^{T},\quad j=1,2\; . \end{aligned}$$

(9)

The following result is not restricted to the case (6) but remains valid in situations where the matrix $X_{1}$ corresponds to an arbitrary set of v independent variables such that the assumptions of (5) are satisfied.

Theorem 1

Under the partitioned linear regression model (5),

$$\begin{aligned} {\widehat{\delta }}_{1} = (X_{1}^{T}X_{1})^{-1} X_{1}^{T} (y - X_{2} {\widehat{\delta }}_{2}) \end{aligned}$$

(10)

is the ordinary least squares estimator of $\delta _{1}$.

A proof is given in the appendix. Theorem 1 means that if ${\widehat{\delta }}_{2}$ is known (e.g. computed by (8)), then the remaining parameters $\delta _{1}$ can be estimated by regressing the adjusted

$$\begin{aligned} y_{*} = y - X_{2} {\widehat{\delta }}_{2} \end{aligned}$$

(11)

on the remaining $X_{1}$ and this procedure just yields the identical estimate of $\delta _{1}$ from (7).

Theorem 2

Under the partitioned linear regression model (5) and (6),

$$\begin{aligned} {\widehat{\sigma }}^2= & {} (y - X_{2} {\widehat{\delta }}_{2})^{T} M_{1} (y - X_{2} {\widehat{\delta }}_{2})/(n-2-w) \end{aligned}$$

(12)

$$\begin{aligned}= & {} (s_{*1}^{2} + s_{*2}^{2})/(n-2-w) \end{aligned}$$

(13)

is an unbiased estimator for $\sigma ^2$.

As a matter of fact, ${\widehat{\sigma }}^2$ coincides with the usual estimator for $\sigma ^2$ in model (5). Identity (13) follows immediately from (9) when the above $y_{*}$ is partitioned into two vectors of length $n_{1}$ and $n_{2}$, respectively.

3 Testing for a group effect

From Theorem 1 with $X_{1}$ from (6), it follows that

$$\begin{aligned} {\widehat{\delta }}_{1} = \begin{pmatrix} {\widehat{\beta }}_{0}\\ {\widehat{\beta }}_{1} \end{pmatrix}= \begin{pmatrix} n_{1}^{-1} 1_{n_{1}} &{} 0\\ - n_{1}^{-1} 1_{n_{1}} &{} n_{2}^{-1} 1_{n_{2}} \end{pmatrix} y_{*} = \begin{pmatrix} \overline{y_{*}}_{1}\\ \overline{y_{*}}_{2} - \overline{y_{*}}_{1} \end{pmatrix}\; . \end{aligned}$$

(14)

Hence, it is seen that $|d_{*}|$ from (4) is identical to

$$\begin{aligned} |d_{*}|= \frac{|{\widehat{\beta }}_{1}|}{{\widehat{\sigma }}} \end{aligned}$$

(15)

with ${\widehat{\sigma }}$ being the square root of ${\widehat{\sigma }}^2$ from Theorem 2. The statistic $d_{*}$ is closely related to the test statistic $t_{*}$ (defined below) for the null hypothesis $H_{0}: \beta _{1} = 0$ in model (5).

Theorem 3

Under the partitioned linear regression model (5) and (6) let $M_{2} = I_{n} - P_{2}$, $P_{2} = X_{2} (X_{2}^{T} X_{2})^{-1} X_{2}^{T}$, and let $\gamma $ be the lower-right element of the $2\times 2$ matrix $(X_{1}^{T} M_{2} X_{1})^{-1}$. Then, the statistic

$$\begin{aligned} t_{*} = d_{*} /\sqrt{\gamma } \end{aligned}$$

(16)

follows a central t distribution with $n - 2-w$ degrees of freedom, provided $\beta _{1} =0$.

In the above theorem, $\gamma $ is the scaled variance of ${\widehat{\beta }}_{1}$, i.e. $\text {Var}({\widehat{\beta }}_{1}) = \sigma ^2 \gamma $. See the proof of Theorem 3 in the appendix. The standard error of ${\widehat{\beta }}_{1}$ is thus $\text {se}({\widehat{\beta }}_{1}) = {\widehat{\sigma }} \sqrt{\gamma }$ with ${\widehat{\sigma }}$ being the square root of ${\widehat{\sigma }}^2$ from Theorem 2.

Note that if $w = 0$, then $M_{2}=I_{n}$, and one gets

$$\begin{aligned} (X_{1}^{T} X_{1})^{-1} = \begin{pmatrix} n_{1}^{-1} &{} - n_{1}^{-1}\\ - n_{1}^{-1} &{} \gamma \end{pmatrix}, \quad \gamma = \frac{n_{1} + n_{2}}{n_{1} n_{2}}\; , \end{aligned}$$

(17)

and hence

$$\begin{aligned} d_{*} = t_{*} \sqrt{\frac{n_{1} + n_{2}}{n_{1} n_{2}}}\; , \end{aligned}$$

(18)

which is just a reformulation of (2). These considerations show that $d_{*}$ is a natural extension of Cohen’s d in the context of additional independent variables.

4 Effect Size in Multiple Regression

In his Chapter 9, Cohen [3] discusses the effect size measure $f^{2}$ based on the F test of a linear hypothesis. It may be applied when $X_{1}$ does not only comprise intercept and one dummy as under model (5), but a total of u independent variables of arbitrary type. Then, it might be of interest to measure the effect size of the set of variables in $X_{1}$ given the set in $X_{2}$, which is Cohen’s case 1. Cohen suggests values $f^2=0.02$, $f^2=0.15$ and $f^2=0.35$ for a small, medium and large effect, respectively. Since the measure $d_{*}$ refers to one dummy ($u=1$), one might expect a relationship between $d_{*}$ and the corresponding $f^2$. Actually, as noted in our Remark below, such a relationship can be specified.

The measure $f^2$ for Cohen’s case 1 is given by

$$\begin{aligned} f^2 = F \frac{u}{v},\quad v = n-u-w- 1\; , \end{aligned}$$

(19)

where under model (5) F is the F statistic for testing the null hypothesis $H_{0}: \beta _{1}=0$. From (9.2.3) in Cohen [3],

$$\begin{aligned} f^2 = \frac{R^2 - R_{0}^2}{1 -R^2}\; , \end{aligned}$$

(20)

where $R^2$ is the coefficient of determination from model (5) and $R_{0}^2$ is the coefficient of determination in the reduced model with $\beta _{1}=0$, admitting model matrix $X_{0}=(1_{n}, X_{2})$. If P denotes the orthogonal projector onto the column space of the model matrix of a regression model with intercept, the coefficient of determination is given by

$$\begin{aligned} R^2 = 1 - \frac{y^{T} (I_{n} - P) y}{y^{T} C y} \end{aligned}$$

(21)

with $C= I_{n} - n^{-1} 1_{n} 1_{n}^{T}$ being the so-called centering matrix, e.g. see [9, Sect. 6.2]. From this, (20) becomes

$$\begin{aligned} f^2 = \frac{y^{T}(P - P_{0}) y}{y^{T} (I_{n} - P) y} \end{aligned}$$

(22)

with $P=X(X^{T} X)^{-1} X^{T}$, $X=(X_{1}, X_{2})$, and $P_{0} = X_{0} (X_{0}^{T} X_{0})^{-1} X_{0}^{T}$. In view of $\text {rank}(P-P_{0}) =1$ and $\text {rank}(I_{n} - P) = n - (2 + w)$, the corresponding F statistic reads

$$\begin{aligned} F = \frac{y^{T}(P - P_{0}) y/\text {rank}(P-P_{0})}{y^{T} (I_{n} - P) y/\text {rank}(I_{n} - P)}\; . \end{aligned}$$

(23)

Then, from Theorem 3.2.1 (ii) in Christensen [2], F follows a central F distribution with 1 and $n-2-w$ degrees of freedom, provided $\beta _{1}=0$.

Now, it is well known and readily verified that the squared t statistic for the null hypothesis $H_{0}: \beta _{1}=0$ is identical to the test statistic of the F test for the very same hypothesis. Thus, by combining (16) and (19), the following is true.

Remark

The identity

$$\begin{aligned} f^2 = \frac{d_{*}^2 /\gamma }{n-2-w} \end{aligned}$$

(24)

specifies the exact relationship between the effect size measures $f^2$ and $d_{*}$ from above.

Note that in case $w=0$, the above identity reads

$$\begin{aligned} f^2 = d^{2} \frac{n_{1} n_{2}}{n (n-2)}\; , \end{aligned}$$

(25)

which slightly differs from formula (9.3.5) in Cohen [3] and lacks some of its beauty. Since (25) is expected to coincide with (9.3.5), this reveals a fallacy in the latter formula. Formula (25) may also be verified independently by directly assuming model (5) without any additional independent variables $X_{2}$. In most cases, actual computations of the two formulas in question only differ in a later digit after the decimal point (say the fifth or sixth), so the difference usually has no practical meaning. The correctness of (24) and (25) is additionally confirmed by applications to real data.

5 Data Example

To give a possible outline for applications and an illustration of the previous formulas, we employ a data set available from the UCI machine learning repository, see Dua and Graff [6]. It contains student achievement in secondary education of two Portuguese schools, see Cortez and Silva [4]. In the following, computations are carried out with the statistical software R [17].

As the dependent variable y, we consider the final grade with integer values ranging between 0 and 20 in Portuguese language (variable G3) of $n_{1} = 383$ female and $n_{2} = 266$ male students. The dummy variable z takes values 0 for female and 1 for male. As also indicated by Figure 1, female students perform better with an average of ${\overline{y}}_{1} = 12.25326$ compared to ${\overline{y}}_{2} = 11.40602$ for male students. The corresponding equal variances two-sample test statistic admits $t=3.310938$ with p value 0.0009815287. Although this implies strong significance, the corresponding effect size from (2) reads $d = 0.264261$, thereby indicating only a slightly more than low effect. This value may also be obtained by function cohens_d from the R package effectsize, see Ben-Shachar et al [1].

As additional independent variables, we consider the education of the father (Fedu) $x_{1}$ and the travel time from home to school (traveltime) $x_{2}$. Both variables are measured on an ordinal scale with integer values ranging from 0 to 4 and 1 to 4, respectively, and are included as quantitative variables in our regression approach, implying $w=2$.

Table 1 Extract from R output by fitting the complete model (5) with function lm, where variable sex is included as a factor, implying that ZM represents the dummy z

Full size table

Least squares estimates of the coefficients with corresponding t statistic values are given in Table 1. The intercept estimate ${\widehat{\beta }}_{0}=11.41385$ is the average of the adjusted final grade $y_{*}$ of females, while the dummy variable estimate ${\widehat{\beta }}_{1}=-0.9406209$ is the difference between the average of $y_{*}$ in the male group minus the average in the female group as given in 14. As it is also seen, better father’s education comes along with better grades (positive ${\widehat{\beta }}_{2}$) while longer travel times to school come along with lower grades (negative ${\widehat{\beta }}_{3}$) when the other variables are held constant, respectively. The coefficient of determination from this model reads $R^2 = 0.07238847$, while for the reduced model (omitting sex) it is $R_{0}^2 = 0.05207054$. Then, $f^2$ from (20) is $f^2 = 0.0219035$, implying a slightly more than small effect concerning the difference between female and male final grades, ceteris paribus.

The relationship (24) may also be used to infer Cohen’s $d_{*}$ from $f^2$. For this

$$\begin{aligned} (X_{1}^{T} M_{2} X_{1})^{-1} = \begin{pmatrix} 0.019062813 &{} -0.001591796\\ -0.001591796 &{} 0.006438624 \end{pmatrix}\; , \end{aligned}$$

(26)

the lower-right element being the scaled variance $\gamma $ of ${\widehat{\beta }}_{1}$. With ${\widehat{\sigma }} = 3.118756$ it follows $d_{*}= 0.3016013$. This indicates a slightly stronger effect when variables $x_{1}$ and $x_{2}$ are held constant than seen before from $d = 0.264261$ not considering any additional independent variables at all. Alternatively, $d_{*}$ may be computed by either of the formulas (15) or (4) yielding the very same absolute value.

References

Ben-Shachar MS, Lüdecke D, Makowski D (2020) Effectsize: estimation of effect size indices and standardized parameters. J Open Source Soft 5:2815. https://doi.org/10.21105/joss.02815
Article Google Scholar
Christensen R (2020) Plane answers to complex questions, 5th edn. Springer, Germany
Book MATH Google Scholar
Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum Associates, New Jersey
MATH Google Scholar
Cortez P, Silva AMG (2008) Using data mining to predict secondary school student performance. In: Brito A, Teixeira J (eds) EUROSIS, pp 5–22
Ding P (2021) The Frisch-Waugh-Lovell theorem for standard errors. Stat Probabil Lett 168(108):945
MathSciNet MATH Google Scholar
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Fiebig DG, Bartels R, Krämer W (1996) The Frisch-Waugh theorem and generalized least squares. Economet Rev 15:431–443
Article MATH Google Scholar
Frisch R, Waugh FV (1933) Partial time regressions as compared with individual trends. Econometrica: J Econ Soc 1:387–401
Article MATH Google Scholar
Groß J (2003) Linear Regression. Springer. Lecture Notes in Statistics 175
Groß J, Puntanen S (2000) Estimation under a general partitioned linear model. Linear Algebra Appl 321:131–144
Article MathSciNet MATH Google Scholar
Groß J, Puntanen S (2005) Extensions of the Frisch-Waugh-Lovell theorem. Discuss Math Probabil Stat 25:39–49
Article MathSciNet MATH Google Scholar
Hedges LV (1981) Distribution theory for Glass’s estimator of effect size and related estimators. J Educ Stat 6:107–128
Article Google Scholar
Kraemer HC (1983) Theory of estimation and testing of effect sizes: Use in meta-analysis. J Educ Stat 8:93–101
Article Google Scholar
Lipsey MW, Wilson DB (2001) Practical meta-analysis. SAGE publications, Inc, California
Google Scholar
Lovell MC (1963) Seasonal adjustment of economic time series and multiple regression analysis. J Am Stat Assoc 58(304):993–1010
Article MathSciNet MATH Google Scholar
Puntanen S (1996) Some matrix results related to a partitioned singular linear model. Commun Stat-Theory Methods 25:269–279
Article MathSciNet MATH Google Scholar
R Core Team (2022) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, https://www.R-project.org/
Wilkinson L (1999) Statistical methods in psychology journals: Guidelines and explanations. Am Psychol 54:594–604
Article Google Scholar
Wilson DB (2016) Formulas used by the “practical meta-analysis effect size calculator”. Practical meta-analysis https://mason.gmu.edu/~dwilsonb/

Download references

Acknowledgements

Annette Möller acknowledges support by the Helmholtz Association’s pilot project “Uncertainty Quantification”. We are thankful to two anonymous referees for helpful comments which improved the presentation.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Jürgen Groß and Annette Möller have contributed equally to this work.

Authors and Affiliations

Institute for Mathematics and Applied Informatics, University of Hildesheim, Hildesheim, Germany
Jürgen Groß
Faculty of Business Administration and Economics, Bielefeld University, Bielefeld, Germany
Annette Möller

Authors

Jürgen Groß
View author publications
You can also search for this author in PubMed Google Scholar
Annette Möller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jürgen Groß.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A Proofs of Theorems

In this section, we give short proofs of the three stated theorems. Although most of the supporting formulas and derivations may already be found throughout the literature, we present them here in order to provide a unified and self-contained treatment of the topic.

Proof of Theorem 1

From (7), it follows that

$$\begin{aligned} y - X_{1} {\widehat{\delta }}_{1} - X_{2} {\widehat{\delta }} =(I_{n} - P) y\; , \end{aligned}$$

(A1)

where $P=X(X^{T}X)^{-1} X^{T}$ is the orthogonal projector onto the column space of the model matrix $X=(X_{1}, X_{2})$. Since the column space of $X_{1}$ is contained in the column space of X, it follows

$$\begin{aligned} P P_{1} = P_{1} = P_{1} P \end{aligned}$$

(A2)

for $P_{1} =X_{1} (X_{1}^{T} X_{1})^{-1} X_{1}^{T}$. Hence,

$$\begin{aligned} X_{1} {\widehat{\delta }}_{1} = P_{1} y = P_{1} (y - X_{2} {\widehat{\delta }}_{2})\; . \end{aligned}$$

(A3)

Since $X_{1}$ is of full column rank, Theorem 1 follows. $\square $

Proof of Theorem 2

From (8)

$$\begin{aligned} y_{*} = y -X_{2}{\widehat{\delta }}_{2} = (I_{n} - X_{2} (X_{2}^{T} M_{1} X_{2})^{-1} X_{2}^{T} M_{1}) y\; . \end{aligned}$$

(A4)

Then,

$$\begin{aligned} (y -X_{2}{\widehat{\delta }}_{2})^{T} M_{1} (y -X_{2}{\widehat{\delta }}_{2}) = y^{T} L y, \quad L= M_{1} - M_{1} X_{2}(X_{2}^{T} M_{1} X_{2})^{-1} X_{2}^{T} M_{1}.\nonumber \\ \end{aligned}$$

(A5)

From Theorem 1.3.2 in Christensen [2], the expectation of the quadratic form $y^{T} L y$ is given by

$$\begin{aligned} \text {E}(y^{T} L y) = \sigma ^2 \text {trace}(L) + \mu ^{T} A \mu \end{aligned}$$

(A6)

with $\mu = X_{1} \delta _{1} + X_{2} \delta _{2}$. From (9) it follows $\text {trace}(M_{1}) =n_{1} - 1 + n_{2} -1$ and therefore $\text {trace}(L) = n_{1} + n_{2} - 2- w$. In addition $L X_{1}=0$ and $L X_{2}=0$, implying $\text {E}(y^{T} L y) = \sigma ^2 (n_{1} + n_{2} - 2- w)$. This gives Theorem 2. $\square $

Proof of Theorem 3

Similarly to (8), the Frisch–Waugh–Lovell theorem states that

$$\begin{aligned} {\widehat{\delta }}_{1} = (X_{1}^{T} M_{2} X_{1})^{-1} X_{1}^{T} M_{2} y,\quad M_{2} = I_{n}- X_{2} (X_{2}^{T} X_{2})^{-1} X_{2}^{T}\; . \end{aligned}$$

(A7)

Hence, it is seen that ${\widehat{\delta }}_{1}$ follows a multivariate normal distribution with expectation vector $\delta _{1}$ and variance-covariance matrix $\sigma ^2 (X_{1}^{T} M_{2} X_{1})^{-1}$, e.g. by Exercise 1.8 in Christensen [2]. Then, ${\widehat{\beta }}_{1}$ follows a univariate normal distribution with expectation $\beta _{1}$ and variance $\sigma ^2 \gamma $, were $\gamma $ is the lower-right element of the $2\times 2$ matrix $(X_{1}^{T} M_{2} X_{1})^{-1}$. If $\beta _{1} =0$, then

$$\begin{aligned} U = {\widehat{\beta }}_{1}/(\sigma \sqrt{\gamma }) \end{aligned}$$

(A8)

follows a standard normal distribution. Let

$$\begin{aligned} V = y^{T} L y/\sigma ^2 \end{aligned}$$

(A9)

with L from (A5). It is readily verified that L is symmetric and idempotent, so that Theorem 1.3.3 in Christensen [2] implies that V follows a central $\chi ^2$ distribution with $\text {rank}(L) = \text {trace}(L) =n-2-w$ degrees of freedom. From

$$\begin{aligned} {\widehat{\delta }}_{1} = F y, \quad F =(X_{1}^{T} X_{1})^{-1} X_{1}{T}(I_{n} - X_{2} (X_{2}^{T} M_{1} X_{2})^{-1} X_{2}^{T} M_{1}) \end{aligned}$$

(A10)

it is seen that $F L=0$, implying that Fy ad $y^{T} L y$ are independent, see Theorem 1.3.7 in Christensen [2]. Then, also U and V are independent and

$$\begin{aligned} t_{*} = \frac{U}{\sqrt{V/(n-2-w)}} = \frac{{\widehat{\beta }}_{1}/\sqrt{\gamma }}{{\widehat{\sigma }}} \end{aligned}$$

(A11)

follows a central t distribution with $n-2-w$ degrees of freedom when $\beta _{1}=0$. $\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Groß, J., Möller, A. A Note on Cohen’s d From a Partitioned Linear Regression Model. J Stat Theory Pract 17, 22 (2023). https://doi.org/10.1007/s42519-023-00323-w

Download citation

Accepted: 17 January 2023
Published: 08 February 2023
DOI: https://doi.org/10.1007/s42519-023-00323-w

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Note on Cohen’s d From a Partitioned Linear Regression Model

Abstract

Similar content being viewed by others

Clustering Methods for Statistical Inference

Group Differences in Generalized Linear Models

r2mlm: An R package calculating R-squared measures for multilevel models

1 Introduction

2 Partitioned Linear Regression

Theorem 1

Theorem 2

3 Testing for a group effect

Theorem 3

4 Effect Size in Multiple Regression

Remark

5 Data Example

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A Proofs of Theorems

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A Note on Cohen’s d From a Partitioned Linear Regression Model

Abstract

Similar content being viewed by others

Clustering Methods for Statistical Inference

Group Differences in Generalized Linear Models

r2mlm: An R package calculating R-squared measures for multilevel models

1 Introduction

2 Partitioned Linear Regression

Theorem 1

Theorem 2

3 Testing for a group effect

Theorem 3

4 Effect Size in Multiple Regression

Remark

5 Data Example

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A Proofs of Theorems

Appendix A Proofs of Theorems

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation