Adding a constant to the variables in the regression through the origin: the effect on the uncentered $$R^2$$

Triacca, Umberto

doi:10.1007/s11135-022-01495-6

Adding a constant to the variables in the regression through the origin: the effect on the uncentered $R^2$

Open access
Published: 09 August 2022

Volume 57, pages 2781–2789, (2023)
Cite this article

Download PDF

You have full access to this open access article

Quality & Quantity Aims and scope Submit manuscript

Adding a constant to the variables in the regression through the origin: the effect on the uncentered $R^2$

Download PDF

Umberto Triacca¹

1977 Accesses
Explore all metrics

Abstract

It is well known the effect on uncentered $R^2$ stemming from adding a constant to dependent variable in a linear regression model with intercept. In this paper, we investigate the effect of adding a constant to variables on the uncentered $R^2$ when a linear regression through the origin is used. In particular, we consider two cases. First, a constant $c \in \mathbb {R}$ is added to all observations of the dependent variable. Second, a constant $c \in \mathbb {R}$ is added to all the observations of both the dependent variable and at least an independent variable. We show that in both cases there is an artificial variation of the uncentered $R^2$. This quantity is not invariant under location change.

Bootstrap LM tests for higher-order spatial effects in spatial linear regression models

Article 28 May 2018

Regressions Involving Circular Variables: An Overview

ANCOVA: a heteroscedastic global test when there is curvature and two covariates

Article 12 January 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Linear regression is one of the most familiar tools used to model the linear relationship between a dependent variable and one or more independent variables. In this paper, we will consider a linear no-intercept model. A very important question is: how to measure the fitting of the model to a set of observations? It is well known that the centered coefficient of determination, $R^2$, can assume negative values when the model does not contain an intercept. This happens because the sum of squared residuals may be larger than the explained sum of squares. Thus, the centered coefficient of determination does not make sense for regressions without a constant term. Several authors have suggested that, in these cases, a more appropriate measure of goodness of fit is the so-called uncentered coefficient of determination (see, for example, Hahn (1977) and Montgomery et al. (2012), p.48). A description of the situations in which the regression through the origin is appropriate is provided by Eisenhauer (2003).

In this paper, we investigate the effect of adding a constant to the variables on the uncentered coefficient of determination, when a linear no-intercept model is used. In particular, we consider two cases. First, a constant $c \in \mathbb {R}$ is added to all observations of the dependent variable. Second, a constant $c \in \mathbb {R}$ is added to all observations of the dependent variable and to all observations of at least an independent variable. We show that in both cases there is an artificial variation of the uncentered coefficient of determination.

We think that this result provides a quite new contribution in the field. In fact, it is well known that, when a model containing an intercept is considered, the uncentered coefficient of determination is not invariant to changes of measuring units whereby a constant is added to all observations of the dependent variable (see, for example, Davidson and MacKinnon (1999)). However, we have not been able to give a reference where it is shown that the same happens when a regression through the origin is considered.

We also show that, when regressors do not include a constant term, to add a large constant to all observations of the dependent variable makes the uncentered coefficient of determination very close to a limit value less than 1. This is another relevant difference with respect to the model with an intercept. In fact, when we consider a model with an intercept, this limit value is 1.

The rest of the paper is organized as follows. Section 2 introduces the definition of uncentered coefficient of determination. Section 3 presents the main results. Section 4 considers an illustrative example. Section 5 summarizes and offers concluding remarks.

2 The uncentered $R^2$

Let $\mathbf {y}=[y_1,y_2,\ldots ,y_n]^T \in \mathbb {R}^n$ be a vector of “observations” and $\mathbf {X}=[\mathbf {x}_{1},\ldots ,\mathbf {x}_{k}]$ a $n\times k$ “data matrix” each column $\mathbf {x}_{i}$ of which is a vector in $\mathbb {R}^{n}$, with $n>k$. The linear regression model assumes that

$$\begin{aligned} \mathbf {y}=\mathbf {X\pmb {\beta } }+\mathbf {u}, \end{aligned}$$

where $\pmb {\beta }\in \mathbb {R}^{k}$ is a vector of unknown coefficients, and u is an $n\times 1$ vector consisting of random disturbances.

Let $\hat{{\textbf {y}}}=[\hat{y}_1,\hat{y}_2,\ldots ,\hat{y}_n]^T=\mathbf {X}(\mathbf {X}^{T}\mathbf {X)}^{-1}\mathbf {X}^{T}\mathbf {y}$ be the vector of predicted y’s from ordinary least squares(OLS). The uncentered coefficient of determination $R_{uy}^{2}$ is defined as

$$\begin{aligned} R_{uy}^{2}=\frac{\left\| \hat{{\textbf {y}}}\right\| ^{2}}{\left\| \mathbf {y}\right\| ^{2}}. \end{aligned}$$

where $\left\| .\right\|$ is the Euclidean norm. It is well known (see, for example, Triacca and Volodin (2012)) that $R_{uy}^{2}$ is equal to the square of the cosine of the angle between ${\textbf {y}}$ and $\hat{\mathbf {y}}$, that is

$$\begin{aligned} \mathrm{R}^2_{uy}=\left[ \frac{\left\langle \mathbf {y}, \hat{\mathbf {y}}\right\rangle }{\left\| \mathbf {y} \right\| \left\| \hat{\mathbf {y}}\right\| }\right] ^2 \end{aligned}$$

Thus $0\le R_{uy}^{2}\le 1$ and it measures how close the vectors ${\textbf {y}}$ and $\hat{\mathbf {y}}$ are in terms of their directions. When $R_{uy}^{2}=1$, ${\textbf {y}}$ and $\hat{\mathbf {y}}$ are collinear, so that ${\textbf {y}}$ must be in the column space of $\mathbf {X}$, denoted by $\mathrm {col}(\mathbf {X})$. When $R_{uy}^{2}=0$, ${\textbf {y}}$ and $\hat{\mathbf {y}}$ are orthogonal, so that ${\textbf {y}}$ is in $\mathrm {col}(\mathbf {X})^{\bot }$ (the orthogonal complement of $\mathrm {col}(\mathbf {X})$). Some interesting properties concerning the uncentered $R^2$ are presented in Triacca and Volodin (2012). A useful discussion of centered vs. uncentered $R^2$ can be found in (Wooldridge 2016, p.214) and Baltagi (2008, p.72).

Now, we introduce the $n\times n$ matrix defined as

$$\begin{aligned} \mathbf {H}\mathbf {\equiv X}(\mathbf {X}^{T}\mathbf {X)}^{-1}\mathbf {X}^{T}. \end{aligned}$$

It is called ‘hat matrix’. The typical element of H is $h_{ij}$, denoting the element of row i and column j. We note that $\hat{\mathbf {y}}={\textbf {Hy}}$. Thus the hat matrix transforms the observed vector y into its LS estimate $\hat{\mathbf {y}}$. It can easily be verified that $\mathbf {H}$ is idempotent and symmetric. The hat matrix is the orthogonal projector onto $\mathrm {col}(\mathbf {X})$. The matrix that projects orthogonally the vector $\mathbf {y}$ on $\mathrm {col}(\mathbf {X})^{\perp }$ is

$$\begin{aligned} \mathbf {M}\mathbf {\equiv I-\mathbf {H}} \end{aligned}$$

where $\mathbf {I}$ is the $n\times n$ identity matrix. We have that

$$\begin{aligned} \mathbf {\mathbf {H}+\mathbf {M}=I.} \end{aligned}$$

Thus

$$\begin{aligned} \mathbf {y}=\mathbf {H}\mathbf {y}+\mathbf {M}\mathbf {y} \end{aligned}$$

with $\mathbf {\mathbf {H}y\perp M}\mathbf {y}$. By the Pythagorean theorem, it follows that

$$\begin{aligned} \left\| \mathbf {y}\right\| ^{2}\mathbf {=}\left\| \mathbf {\mathbf {H}y}\right\| ^{2}\mathbf {+}\left\| \mathbf {M}\mathbf {y}\right\| ^{2} \end{aligned}$$

Further, we have that

$$\begin{aligned} \left[ \frac{\sum _{i=1}^n\sum _{j=1}^n h_{ij}}{n}\right] ^{1/2}= & {} \left[ \frac{{\textbf {1}}^T{\textbf {H}}{} {\textbf {1}}}{\left\| {\textbf {1}}\right\| }\right] ^{1/2}\\= & {} \frac{\left\langle {\textbf {1}},{\textbf {H}}{} {\textbf {1}}\right\rangle }{\left\| {\textbf {1}}\right\| \left\| {\textbf {H}}{} {\textbf {1}}\right\| }\\= & {} \text{ cos } \theta , \end{aligned}$$

where $\theta$ is the angle between ${\textbf {1}}=(1,1,\ldots ,1)^T$ and ${\textbf {H}}{} {\textbf {1}}$. Thus

$$\begin{aligned} \frac{\sum _{i=1}^n\sum _{j=1}^n h_{ij}}{n}=\text{ cos}^2 \theta \end{aligned}$$

and hence

$$\begin{aligned} 0\le \frac{\sum _{i=1}^n\sum _{j=1}^n h_{ij}}{n}\le 1. \end{aligned}$$

Now, we observe that if

$$\begin{aligned} \frac{\sum _{i=1}^n\sum _{j=1}^n h_{ij}}{n} = 1, \end{aligned}$$

then $\left\langle {\textbf {1}},{\textbf {H}}{} {\textbf {1}}\right\rangle =\left\| {\textbf {1}}\right\| \left\| {\textbf {H}}{} {\textbf {1}}\right\|$. This implies that ${\textbf {1}}={\textbf {H1}} \in \mathrm {col}(\mathbf {X})$. Thus, we can conclude that if ${\textbf {1}} \notin \mathrm {col}(\mathbf {X})$, then

$$\begin{aligned} \frac{\sum _{i=1}^n\sum _{j=1}^n h_{ij}}{n}< 1. \end{aligned}$$

We will use this result in the sequel.

Considering the hat matrix, the uncentered coefficient of determination can be re-written as

$$\begin{aligned} R_{uy}^{2}=\frac{\left\| {\textbf {H}}{} {\textbf {y}}\right\| ^2}{\left\| {\textbf {y}}\right\| ^2}. \end{aligned}$$

3 Main results

It is important to note that if a non null constant c is added to all observations of the dependent variable,

$$\begin{aligned} \mathbf {z}=\mathbf {y}+\mathbf {c}=\mathbf {y}+c{\textbf {1}}, \end{aligned}$$

the resulting uncentered coefficient of determination,

$$\begin{aligned} R_{uz}^{2}=\frac{\left\| \hat{{\textbf {z}}}\right\| ^{2}}{\left\| \mathbf {z}\right\| ^{2}}, \end{aligned}$$

changes. This happens since the angle between ${\textbf {z}}$ and $\hat{{\textbf {z}}}=\mathbf {X}{\mathop {\gamma }\limits ^{\wedge }}$, where ${\mathop {{\gamma }}\limits ^{\wedge }}=(\mathbf {X}^{T}\mathbf {X)}^{-1}\mathbf {X}^{T}\mathbf {z}$, is different from the angle between ${\textbf {y}}$ and $\hat{{\textbf {y}}}$.

Interesting questions are: which is the limit of $R_{uz}^{2}$ as $c\rightarrow +\infty ?$ Which is the limit of the distance between z and $\hat{{\textbf {z}}}$ as $c\rightarrow +\infty$?

We also analyze the behavior of $R_{uz}^2$ in the case in which a constant $c \in \mathbb {R}$ is added to all observations of at least an independent variable ${\textbf {x}}_i$.

We observe that the first issue has been investigated in literature when the matrix X includes a constant (see, for example, (Davidson and MacKinnon 1999, p.75) ). Our framework is different since we will consider the case in which a linear no-intercept model is used. In particular, we assume that ${\textbf {1}}\notin \mathrm {col}(\mathbf {X})$.

Consider the regression model

$$\begin{aligned} {\textbf {z}}=\mathbf {y}+{\textbf {c}}=\mathbf {X\pmb {\beta } }+\mathbf {u}. \end{aligned}$$

We have that

$$\begin{aligned} R_{uz}^{2}= & {} \frac{\left\| {\textbf {H}}{} {\textbf {z}}\right\| ^2}{\left\| \mathbf {z} \right\| ^2}\\= & {} \frac{\left\| {\textbf {H}}{} {\textbf {y}}+{\textbf {H}}{} {\textbf {c}}\right\| ^2}{\left\| {\textbf {y}}+{\textbf {c}} \right\| ^2}\\= & {} \frac{\frac{\left\| \hat{{\textbf {y}}}\right\| ^2}{c^2}+\frac{2\sum _{i=1}^n\hat{y}_i}{c}+\sum _{i=1}^n\sum _{j=1}^n h_{ij}}{\frac{\left\| \mathbf {y} \right\| ^2}{c^2}+\frac{2\sum _{i=1}^ny_i}{c}+n} \end{aligned}$$

Since ${\textbf {1}}\notin \mathrm {col}(\mathbf {X})$, we have that

$$\begin{aligned} 0\le \sum _{i=1}^n\sum _{j=1}^n h_{ij} < n. \end{aligned}$$

Thus

$$\begin{aligned} \lim _{c\rightarrow +\infty }R_{uz}^{2}=\frac{\sum _{i=1}^n\sum _{j=1}^n h_{ij}}{n}<1^1. \end{aligned}$$

^{Footnote 1} Since

$$\begin{aligned} \frac{\sum _{i=1}^n\sum _{j=1}^n h_{ij}}{n}=\frac{\left\| {\textbf {H}}{} {\textbf {1}}\right\| ^2}{\left\| {\textbf {1}}\right\| ^2}, \end{aligned}$$

we have

$$\begin{aligned} \lim _{c\rightarrow +\infty }R_{uz}^{2}=\frac{\left\| {\textbf {H}}{} {\textbf {1}}\right\| ^2}{\left\| {\textbf {1}}\right\| ^2}. \end{aligned}$$

Now, we observe that the ratio

$$\begin{aligned} R_{u1}^{2}=\frac{\left\| {\textbf {H}}{} {\textbf {1}}\right\| ^2}{\left\| {\textbf {1}}\right\| ^2} \end{aligned}$$

can be interpreted as the uncentered $R^{2}$ for the OLS regression of a constant 1 onto a set of variables X, without an intercept term. Thus, we can conclude that

$$\begin{aligned} \lim _{c\rightarrow +\infty }R_{uz}^{2}=R_{u1}^{2}<1. \end{aligned}$$

This happens because ${\textbf {z}}={\textbf {y}}+{\textbf {c}}$ becomes collinear to 1 as $c\rightarrow +\infty$. In fact, since the cosine of the angle between z and 1 is

$$\begin{aligned} \frac{\left\langle \mathbf {z}, \mathbf {1}\right\rangle }{\left\| \mathbf {z} \right\| \left\| \mathbf {1}\right\| }=\frac{\frac{\sum _{i=1}^ny_i}{c}+n}{\sqrt{\left( \frac{\left\| \mathbf {y} \right\| ^2}{c^2}+\frac{2\sum _{i=1}^ny_i}{c}\right) n+n^2}} \end{aligned}$$

it follows that

$$\begin{aligned} \lim _{c\rightarrow +\infty }\frac{\left\langle \mathbf {z}, \mathbf {1}\right\rangle }{\left\| \mathbf {z} \right\| \left\| \mathbf {1}\right\| }=\frac{n}{\sqrt{n^2}}=1. \end{aligned}$$

It is important to note that if there exists a vector ${\textbf {v}} \in \text{ col }({\textbf {X}})$ such that $\left\| {\textbf {1}}-{\textbf {v}}\right\| \approx 0$, then $R_{u1}^{2}\approx 1.$ In fact, because

$$\begin{aligned} \left\| {\textbf {1}}-{\textbf {H}}{} {\textbf {1}}\right\| \le \left\| {\textbf {1}}-{\textbf {w}}\right\| \; \; \forall \; {\textbf {w}}\in \text{ col }({\textbf {X}}), \end{aligned}$$

we have that $\left\| {\textbf {1}}-{\textbf {H}}{} {\textbf {1}}\right\| \approx 0$. This implies $\left\| {\textbf {1}}\right\| \approx \left\| {\textbf {H}}{} {\textbf {1}}\right\|$ and hence $R_{u1}^{2}\approx 1.$

Further, we observe that

$$\begin{aligned} \left\| \mathbf {z}-\hat{\mathbf {z}}\right\| =\left| c\right| \left\| \frac{\hat{\mathbf {y}}}{c}+M{\textbf {1}}\right\| . \end{aligned}$$

and

$$\begin{aligned} \lim _{c\rightarrow +\infty }\left\| \mathbf {z}-\hat{\mathbf {z}}\right\| =\sqrt{{\textbf {1}}^T{\textbf {M}}{} {\textbf {1}}}\lim _{c\rightarrow +\infty }\left| c\right| . \end{aligned}$$

If ${\textbf {1}}\notin \mathrm {col}(\mathbf {X})$, we have that

$$\begin{aligned} {\textbf {1}}^T{\textbf {M}}{} {\textbf {1}}\ne 0 \end{aligned}$$

and hence it follows that

$$\begin{aligned} \lim _{c\rightarrow +\infty }\left\| \mathbf {z}-\hat{\mathbf {z}}\right\| =\infty . \end{aligned}$$

Thus, when the regressors do not include a constant term, the addition of a large constant to all observations of the dependent variable makes very large the distance between the vector of observations and its estimate but this does not necessarily make $R_{uz}^{2}$ near to 0. In this case, $R_{uz}^{2}$ converges to $R_{u1}^{2}$ as $c\rightarrow +\infty$ and $R_{u1}^{2}$ may be very close to 1. This is a very unsatisfactory feature of the uncentered coefficient of determination. An appropriate measure of fit should not be affected by the location of the dependent variables. To illustrate this point, we consider the following ten (x, y) pairs of hypothetical observations: (5.1, 0.9), (4.2, 1.1), (6.5,−0.7), (5.3, −1.3), (3.1, 0.8), (6.2, 1.5), (5.8, 0.1), (3.2,−0.1), (4.7, 1.4), (2.7, 1.3). We estimate the following model

$$\begin{aligned} z_i=y_i+c= \beta _1x_{i}+w_i \; \; i=1,2,\ldots ,10. \end{aligned}$$

The value of the uncentered $R^2$ for this model with $c=0$ is 0.156413, indicating a terrible fit on the original data. However, by adding a constant to all observations of the dependent variable we can produce an artificial increase in the uncentered $R^2$. Considering $c=5, 20, 100$, we obtain $R^2_{u,z}=0.881614, 0.922237, 0.929368$, respectively. Clearly, $R_{uz}^{2}$ converges to $R_{u1}^{2}=0.930829$ as $c\rightarrow \infty$.

Finally, we note that by adding a sufficiently large constant to at least a column of X, we can make $R_{uz}^{2}$ as close as we wish to 1. In order to show this, without loss of generality, we pose

$$\begin{aligned} \pmb {\chi }=[{\textbf {x}}_1+c{\textbf {1}}, {\textbf {x}}_2,\ldots ,{\textbf {x}}_k] \end{aligned}$$

with $c \in \mathbb {R}$, and we consider the vector $\pmb {\chi }\pmb {\hat{\lambda }}$, where $\pmb {\hat{\lambda }}=[\hat{\lambda }_1,\hat{\lambda }_2,\ldots ,\hat{\lambda }_k]^T=(\pmb {\chi }^{T}\pmb {\chi })^{-1}\pmb {\chi }^{T}\mathbf {z}$. We have that

$$\begin{aligned} \pmb {\chi }\pmb {\hat{\lambda }}=\left[ \begin{array}{cccc} x_{11}+c&{} x_{12}&{}\ldots &{} x_{1k}\\ x_{21}+c&{} x_{22}&{}\ldots &{} x_{2k}\\ \vdots &{} \vdots &{}\ddots &{}\vdots \\ x_{n1}+c &{} x_{n2} &{}\ldots &{} x_{nk} \end{array} \right] \left[ \begin{array}{c} \hat{\lambda }_1\\ \hat{\lambda }_2\\ \vdots \\ \hat{\lambda }_k \end{array} \right] =\hat{\lambda }_1\left[ \begin{array}{c} c\\ c\\ \vdots \\ c \end{array} \right] +\left[ \begin{array}{c} \alpha _1\\ \alpha _2\\ \vdots \\ \alpha _k \end{array} \right] =\hat{\lambda }_1{\textbf {c}}+\pmb {\alpha } \end{aligned}$$

and

$$\begin{aligned} \sqrt{R_{uz}^{2}}= & {} \frac{\left\langle \mathbf {z}, \pmb {\chi }\pmb {\hat{\lambda }}\right\rangle }{\left\| \mathbf {z} \right\| \left\| \pmb {\chi }\pmb {\hat{\lambda }}\right\| }\\= & {} \frac{\left\langle {\textbf {y}}+{\textbf {c}},\hat{\lambda }_1{\textbf {c}}+\pmb {\alpha }\right\rangle }{\left\| {\textbf {y}}+{\textbf {c}}\right\| \left\| \hat{\lambda }_1{\textbf {c}}+\pmb {\alpha }\right\| }\\= & {} \frac{\hat{\lambda }_1\frac{\left\langle {\textbf {y,{\textbf {1}}}}\right\rangle }{c}+\frac{\left\langle {\textbf {y}},\pmb {\alpha }\right\rangle }{c^2}+\hat{\lambda }_1\left\langle {\textbf {1}},{\textbf {1}}\right\rangle +\frac{\left\langle {\textbf {1}},\pmb {\alpha }\right\rangle }{c}}{\sqrt{\frac{\left\langle {\textbf {y}},{\textbf {y}}\right\rangle }{c^2}+2\frac{\left\langle {\textbf {1}},{\textbf {y}}\right\rangle }{c}+\left\langle {\textbf {1}},{\textbf {1}}\right\rangle }\sqrt{\frac{\left\langle \pmb {\alpha },\pmb {\alpha }\right\rangle }{c^2}+2\hat{\lambda }_1\frac{\left\langle {\textbf {1}},\pmb {\alpha }\right\rangle }{c}+\hat{\lambda }_1^2\left\langle {\textbf {1}},{\textbf {1}}\right\rangle }}. \end{aligned}$$

Thus we can conclude that

$$\begin{aligned} \lim _{c\rightarrow +\infty }R_{uz}^{2}=1. \end{aligned}$$

Adding a sufficiently large constant to the dependent variable and to at least one independent variable of the model, we can make $R_{uz}^{2}$ as close as we wish to 1.

4 An illustrative example

In this section, in order to illustrate the obtained results, we consider an empirical example. Data have been taken from UK National Weather Service

(https://www.metoffice.gov.uk/pub/data/weather/uk/climate/stationdata/oxforddata.txt).

In particular, we use the following monthly time series:

Mean daily maximum temperature (tmax)
Mean daily minimum temperature (tmin)
Total rainfall (rain)
Total sunshine duration (sun)

for a 17 years period (2001–2017) at Oxford (UK). We investigate the relationship between tmax and, respectively, rain, sun, and tmin. In all cases, we estimate a simple linear model without intercept,

$$\begin{aligned} y_i=\beta x_i+u_i. \end{aligned}$$

Our first case concerns the relationship between monthly mean maximum temperature and rainfall. The temperature, $y_i$, is expressed on the Celsius scale. Fitting a simple linear regression without intercept gives a value of 0.663447 for $R_{uy}^{2}$. If we estimate the model by using the temperature expressed on the Kelvin scale, that is $z_i=y_i+273.15$, we have $R_{uz}^{2}=0.766925$ (we note that $R_{u1}^{2}=0.768007$). Thus, in this case, by adding a sufficiently large constant to all observations of the dependent variable we obtain an artificial increase in the uncentered $R^2$. This happens because $R_{u1}^{2}>R_{uy}^{2}$ and $R_{uz}^{2} \rightarrow R_{u1}$ as $c\rightarrow +\infty$. Of course, if $R_{u1}^{2}$ had been less than $R_{uy}^{2}$, a reduction in the uncentered $R^2$ would have occured (see the next case).

Then, we regress tmax on sun. We obtain $R_{uy}^{2}=0.950690$, $R_{uz}^{2}=0.845403$ and $R_{u1}^{2}=0.833621$. Here there is an artificial decrease in the uncentered $R^2$.

The final case regards the relationship between tmax and tmin. If we estimate the model by using the temperature expressed on the Celsius scale, we have $R_{uy}^{2}=0.953874$. If we use a model with the temperature expressed on the Kelvin scale, that is

$$\begin{aligned} z_i=y_i+273.15=\beta (x_i+273.15)+u_i, \end{aligned}$$

we obtain $R_{uz}^{2}=0.999959$, value, as expected, approaching to 1.

5 Final remarks

In literature, it is often argued that in a linear regression model without intercept, the uncentered $R^2$ is an appropriate measure of goodness of fit. In this paper, we have shown that, when a linear regression through the origin is used, the uncentered $R^2$ varies artificially if a constant $c \in \mathbb {R}$ is added to the observations. In particular, we have considered two cases. First, the constant c is added to all observations of the dependent variable. Second, the constant c is added to all observations of the dependent variable and to all observations of at least an independent variable. We have shown that in both cases there is an artificial variation of the uncentered $R^2$.

In the first case, the uncentered $R^2$ reaches the limit value $R_{u1} < 1$, when the constant c goes to infinite. If $R_{u1}^{2}>R_{uy}^{2}$, then by adding a sufficiently large constant to all observations of the dependent variable we obtain an artificial increase in the uncentered $R^2$. If $R_{u1}^{2}<R_{uy}^{2}$, then a reduction in the uncentered $R^2$ is obtained.

In the second case, by adding a sufficient large constant we can make $R_{uz}^{2}$ as close as we wish to 1. From this point of view, the uncentered $R^{2}$ does not seem to be an appropriate measure of goodness of fit for linear regression models without intercept. In fact, rather than measuring the fit in term of Euclidean distance between y and $\hat{{\textbf {y}}}$, the uncentered coefficient of determination measures how close the vectors y and $\hat{{\textbf {y}}}$ are in terms of their directions.

Several authors have suggested that the uncentered coefficient of determination is a more appropriate measure of goodness of fit with respect to the centered coefficient of determination, when a regression without a constant is used. However, in this paper, we have shown the uncentered $R^2$ is not invariant under location change, when regression through the origin is considered. For this reason, also the use of the uncentered $R^2$ as fitting measure have to be considered with caution.

Notes

It is interesting to note that if we assume that ${\textbf {1}}\in \mathrm {col}(\mathbf {X})$, then
$$\begin{aligned} \sum _{i=1}^n\sum _{j=1}^n h_{ij} = n. \end{aligned}$$
and hence
$$\begin{aligned} \lim _{c\rightarrow +\infty }R_{uz}^{2}=1. \end{aligned}$$
.

References

Baltagi, B.H.: Econometrics. Springer, Berlin Heidelberg (2008)
Google Scholar
Davidson, R., MacKinnon, J.G.: Econometric Theory and Methods. Oxford Univesity Press, Oxford (1999)
Google Scholar
Eisenhauer, J.G.: Regression through the origin. Teaching Stat 3, 76–80 (2003)
Article Google Scholar
Hahn, G.: Fitting regression models with no intercept term. J Qualit Technol 9, 56–61 (1977)
Article Google Scholar
Montgomery, D.C., Peck, E.A., Vining, G.G.: Introduction to Linear Regression Analysis, 5th edn. Wiley, New York (2012)
Google Scholar
Triacca, U., Volodin, A.: Few remarks on the geometry of the uncentered coefficient of determination. Lobachevskii J Math 33, 284–292 (2012)
Article Google Scholar
Wooldridge, J.: Introductory Econometrics: A Modern Approach. 6 Edition, Cengaga Learning (2016)

Download references

Acknowledgements

I thank Fulvia Focker and the anonymous referees for their valuable comments.

Funding

Open access funding provided by Università degli Studi dell’Aquila within the CRUI-CARE Agreement. The authors have not disclosed any funding.

Author information

Authors and Affiliations

Department of Computer Engineering, Computer Science and Mathematics, University of L’Aquila, 67010, Coppito, Italy
Umberto Triacca

Authors

Umberto Triacca
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Umberto Triacca.

Ethics declarations

Conflict of interest

The author declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Triacca, U. Adding a constant to the variables in the regression through the origin: the effect on the uncentered $R^2$. Qual Quant 57, 2781–2789 (2023). https://doi.org/10.1007/s11135-022-01495-6

Download citation

Accepted: 03 July 2022
Published: 09 August 2022
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11135-022-01495-6

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Adding a constant to the variables in the regression through the origin: the effect on the uncentered \(R^2\)

Abstract

Similar content being viewed by others

Bootstrap LM tests for higher-order spatial effects in spatial linear regression models

Regressions Involving Circular Variables: An Overview

ANCOVA: a heteroscedastic global test when there is curvature and two covariates

1 Introduction

2 The uncentered \(R^2\)

3 Main results

4 An illustrative example

5 Final remarks

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Adding a constant to the variables in the regression through the origin: the effect on the uncentered \(R^2\)

Abstract

Similar content being viewed by others

Bootstrap LM tests for higher-order spatial effects in spatial linear regression models

Regressions Involving Circular Variables: An Overview

ANCOVA: a heteroscedastic global test when there is curvature and two covariates

1 Introduction

2 The uncentered \(R^2\)

3 Main results

4 An illustrative example

5 Final remarks

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation