Least-Squares Linear Regression

Brinda, W. D.

doi:10.1007/978-3-030-64167-2_2

W. D. Brinda²

2407 Accesses
2 Altmetric

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Hardcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Often these observed variables are mathematically modeled by random variables, which can cause some confusion. We will not make this connection until Chap. 4. For this chapter, we will only have data and no model.
2.
Do not take the word measurement too literally here. It does not necessarily involve a scientist reading numbers from a fancy instrument; it can be a person’s age or sex, for example.
3.
The numbers should be meaningful as numbers rather than as codes for something else. For example, zip codes are not quantitative variables.
4.
Sometimes a categorical variable specifies groups that have a natural ordering (e.g. freshman, sophomore, junior, senior), in which case the variable is called ordinal.
5.
For simplicity, we will talk as if there is always exactly one response variable. That will indeed be the case throughout this text. The alternative context is called multivariate analysis.
6.
When the response variable is categorical, this selection is called classification.
7.
The terms least-squares point and location regression are not common, but they are introduced in order to help clarify the coherence of this chapter’s topics.
8.
If n is left unspecified, do we have to worry about whether the drawings remain valid with n < 3? Not to worry, the drawing remains valid as can be a subspace embedded within the three-dimensional picture.
9.
More precisely, these statistics are unbiased estimates of the variances and covariance between the random variables X and Y , assuming (x ₁, y ₁), …, (x _n, y _n) were iid draws from the distribution of (X, Y ) see Exercise 3.36.
10.
Perhaps you would rather use the slope-intercept parameterization y = a + bx in the first place rather than subtracting \({\bar {x}}\) from x. I have concluded that there is very good reason to subtract \({\bar {x}}\), so bear with me. Either way you parameterize it, you will derive the same least-squares line.
11.
Of course, individuals also have a chance to be more exceptional than their parents in any given characteristic, and as a result aggregate population characteristics remain relatively stable.
12.
In that section, the centered data matrix was defined by subtracting the empirical mean vector from each row of the data matrix. Convince yourself that the expression given here is equivalent.
13.
The least-squares coefficient of that column was identified as \({\hat {b}}_0\) in Sect. 2.1.3, while the least-squares coefficients of the x ⁽¹⁾, …, x ^(m) are exactly the same as the coefficients calculated for their centered versions.
14.
The notation is called an indicator function; it takes the value 1 if its condition is true and 0 otherwise.
15.
Galton used the letter R for the “regression coefficient” in the least-squares equation for the standardized variables, which we now call the correlation. In simple linear regression, R ² is exactly the squared correlation, which is why the statistic is called R ². Note that with more than one explanatory variable, this interpretation no longer works.

Author information

Authors and Affiliations

Statistics and Data Science, Yale University, New Haven, CT, USA
W. D. Brinda

Authors

W. D. Brinda
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Brinda, W.D. (2021). Least-Squares Linear Regression. In: Visualizing Linear Models. Springer, Cham. https://doi.org/10.1007/978-3-030-64167-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-64167-2_2
Published: 25 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-64166-5
Online ISBN: 978-3-030-64167-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics