Skip to main content

Least-Squares Linear Regression

  • Chapter
  • First Online:
Visualizing Linear Models

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Often these observed variables are mathematically modeled by random variables, which can cause some confusion. We will not make this connection until Chap. 4. For this chapter, we will only have data and no model.

  2. 2.

    Do not take the word measurement too literally here. It does not necessarily involve a scientist reading numbers from a fancy instrument; it can be a person’s age or sex, for example.

  3. 3.

    The numbers should be meaningful as numbers rather than as codes for something else. For example, zip codes are not quantitative variables.

  4. 4.

    Sometimes a categorical variable specifies groups that have a natural ordering (e.g. freshman, sophomore, junior, senior), in which case the variable is called ordinal.

  5. 5.

    For simplicity, we will talk as if there is always exactly one response variable. That will indeed be the case throughout this text. The alternative context is called multivariate analysis.

  6. 6.

    When the response variable is categorical, this selection is called classification.

  7. 7.

    The terms least-squares point and location regression are not common, but they are introduced in order to help clarify the coherence of this chapter’s topics.

  8. 8.

    If n is left unspecified, do we have to worry about whether the drawings remain valid with n < 3? Not to worry, the drawing remains valid as can be a subspace embedded within the three-dimensional picture.

  9. 9.

    More precisely, these statistics are unbiased estimates of the variances and covariance between the random variables X and Y , assuming (x 1, y 1), …,  (x n, y n) were iid draws from the distribution of (X, Y ) see Exercise 3.36.

  10. 10.

    Perhaps you would rather use the slope-intercept parameterization y = a + bx in the first place rather than subtracting \({\bar {x}}\) from x. I have concluded that there is very good reason to subtract \({\bar {x}}\), so bear with me. Either way you parameterize it, you will derive the same least-squares line.

  11. 11.

    Of course, individuals also have a chance to be more exceptional than their parents in any given characteristic, and as a result aggregate population characteristics remain relatively stable.

  12. 12.

    In that section, the centered data matrix was defined by subtracting the empirical mean vector from each row of the data matrix. Convince yourself that the expression given here is equivalent.

  13. 13.

    The least-squares coefficient of that column was identified as \({\hat {b}}_0\) in Sect. 2.1.3, while the least-squares coefficients of the x (1), …, x (m) are exactly the same as the coefficients calculated for their centered versions.

  14. 14.

    The notation is called an indicator function; it takes the value 1 if its condition is true and 0 otherwise.

  15. 15.

    Galton used the letter R for the “regression coefficient” in the least-squares equation for the standardized variables, which we now call the correlation. In simple linear regression, R 2 is exactly the squared correlation, which is why the statistic is called R 2. Note that with more than one explanatory variable, this interpretation no longer works.

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Brinda, W.D. (2021). Least-Squares Linear Regression. In: Visualizing Linear Models. Springer, Cham. https://doi.org/10.1007/978-3-030-64167-2_2

Download citation

Publish with us

Policies and ethics