Skip to main content

Least Squares Estimation

  • Chapter
  • First Online:
Advanced Statistics for the Behavioral Sciences
  • 1481 Accesses

Abstract

In Chap. 1 we learned how to solve a system of linear equations. All of the systems were square (i.e., the number of equations equaled the number of unknowns) and each system had an exact solution. Systems like these do not characterize most statistical analyses. Instead, we deal with rectangular systems (more equations than unknowns) for which an exact solution does not exist. Such systems are called overdetermined linear systems, and in this chapter you will learn how to solve them and become familiar with their role in a linear regression model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The error term is also referred to as the residuals, and in this chapter we will use the terms interchangeably.

  2. 2.

    Some textbooks refer to the QR decomposition as the QR factorization. These terms are interchangeable. Unfortunately, there is another technique known as the QR algorithm that is used in a different context. We will discuss this method in Chap. 4, but it should not be confused with the QR decomposition itself.

  3. 3.

    Some textbooks use the term orthogonal to mean orthonormal, failing to distinguish the two terms. The failure is justified because any orthogonal matrix can be normalized to create an orthonormal one.

  4. 4.

    In the section that follows we will learn other ways of calculating R that offer greater numerical stability.

  5. 5.

    Rearranging terms in Eq. (2.11) shows another way to calculate b from R: b = R−1Qy. Because this method involves using an inverse, it is less accurate and efficient than using backward substitution.

  6. 6.

    In Chap. 1 we noted that many statistical packages, including \( \mathrm{\mathcal{R}} \), return an upper triangular matrix when performing the Cholesky decomposition. In part, this is because the Cholesky decomposition of AA is equivalent to the R matrix from a QR decomposition of A.

  7. 7.

    A more sophisticated implementation of this method, known as the Modified Gram-Schmidt Orthogonalization can be found in Golub and van Loan (2013).

  8. 8.

    We can also compute Q = XR−1.

  9. 9.

    Other ways to create the rotation have greater numerical stability than the one shown here, and the accompanying\( \mathrm{\mathcal{R}} \) code uses a method from Golub and van Loan (2013).

  10. 10.

    The sign of the rotated vectors differs among decomposition methods, but this has no effect on the decomposition itself.

  11. 11.

    Portions of this section are excerpted from Brown (2014).

  12. 12.

    Nonlinear regression models are discussed later in this text.

  13. 13.

    In practice, the predictors can be random variables as long as their values are generated by a mechanism that is unrelated to the error term.

  14. 14.

    Bias and efficiency are known as finite-sample properties because they do not depend on sample size; in contrast, consistency refers to an asymptotic property that varies as a function of sample size.

  15. 15.

    This definition assumes that we have only one predictor. When we have multiple predictors, we need to further stipulate that they are not linearly dependent.

  16. 16.

    Notice that as long as the errors have expectation 0, the least squares estimators are unbiased regardless of whether the errors are independent or identically distributed. These latter properties do, however, affect the efficiency of the least squares estimators.

  17. 17.

    A more thorough discussion of maximum likelihood estimation can be found in a variety of textbooks, including Brown (2014).

References

  • Brown, J. D. (2014). Linear models in matrix form: A hands-on approach for the behavioral sciences. New York: Springer.

    Book  Google Scholar 

  • Davis, C. H. (1857). Theory of the motion of the heavenly bodies moving about the sun in conic sections: A translation of Gauss’s “Theoria Motus”. Boston: Little, Brown and Company.

    Google Scholar 

  • Gauss, C. F. (1809). Theoria motus corporum coelestium. In Carl Friedrich Gauss – Werke. Königliche Gesellschaft der Wissenschaften zu Göttingen (1906).

    Google Scholar 

  • Golub, G. H., & van Loan, C. F. (2013). Matrix computations (4th ed.). Baltimore: John Hopkins.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Brown, J.D. (2018). Least Squares Estimation. In: Advanced Statistics for the Behavioral Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-93549-2_2

Download citation

Publish with us

Policies and ethics