Least Squares Estimation

Brown, Jonathon D.

doi:10.1007/978-3-319-93549-2_2

Jonathon D. Brown²

1481 Accesses

Abstract

In Chap. 1 we learned how to solve a system of linear equations. All of the systems were square (i.e., the number of equations equaled the number of unknowns) and each system had an exact solution. Systems like these do not characterize most statistical analyses. Instead, we deal with rectangular systems (more equations than unknowns) for which an exact solution does not exist. Such systems are called overdetermined linear systems, and in this chapter you will learn how to solve them and become familiar with their role in a linear regression model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The error term is also referred to as the residuals, and in this chapter we will use the terms interchangeably.
2.
Some textbooks refer to the QR decomposition as the QR factorization. These terms are interchangeable. Unfortunately, there is another technique known as the QR algorithm that is used in a different context. We will discuss this method in Chap. 4, but it should not be confused with the QR decomposition itself.
3.
Some textbooks use the term orthogonal to mean orthonormal, failing to distinguish the two terms. The failure is justified because any orthogonal matrix can be normalized to create an orthonormal one.
4.
In the section that follows we will learn other ways of calculating R that offer greater numerical stability.
5.
Rearranging terms in Eq. (2.11) shows another way to calculate b from R: b = R⁻¹Q′y. Because this method involves using an inverse, it is less accurate and efficient than using backward substitution.
6.
In Chap. 1 we noted that many statistical packages, including \( \mathrm{\mathcal{R}} \), return an upper triangular matrix when performing the Cholesky decomposition. In part, this is because the Cholesky decomposition of A′A is equivalent to the R matrix from a QR decomposition of A.
7.
A more sophisticated implementation of this method, known as the Modified Gram-Schmidt Orthogonalization can be found in Golub and van Loan (2013).
8.
We can also compute Q = XR⁻¹.
9.
Other ways to create the rotation have greater numerical stability than the one shown here, and the accompanying\( \mathrm{\mathcal{R}} \) code uses a method from Golub and van Loan (2013).
10.
The sign of the rotated vectors differs among decomposition methods, but this has no effect on the decomposition itself.
11.
Portions of this section are excerpted from Brown (2014).
12.
Nonlinear regression models are discussed later in this text.
13.
In practice, the predictors can be random variables as long as their values are generated by a mechanism that is unrelated to the error term.
14.
Bias and efficiency are known as finite-sample properties because they do not depend on sample size; in contrast, consistency refers to an asymptotic property that varies as a function of sample size.
15.
This definition assumes that we have only one predictor. When we have multiple predictors, we need to further stipulate that they are not linearly dependent.
16.
Notice that as long as the errors have expectation 0, the least squares estimators are unbiased regardless of whether the errors are independent or identically distributed. These latter properties do, however, affect the efficiency of the least squares estimators.
17.
A more thorough discussion of maximum likelihood estimation can be found in a variety of textbooks, including Brown (2014).

References

Brown, J. D. (2014). Linear models in matrix form: A hands-on approach for the behavioral sciences. New York: Springer.
Book Google Scholar
Davis, C. H. (1857). Theory of the motion of the heavenly bodies moving about the sun in conic sections: A translation of Gauss’s “Theoria Motus”. Boston: Little, Brown and Company.
Google Scholar
Gauss, C. F. (1809). Theoria motus corporum coelestium. In Carl Friedrich Gauss – Werke. Königliche Gesellschaft der Wissenschaften zu Göttingen (1906).
Google Scholar
Golub, G. H., & van Loan, C. F. (2013). Matrix computations (4th ed.). Baltimore: John Hopkins.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychology, University of Washington, Seattle, WA, USA
Jonathon D. Brown

Authors

Jonathon D. Brown
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Brown, J.D. (2018). Least Squares Estimation. In: Advanced Statistics for the Behavioral Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-93549-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-93549-2_2
Published: 01 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93547-8
Online ISBN: 978-3-319-93549-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics