Skip to main content
Log in

A Gauss–Newton iteration for Total Least Squares problems

  • Published:
BIT Numerical Mathematics Aims and scope Submit manuscript

Abstract

The Total Least Squares solution of an overdetermined, approximate linear equation \(Ax \approx b\) minimizes a nonlinear function which characterizes the backward error. We devise a variant of the Gauss–Newton iteration with guaranteed convergence to that solution, under classical well-posedness hypotheses. At each iteration, the proposed method requires the solution of an ordinary least squares problem where the matrix A is modified by a rank-one term. In exact arithmetics, the method is equivalent to an inverse power iteration to compute the smallest singular value of the complete matrix \((A\mid b)\). Geometric and computational properties of the method are analyzed in detail and illustrated by numerical examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. We are grateful to an anonymous referee for having brought this citation to our attention.

References

  1. Björck, Å., Heggernes, P., Matstoms, P.: Methods for large scale total least squares problems. SIAM J. Matrix Anal. Appl. 22, 413–429 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  2. Daniel, J.W., Gragg, W.B., Kaufman, L., Stewart, G.W.: Reorthogonalization and stable algorithms for updating the Gram–Schmidt QR factorization. Math. Comput. 30, 772–795 (1976)

    MathSciNet  MATH  Google Scholar 

  3. Golub, G.H.: Some modified matrix eigenvalue problems. SIAM Rev. 15, 318–344 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  4. Golub, G.H., Van Loan, C.: An analysis of the total least squares problem. SIAM J. Numer. Anal. 17, 883–893 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  5. Golub, G.H., Van Loan, C.: Matrix Computations, 4th edn. The Johns Hopkins University Press, Baltimore (1996)

    MATH  Google Scholar 

  6. Li, C.K., Liu, X.G., Wang, X.F.: Extension of the total least square problem using general unitarily invariant norms. Linear Multilinear Algebra 55, 71–79 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  7. Markovsky, I.: Bibliography on total least squares and related methods. Stat. Interface 3, 329–334 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  8. Markovsky, I., Van Huffel, S., Pintelon, R.: Block Toeplitz/Hankel total least squares. SIAM J. Matrix Anal. Appl. 26, 1083–1099 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  9. Markovsky, I., Van Huffel, S.: Overview of total least-squares methods. Signal Process. 87, 2283–2302 (2007)

    Article  MATH  Google Scholar 

  10. Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York (1970)

    MATH  Google Scholar 

  11. Paige, C., Strakos̆, Z.: Scaled total least squares fundamentals. Numer. Math. 91, 117–146 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  12. Peters, G., Wilkinson, J.H.: Inverse iteration, ill-conditioned equations and Newton’s method. SIAM Rev. 21, 339–360 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  13. Van Huffel, S., Vandewalle, J.: Analysis and solution of the nongeneric total least squares problem. SIAM J. Matrix Anal. Appl. 9, 360–372 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  14. Van Huffel, S., Vandewalle, J.: The Total Least Squares Problem: Computational Aspects and Analysis. SIAM, Philadelphia (1991)

    Book  MATH  Google Scholar 

Download references

Acknowledgements

The first author acknowledges the support received by Istituto Nazionale di Alta Matematica (GNCS-INdAM, Italy) for his research. The work of the second author has been partly supported by a student research grant by University of Udine, Italy, and performed during a visit at Vrije Universiteit Brussel, Belgium. Both authors thank Prof. I. Markovsky for his ospitality and advice, and two anonymous referees for their pertinent and constructive comments which led to improvements of our manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dario Fasino.

Additional information

Communicated by Lars Eldén.

Appendix A: Proof of Theorem 3.1

Appendix A: Proof of Theorem 3.1

Before entering into the details of the proof we rephrase the iteration in Algorithm GN-TLS with optimal step length in a more general setting. Notations adopted hereafter mirror those in the previous sections, with some minor exceptions.

Let \(C\in \mathbb {R}^{p\times q}\) be a full column rank matrix, let \(\mathscr {S}= \{s \in \mathbb {R}^q : \Vert s\Vert = 1\}\) and \(\mathscr {E}= \{y \in \mathbb {R}^p : y = Cs, s\in \mathscr {S}\}\). Therefore, \(\mathscr {E}\) is a differentiable manifold of \(\mathbb {R}^p\); more precisely, it is an ellipsoid whose (nontrivial) semiaxes are directed as the left singular vectors of C; and the corresponding singular values are the respective lengths.

For any nonzero vector \(z\in \mathrm {Range}(C)\) there exists a unique vector \(y \in \mathscr {E}\) such that \(z = \alpha y\) for some scalar \(\alpha > 0\); we say that y is the retraction of z onto \(\mathscr {E}\).

For any \(f\in \mathscr {E}\) let \(\mathscr {T}_f\) be the tangent space of \(\mathscr {E}\) in f. Hence, \(\mathscr {T}_f\) is an affine \((q-1)\)-dimensional subspace of \(\mathbb {R}^p\) with \(f\in \mathscr {T}_f\). If \(f = Cs\) then it is not difficult to verify that \(\mathscr {T}_f\) admits the following description:

$$\begin{aligned} \mathscr {T}_f = \{ f + Cw,\ s^Tw = 0\} . \end{aligned}$$

In fact, the map \(s\mapsto Cs\) transforms tangent spaces of the unit sphere \(\mathscr {S}\) into tangent spaces of \(\mathscr {E}\).

Consider the following iteration:

  • Choose \(f_0 \in \mathscr {E}\)

  • For \(k = 0,1,2,\ldots \)

    • Let \(z_{k}\) be the minimum norm vector in \(\mathscr {T}_{f_k}\)

    • Let \(f_{k+1}\) be the retraction. of \(z_k\) onto \(\mathscr {E}\).

Owing to Lemma 3.1 it is not difficult to recognize that the sequence \(\{f(x_k)\}\) produced by Algorithm GN-TLS with optimal step length fits into the framework of the foregoing iteration.

Hereafter, we analyze the behavior of the sequence \(\{f_k\}\subset \mathscr {E}\) and of the auxiliary sequence \(\{s_k\}\subset \mathscr {S}\) defined by the equation \(s_k = C^+ f_k\). In particular, we will prove that the sequence \(\{s_k\}\) is produced by a certain power method and converges to a point in \(\mathscr {S}\) corresponding to the smallest semiaxis of \(\mathscr {E}\). To this aim, we need the following preliminary result characterizing the solution of the least squares problem with a linear constraint.

Lemma 5.1

Let A be a full column rank matrix and let v be a nonzero vector. The solution \(\bar{x}\) of the constrained least squares problem

$$\begin{aligned} \min _{x\, :\, v^Tx = 0}\Vert Ax - b\Vert \end{aligned}$$

is given by \(\bar{x} = P x_{\mathrm {LS}}\) where \(x_{\mathrm {LS}} = A^+b\) is the solution of the unconstrained least squares problem and

$$\begin{aligned} P = I - \frac{1}{v^T(A^TA)^{-1}v}(A^TA)^{-1}vv^T \end{aligned}$$

is the oblique projector onto \(\langle v\rangle ^\perp \) along \((A^TA)^{-1}v\).

Proof

Simple computations using Lagrange multipliers, see e.g., [5, § 6.2.4], prove that \(\bar{x}\) fulfills the linear equation

$$\begin{aligned} \begin{pmatrix} A^TA &{} v \\ v^T &{} 0 \end{pmatrix} \begin{pmatrix} \bar{x} \\ \lambda \end{pmatrix} = \begin{pmatrix} A^Tb \\ 0 \end{pmatrix} , \end{aligned}$$

for some scalar \(\lambda \). To solve this equation, consider the block triangular factorization

$$\begin{aligned} \begin{pmatrix} A^TA &{} v \\ v^T &{} 0 \end{pmatrix} = \begin{pmatrix} A^TA &{} 0 \\ v^T &{} 1 \end{pmatrix} \begin{pmatrix} I &{} w \\ 0 &{} -v^Tw \end{pmatrix} \end{aligned}$$

where \(w = (A^TA)^{-1}v\). Solving the corresponding block triangular systems we get

$$\begin{aligned} \begin{pmatrix} A^TA &{} 0 \\ v^T &{} 1 \end{pmatrix} \begin{pmatrix} x_{\mathrm {LS}} \\ -v^Tx_{\mathrm {LS}} \end{pmatrix} = \begin{pmatrix} A^Tb \\ 0 \end{pmatrix} , \end{aligned}$$

and

$$\begin{aligned} \begin{pmatrix} I &{} w \\ 0 &{} -v^Tw \end{pmatrix} \begin{pmatrix} \bar{x} \\ \lambda \end{pmatrix} = \begin{pmatrix} x_{\mathrm {LS}} \\ -v^Tx_{\mathrm {LS}} \end{pmatrix} , \end{aligned}$$

with \(\lambda = -v^Tx_{\mathrm {LS}}/v^Tw\) and

$$\begin{aligned} \bar{x} = x_{\mathrm {LS}} - \lambda w = x_{\mathrm {LS}} - \frac{v^Tx_{\mathrm {LS}}}{v^T(A^TA)^{-1}v} (A^TA)^{-1}v . \end{aligned}$$

The claim follows by rearranging terms in the last formula. \(\square \)

Let \(s_k\in \mathscr {S}\) and let \(f_k = Cs_k\) be the corresponding point on \(\mathscr {E}\). The minimum norm vector in \(\mathscr {T}_{f_k}\) can be expressed as \(z_k = f_k + Cw_k\) where

$$\begin{aligned} w_k = \arg \min _{w \, :\, s_k^Tw = 0}\Vert f_k + Cw \Vert . \end{aligned}$$

A straightforward application of Lemma 5.1 yields the formula

$$\begin{aligned} w_k&= -\left( I - \frac{1}{s_k^T(C^TC)^{-1}s_k}(C^TC)^{-1} s_ks_k^T \right) s_k \\&= \frac{1}{s_k^T(C^TC)^{-1}s_k}(C^TC)^{-1}s_k - s_k . \end{aligned}$$

In fact, the solution of the unconstrained problem \(\min _w \Vert f_k + Cw \Vert \) clearly is \(w_\mathrm {LS}= -s_k\), and \(s_k^Ts_k = 1\). Then, the minimum norm vector in \(\mathscr {T}_{f_k}\) admits the expression

$$\begin{aligned} z_k = C(s_k + w_k) = \alpha _k C (C^TC)^{-1} s_k , \end{aligned}$$

where \(\alpha _k > 0\) is a normalization constant whose exact formula is not relevant. Since \(f_{k+1}\) is the retraction of \(z_k\) onto \(\mathscr {E}\) and \(C^+C = I\), we conclude that \(f_{k+1} = Cs_{k+1}\) with

$$\begin{aligned} s_{k+1} = C^+ f_{k+1} = \beta _k (C^TC)^{-1} s_k , \quad \beta _k = 1/\Vert (C^TC)^{-1} s_k\Vert . \end{aligned}$$
(5.1)

Finally,

$$\begin{aligned} f_{k+1} = \beta _k C(C^TC)^{-1}C^+ f_k = \beta _k (C^+)^TC^+ f_k = \beta _k (CC^T)^+ f_k, \end{aligned}$$

as \((C^+)^TC^+ = (CC^T)^+\). This proves the first part of Theorem 3.1.

We are now in position to describe the asymptotic behavior of \(\{s_k\}\). As shown in Eq. (5.1), the sequence \(\{s_k\}\) corresponds to a power method for the matrix \((C^TC)^{-1}\) with normalization. The spectral decomposition of \((C^TC)^{-1}\) can be readily obtained from the SVD \(C = U\varSigma V^T\),

$$\begin{aligned} (C^TC)^{-1} = V\varLambda V^T , \quad \varLambda = \mathrm {diag}(\sigma _1^{-2},\ldots ,\sigma _q^{-2}) . \end{aligned}$$

By hypotheses, the eigenvalue \(\sigma _q^{-2}\) is simple and dominant, and the angle between the respective eigenvector \(v_q\) and the initial vector \(s_0\) is acute. For notational simplicity let \(\rho = \sigma _q^2/\sigma _{q-1}^2\). Noting that \((C^TC)^{-1}\) is symmetric and positive definite, classical results on convergence properties of the power method [5, §7.3.1] give us the asymptotic converge nce estimates

$$\begin{aligned} \Vert s_k - v_q \Vert= & {} O(\rho ^{k}) , \quad s_k^T (C^TC)^{-1}s_k = \sigma _q^{-2} + O(\rho ^{2k}) ,\\ \Vert (C^TC)^{-1}s_k\Vert= & {} \sigma _q^{-2} + O(\rho ^{2k}) . \end{aligned}$$

Using again (5.1) we get

$$\begin{aligned} \Vert f_k\Vert ^2 = s_{k}^TC^TCs_{k}&= \frac{s_{k-1}^T(C^TC)^{-1}C^TC(C^TC)^{-1}s_{k-1}}{\Vert (C^TC)^{-1}s_{k-1}\Vert ^2} \\&= \frac{s_{k-1}^T(C^TC)^{-1}s_{k-1}}{\Vert (C^TC)^{-1}s_{k-1}\Vert ^2} = \frac{\sigma _q^{-2} + O(\rho ^{2k})}{\sigma _q^{-4} + O(\rho ^{2k})} = \sigma _q^{2} + O(\rho ^{2k}) \end{aligned}$$

and the proof of Theorem 3.1 is now complete.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fasino, D., Fazzi, A. A Gauss–Newton iteration for Total Least Squares problems. Bit Numer Math 58, 281–299 (2018). https://doi.org/10.1007/s10543-017-0678-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10543-017-0678-5

Keywords

Mathematics Subject Classification

Navigation