A Gauss–Newton iteration for Total Least Squares problems

Fasino, Dario; Fazzi, Antonio

doi:10.1007/s10543-017-0678-5

A Gauss–Newton iteration for Total Least Squares problems

Published: 29 July 2017

Volume 58, pages 281–299, (2018)
Cite this article

BIT Numerical Mathematics Aims and scope Submit manuscript

786 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

The Total Least Squares solution of an overdetermined, approximate linear equation $Ax \approx b$ minimizes a nonlinear function which characterizes the backward error. We devise a variant of the Gauss–Newton iteration with guaranteed convergence to that solution, under classical well-posedness hypotheses. At each iteration, the proposed method requires the solution of an ordinary least squares problem where the matrix A is modified by a rank-one term. In exact arithmetics, the method is equivalent to an inverse power iteration to compute the smallest singular value of the complete matrix $(A\mid b)$. Geometric and computational properties of the method are analyzed in detail and illustrated by numerical examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient Gauss–Newton algorithm for solving regularized total least squares problems

Article 18 June 2021

A Gauss–Newton method for mixed least squares-total least squares problems

Article 01 March 2024

Gauss–Newton–Kurchatov Method for the Solution of Nonlinear Least-Squares Problems

Article 04 April 2020

Notes

We are grateful to an anonymous referee for having brought this citation to our attention.

References

Björck, Å., Heggernes, P., Matstoms, P.: Methods for large scale total least squares problems. SIAM J. Matrix Anal. Appl. 22, 413–429 (2000)
Article MathSciNet MATH Google Scholar
Daniel, J.W., Gragg, W.B., Kaufman, L., Stewart, G.W.: Reorthogonalization and stable algorithms for updating the Gram–Schmidt QR factorization. Math. Comput. 30, 772–795 (1976)
MathSciNet MATH Google Scholar
Golub, G.H.: Some modified matrix eigenvalue problems. SIAM Rev. 15, 318–344 (1973)
Article MathSciNet MATH Google Scholar
Golub, G.H., Van Loan, C.: An analysis of the total least squares problem. SIAM J. Numer. Anal. 17, 883–893 (1980)
Article MathSciNet MATH Google Scholar
Golub, G.H., Van Loan, C.: Matrix Computations, 4th edn. The Johns Hopkins University Press, Baltimore (1996)
MATH Google Scholar
Li, C.K., Liu, X.G., Wang, X.F.: Extension of the total least square problem using general unitarily invariant norms. Linear Multilinear Algebra 55, 71–79 (2007)
Article MathSciNet MATH Google Scholar
Markovsky, I.: Bibliography on total least squares and related methods. Stat. Interface 3, 329–334 (2010)
Article MathSciNet MATH Google Scholar
Markovsky, I., Van Huffel, S., Pintelon, R.: Block Toeplitz/Hankel total least squares. SIAM J. Matrix Anal. Appl. 26, 1083–1099 (2005)
Article MathSciNet MATH Google Scholar
Markovsky, I., Van Huffel, S.: Overview of total least-squares methods. Signal Process. 87, 2283–2302 (2007)
Article MATH Google Scholar
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York (1970)
MATH Google Scholar
Paige, C., Strakos̆, Z.: Scaled total least squares fundamentals. Numer. Math. 91, 117–146 (2002)
Article MathSciNet MATH Google Scholar
Peters, G., Wilkinson, J.H.: Inverse iteration, ill-conditioned equations and Newton’s method. SIAM Rev. 21, 339–360 (1979)
Article MathSciNet MATH Google Scholar
Van Huffel, S., Vandewalle, J.: Analysis and solution of the nongeneric total least squares problem. SIAM J. Matrix Anal. Appl. 9, 360–372 (1988)
Article MathSciNet MATH Google Scholar
Van Huffel, S., Vandewalle, J.: The Total Least Squares Problem: Computational Aspects and Analysis. SIAM, Philadelphia (1991)
Book MATH Google Scholar

Download references

Acknowledgements

The first author acknowledges the support received by Istituto Nazionale di Alta Matematica (GNCS-INdAM, Italy) for his research. The work of the second author has been partly supported by a student research grant by University of Udine, Italy, and performed during a visit at Vrije Universiteit Brussel, Belgium. Both authors thank Prof. I. Markovsky for his ospitality and advice, and two anonymous referees for their pertinent and constructive comments which led to improvements of our manuscript.

Author information

Authors and Affiliations

Dipartimento di Scienze Matematiche, Informatiche e Fisiche, Università di Udine, Via delle Scienze 206, 33100, Udine, Italy
Dario Fasino
Gran Sasso Science Institute, Viale F. Crispi 7, 67100, L’Aquila, Italy
Antonio Fazzi

Authors

Dario Fasino
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Fazzi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dario Fasino.

Additional information

Communicated by Lars Eldén.

Appendix A: Proof of Theorem 3.1

Before entering into the details of the proof we rephrase the iteration in Algorithm GN-TLS with optimal step length in a more general setting. Notations adopted hereafter mirror those in the previous sections, with some minor exceptions.

Let $C\in \mathbb {R}^{p\times q}$ be a full column rank matrix, let $\mathscr {S}= \{s \in \mathbb {R}^q : \Vert s\Vert = 1\}$ and $\mathscr {E}= \{y \in \mathbb {R}^p : y = Cs, s\in \mathscr {S}\}$. Therefore, $\mathscr {E}$ is a differentiable manifold of $\mathbb {R}^p$; more precisely, it is an ellipsoid whose (nontrivial) semiaxes are directed as the left singular vectors of C; and the corresponding singular values are the respective lengths.

For any nonzero vector $z\in \mathrm {Range}(C)$ there exists a unique vector $y \in \mathscr {E}$ such that $z = \alpha y$ for some scalar $\alpha > 0$; we say that y is the retraction of z onto $\mathscr {E}$.

For any $f\in \mathscr {E}$ let $\mathscr {T}_f$ be the tangent space of $\mathscr {E}$ in f. Hence, $\mathscr {T}_f$ is an affine $(q-1)$-dimensional subspace of $\mathbb {R}^p$ with $f\in \mathscr {T}_f$. If $f = Cs$ then it is not difficult to verify that $\mathscr {T}_f$ admits the following description:

$$\begin{aligned} \mathscr {T}_f = \{ f + Cw,\ s^Tw = 0\} . \end{aligned}$$

In fact, the map $s\mapsto Cs$ transforms tangent spaces of the unit sphere $\mathscr {S}$ into tangent spaces of $\mathscr {E}$.

Consider the following iteration:

Choose $f_0 \in \mathscr {E}$
For $k = 0,1,2,\ldots $
- Let $z_{k}$ be the minimum norm vector in $\mathscr {T}_{f_k}$
- Let $f_{k+1}$ be the retraction. of $z_k$ onto $\mathscr {E}$.

Owing to Lemma 3.1 it is not difficult to recognize that the sequence $\{f(x_k)\}$ produced by Algorithm GN-TLS with optimal step length fits into the framework of the foregoing iteration.

Hereafter, we analyze the behavior of the sequence $\{f_k\}\subset \mathscr {E}$ and of the auxiliary sequence $\{s_k\}\subset \mathscr {S}$ defined by the equation $s_k = C^+ f_k$. In particular, we will prove that the sequence $\{s_k\}$ is produced by a certain power method and converges to a point in $\mathscr {S}$ corresponding to the smallest semiaxis of $\mathscr {E}$. To this aim, we need the following preliminary result characterizing the solution of the least squares problem with a linear constraint.

Lemma 5.1

Let A be a full column rank matrix and let v be a nonzero vector. The solution $\bar{x}$ of the constrained least squares problem

$$\begin{aligned} \min _{x\, :\, v^Tx = 0}\Vert Ax - b\Vert \end{aligned}$$

is given by $\bar{x} = P x_{\mathrm {LS}}$ where $x_{\mathrm {LS}} = A^+b$ is the solution of the unconstrained least squares problem and

$$\begin{aligned} P = I - \frac{1}{v^T(A^TA)^{-1}v}(A^TA)^{-1}vv^T \end{aligned}$$

is the oblique projector onto $\langle v\rangle ^\perp $ along $(A^TA)^{-1}v$.

Proof

Simple computations using Lagrange multipliers, see e.g., [5, § 6.2.4], prove that $\bar{x}$ fulfills the linear equation

$$\begin{aligned} \begin{pmatrix} A^TA &{} v \\ v^T &{} 0 \end{pmatrix} \begin{pmatrix} \bar{x} \\ \lambda \end{pmatrix} = \begin{pmatrix} A^Tb \\ 0 \end{pmatrix} , \end{aligned}$$

for some scalar $\lambda $. To solve this equation, consider the block triangular factorization

$$\begin{aligned} \begin{pmatrix} A^TA &{} v \\ v^T &{} 0 \end{pmatrix} = \begin{pmatrix} A^TA &{} 0 \\ v^T &{} 1 \end{pmatrix} \begin{pmatrix} I &{} w \\ 0 &{} -v^Tw \end{pmatrix} \end{aligned}$$

where $w = (A^TA)^{-1}v$. Solving the corresponding block triangular systems we get

$$\begin{aligned} \begin{pmatrix} A^TA &{} 0 \\ v^T &{} 1 \end{pmatrix} \begin{pmatrix} x_{\mathrm {LS}} \\ -v^Tx_{\mathrm {LS}} \end{pmatrix} = \begin{pmatrix} A^Tb \\ 0 \end{pmatrix} , \end{aligned}$$

and

$$\begin{aligned} \begin{pmatrix} I &{} w \\ 0 &{} -v^Tw \end{pmatrix} \begin{pmatrix} \bar{x} \\ \lambda \end{pmatrix} = \begin{pmatrix} x_{\mathrm {LS}} \\ -v^Tx_{\mathrm {LS}} \end{pmatrix} , \end{aligned}$$

with $\lambda = -v^Tx_{\mathrm {LS}}/v^Tw$ and

$$\begin{aligned} \bar{x} = x_{\mathrm {LS}} - \lambda w = x_{\mathrm {LS}} - \frac{v^Tx_{\mathrm {LS}}}{v^T(A^TA)^{-1}v} (A^TA)^{-1}v . \end{aligned}$$

The claim follows by rearranging terms in the last formula. $\square $

Let $s_k\in \mathscr {S}$ and let $f_k = Cs_k$ be the corresponding point on $\mathscr {E}$. The minimum norm vector in $\mathscr {T}_{f_k}$ can be expressed as $z_k = f_k + Cw_k$ where

$$\begin{aligned} w_k = \arg \min _{w \, :\, s_k^Tw = 0}\Vert f_k + Cw \Vert . \end{aligned}$$

A straightforward application of Lemma 5.1 yields the formula

$$\begin{aligned} w_k&= -\left( I - \frac{1}{s_k^T(C^TC)^{-1}s_k}(C^TC)^{-1} s_ks_k^T \right) s_k \\&= \frac{1}{s_k^T(C^TC)^{-1}s_k}(C^TC)^{-1}s_k - s_k . \end{aligned}$$

In fact, the solution of the unconstrained problem $\min _w \Vert f_k + Cw \Vert $ clearly is $w_\mathrm {LS}= -s_k$, and $s_k^Ts_k = 1$. Then, the minimum norm vector in $\mathscr {T}_{f_k}$ admits the expression

$$\begin{aligned} z_k = C(s_k + w_k) = \alpha _k C (C^TC)^{-1} s_k , \end{aligned}$$

where $\alpha _k > 0$ is a normalization constant whose exact formula is not relevant. Since $f_{k+1}$ is the retraction of $z_k$ onto $\mathscr {E}$ and $C^+C = I$, we conclude that $f_{k+1} = Cs_{k+1}$ with

$$\begin{aligned} s_{k+1} = C^+ f_{k+1} = \beta _k (C^TC)^{-1} s_k , \quad \beta _k = 1/\Vert (C^TC)^{-1} s_k\Vert . \end{aligned}$$

(5.1)

Finally,

$$\begin{aligned} f_{k+1} = \beta _k C(C^TC)^{-1}C^+ f_k = \beta _k (C^+)^TC^+ f_k = \beta _k (CC^T)^+ f_k, \end{aligned}$$

as $(C^+)^TC^+ = (CC^T)^+$. This proves the first part of Theorem 3.1.

We are now in position to describe the asymptotic behavior of $\{s_k\}$. As shown in Eq. (5.1), the sequence $\{s_k\}$ corresponds to a power method for the matrix $(C^TC)^{-1}$ with normalization. The spectral decomposition of $(C^TC)^{-1}$ can be readily obtained from the SVD $C = U\varSigma V^T$,

$$\begin{aligned} (C^TC)^{-1} = V\varLambda V^T , \quad \varLambda = \mathrm {diag}(\sigma _1^{-2},\ldots ,\sigma _q^{-2}) . \end{aligned}$$

By hypotheses, the eigenvalue $\sigma _q^{-2}$ is simple and dominant, and the angle between the respective eigenvector $v_q$ and the initial vector $s_0$ is acute. For notational simplicity let $\rho = \sigma _q^2/\sigma _{q-1}^2$. Noting that $(C^TC)^{-1}$ is symmetric and positive definite, classical results on convergence properties of the power method [5, §7.3.1] give us the asymptotic converge nce estimates

$$\begin{aligned} \Vert s_k - v_q \Vert= & {} O(\rho ^{k}) , \quad s_k^T (C^TC)^{-1}s_k = \sigma _q^{-2} + O(\rho ^{2k}) ,\\ \Vert (C^TC)^{-1}s_k\Vert= & {} \sigma _q^{-2} + O(\rho ^{2k}) . \end{aligned}$$

Using again (5.1) we get

$$\begin{aligned} \Vert f_k\Vert ^2 = s_{k}^TC^TCs_{k}&= \frac{s_{k-1}^T(C^TC)^{-1}C^TC(C^TC)^{-1}s_{k-1}}{\Vert (C^TC)^{-1}s_{k-1}\Vert ^2} \\&= \frac{s_{k-1}^T(C^TC)^{-1}s_{k-1}}{\Vert (C^TC)^{-1}s_{k-1}\Vert ^2} = \frac{\sigma _q^{-2} + O(\rho ^{2k})}{\sigma _q^{-4} + O(\rho ^{2k})} = \sigma _q^{2} + O(\rho ^{2k}) \end{aligned}$$

and the proof of Theorem 3.1 is now complete.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fasino, D., Fazzi, A. A Gauss–Newton iteration for Total Least Squares problems. Bit Numer Math 58, 281–299 (2018). https://doi.org/10.1007/s10543-017-0678-5

Download citation

Received: 12 August 2016
Accepted: 24 July 2017
Published: 29 July 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10543-017-0678-5

Keywords

Mathematics Subject Classification

65F20

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Gauss–Newton iteration for Total Least Squares problems

Abstract

Access this article

Similar content being viewed by others

An efficient Gauss–Newton algorithm for solving regularized total least squares problems

A Gauss–Newton method for mixed least squares-total least squares problems

Gauss–Newton–Kurchatov Method for the Solution of Nonlinear Least-Squares Problems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix A: Proof of Theorem 3.1

Lemma 5.1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A Gauss–Newton iteration for Total Least Squares problems

Abstract

Access this article

Similar content being viewed by others

An efficient Gauss–Newton algorithm for solving regularized total least squares problems

A Gauss–Newton method for mixed least squares-total least squares problems

Gauss–Newton–Kurchatov Method for the Solution of Nonlinear Least-Squares Problems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix A: Proof of Theorem 3.1

Appendix A: Proof of Theorem 3.1

Lemma 5.1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation