A convergence study for reduced rank extrapolation on nonlinear systems

Sidi, Avram

doi:10.1007/s11075-019-00788-6

A convergence study for reduced rank extrapolation on nonlinear systems

Original Paper
Published: 20 August 2019

Volume 84, pages 957–982, (2020)
Cite this article

Numerical Algorithms Aims and scope Submit manuscript

Avram Sidi¹

285 Accesses
2 Citations
Explore all metrics

Abstract

Reduced Rank Extrapolation (RRE) is a polynomial type method used to accelerate the convergence of sequences of vectors {x_m}. It is applied successfully in different disciplines of science and engineering in the solution of large and sparse systems of linear and nonlinear equations of very large dimension. If s is the solution to the system of equations x = f(x), first, a vector sequence {x_m} is generated via the fixed-point iterative scheme x_m+ 1 = f(x_m), m = 0,1,…, and next, RRE is applied to this sequence to accelerate its convergence. RRE produces approximations s_{n, k} to s that are of the form $\boldsymbol {s}_{n,k}={\sum }_{i=0}^{k} \gamma _{i} \boldsymbol {x}_{n+i}$ for some scalars γ_i depending (nonlinearly) on x_n, x_n+ 1,…,x_n+k+ 1 and satisfying ${\sum }_{i=0}^{k} \gamma _{i}=1$. The convergence properties of RRE when applied in conjunction with linear f(x) have been analyzed in different publications. In this work, we discuss the convergence of the s_{n, k} obtained from RRE with nonlinear f(x) (i) when $n\to \infty $ with fixed k, and (ii) in two so-called cycling modes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Minimal polynomial and reduced rank extrapolation methods are related

Article 14 September 2016

A Quadratically Convergent Algorithm for Structured Low-Rank Approximation

Article 11 March 2015

Towards explicit superlinear convergence rate for SR1

Article 06 August 2022

Notes

It is clear that the integers n and k are chosen by the user and that M is determined by n, k, and the extrapolation method being used.
The approaches of [18] and [21] to RRE are almost identical, in the sense that $\boldsymbol {s}_{n,k}={\sum }^{k}_{i=0}\gamma _{i} \boldsymbol {x}_{n+i}$ in [21], while $\boldsymbol {s}_{n,k}={\sum }^{k}_{i=0}\gamma _{i} \boldsymbol {x}_{n+i+1}$ in [18], the γ_i being the same for both. The approaches of [11] and [21] are completely different, however; their equivalence was proved in the review paper of Smith, Ford, and Sidi [39].
Note that M = n + k + 1 for MPE, RRE, MMPE, and SVD-MPE, while M = n + 2k for SEA, VEA, and TEA.
Given a nonzero vector $\boldsymbol {u}\in \mathbb {C}^{N},$ the monic polynomial P(λ) is said to be a minimal polynomial of the matrix $\boldsymbol {T}\in \mathbb {C}^{N\times N}$with respect tou if P(T)u = 0 and if P(λ) has smallest degree.
The polynomial P(λ) exists and is unique. Moreover, if P₁(T)u = 0 for some polynomial P₁(λ) with $\deg P_{1} > \deg P$, then P(λ) divides P₁(λ). In particular, P(λ) divides the minimal polynomial of T, which in turn divides the characteristic polynomial of T. [Thus, the degree of P(λ) is at most N and its zeros are some or all of the eigenvalues of T.]
It is clear that to apply any of the extrapolation methods in this mode, one needs to know the matrix F(s), for which one also needs to know the solution s.
Note that k is not necessarily fixed in this mode of cycling; it may vary from one cycle to the next. It always satisfies k ≤ N, however.
Quadratic convergence is relevant only when f(x) is nonlinear. When f(x) is linear, that is, f(x) = Tx + d, where T is a fixed N × N matrix and d is a fixed vector, hence F(s) = T, the solution s is obtained already at the end of step MC2 of the first cycle, that is, we have s⁽¹⁾ = s. Therefore, there is nothing to analyze when f(x) is linear.
See also Sidi and Shapira [37] concerning a modified version of restarted GMRES with prior Richardson iterations, that is very closely related to RRE.
Recall that, for any matrix K with rank(K) = r, we have ∥K∥₂ ≤∥K∥_F ≤ r∥K∥₂. See Golub and Van Loan [13].
Clearly, g(z) = z^k is in $\tilde {\mathcal {P}_{k}}$ and 𝜃_k < 1 since L < 1. Next, in general, the polynomial g(z) that gives the optimum in (5.4) is different from z^k. Thus, generally speaking, 𝜃_k < L^k
For the linear system $\boldsymbol {x}=\tilde {\boldsymbol {f}}(\boldsymbol {x})$, we have $\boldsymbol {\epsilon }_{n+1}=\tilde {\boldsymbol {F}}\boldsymbol {\epsilon }_{n}$, n = 0, 1,…, as power iterations. Thus, in some cases, $\boldsymbol {e}_{\infty }=\lim _{n\to \infty }\boldsymbol {e}_{n}$ exists and is an eigenvector of $\tilde {\boldsymbol {F}}$, hence causes $\text {rank}(\boldsymbol {S}(\boldsymbol {e}_{\infty }))=1$ at most. Clearly, this is a problem when rank(S(e_n)) = k > 1, for n = 0, 1,….

References

Anderson, D.G.: Iterative procedures for nonlinear integral equations. J. ACM 12, 547–560 (1965)
Article MATH MathSciNet Google Scholar
Ben-Israel, A.: On error bounds for generalized inverses. SIAM J. Numer. Anal. 3, 585–592 (1966)
Article MATH MathSciNet Google Scholar
Ben-Israel, A., Greville, T.N.E.: Generalized Inverses: Theory and Applications. CMS Books in Mathematics, 2nd edn. Springer, New York (2003)
MATH Google Scholar
Brezinski, C.: Application de l’𝜖-algorithme à la résolution des systèmes non linéaires. C. R. Acad. Sci. Paris 271 A, 1174–1177 (1970)
MATH Google Scholar
Brezinski, C.: Sur un algorithme de résolution des systèmes non linéaires. C. R. Acad. Sci. Paris 272 A, 145–148 (1971)
MATH Google Scholar
Brezinski, C.: Généralisations de la transformation de Shanks, de la table de Padé, et de l’𝜖-algorithme. Calcolo 12, 317–360 (1975)
Article MATH MathSciNet Google Scholar
Brezinski, C.: Accélération de la Convergence en Analyse Numérique. Number 584 in Lecture Notes in Mathematics, Springer, Berlin (1977)
Brezinski, C., Redivo Zaglia, M.: Extrapolation Methods: Theory and Practice. North-Holland, Amsterdam (1991)
MATH Google Scholar
Cabay, S., Jackson, L.W.: A polynomial extrapolation method for finding limits and antilimits of vector sequences. SIAM J. Numer. Anal. 13, 734–752 (1976)
Article MATH MathSciNet Google Scholar
Campbell, S.L., Meyer, C.D. Jr.: Generalized Inverses of Linear Transformations. Dover, New York (1991)
MATH Google Scholar
Eddy, R.P.: Extrapolating to the limit of a vector sequence. In: Wang, P.C.C. (ed.) Information Linkage Between Applied Mathematics and Industry, pp 387–396. Academic Press, New York (1979)
Gekeler, E.: On the solution of systems of equations by the epsilon algorithm of Wynn. Math. Comp. 26, 427–436 (1972)
Article MATH MathSciNet Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. The Johns Hopkins University Press, Baltimore (2013)
MATH Google Scholar
Graves-Morris, P.R., Saff, E.B.: Row convergence theorems for generalised inverse vector-valued Padé, approximants. J. Comp. Appl. Math. 23, 63–85 (1988)
Article MATH Google Scholar
Jbilou, K., Sadok, H.: Some results about vector extrapolation methods and related fixed-point iterations. J. Comp. Appl. Math. 36, 385–398 (1991)
Article MATH MathSciNet Google Scholar
Jbilou, K., Sadok, H.: Analysis of some vector extrapolation methods for linear systems. Numer. Math. 70, 73–89 (1995)
Article MATH MathSciNet Google Scholar
Jbilou, K., Sadok, H.: LU-implementation of the modified minimal polynomial extrapolation method. IMA J. Numer. Anal. 19, 549–561 (1999)
Article MATH MathSciNet Google Scholar
Kaniel, S., Stein, J.: Least-square acceleration of iterative methods for linear equations. J Optimization Theory Appl. 14, 431–437 (1974)
Article MATH MathSciNet Google Scholar
Laurens, J., Le Ferrand, H.: Fonctions d’itérations vectorielles, itérations rationelles. C. R. Acad. Sci. Paris 321 I, 631–636 (1995)
MATH Google Scholar
Le Ferrand, H.: Convergence of the topological 𝜖-algorithm for solving systems of nonlinear equations. Numer. Algorithms 3, 273–283 (1992)
Article MATH MathSciNet Google Scholar
Mes̆ina, M.: Convergence acceleration for the iterative solution of the equations x = AX + f. Comput. Methods Appl. Mech. Engrg. 10, 165–173 (1977)
Article MathSciNet Google Scholar
Ortega, J., Rheinboldt, W.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York (1970)
MATH Google Scholar
Pugachev, B.P.: Acceleration of the convergence of iterative processes and a method of solving systems of nonlinear equations. U.S.S.R. Comput. Math. Math. Phys. 17, 199–207 (1978)
Article MATH Google Scholar
Shanks, D.: Nonlinear transformations of divergent and slowly convergent sequences. J. Math. and Phys. 34, 1–42 (1955)
Article MATH MathSciNet Google Scholar
Sidi, A.: Convergence and stability properties of minimal polynomial and reduced rank extrapolation algorithms. SIAM J. Numer. Anal. 23, 197–209 (1986). Originally appeared as NASA TM-83443 (1983)
Article MATH MathSciNet Google Scholar
Sidi, A.: Extrapolation vs. projection methods for linear systems of equations. J. Comp. Appl. Math. 22, 71–88 (1988)
Article MATH MathSciNet Google Scholar
Sidi, A.: Efficient implementation of minimal polynomial and reduced rank extrapolation methods. J. Comp. Appl. Math. 36, 305–337 (1991). Originally appeared as NASA TM-103240 ICOMP-90-20 (1990)
Article MATH MathSciNet Google Scholar
Sidi, A.: Convergence of intermediate rows of minimal polynomial and reduced rank extrapolation tables. Numer. Algorithms 6, 229–244 (1994)
Article MATH MathSciNet Google Scholar
Sidi, A.: Extension and completion of Wynn’s theory on convergence of columns of the epsilon table. J Approx. Theory 86, 21–40 (1996)
Article MATH MathSciNet Google Scholar
Sidi, A.: Review of two vector extrapolation methods of polynomial type with applications to large-scale problems. J. Comput. Sci. 3, 92–101 (2012)
Article Google Scholar
Sidi, A.: SVD-MPE: An SVD-based vector extrapolation method of polynomial type. Appl. Math. 7, 1260–1278 (2016). Special issue on Applied Iterative Methods
Article Google Scholar
Sidi, A.: Minimal polynomial and reduced rank extrapolation methods are related. Adv. Comput. Math. 43, 151–170 (2017)
Article MATH MathSciNet Google Scholar
Sidi, A.: Vector Extrapolation Methods with Applications. Number 17 in SIAM Series on Computational Science and Engineering. SIAM, Philadelphia (2017)
Book Google Scholar
Sidi, A., Bridger, J.: Convergence and stability analyses for some vector extrapolation methods in the presence of defective iteration matrices. J. Comp. Appl. Math. 22, 35–61 (1988)
Article MATH MathSciNet Google Scholar
Sidi, A., Ford, W.F., Smith, D.A.: Acceleration of convergence of vector sequences. SIAM J. Numer. Anal. 23, 178–196 (1986). Originally appeared as NASA TP-2193 (1983)
Article MATH MathSciNet Google Scholar
Sidi, A., Shapira, Y.: Upper bounds for convergence rates of vector extrapolation methods on linear systems with initial iterations. Technical Report 701, Computer Science Dept., Technion–Israel Institute of Technology, 1991. Appeared also as NASA TM-105608 ICOMP-92-09 (1992)
Sidi, A., Shapira, Y.: Upper bounds for convergence rates of acceleration methods with initial iterations. Numer. Algorithms 18, 113–132 (1998)
Article MATH MathSciNet Google Scholar
Skelboe, S.: Computation of the periodic steady-state response of nonlinear networks by extrapolation methods. IEEE Trans. Circuits Syst. 27, 161–175 (1980)
Article MATH MathSciNet Google Scholar
Smith, D.A., Ford, W.F., Sidi, A.: Extrapolation methods for vector sequences. SIAM Rev. 29, 199–233 (1987). Erratum: SIAM Rev. 30, 623–634 (1988)
Article MATH MathSciNet Google Scholar
Stewart, G.W.: On the continuity of the generalized inverse. 17, 33–45 (1969)
Toth, A., Kelly, C.T.: Convergence analysis for Anderson acceleration. SIAM J. Numer Anal. 53, 805–819 (2015)
Article MATH MathSciNet Google Scholar
Varga, R.S.: Matrix Iterative Analysis. Number 27 in Springer Series in Computational Mathematics, 2nd edn. Springer, New York (2000)
Google Scholar
Walker, H.F., Ni, P.: Anderson acceleration for fixed-point iterations. SIAM J. Numer. Anal. 49, 1715–1735 (2011)
Article MATH MathSciNet Google Scholar
Wedin, P.Å.: Perturbation theory for pseudo-inverses. BIT 13, 217–232 (1973)
Article MATH MathSciNet Google Scholar
Wynn, P.: On a device for computing the e_m(S_n) transformation. Math. Tables Aids to Comput. 10, 91–96 (1956)
Article MATH MathSciNet Google Scholar
Wynn, P.: Acceleration techniques for iterated vector and matrix problems. Math. Comp. 16, 301–322 (1962)
Article MATH MathSciNet Google Scholar
Wynn, P.: On the convergence and stability of the epsilon algorithm. SIAM J. Numer. Anal. 3, 91–122 (1966)
Article MATH MathSciNet Google Scholar

Download references

Acknowledgements

The author would like to thank one of the anonymous referees for his/her remarks that helped to improve the presentation and results of this work substantially.

Author information

Authors and Affiliations

Computer Science Department, Technion - Israel Institute of Technology, Haifa, 32000, Israel
Avram Sidi

Authors

Avram Sidi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Avram Sidi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Some properties of Moore–Penrose inverses

First , we recall the well-known facts

$$ \boldsymbol{A}\in\mathbb{C}^{m\times n},\quad \text{rank}(\boldsymbol{A})=n\quad\Rightarrow\quad \boldsymbol{A}^{+}=(\boldsymbol{A}^{*}\boldsymbol{A})^{-1}\boldsymbol{A}^{*} \quad\Rightarrow\quad\boldsymbol{A}^{+}\boldsymbol{A}=\boldsymbol{I}_{n\times n}, $$

(A.1)

$$ \boldsymbol{A}\in\mathbb{C}^{m\times n},\quad \text{rank}(\boldsymbol{A})=m\quad\Rightarrow\quad \boldsymbol{A}^{+}=\boldsymbol{A}^{*}(\boldsymbol{A}\boldsymbol{A}^{*})^{-1} \quad\Rightarrow\quad\boldsymbol{A}\boldsymbol{A}^{+}=\boldsymbol{I}_{m\times m}, $$

(A.2)

and

$$ \boldsymbol{A}\in\mathbb{C}^{m\times n},\quad \boldsymbol{B}\in\mathbb{C}^{n\times p},\quad\text{rank}(\boldsymbol{A})=\text{rank}(\boldsymbol{B})=n\quad \Rightarrow\quad (\boldsymbol{A}\boldsymbol{B})^{+}=\boldsymbol{B}^{+}\boldsymbol{A}^{+}. $$

(A.3)

The following theorems on Moore–Penrose inverses of perturbed matrices can be found in Ben-Israel and Greville [2], Wedin [44], and Stewart [40]. Here we give independent proofs of two of them.

Remark 6

For convenience of notation, throughout this appendix only, we will use ∥⋅∥ to denote the l₂ norm. (Thus, ∥⋅∥ here does not stand for the G norm we have used in Sections 1–6.)

Theorem A.1

Let$\boldsymbol {A}\in \mathbb {C}^{m\times n}$,rank(A) = n, andlet$\boldsymbol {G}\in \mathbb {C}^{m\times m}$be nonsingularand defineB = GA.Then rank(B) = ntoo,and

$$ \|\boldsymbol{B}^{+}\|\leq \|\boldsymbol{G}^{-1}\|\|\boldsymbol{A}^{+}\|. $$

Proof

That rank(B) = n is clear since G is nonsingular. Starting now with A = G^− 1B, we first have

$$ \boldsymbol{A}\boldsymbol{x}=\boldsymbol{G}^{-1}(\boldsymbol{B}\boldsymbol{x})\quad \Rightarrow \quad \|\boldsymbol{A}\boldsymbol{x}\| \leq\|\boldsymbol{G}^{-1}\| \|\boldsymbol{B}\boldsymbol{x}\|\quad \forall \boldsymbol{x}\in\mathbb{C}^{n},\quad \|\boldsymbol{x}\|=1. $$

Let $\boldsymbol {x}^{\prime }$ and $\boldsymbol {x}^{\prime \prime }$, with $\|\boldsymbol {x}^{\prime }\|=1$ and $\|\boldsymbol {x}^{\prime \prime }\|=1$, be such that

$$ \sigma_{\min}(\boldsymbol{A})=\min\limits_{\|\boldsymbol{x}\|=1}\|\boldsymbol{A}\boldsymbol{x}\|=\|\boldsymbol{A}\boldsymbol{x}^{\prime}\|\quad\text{and}\quad \sigma_{\min}(\boldsymbol{B})=\min\limits_{\|\boldsymbol{x}\|=1}\|\boldsymbol{B}\boldsymbol{x}\|=\|\boldsymbol{B}\boldsymbol{x}^{\prime\prime}\|, $$

where $\sigma _{\min \nolimits }(\boldsymbol {K})$ denotes the smallest singular value of a matrix K. Then

$$ \sigma_{\min}(\boldsymbol{A})=\|\boldsymbol{A}\boldsymbol{x}^{\prime}\|\leq \|\boldsymbol{A}\boldsymbol{x}^{\prime\prime}\|\leq \|\boldsymbol{G}^{-1}\| \|\boldsymbol{B}\boldsymbol{x}^{\prime\prime}\| =\|\boldsymbol{G}^{-1}\| \sigma_{\min}(\boldsymbol{B}). $$

The result follows by recalling that $\|\boldsymbol {K}^{+}\|=1/\sigma _{\min \nolimits }(\boldsymbol {K}) $ when K has full column rank, which implies that $\sigma _{\min \nolimits }(\boldsymbol {K})>0$. □

Theorem A.2

Let$\boldsymbol {A}\in \mathbb {C}^{m\times n}$and$(\boldsymbol {A}+\boldsymbol {E})\in \mathbb {C}^{m\times n}$,m ≥ n, such that rank(A) = nand∥EA⁺∥ < 1.Then

$$ \|(\boldsymbol{A}+\boldsymbol{E})^{+}\|\leq\frac{\|\boldsymbol{A}^{+}\|}{1-\|\boldsymbol{E}\boldsymbol{A}^{+}\|}. $$

If Δ = ∥E∥∥A⁺∥ < 1 in addition, then this result can be expressed as

$$ \|(\boldsymbol{A}+\boldsymbol{E})^{+}\|\leq\frac{1}{1-{\Delta}}\|\boldsymbol{A}^{+}\|. $$

Proof

First, because A is of full column rank, we have that $\boldsymbol {A}^{+}\boldsymbol {A}=\boldsymbol {I}_{n\times n}$. Consequently,

$$ \boldsymbol{A}+\boldsymbol{E}=(\boldsymbol{I}+\boldsymbol{E}\boldsymbol{A}^{+})\boldsymbol{A}. $$

Since ∥EA⁺∥ < 1 by assumption, the matrix G = I + EA⁺ is nonsingular. The first result now follows from Theorem A.1 and by the fact that ∥G^− 1∥≤ 1/(1 −∥EA⁺∥). The second result follows by invoking ∥EA⁺∥≤∥E∥∥A⁺∥ = Δ and the additional assumption that Δ < 1. □

Theorem A.3

LetAandEbe as in Theorem A.2, Δ = ∥E∥∥A⁺∥ < 1,and letH = (A + E)⁺ −A⁺.Then

$$ \|\boldsymbol{H}\|\leq\sqrt{2}\frac{\Delta}{1-{\Delta}}\|\boldsymbol{A}^{+}\|.$$

Proof

By Wedin [44, Theorem 4.1], there holds

$$ \|\boldsymbol{H}\|\leq\sqrt{2} \|(\boldsymbol{A}+\boldsymbol{E})^{+}\| \|\boldsymbol{A}^{+}\| \|\boldsymbol{E}\|. $$

Invoking now Theorem A.2, the result follows. □

The following theorem is due to Stewart [40].

Theorem A.4

LetA₁, A₂,…, andAbesuch that$\lim _{n\to \infty }\boldsymbol {A}_{n}=\boldsymbol {A}$.Then$\lim _{n\to \infty }\boldsymbol {A}_{n}^{+}=\boldsymbol {A}^{+}$ifand only if rank(A_n) = rank(A),n ≥ n₀,for some integern₀.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sidi, A. A convergence study for reduced rank extrapolation on nonlinear systems. Numer Algor 84, 957–982 (2020). https://doi.org/10.1007/s11075-019-00788-6

Download citation

Received: 31 January 2019
Accepted: 24 July 2019
Published: 20 August 2019
Issue Date: July 2020
DOI: https://doi.org/10.1007/s11075-019-00788-6

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A convergence study for reduced rank extrapolation on nonlinear systems

Abstract

Access this article

Similar content being viewed by others

Minimal polynomial and reduced rank extrapolation methods are related

A Quadratically Convergent Algorithm for Structured Low-Rank Approximation

Towards explicit superlinear convergence rate for SR1

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix: Some properties of Moore–Penrose inverses

Remark 6

Theorem A.1

Proof

Theorem A.2

Proof

Theorem A.3

Proof

Theorem A.4

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

A convergence study for reduced rank extrapolation on nonlinear systems

Abstract

Access this article

Similar content being viewed by others

Minimal polynomial and reduced rank extrapolation methods are related

A Quadratically Convergent Algorithm for Structured Low-Rank Approximation

Towards explicit superlinear convergence rate for SR1

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix: Some properties of Moore–Penrose inverses

Appendix: Some properties of Moore–Penrose inverses

Remark 6

Theorem A.1

Proof

Theorem A.2

Proof

Theorem A.3

Proof

Theorem A.4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation