Comments on “Anderson Acceleration, Mixing and Extrapolation”

Anderson, Donald G. M.

doi:10.1007/s11075-018-0549-4

Comments on “Anderson Acceleration, Mixing and Extrapolation”

Original Paper
Published: 05 June 2018

Volume 80, pages 135–234, (2019)
Cite this article

Numerical Algorithms Aims and scope Submit manuscript

Donald G. M. Anderson¹

526 Accesses
19 Citations
Explore all metrics

Abstract

The Extrapolation Algorithm is a technique devised in 1962 for accelerating the rate of convergence of slowly converging Picard iterations for fixed point problems. Versions to this technique are now called Anderson Acceleration in the applied mathematics community and Anderson Mixing in the physics and chemistry communities, and these are related to several other methods extant in the literature. We seek here to broaden and deepen the conceptual foundations for these methods, and to clarify their relationship to certain iterative methods for root-finding problems. For this purpose, the Extrapolation Algorithm will be reviewed in some detail, and selected papers from the existing literature will be discussed, both from conceptual and implementation perspectives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Publishing Research: Book Chapters and Books

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

Article 15 February 2022

Optimization of Random Feature Method in the High-Precision Regime

Article 30 March 2024

References

Anderson, D.G.: Iterative procedures for nonlinear integral equations. J. Assoc. Comput. Mach. 12, 547–560 (1965)
Article MathSciNet MATH Google Scholar
Bierlaire, M., Crittin, F.: Solving noisy, large-scale fixed point problems and systems of nonlinear equations. Transp. Sci. 40, 44–63 (2006)
Article Google Scholar
Björck, Å.: Numerical Methods for Least Squares Problems. SIAM, Philadelphia (1996)
Book MATH Google Scholar
Broyden, C.J.: A class of methods for solving nonlinear simultaneous equations. Math. Comput. 19, 577–593 (1965)
Article MathSciNet MATH Google Scholar
Calef, M.T., Fichtl, E.D., Warsa, B., Carlson, N.N.: Nonlinear Krylov acceleration applied to a discrete ordinates formulation of the k-eigenvalue problem. J. Comput. Phys. 238, 188–209 (2013)
Article MathSciNet MATH Google Scholar
Carlson, N.N., Miller, K.: Design and application of a gradient-weighted moving finite element code I: one dimension. SIAM J. Sci. Comput. 19, 728–765 (1998)
Article MathSciNet MATH Google Scholar
Eyert, V.: A comparative study on methods for convergence acceleration of iterative vector sequences. J. Comput. Phys. 124, 271–285 (1996)
Article MathSciNet MATH Google Scholar
Fang, H.-r., Saad, Y.: Two classes of multisecant methods for nonlinear acceleration. Numer. Linear Algebra Appl. 16, 197–221 (2009)
Article MathSciNet MATH Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. The Johns Hopkins University Press, Baltimore (2013)
Google Scholar
Horn, R.A., Johnson, C.A.: Matrix Analysis. Cambridge University Press, Cambridge (1985)
Book MATH Google Scholar
Marks, L.D., Luke, D.R.: Robust mixing for ab initio quantum mechanical calculations. Phys. Rev. B 78, 075114 (2008)
Article Google Scholar
Ni, P.: Anderson Acceleration of Fixed-Point Iteration with Applications to Electronic Structure Computations, Ph.D. thesis, Worcester Polytechnic Institute, Worcester, MA (2009)
Ortega, J.M., Rheinboldt, W.C.: Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, San Diego (1970)
MATH Google Scholar
Ostrowski, A.M.: Solution of Equations and Systems of Equations, 3nd edn. Academic Press, San Diego (1966)
Google Scholar
Toth, A., Kelley, C.T.: Convergence analysis for Anderson Acceleration. SIAM J. Numer. Anal. 53, 805–819 (2015)
Article MathSciNet MATH Google Scholar
Walker, H.F., Ni, P.: Anderson Acceleration for fixed-point iterations. SIAM J. Numer. Anal. 49, 1715–1735 (2011)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Gordon McKay Professor of Applied Mathematics, Emeritus, Harvard John A. Paulson School of Engineering and Applied Sciences, Cambridge, MA, USA
Donald G. M. Anderson

Authors

Donald G. M. Anderson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Donald G. M. Anderson.

Appendix

We shall discuss here supplementary matters not strictly required within the main text, but related and potentially relevant thereto. We adopt the notation and terminology previously introduced. We rely upon essential results argued above. We provide arguments for nonessential results previously stated without proof and choices tacitly made without explicit justification.

Affine subspaces and affine independence/dependence are essentially geometric concepts, though it is convenient to describe and manipulate them in algebraic terms. The labeling of x^(ℓ−k), y^(ℓ−k) = g(x^(ℓ−k)) and r^(ℓ−k) = y^(ℓ−k) − x^(ℓ−k), 0 ≤ k ≤ m, derives from the iterative process context of interest, the ordering reflecting the “age” of the iterants. The affine span of the affine independent defining set $\{r^{(\ell - k)}\}_{k = 0}^{m}$ is an affine subspace of maximal dimension, m. We have chosen above to describe this algebraically using the shift vector r^(ℓ) and the linear subspace with the deviation basis $\left \{r^{(\ell - k)} - r^{(\ell )} \right \}_{k = 1}^{m}$. From a geometric perspective, the labeling and ordering of the defining set is irrelevant; any member and the associated deviation basis could equally well have been used. Moreover, any nonzero member of the affine span and the associated deviation basis could be used. It is the questions of how the latter might be accomplished algebraically, and to what advantage, that originally motivated inclusion of this appendix. For the moment, we shall continue with our previous choice of shift vector r^(ℓ) and deviation basis $\left \{ r^{(\ell - k)} - r^{(\ell )} \right \}_{k = 1}^{m}$, examining this choice and alternatives later.

The affine combination ${\sum }_{k = 0}^{m} {\theta }_{k}^{(\ell )} r^{(\ell - k)}$, with ${\sum }_{k = 0}^{m} {\theta }_{k}^{(\ell )} = 1$, can be written in the form $r^{(\ell )} + {\sum }_{k = 1}^{m} {\theta }_{k}^{(\ell )} (r^{(\ell - k)} - r^{(\ell )})$, with ${\theta }_{0}^{(\ell )} = 1 - {\sum }_{k = 1}^{m} {\theta }_{k}^{(\ell )}$. If we use shift vector r^(ℓ) in representing the affine span of $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$, we identify $ \left \{ r^{(\ell - k)} - r^{(\ell )} \right \}_{k = 1}^{m}$ as a spanning set for the associated linear subspace. The dimension is maximal, m, if $\left \{ r^{(\ell - k)} - r^{(\ell )} \right \}_{k = 1}^{m}$ is linearly independent, thence a basis for the linear subspace. Thus, $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ is affinely independent if $\left \{ r^{(\ell - k)} - r^{(\ell )} \right \}_{k = 1}^{m}$ is linearly independent, and $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ is affinely dependent if $\left \{ r^{(\ell - k)} - r^{(\ell )} \right \}_{k = 1}^{m}$ is linearly dependent.

Recall the pair of assertions above that linear independence of $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ is a sufficient, but not a necessary, condition for affine independence of $ \left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$, and that linear dependence of $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ is a necessary, but not a sufficient, condition for 0 to be a member of the affine span of $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$. We now provide the requisite proofs, in reverse order.

If 0 is an affine combination of $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$, there is a nontrivial linear combination of $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ which is 0, so $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ is linearly dependent. Therefore, linear dependence of $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ is a necessary condition for 0 to be a member of the affine span of $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$. If $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ is linearly dependent, but all nontrivial linear combinations of $ \left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ which are 0 have the property that the sum of their linear combination coefficients is zero, then there is no affine combination of $ \left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ that is 0. Therefore, linear dependence of $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ is not a sufficient condition for 0 to be a member of the affine span of $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$.

Assume that $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ is affinely dependent, so $\left \{ r^{(\ell - k)} - r^{(\ell )} \right \}_{k = 1}^{m}$ is linearly dependent and there is a nontrival linear combination of $\left \{ r^{(\ell - k)} - r^{(\ell )} \right \}_{k = 1}^{m}$ which is 0. If the sum of the linear combination coefficients is nonzero, then we can express r^(ℓ) as a linear combination of $\left \{ r^{(\ell - k)} \right \}_{k = 1}^{m}$, so $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ is linearly dependent. If the sum of the linear combination coefficients is zero, then the same nontrivial linear combination of $\left \{ r^{(\ell - k)} \right \}_{k = 1}^{m}$ is zero, so $\left \{ r^{(\ell - k)} \right \}_{k = 1}^{m}$ is linearly dependent, thence $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ is linearly dependent. We conclude that affine dependence of $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ implies linear dependence of $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$. Therefore, by contraposition, linear independence of $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ implies affine independence of $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$, establishing sufficiency. To establish lack of necessity, we need only identify at least one instance $\left \{ \tilde {r}^{(\ell - k)} \right \}_{k = 0}^{m}$ in which $ \left \{ \tilde {r}^{(\ell - k)} \right \}_{k = 0}^{m}$ is both linearly dependent and affinely independent. Before doing so, observe that we implicitly established sufficiency earlier during the discussion of constrained minimization in connection with the Ni thesis.

Assume that $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ is affinely independent. We have seen earlier that this is a necessary and sufficient condition for there to be a unique affine combination $(\hat {v}^{(\ell )} - \hat {u}^{(\ell )})$ of $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ closest to 0. Define $\tilde {r}^{(\ell -k)} = r^{(\ell - k)} - (\hat {v}^{(\ell )} - \hat {u}^{(\ell )})$, for 0 ≤ k ≤ m. Observing that $ \tilde {r}^{(\ell -k)} - \tilde {r}^{(\ell )} = r^{(\ell - k)} - r^{(\ell )}$, for 1 ≤ k ≤ m, we conclude that $\left \{ \tilde {r}^{(\ell - k)} \right \}_{k = 0}^{m}$ is also affinely independent. Moreover, we see that any affine combination of $\left \{ \tilde {r}^{(\ell - k)} \right \}_{k = 0}^{m}$ is just the corresponding affine combination of $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ minus $(\hat {v}^{(\ell )} - \hat {u}^{(\ell )})$. Consequently, the same affine combination coefficients will yield the affine combinations of $\left \{ \tilde {r}^{(\ell - k)} \right \}_{k = 0}^{m}$ and $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ closest to 0, that affine combination of $\left \{ \tilde {r}^{(\ell - k)}\right \}_{k = 0}^{m}$ being 0, so 0 is in the affine span of $\left \{ \tilde { r}^{(\ell - k)} \right \}_{k = 0}^{m}$. We infer that $\left \{ \tilde {r}^{(\ell - k)} \right \}_{k = 0}^{m}$ is both linearly dependent and affinely independent. Therefore, linear independence of $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ is a sufficient, but not a necessary, condition for $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ to be affine independent. Observe that $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ can be nearly linearly dependent while $\left \{ r^{(\ell - k)} - r^{(\ell )} \right \}_{k = 1}^{m}$ is not nearly linearly dependent so $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ is not nearly affinely dependent. Note the implications for the constrained minimization approach.

Before returning to the choice of the shift vector and associated deviation basis for the affine span of an affinely independent $ \left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$, we shall sort out some issues regarding affine fixed point problems: g(x) = Gx + h. There is a unique fixed point $\hat {x}$ if (I − G) is nonsingular, with $ \hat {x} = (I-G)^{-1}h$. Sufficient conditions for the Picard iteration to converge to a unique fixed point for any h and any initial iterant x⁽⁰⁾ are ∥ G ∥< 1 or ρ(G) < 1; thence, these are also sufficient conditions for nonsingularity of (I − G). Recall that g is invertible if G is nonsingular.

For 1 ≤ k ≤ m, we have

$$g(x^{(\ell - k)}) - g(x^{(\ell }) = G(x^{(\ell - k)} - x^{(\ell )}) = (y^{(\ell - k)} - y^{(\ell )}). $$

If $\left \{ x^{(\ell - k)} - x^{(\ell )} \right \}_{k = 1}^{m}$ is linearly dependent, there are nontrivial η_k, 1 ≤ k ≤ m, such that $ {\sum }^{m}_{k = 1} \eta _{k} (x^{(\ell - k)} - x^{(\ell )}) = 0$. We see that

$$G \left[ \sum\limits_{k = 1}^{m} \eta_{k} (x^{(\ell - k)} - x^{(\ell )}) \right] = \sum\limits_{k = 1}^{m} \eta_{k} (y^{(\ell - k)} - y^{(\ell )}) = 0, $$

so linear dependence of $\left \{ x^{(\ell - k)} - x^{(\ell )} \right \}_{k = 1}^{m}$ implies linear dependence of $\left \{ y^{(\ell - k)} - y^{(\ell )} \right \}_{k = 1}^{m}$. Since the same η_k, 1 ≤ k ≤ m, are involved for both sets, we also obtain ${\sum }_{k = 1}^{m} \eta _{k} (r^{(\ell - k)} - r^{(\ell )}) = 0$, so linear dependence of $\left \{x^{(\ell - k)} - x^{(\ell )} \right \}_{k = 1}^{m}$ implies linear dependence of $\left \{ r^{(\ell - k)} - r^{(\ell )} \right \}_{k = 1}^{m}$. We can rephrase these two inferences as affine dependence of $\left \{ x^{(\ell - k)} \right \}_{k = 0}^{m}$ implies affine dependence of $\left \{ y^{(\ell - k)} \right \}_{k = 0}^{m}$, and affine dependence of $\left \{ x^{(\ell - k)} \right \}_{k = 0}^{m}$ implies affine dependence of $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$. By contraposition, these two inferences become affine independence of $\left \{ y^{(\ell - k)} \right \}_{k = 0}^{m}$ implies affine independence of $\left \{ x^{(\ell - k)} \right \}_{k = 0}^{m}$, and affine independence of $\left \{ r^{(\ell - k)} \right \}_{k = 0}^{m}$ implies affine independence of $\left \{ x^{(\ell - k)} \right \}_{k = 0}^{m}$.

The foregoing inferences depend only on the assumption that g is affine. We now assume that g is affine and invertible, so we also have

$$G^{-1} (y^{(\ell - k)} - y^{(\ell )}) = (x^{(\ell - k)} - x^{(\ell )}) , 1 \leq k \leq m . $$

We see that linear dependence of $\left \{ y^{(\ell - k)} - y^{(\ell )} \right \}_{k = 1}^{m}$ implies linear dependence of $ \left \{ x^{(\ell - k)} - x^{(\ell )} \right \}_{k = 1}^{m}$. Combined with the foregoing, we obtain linear dependence of $\left \{ y^{(\ell - k)} - y^{(\ell )} \right \}_{k = 1}^{m}$ if we have linear dependence of $\left \{ x^{(\ell - k)} - x^{(\ell )} \right \}_{k = 1}^{m}$; thence by contraposition, we obtain linear independence of $\left \{ y^{(\ell - k)} - y^{(\ell )} \right \}_{k = 1}^{m}$ if we have linear independence of $\left \{ x^{(\ell - k)} - x^{(\ell )} \right \}_{k = 1}^{m}$. This may be rephrased as the statements that we obtain affine dependence or independence of $\left \{ y^{(\ell - k)} \right \}_{k = 0}^{m}$ if we have affine dependence or independence of $\left \{ x^{(\ell - k)} \right \}_{k = 0}^{m}$, respectively. In addition, we infer that linear dependence of $\left \{ y^{(\ell - k)} - y^{(\ell )} \right \}_{k = 1}^{m}$, or equivalently, affine dependence of $\left \{ y^{(\ell -k)} \right \}_{k = 0}^{m}$, implies linear dependence of $\left \{ r^{(\ell - k)} - r^{(\ell )} \right \}_{k = 1}^{m}$, or equivalently, affine dependence of $\left \{ r^{(\ell -k)} \right \}_{k = 0}^{m}$. By contraposition, we see that linear independence of $\left \{ r^{(\ell - k)} - r^{(\ell )} \right \}_{k = 1}^{m}$, or equivalently, affine independence of $\left \{ r^{(\ell -k)} \right \}_{k = 0}^{m}$ implies linear independence of $\left \{ y^{(\ell - k)} - y^{(\ell )} \right \}_{k = 1}^{m}$, or equivalently, affine independence of $\left \{ y^{(\ell -k)} \right \}_{k = 0}^{m}$.

Note that in the foregoing results affine dependence of $\left \{ r^{(\ell -k)} \right \}_{k = 0}^{m}$ appears only as a conclusion, and affine independence of $\left \{ r^{(\ell -k)} \right \}_{k = 0}^{m}$ appears only as a hypothesis. Thus, we identify circumstances in which $\left \{ r^{(\ell -k)} \right \}_{k = 0}^{m}$ is affinely dependent, and consequences of $\left \{ r^{(\ell -k)} \right \}_{k = 0}^{m}$ being affinely independent. However, for 0 ≤ k ≤ m, we have r^(ℓ−k) = (G − I)x^(ℓ−k) + h, and, for 1 ≤ k ≤ m, we have (r^(ℓ−k) − r^(ℓ)) = (G − I)(x^(ℓ−k) − x^(ℓ)). If (I − G), thence (G − I), is nonsingular, it follows as above that $\left \{ r^{(\ell - k)} - r^{(\ell )} \right \}_{k = 1}^{m} $ is linearly dependent (independent) if $\left \{ x^{(\ell - k)} - x^{(\ell )} \right \}_{k = 1}^{m}$ is linearly dependent (independent), or equivalently, that $\left \{ r^{(\ell -k)} \right \}_{k = 0}^{m}$ is affinely dependent (independent) if $ \left \{ x^{(\ell -k)} \right \}_{k = 0}^{m}$ is affinely dependent (independent). Recall that, as a practical matter, near affine (linear) dependence is usually the more salient issue, so the condition number is relevant.

We now return to the choice of the shift vector and associated deviation basis for the affine span of an affinely independent $\left \{ r^{(\ell -k)} \right \}_{k = 0}^{m}$. As with our previous choices, we have no principled basis for choosing without relevant information from the problem context and about the anticipated consequences thereof.

In the nonstationary Extrapolation Algorithm, the one set of iterant data that is immune to being disregarded is that corresponding to the residual chosen as the shift vector. That set ought to be x^(ℓ) and y^(ℓ), so r^(ℓ) should be chosen as the shift vector; thence also, imposition of the constraint $\hat {\theta }^{(\ell )}_{0} > 0$. The motivating assumption behind the Extrapolation Algorithm is that the underlying Picard iteration is converging and we seek to increase the rate of convergence. We anticipate that the younger residuals will eventually be significantly smaller than the older residuals so $(\hat {v}^{(\ell )} - \hat {u}^{(\ell )})$ will be close to the younger residuals, whose $\hat {\theta }^{(\ell )}_{k}$ will dominate.

Having chosen r^(ℓ) as the shift vector, the associated deviation basis can be rescaled and reordered for numerical purposes, and the issue of actual or near affine dependence can be addressed. The associated deviation basis could be replaced by the corresponding difference basis, to exploit the natural ordering of the iterants. These matters have been discussed in detail in the main text.

Consider the affine subspace of dimension m defined as the affine span of the affinely independent set $\left \{ r^{(\ell -k)} \right \}_{k = 0}^{m}$. Why might one wish to consider a shift vector other than one of the r^(ℓ−k), namely an affine combination of $\left \{ r^{(\ell -k)} \right \}_{k = 0}^{m}$? How could we define, determine, and manipulate an associated deviation basis? How could we use this shift vector and associated deviation basis to find the unique point $ (\hat {v}^{(\ell )} - \hat {u}^{(\ell )}) $ in the affine subspace closest to 0? How could we determine the corresponding unique affine combination coefficients such that

$$(\hat{v}^{(\ell)} - \hat{u}^{(\ell)}) = \sum\limits_{k = 0}^{m} {\hat{\theta}}_{k}^{(\ell)} r^{(\ell-k)} , $$

and thence determine $\hat {u}^{(\ell )} = {\sum }_{k = 0}^{m} {\hat {\theta }}_{k}^{(\ell )} x^{(\ell -k)}$ and $\hat {v}^{(\ell )} = {\sum }_{k = 0}^{m} {\hat {\theta }}_{k}^{(\ell )} y^{(\ell -k)}$? We shall answer these questions hereafter.

We consider as shift vector the affine combination $s^{(\ell )} = {\sum }_{k = 0}^{m} {\sigma }_{k}^{(\ell )} r^{(\ell -k)}$, with ${\sum }_{k = 0}^{m} {\sigma }_{k}^{(\ell )} = 1$. We shall be primarily concerned with convex combinations: ${\sigma }_{k}^{(\ell )} \geq 0$, 0 ≤ k ≤ m. Of particular interest will be the centroid $ \bar {s}^{(\ell )}$, with ${\bar {\sigma }}_{k}^{(\ell )} = (m + 1)^{-1}$, 0 ≤ k ≤ m. Since the affine span of $\left \{ s^{(\ell )} \right \} \cup \left \{ r^{(\ell -k)} \right \}_{k = 0}^{m}$ coincides with that of $\left \{ r^{(\ell -k)} \right \}_{k = 0}^{m}$, $\left \{ s^{(\ell )} \right \} \cup \left \{ r^{(\ell -k)} \right \}_{k = 0}^{m}$ is affinely dependent. Any member (v^(ℓ) − u^(ℓ)) of the affine span of $\left \{ r^{(\ell -k)} \right \}_{k = 0}^{m}$ can be written in the form

$$(v^{(\ell)} - u^{(\ell)}) = \sum\limits_{k = 0}^{m} {\theta}_{k}^{(\ell)} r^{(\ell-k)}, $$

with ${\sum }_{k = 0}^{m} {\theta }_{k}^{(\ell )} = 1$, and can also be written in the form

$$(v^{(\ell)} - u^{(\ell)}) = s^{(\ell)} + \sum\limits_{k = 0}^{m} {\theta}_{k}^{(\ell)} (r^{(\ell-k)} - s^{(\ell)}). $$

We identify $\left \{r^{(\ell -k)} - s^{(\ell )} \right \}_{k = 0}^{m}$ as a deviation spanning set associated with shift vector s^(ℓ) for the affine span of $\left \{ r^{(\ell -k)} \right \}_{k = 0}^{m}$, or equivalently $\left \{ s^{(\ell )} \right \} \cup \left \{ r^{(\ell -k)} \right \}_{k = 0}^{m} \cdot \left \{ r^{(\ell -k)} - s^{(\ell )} \right \}_{k = 0}^{m}$ is not a deviation basis because it constitutes a set of m + 1 vectors in an m dimensional linear subspace and must therefore be linearly dependent. Since $\left \{ r^{(\ell -k)} \right \}_{k = 0}^{m}$ is affinely independent, the members of the set are nonzero and distinct. If all members of $\left \{ r^{(\ell -k)} - s^{(\ell )} \right \}_{k = 0}^{m}$ are nonzero, there is a nontrivial linear combination thereof equal to zero: ${\sum }_{k = 0}^{m} \eta _{k} (r^{(\ell -k)} - s^{(\ell )}) = 0$. There are at least two (and at most m + 1) j such that η_j≠ 0, so (r^(ℓ−j) − s^(ℓ)) can be expressed as a linear combination of the other members of $\left \{r^{(\ell -k)} - s^{(\ell )} \right \}_{k = 0}^{m}$. There is at most one j such that (r^(ℓ−j) − s^(ℓ)) = 0. For each j, the linear span of $\left \{ r^{(\ell -k)} - s^{(\ell )} \right \}_{k = 0}^{m} - (r^{(\ell -j)} - s^{(\ell )})$ coincides with that of $\left \{ r^{(\ell -k)} - s^{(\ell )} \right \}_{k = 0}^{m}$. We identify $\left \{ r^{(\ell -k)} - s^{(\ell )} \right \}_{k = 0}^{m} - (r^{(\ell -j)} - s^{(\ell )})$ as a deviation spanning set associated with shift vector s^(ℓ). Since $\left \{ r^{(\ell -k)} - s^{(\ell )} \right \}_{k = 0}^{m} - (r^{(\ell -j)} - s^{(\ell )})$ constitutes a spanning set of m vectors in an m dimensional linear subspace, this spanning set must be linearly independent, thence a deviation basis.

When the shift vector is a member of $\left \{ r^{(\ell -k)} \right \}_{k = 0}^{m}$, the customary choice, there is only one associated deviation basis. When the shift vector is not a member of $\left \{r^{(\ell -k)}\right \}_{k = 0}^{m}$, there are at least 2 and at most m + 1 deviation bases. For shift vector $s^{(\ell )} = {\sum }_{k = 0}^{m} {\sigma }_{k}^{(\ell )} r^{(\ell -k)}$, with ${\sum }_{k = 0}^{m} {\sigma }_{k}^{(\ell )} = 1$, we see that there are as many deviation bases as there are nonzero ${\sigma }_{k}^{(\ell )}$, 0 ≤ k ≤ m, because ${\sum }_{k = 0}^{m} {\sigma }_{k}^{(\ell )} (r^{(\ell -k)} - s^{(\ell )}) = 0$. In particular, for $\bar {s}^{(\ell )}$, with ${\bar {\sigma }}_{k}^{(\ell )} = (m + 1)^{-1}$, 0 ≤ k ≤ m, there are m + 1 deviation bases. The N × (m + 1) matrix with columns $(r^{(\ell -k)} - \bar {s}^{(\ell )})$, 0 ≤ k ≤ m ≪ N, will have rank m, and any subset of m columns constitutes a deviation basis associated with $\bar {s}^{(\ell )}$. How do we choose among them? If we were to use the standard scaling and pivoting strategies to construct a QR decomposition/factorization of this matrix, we would select a particular deviation basis. Interest in using the centroid $\bar {s}^{(\ell )}$ as the shift vector arises from the expectation that the resulting deviation basis matrix will have a smaller condition number than that for the deviation basis associated with r^(ℓ). “Centering” of this sort is a common stratagem in many contexts. Whether the potential gain would be worth the modest incremental cost remains to be seen and would be problem dependent.

Specifically, suppose that we set out to use the Extrapolation Algorithm as laid out in the main text to seek the unique point $(\hat {v}^{(\ell )} - \hat {u}^{(\ell )})$ closest to 0 in the affine span of $\left \{ \bar {s}^{(\ell )} \right \} \cup \left \{ r^{(\ell -k)} \right \}_{k = 0}^{m}$ using $\bar {s}^{(\ell )}$ as the shift vector. The standard scaling, pivoting, and narrow regularization strategies will cope with the actual affine dependence and choose an associated deviation basis, characterized by j. Ordering by increasing age and using the standard scaling strategy will ensure that j > 0, so the iterant data x^(ℓ) and y^(ℓ) will not be disregarded in choosing the deviation basis. We assume that the corresponding basic solution will be produced, so the solution can be written in the form

$$(\hat{v}^{(\ell)} - \hat{u}^{(\ell)}) = \left[ 1 - \sum\limits_{k = 0}^{m} \hat{\phi}^{(\ell)}_{k} \right] \bar{s}^{(\ell)} + \sum\limits_{k = 0}^{m} {\hat{\phi}}_{k}^{(\ell)} r^{(\ell-k)}, $$

with the understanding that ${\hat {\phi }}_{j}^{(\ell )} = 0$ for the j characterizing the chosen deviation basis. We may then obtain

$$(\hat{v}^{(\ell)} - \hat{u}^{(\ell)}) = \sum\limits_{k = 0}^{m} {\hat{\theta}}_{k}^{(\ell)} r^{(\ell-k)}, $$

where

$${\hat{\theta}}_{k}^{(\ell)} = {\hat{\phi}}_{k}^{(\ell)} + \left[ 1 - \sum\limits_{i = 0}^{m} {\hat{\phi}}_{i}^{(\ell)} \right] {\bar{\sigma}}_{k}^{(\ell)}, $$

for 0 ≤ k ≤ m. The minimal solution could easily be used instead of the basic solution. Centering is particularly attractive for problems that exhibit oscillatory behavior of the residuals. For problems exhibiting monotonic behavior of the residuals, selection of s^(ℓ) closer to the younger residuals may be preferable. The algorithm could also accommodate near affine dependence of $ \left \{ r^{(\ell -k)} \right \}_{k = 0}^{m}$, as discussed in the main text.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Anderson, D.G.M. Comments on “Anderson Acceleration, Mixing and Extrapolation”. Numer Algor 80, 135–234 (2019). https://doi.org/10.1007/s11075-018-0549-4

Download citation

Received: 03 May 2018
Accepted: 10 May 2018
Published: 05 June 2018
Issue Date: 23 January 2019
DOI: https://doi.org/10.1007/s11075-018-0549-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comments on “Anderson Acceleration, Mixing and Extrapolation”

Abstract

Access this article

Similar content being viewed by others

Publishing Research: Book Chapters and Books

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

Optimization of Random Feature Method in the High-Precision Regime

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Comments on “Anderson Acceleration, Mixing and Extrapolation”

Abstract

Access this article

Similar content being viewed by others

Publishing Research: Book Chapters and Books

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

Optimization of Random Feature Method in the High-Precision Regime

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation