Abstract
In many iterative optimization methods, fixed-point theory enables the analysis of the convergence rate via the contraction factor associated with the linear approximation of the fixed-point operator. While this factor characterizes the asymptotic linear rate of convergence, it does not explain the non-linear behavior of these algorithms in the non-asymptotic regime. In this letter, we take into account the effect of the first-order approximation error and present a closed-form bound on the convergence in terms of the number of iterations required for the distance between the iterate and the limit point to reach an arbitrarily small fraction of the initial distance. Our bound includes two terms: one corresponds to the number of iterations required for the linearized version of the fixed-point operator and the other corresponds to the overhead associated with the approximation error. With a focus on the convergence in the scalar case, the tightness of the proposed bound is proven for positively quadratic first-order difference equations.
This is a preview of subscription content, access via your institution.

References
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5), 1–17 (1964)
Saigal, R., Todd, M.J.: Efficient acceleration techniques for fixed point algorithms. SIAM J. Numer. Anal. 15(5), 997–1007 (1978)
Walker, H.F., Ni, P.: Anderson acceleration for fixed-point iterations. SIAM J. Numer. Anal. 49(4), 1715–1735 (2011)
Jung, A.: A fixed-point of view on gradient methods for big data. Front. Appl. Math. Stat. 3, 18 (2017)
Brouwer, L.E.J.: Über abbildung von mannigfaltigkeiten. Math. Ann. 71(1), 97–115 (1911)
Banach, S.: Sur les opérations dans les ensembles abstraits et leur application aux équations intégrales. Fundam. Math. 3(1), 133–181 (1922)
Lambers, J.V., Mooney, A.S., Montiforte, V.A.: Explorations in Numerical Analysis. World Scientific, Singapore (2019)
Roberts, A.: The derivative as a linear transformation. Am. Math. Mon. 76(6), 632–638 (1969)
Bellman, R.: Stability Theory of Differential Equations. McGraw-Hill, New York (1953)
Abramowitz, M., Stegun, I.A.: Handbook of mathematical functions with formulas, graphs, and mathematical tables. NBS Appl. Math. Ser. 55 (1964)
Vu, T., Raich, R.: Local convergence of the Heavy Ball method in iterative hard thresholding for low-rank matrix completion. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 3417–3421 (2019)
Vu, T., Raich, R.: Accelerating iterative hard thresholding for low-rank matrix completion via adaptive restart. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 2917–2921 (2019)
Vu, T., Raich, R., Fu, X.: On convergence of projected gradient descent for minimizing a large-scale quadratic over the unit sphere. In: IEEE International Workshop on Machine Learning for Signal Processing, pp. 1–6 (2019)
Vu, T., Raich, R.: Exact linear convergence rate analysis for low-rank symmetric matrix completion via gradient descent. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3240–3244 (2021)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article. Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Proof of Theorem 1
First, we establish a sandwich inequality on \(K(\epsilon )\) in the following lemma:
Lemma 1
For any \(0< \epsilon < 1\), let \(K(\epsilon )\) be the smallest integer such that for all \(k \ge K(\epsilon )\), we have \(a_k \le \epsilon a_0\). Then,
where \(b(\rho ,\tau )\) is defined in (7) and
The lemma provides an upper bound on \(K(\epsilon )\). Moreover, it is a tight bound in the sense that the gap between lower bound \(\underline{K}(\epsilon )\) and the upper bound \(\overline{K}(\epsilon )\) is independent of \(\epsilon\). In other words, the ratio \(K(\epsilon )/\overline{K}(\epsilon )\) approaches 1 as \(\epsilon \rightarrow 0\). Next, we proceed to obtain a tight closed-form upper bound on \(\overline{K}(\epsilon )\) by upper-bounding \(F(\log (1/\epsilon ))\).
Lemma 2
Consider the function \(F(\cdot )\) given in (A2). For \(0<\epsilon <1\), we have
and
where
Lemma 2 offers two upper bounds on \(F(\log (1/\epsilon ))\) and one lower bound. The first bound \(\overline{F}_1(\log (1/\epsilon ))\) approximates well the behavior of \(F(\log (1/\epsilon ))\) for both small and large values of \(\log (1/\epsilon )\). The second bound \(\overline{F}_2 (\log (1/\epsilon ))\) provides a linear bound on \(F(\log (1/\epsilon ))\) in terms of \(\log (1/\epsilon )\). Moreover, the gap between \(F(\log (1/\epsilon ))\) and \(\underline{F}_1(\log (1/\epsilon ))\), given by \(A(\epsilon )\), can be upper bound by A(0) since \(A(\cdot )\) is monotonically decreasing for \(\epsilon \in [0,1)\). While \(F(\cdot )\) asymptotically increases like \(\log (1/\epsilon )/ \log (1/\rho )\), the gap approaches a constant independent of \(\epsilon\). Replacing \(F(\log (1/\epsilon ))\) on the RHS of (A1) by either of the upper bounds in Lemma 2, we obtain two corresponding bounds on \(K(\epsilon )\):
where we note that \(\overline{K}_2(\epsilon )\) has the same expression as in (5). Moreover, the tightness of these two upper bounds can be shown as follows. First, using the first inequality in (A1) and then the lower bound on \(F(\log (1/\epsilon ))\) in (A5), the gap between \(\overline{K}_1(\epsilon )\) and \(K(\epsilon )\) can be bounded by
where the last inequality stems from the monotonicity of \(A(\cdot )\) in [0, 1). Note that the bound in (A8) holds uniformly independent of \(\epsilon\), implying \(\overline{K}_1(\epsilon )\) is a tight bound on \(K(\epsilon )\). Second, using (A7), the gap between \(\overline{K}_2(\epsilon )\) and \(K(\epsilon )\) can be represented as
where the last inequality stems from (A8). Furthermore, using the definition of \(\overline{F}_1(\log (1/\epsilon ))\) and \(\overline{F}_2(\log (1/\epsilon ))\) in (A3) and (A4), respectively, we have \(\lim _{\epsilon \rightarrow 0} (\overline{F}_2(\log (1/\epsilon )) - \overline{F}_1(\log (1/\epsilon ))) = 0\). Thus, taking the limit \(\epsilon \rightarrow 0\) on both sides of (A9), we obtain
We note that \(\overline{K}_2(\epsilon )\) is a simple bound that is linear in terms of \(\log (1/\epsilon )\) and approaches the upper bound \(\overline{K}_1(\epsilon )\) in the asymptotic regime (\(\epsilon \rightarrow 0\)). Evaluating A(0) from (A6) and substituting it back into (A10) yields (8), which completes our proof of Theorem 1. Figure 1 (right) depicts the aforementioned bounds on \(K(\epsilon )\). It can be seen from the plot that all the four bounds match the asymptotic rate of increment in \(K(\epsilon )\) (for large values of \(1/\epsilon\)). The three bounds \(\underline{K}(\epsilon )\) (red), \(\overline{K}(\epsilon )\) (yellow), and \(\overline{K}_1(\epsilon )\) (purple) closely follow \(K(\epsilon )\) (blue), indicating that the integral function \(F(\cdot )\) effectively estimates the minimum number of iterations required to achieve \(a_k \le \epsilon a_0\) in this setting. The upper bound \(\overline{K}_2(\epsilon )\) (green) forms a tangent to \(\overline{K}_1(\epsilon )\) at \(1/\epsilon \rightarrow \infty\) (i.e., \(\epsilon \rightarrow 0\)).
Appendix A. 1: Proof of Lemma 1
Let \(d_k=\log ({a_0}/{a_k})\) for each \(k \in {\mathbb N}\). Substituting \(a_k = a_0 e^{-d_k}\) into (3), we obtain the surrogate sequence \(\{d_k\}_{k=0}^\infty\):
where \(d_0=0\) and \(\tau = a_0 q/(1-\rho ) \in (0,1)\). Since \(\{a_k\}_{k=0}^\infty\) is monotonically decreasing to 0 and \(d_k\) is monotonically decreasing as a function of \(a_k\), \(\{d_k\}_{k=0}^\infty\) is a monotonically increasing sequence. Our key steps in this proof are first to tightly bound the index \(K \in \mathbb {N}\) using \(F(d_K)\)
and then to obtain (A1) from (A12) using the monotonicity of the sequence \(\{d_k\}_{k=0}^\infty\) and of the function \(F(\cdot )\). We proceed with the details of each of the steps in the following.
Step 1: We prove (A12) by showing the lower bound on K first and then showing the upper bound on K. Using (A2), we can rewrite (A11) as \(d_{k+1}=d_k+1/f(d_k)\). Rearranging this equation yields
Since f(x) is monotonically decreasing, we obtain the lower bound on K in (A12) by
where the last equality stems from (A13). For the upper bound on K in (A12), we use the convexity of \(f(\cdot )\) to lower-bound \(F(d_K)\) as follows
Using (A13) and substituting \(f'(x) = -\bigl (f(x)\bigr )^2 \frac{\tau (1-\rho ) e^{-x}}{\rho +\tau (1-\rho ) e^{-x}}\) into the RHS of (A15), we obtain
Note that (A16) already offers an upper on K in terms of \(F(d_K)\). To obtain the upper bound on K in (A12) from (A16), it suffices to show that
In the following, we prove (A17) by introducing the functions
and
Note that \(g(\cdot )\) is monotonically decreasing (a product of two decreasing functions) while \(G(\cdot )\) is monotonically increasing (an integral of a non-negative function) on \([0,\infty )\). We have
Lemma 3
For any \(k \in \mathbb {N}\), we have \({g(d_{k+1})}/{g(d_k)} \ge \rho\).
Proof
For \(k\in \mathbb {N}\), let \(t_k = \rho + \tau (1-\rho ) e^{-d_k} \in (\rho ,1)\). From (A11), we have \(t_k = e^{-(d_{k+1}-d_k)}\) and \(t_{k+1} = \rho + \tau (1-\rho ) e^{-d_{k+1}} = \rho + \tau (1-\rho ) e^{-d_{k}} e^{-(d_{k+1}-d_k)} = \rho + (t_k-\rho )t_k\). Substituting \(d_k\) for x in g(x) from (A18) and replacing \(\rho + \tau (1-\rho ) e^{-d_k}\) with \(t_k\) yield \(g(d_k) = \frac{\tau (1-\rho ) e^{-d_{k}}}{t_k} \frac{1}{-\log (t_k)}\). Repeating the same process to obtain \(g(d_{k+1})\) and taking the ratio between \(g(d_{k+1})\) and \(g(d_k)\), we obtain
Substituting \(e^{-(d_{k+1}-d_k)} = t_k\) and \(t_{k+1} = \rho + (t_k-\rho )t_k\) into (A21) yields
We now continue to bound the ratio \({g(d_{k+1})}/{g(d_k)}\) by bounding the RHS. Since \(t_k-\rho \ge 0\) and \(t_k<1\), we have \(t_k-\rho > (t_k-\rho ) t_k\) and hence \({t_k}/{(\rho +(t_k-\rho )t_k)} > 1\). Thus, in order to prove \(\frac{g(d_{k+1})}{g(d_k)} \ge \rho\) from the fact that the RHS of (A22) is greater or equal to \(\rho\), it remains to show that
By the concavity of \(\log (\cdot )\), it holds that \(\log (\frac{\rho }{t_k} 1+\frac{t_k-\rho }{t_k} t ) \ge \frac{\rho }{t_k} \log (1) + \frac{t_k-\rho }{t_k} \log (t_k) = (1-\frac{\rho }{t_k}) \log (t_k)\). Adding \(\log (t_k)\) to both sides of the last inequality yields \(\log (\rho +(t_k-\rho )t_k ) \ge (2-\frac{\rho }{t_k}) \log (t_k)\). Now using the fact that \((\sqrt{\rho /t_k} - \sqrt{t_k/\rho })^2 \ge 0\), we have \(2-\rho /t_k \le t_k/\rho\). By this inequality and the negativity of \(\log (t_k)\), we have \(\log (\rho +(t_k-\rho )t_k ) \ge \frac{t_k}{\rho } \log (t_k)\). Multiplying both sides by the negative ratio \(\rho /\log (\rho +(t_k-\rho )t_k)\) and adjusting the direction of the inequality yields the inequality in (A23), which completes our proof of the lemma.
Back to our proof of Theorem 1, applying Lemma 3 to (A20) and substituting \(d_{k+1}-d_k = -\log (\rho + \tau (1-\rho ) e^{-d_k})\) from (A11) and \(g(d_k)\) from (A18), we have
Using the monotonicity of \(G(\cdot )\), we upper-bound \(G(d_K)\) by
Thus, the RHS of (A24) is upper bounded by the RHS of (A25). Dividing the result by \(\rho\), we obtain (A17). This completes our proof of the upper bound on K in (A12) and thereby the first step of the proof.
Step 2: We proved both the lower bound and the upper bound on K in (A12). Next, we proceed to show (A1) using (A12). By the definition of \(K(\epsilon )\), \(a_{K(\epsilon )} \le \epsilon a_0 < a_{K(\epsilon )-1}\). Since \(d_k=\log ({a_0}/{a_k})\), for \(k \in {\mathbb N}\), we have \(d_{K(\epsilon )-1} \le \log (1/\epsilon ) \le d_{K(\epsilon )}\). On the one hand, using the monotonicity of \(F(\cdot )\) and substituting \(K=K(\epsilon )\) into the lower bound on K in (A12) yields
On the other hand, substituting \(K=K(\epsilon )-1\) into the upper bound on K in (A12), we obtain
Since \(F(\cdot )\) is monotonically increasing and \(d_{K(\epsilon )-1} \le \log (1/\epsilon )\), we have \(F(d_{K(\epsilon )-1}) \le F(\log (1/\epsilon ))\). Therefore, upper-bounding \(F(d_{K(\epsilon )-1})\) on the RHS of (A27) by \(F(\log (1/\epsilon ))\) yields
The inequality (A1) follows on combining (A26) and (A28).
Appendix A. 2: Proof of Lemma 2
Let \(\nu =\tau (1-\rho ) /\rho\). We represent f(x) in the interval \((0,\log (1/\epsilon ))\) as
Then, taking the integral from 0 to \(\log (1/\epsilon )\) yields
Using \(\alpha (1-\alpha /2)=\alpha -\alpha ^2/2 \le \log (1+\alpha ) \le \alpha\), for \(\alpha =\nu e^{-t} \ge 0\), on the numerator within the integral in (A29) and changing the integration variable t to \(z = \log (1/\rho )-\log (1+ \nu e^{-t})\), we obtain both an upper bound and a lower bound on the integral on the RHS of (A29)
where \(\underline{z} = -\log (\rho +\tau (1-\rho ))\) and \(\overline{z} = -\log (\rho +\epsilon \tau (1-\rho ))\). Replacing the integral in (A29) by the upper bound and lower bound from (A30), using the definition of the exponential integral, and simplifying, we obtain the upper-bound on \(F(\log (1/\epsilon ))\) given by \(\overline{F}_1(\log (1/\epsilon ))\) in (A3) and similarly the lower bound on \(F(\log (1/\epsilon ))\) given by \(\underline{F}_1(\log (1/\epsilon ))\) in (A5). Finally, we prove the second upper bound in (A4) as follows. Since \(E_1(\cdot )\) is monotonically decreasing and \(\frac{1}{\rho +\epsilon \tau (1-\rho )} \le \frac{1}{\rho }\), we have \(E_1(\log \frac{1}{\rho +\epsilon \tau (1-\rho )}) \ge E_1(\log \frac{1}{\rho })\), which implies \(\Delta E_1 (\log \frac{1}{\rho +\tau (1-\rho )} , \log \frac{1}{\rho +\epsilon \tau (1-\rho )}) \le \Delta E_1 (\log \frac{1}{\rho +\tau (1-\rho )} , \log \frac{1}{\rho })\). Combining this with the definition of \(\overline{F}_1(\log (1/\epsilon ))\) and \(\overline{F}_2(\log (1/\epsilon ))\) in (A3) and (A4), respectively, we conclude that \(\overline{F}_1(\log (1/\epsilon )) \le \overline{F}_2(\log (1/\epsilon ))\), thereby completes the proof of the lemma.
Appendix B: Proof of Theorem 2
Let \(\tilde{\varvec{\delta }}^{(k)} = \varvec{Q}^{-1} \varvec{\delta }^{(k)}\) be the transformed error vector. Substituting \(\mathcal {T}(\varvec{\delta }^{(k)}) = \varvec{Q} \varvec{\Lambda }\varvec{Q}^{-1} (\varvec{\delta }^{(k)})\) into (2) and then left-multiplying both sides by \(\varvec{Q}^{-1}\), we obtain
where \(\tilde{\varvec{q}}(\tilde{\varvec{\delta }}^{(k)}) = \varvec{Q}^{-1} \varvec{q}(\varvec{Q} \tilde{\varvec{\delta }}^{(k)})\) satisfies \(\Vert \tilde{\varvec{q}}(\tilde{\varvec{\delta }}^{(k)})\Vert \le q \Vert \varvec{Q}^{-1}\Vert _2 \Vert \varvec{Q}\Vert _2^2 \Vert \tilde{\varvec{\delta }}^{(k)}\Vert ^2\). Taking the norm of both sides of (B1) and using the triangle inequality yield
Since \(\Vert \varvec{\Lambda }\Vert _2 = \rho (\mathcal {T})\), the last inequality can be rewritten compactly as
where \(\rho =\rho (\mathcal {T})\) and \(\tilde{q} = q \Vert \varvec{Q}^{-1}\Vert _2 \Vert \varvec{Q}\Vert _2^2\).
To analyze the convergence of \(\{\Vert \tilde{\varvec{\delta }}^{(k)}\Vert \}_{k=0}^\infty\), let us consider a surrogate sequence \(\{a_k\}_{k=0}^\infty \subset {\mathbb R}\) defined by \(a_{k+1} = \rho a_k + \tilde{q} a_k^2\) with \(a_0=\Vert \tilde{\varvec{\delta }}^{(0)}\Vert\). We show that \(\{a_k\}_{k=0}^\infty\) upper-bounds \(\{\Vert \tilde{\varvec{\delta }}^{(k)}\Vert \}_{k=0}^\infty\), i.e.,
The base case when \(k=0\) holds trivially as \(a_0=\Vert \tilde{\varvec{\delta }}^{(0)}\Vert\). In the induction step, given \(\Vert \tilde{\varvec{\delta }}^{(k)}\Vert \le a_k\) for some integer \(k \ge 0\), we have
By the principle of induction, (B3) holds for all \(k \in \mathbb {N}\). Assume for now that \(a_0 = \Vert \tilde{\varvec{\delta }}^{(0)}\Vert < (1-\rho )/\tilde{q}\), then applying Theorem 1 yields \(a_k \le \tilde{\epsilon } a_0\) for any \(\tilde{\epsilon }>0\) and integer \(k \ge {\log (1/\tilde{\epsilon })}/{\log (1/\rho )} + c(\rho ,\tau )\). Using (B3) and setting \(\tilde{\epsilon } = \epsilon /\kappa (\varvec{Q})\), we further have \(\Vert \tilde{\varvec{\delta }}^{(k)}\Vert \le a_k \le \tilde{\epsilon } a_0 = \epsilon \Vert \tilde{\varvec{\delta }}^{(0)}\Vert / \kappa (\varvec{Q})\) for all
Now, it remains to prove (i) the accuracy on the transformed error vector \(\Vert \tilde{\varvec{\delta }}^{(k)}\Vert \le \tilde{\epsilon } \Vert \tilde{\varvec{\delta }}^{(0)}\Vert\) is sufficient for the accuracy on the original error vector \(\Vert {\varvec{\delta }}^{(k)}\Vert \le \epsilon \Vert {\varvec{\delta }}^{(0)}\Vert\); and (ii) the initial condition \(\Vert {\varvec{\delta }}^{(0)}\Vert < (1-\rho )/(q \kappa (\varvec{Q})^2)\) is sufficient for \(\Vert \tilde{\varvec{\delta }}^{(0)}\Vert < (1-\rho )/\tilde{q}\). In order to prove (i), using \(\Vert \tilde{\varvec{\delta }}^{(k)}\Vert \le \epsilon \Vert \tilde{\varvec{\delta }}^{(0)}\Vert /\kappa (\varvec{Q})\), we have
where the last inequality stems from \(\Vert \tilde{\varvec{\delta }}^{(0)}\Vert = \Vert \varvec{Q}^{-1} {\varvec{\delta }}^{(0)}\Vert \le \Vert \varvec{Q}^{-1}\Vert _2 \Vert \varvec{\delta }^{(0)}\Vert\). To prove (ii), we use similar derivation as follows
Finally, the case that \(\varvec{T}\) is symmetric can be proven by the fact that \(\varvec{Q}\) is orthogonal, i.e., \(\varvec{Q}^{-1} = \varvec{Q}^T\) and \(\kappa (\varvec{Q})=1\). Substituting this back into (10) and using the orthogonal invariance property of norm, we obtain the simplified version in (11). This completes our proof of Theorem 2.
Rights and permissions
About this article
Cite this article
Vu, T., Raich, R. A closed-form bound on the asymptotic linear convergence of iterative methods via fixed point analysis. Optim Lett 17, 643–656 (2023). https://doi.org/10.1007/s11590-022-01893-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-022-01893-7
Keywords
- Non-linear difference equations
- Asymptotic linear convergence
- Convergence bounds
- Fixed-point iterations