## Abstract

In many iterative optimization methods, fixed-point theory enables the analysis of the convergence rate via the contraction factor associated with the linear approximation of the fixed-point operator. While this factor characterizes the asymptotic linear rate of convergence, it does not explain the non-linear behavior of these algorithms in the non-asymptotic regime. In this letter, we take into account the effect of the first-order approximation error and present a closed-form bound on the convergence in terms of the number of iterations required for the distance between the iterate and the limit point to reach an arbitrarily small fraction of the initial distance. Our bound includes two terms: one corresponds to the number of iterations required for the linearized version of the fixed-point operator and the other corresponds to the overhead associated with the approximation error. With a focus on the convergence in the scalar case, the tightness of the proposed bound is proven for positively quadratic first-order difference equations.

This is a preview of subscription content, access via your institution.

## References

Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys.

**4**(5), 1–17 (1964)Saigal, R., Todd, M.J.: Efficient acceleration techniques for fixed point algorithms. SIAM J. Numer. Anal.

**15**(5), 997–1007 (1978)Walker, H.F., Ni, P.: Anderson acceleration for fixed-point iterations. SIAM J. Numer. Anal.

**49**(4), 1715–1735 (2011)Jung, A.: A fixed-point of view on gradient methods for big data. Front. Appl. Math. Stat.

**3**, 18 (2017)Brouwer, L.E.J.: Über abbildung von mannigfaltigkeiten. Math. Ann.

**71**(1), 97–115 (1911)Banach, S.: Sur les opérations dans les ensembles abstraits et leur application aux équations intégrales. Fundam. Math.

**3**(1), 133–181 (1922)Lambers, J.V., Mooney, A.S., Montiforte, V.A.: Explorations in Numerical Analysis. World Scientific, Singapore (2019)

Roberts, A.: The derivative as a linear transformation. Am. Math. Mon.

**76**(6), 632–638 (1969)Bellman, R.: Stability Theory of Differential Equations. McGraw-Hill, New York (1953)

Abramowitz, M., Stegun, I.A.: Handbook of mathematical functions with formulas, graphs, and mathematical tables. NBS Appl. Math. Ser.

**55**(1964)Vu, T., Raich, R.: Local convergence of the Heavy Ball method in iterative hard thresholding for low-rank matrix completion. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 3417–3421 (2019)

Vu, T., Raich, R.: Accelerating iterative hard thresholding for low-rank matrix completion via adaptive restart. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 2917–2921 (2019)

Vu, T., Raich, R., Fu, X.: On convergence of projected gradient descent for minimizing a large-scale quadratic over the unit sphere. In: IEEE International Workshop on Machine Learning for Signal Processing, pp. 1–6 (2019)

Vu, T., Raich, R.: Exact linear convergence rate analysis for low-rank symmetric matrix completion via gradient descent. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3240–3244 (2021)

## Author information

### Authors and Affiliations

### Corresponding author

## Ethics declarations

### Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article. Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

## Additional information

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Appendices

### Appendix A: Proof of Theorem 1

First, we establish a sandwich inequality on \(K(\epsilon )\) in the following lemma:

### Lemma 1

For any \(0< \epsilon < 1\), let \(K(\epsilon )\) be the smallest integer such that for all \(k \ge K(\epsilon )\), we have \(a_k \le \epsilon a_0\). Then,

where \(b(\rho ,\tau )\) is defined in (7) and

The lemma provides an upper bound on \(K(\epsilon )\). Moreover, it is a tight bound in the sense that the gap between lower bound \(\underline{K}(\epsilon )\) and the upper bound \(\overline{K}(\epsilon )\) is independent of \(\epsilon\). In other words, the ratio \(K(\epsilon )/\overline{K}(\epsilon )\) approaches 1 as \(\epsilon \rightarrow 0\). Next, we proceed to obtain a tight closed-form upper bound on \(\overline{K}(\epsilon )\) by upper-bounding \(F(\log (1/\epsilon ))\).

### Lemma 2

Consider the function \(F(\cdot )\) given in (A2). For \(0<\epsilon <1\), we have

and

where

Lemma 2 offers two upper bounds on \(F(\log (1/\epsilon ))\) and one lower bound. The first bound \(\overline{F}_1(\log (1/\epsilon ))\) approximates well the behavior of \(F(\log (1/\epsilon ))\) for both small and large values of \(\log (1/\epsilon )\). The second bound \(\overline{F}_2 (\log (1/\epsilon ))\) provides a linear bound on \(F(\log (1/\epsilon ))\) in terms of \(\log (1/\epsilon )\). Moreover, the gap between \(F(\log (1/\epsilon ))\) and \(\underline{F}_1(\log (1/\epsilon ))\), given by \(A(\epsilon )\), can be upper bound by *A*(0) since \(A(\cdot )\) is monotonically decreasing for \(\epsilon \in [0,1)\). While \(F(\cdot )\) asymptotically increases like \(\log (1/\epsilon )/ \log (1/\rho )\), the gap approaches a constant independent of \(\epsilon\). Replacing \(F(\log (1/\epsilon ))\) on the RHS of (A1) by either of the upper bounds in Lemma 2, we obtain two corresponding bounds on \(K(\epsilon )\):

where we note that \(\overline{K}_2(\epsilon )\) has the same expression as in (5). Moreover, the tightness of these two upper bounds can be shown as follows. First, using the first inequality in (A1) and then the lower bound on \(F(\log (1/\epsilon ))\) in (A5), the gap between \(\overline{K}_1(\epsilon )\) and \(K(\epsilon )\) can be bounded by

where the last inequality stems from the monotonicity of \(A(\cdot )\) in [0, 1). Note that the bound in (A8) holds uniformly independent of \(\epsilon\), implying \(\overline{K}_1(\epsilon )\) is a tight bound on \(K(\epsilon )\). Second, using (A7), the gap between \(\overline{K}_2(\epsilon )\) and \(K(\epsilon )\) can be represented as

where the last inequality stems from (A8). Furthermore, using the definition of \(\overline{F}_1(\log (1/\epsilon ))\) and \(\overline{F}_2(\log (1/\epsilon ))\) in (A3) and (A4), respectively, we have \(\lim _{\epsilon \rightarrow 0} (\overline{F}_2(\log (1/\epsilon )) - \overline{F}_1(\log (1/\epsilon ))) = 0\). Thus, taking the limit \(\epsilon \rightarrow 0\) on both sides of (A9), we obtain

We note that \(\overline{K}_2(\epsilon )\) is a simple bound that is linear in terms of \(\log (1/\epsilon )\) and approaches the upper bound \(\overline{K}_1(\epsilon )\) in the asymptotic regime (\(\epsilon \rightarrow 0\)). Evaluating *A*(0) from (A6) and substituting it back into (A10) yields (8), which completes our proof of Theorem 1. Figure 1 (right) depicts the aforementioned bounds on \(K(\epsilon )\). It can be seen from the plot that all the four bounds match the asymptotic rate of increment in \(K(\epsilon )\) (for large values of \(1/\epsilon\)). The three bounds \(\underline{K}(\epsilon )\) (red), \(\overline{K}(\epsilon )\) (yellow), and \(\overline{K}_1(\epsilon )\) (purple) closely follow \(K(\epsilon )\) (blue), indicating that the integral function \(F(\cdot )\) effectively estimates the minimum number of iterations required to achieve \(a_k \le \epsilon a_0\) in this setting. The upper bound
\(\overline{K}_2(\epsilon )\) (green) forms a tangent to \(\overline{K}_1(\epsilon )\) at \(1/\epsilon \rightarrow \infty\) (i.e., \(\epsilon \rightarrow 0\)).

### Appendix A. 1: Proof of Lemma 1

Let \(d_k=\log ({a_0}/{a_k})\) for each \(k \in {\mathbb N}\). Substituting \(a_k = a_0 e^{-d_k}\) into (3), we obtain the surrogate sequence \(\{d_k\}_{k=0}^\infty\):

where \(d_0=0\) and \(\tau = a_0 q/(1-\rho ) \in (0,1)\). Since \(\{a_k\}_{k=0}^\infty\) is monotonically decreasing to 0 and \(d_k\) is monotonically decreasing as a function of \(a_k\), \(\{d_k\}_{k=0}^\infty\) is a monotonically increasing sequence. Our key steps in this proof are first to tightly bound the index \(K \in \mathbb {N}\) using \(F(d_K)\)

and then to obtain (A1) from (A12) using the monotonicity of the sequence \(\{d_k\}_{k=0}^\infty\) and of the function \(F(\cdot )\). We proceed with the details of each of the steps in the following.

**Step 1:** We prove (A12) by showing the lower bound on *K* first and then showing the upper bound on *K*. Using (A2), we can rewrite (A11) as \(d_{k+1}=d_k+1/f(d_k)\). Rearranging this equation yields

Since *f*(*x*) is monotonically decreasing, we obtain the lower bound on *K* in (A12) by

where the last equality stems from (A13). For the upper bound on *K* in (A12), we use the convexity of \(f(\cdot )\) to lower-bound \(F(d_K)\) as follows

Using (A13) and substituting \(f'(x) = -\bigl (f(x)\bigr )^2 \frac{\tau (1-\rho ) e^{-x}}{\rho +\tau (1-\rho ) e^{-x}}\) into the RHS of (A15), we obtain

Note that (A16) already offers an upper on *K* in terms of \(F(d_K)\). To obtain the upper bound on *K* in (A12) from (A16), it suffices to show that

In the following, we prove (A17) by introducing the functions

and

Note that \(g(\cdot )\) is monotonically decreasing (a product of two decreasing functions) while \(G(\cdot )\) is monotonically increasing (an integral of a non-negative function) on \([0,\infty )\). We have

### Lemma 3

For any \(k \in \mathbb {N}\), we have \({g(d_{k+1})}/{g(d_k)} \ge \rho\).

### Proof

For \(k\in \mathbb {N}\), let \(t_k = \rho + \tau (1-\rho ) e^{-d_k} \in (\rho ,1)\). From (A11), we have \(t_k = e^{-(d_{k+1}-d_k)}\) and \(t_{k+1} = \rho + \tau (1-\rho ) e^{-d_{k+1}} = \rho + \tau (1-\rho ) e^{-d_{k}} e^{-(d_{k+1}-d_k)} = \rho + (t_k-\rho )t_k\). Substituting \(d_k\) for *x* in *g*(*x*) from (A18) and replacing \(\rho + \tau (1-\rho ) e^{-d_k}\) with \(t_k\) yield \(g(d_k) = \frac{\tau (1-\rho ) e^{-d_{k}}}{t_k} \frac{1}{-\log (t_k)}\). Repeating the same process to obtain \(g(d_{k+1})\) and taking the ratio between \(g(d_{k+1})\) and \(g(d_k)\), we obtain

Substituting \(e^{-(d_{k+1}-d_k)} = t_k\) and \(t_{k+1} = \rho + (t_k-\rho )t_k\) into (A21) yields

We now continue to bound the ratio \({g(d_{k+1})}/{g(d_k)}\) by bounding the RHS. Since \(t_k-\rho \ge 0\) and \(t_k<1\), we have \(t_k-\rho > (t_k-\rho ) t_k\) and hence \({t_k}/{(\rho +(t_k-\rho )t_k)} > 1\). Thus, in order to prove \(\frac{g(d_{k+1})}{g(d_k)} \ge \rho\) from the fact that the RHS of (A22) is greater or equal to \(\rho\), it remains to show that

By the concavity of \(\log (\cdot )\), it holds that \(\log (\frac{\rho }{t_k} 1+\frac{t_k-\rho }{t_k} t ) \ge \frac{\rho }{t_k} \log (1) + \frac{t_k-\rho }{t_k} \log (t_k) = (1-\frac{\rho }{t_k}) \log (t_k)\). Adding \(\log (t_k)\) to both sides of the last inequality yields \(\log (\rho +(t_k-\rho )t_k ) \ge (2-\frac{\rho }{t_k}) \log (t_k)\). Now using the fact that \((\sqrt{\rho /t_k} - \sqrt{t_k/\rho })^2 \ge 0\), we have \(2-\rho /t_k \le t_k/\rho\). By this inequality and the negativity of \(\log (t_k)\), we have \(\log (\rho +(t_k-\rho )t_k ) \ge \frac{t_k}{\rho } \log (t_k)\). Multiplying both sides by the negative ratio \(\rho /\log (\rho +(t_k-\rho )t_k)\) and adjusting the direction of the inequality yields the inequality in (A23), which completes our proof of the lemma.

Back to our proof of Theorem 1, applying Lemma 3 to (A20) and substituting \(d_{k+1}-d_k = -\log (\rho + \tau (1-\rho ) e^{-d_k})\) from (A11) and \(g(d_k)\) from (A18), we have

Using the monotonicity of \(G(\cdot )\), we upper-bound \(G(d_K)\) by

Thus, the RHS of (A24) is upper bounded by the RHS of (A25). Dividing the result by \(\rho\), we obtain (A17). This completes our proof of the upper bound on *K* in (A12) and thereby the first step of the proof.

**Step 2:** We proved both the lower bound and the upper bound on *K* in (A12). Next, we proceed to show (A1) using (A12). By the definition of \(K(\epsilon )\), \(a_{K(\epsilon )} \le \epsilon a_0 < a_{K(\epsilon )-1}\). Since \(d_k=\log ({a_0}/{a_k})\), for \(k \in {\mathbb N}\), we have \(d_{K(\epsilon )-1} \le \log (1/\epsilon ) \le d_{K(\epsilon )}\). On the one hand, using the monotonicity of \(F(\cdot )\) and substituting \(K=K(\epsilon )\) into the lower bound on *K* in (A12) yields

On the other hand, substituting \(K=K(\epsilon )-1\) into the upper bound on *K* in (A12), we obtain

Since \(F(\cdot )\) is monotonically increasing and \(d_{K(\epsilon )-1} \le \log (1/\epsilon )\), we have \(F(d_{K(\epsilon )-1}) \le F(\log (1/\epsilon ))\). Therefore, upper-bounding \(F(d_{K(\epsilon )-1})\) on the RHS of (A27) by \(F(\log (1/\epsilon ))\) yields

The inequality (A1) follows on combining (A26) and (A28).

### Appendix A. 2: Proof of Lemma 2

Let \(\nu =\tau (1-\rho ) /\rho\). We represent *f*(*x*) in the interval \((0,\log (1/\epsilon ))\) as

Then, taking the integral from 0 to \(\log (1/\epsilon )\) yields

Using \(\alpha (1-\alpha /2)=\alpha -\alpha ^2/2 \le \log (1+\alpha ) \le \alpha\), for \(\alpha =\nu e^{-t} \ge 0\), on the numerator within the integral in (A29) and changing the integration variable *t* to \(z = \log (1/\rho )-\log (1+ \nu e^{-t})\), we obtain both an upper bound and a lower bound on the integral on the RHS of (A29)

where \(\underline{z} = -\log (\rho +\tau (1-\rho ))\) and \(\overline{z} = -\log (\rho +\epsilon \tau (1-\rho ))\). Replacing the integral in (A29) by the upper bound and lower bound from (A30), using the definition of the exponential integral, and simplifying, we obtain the upper-bound on \(F(\log (1/\epsilon ))\) given by \(\overline{F}_1(\log (1/\epsilon ))\) in (A3) and similarly the lower bound on \(F(\log (1/\epsilon ))\) given by \(\underline{F}_1(\log (1/\epsilon ))\) in (A5). Finally, we prove the second upper bound in (A4) as follows. Since \(E_1(\cdot )\) is monotonically decreasing and \(\frac{1}{\rho +\epsilon \tau (1-\rho )} \le \frac{1}{\rho }\), we have \(E_1(\log \frac{1}{\rho +\epsilon \tau (1-\rho )}) \ge E_1(\log \frac{1}{\rho })\), which implies \(\Delta E_1 (\log \frac{1}{\rho +\tau (1-\rho )} , \log \frac{1}{\rho +\epsilon \tau (1-\rho )}) \le \Delta E_1 (\log \frac{1}{\rho +\tau (1-\rho )} , \log \frac{1}{\rho })\). Combining this with the definition of \(\overline{F}_1(\log (1/\epsilon ))\) and \(\overline{F}_2(\log (1/\epsilon ))\) in (A3) and (A4), respectively, we conclude that \(\overline{F}_1(\log (1/\epsilon )) \le \overline{F}_2(\log (1/\epsilon ))\), thereby completes the proof of the lemma.

### Appendix B: Proof of Theorem 2

Let \(\tilde{\varvec{\delta }}^{(k)} = \varvec{Q}^{-1} \varvec{\delta }^{(k)}\) be the transformed error vector. Substituting \(\mathcal {T}(\varvec{\delta }^{(k)}) = \varvec{Q} \varvec{\Lambda }\varvec{Q}^{-1} (\varvec{\delta }^{(k)})\) into (2) and then left-multiplying both sides by \(\varvec{Q}^{-1}\), we obtain

where \(\tilde{\varvec{q}}(\tilde{\varvec{\delta }}^{(k)}) = \varvec{Q}^{-1} \varvec{q}(\varvec{Q} \tilde{\varvec{\delta }}^{(k)})\) satisfies \(\Vert \tilde{\varvec{q}}(\tilde{\varvec{\delta }}^{(k)})\Vert \le q \Vert \varvec{Q}^{-1}\Vert _2 \Vert \varvec{Q}\Vert _2^2 \Vert \tilde{\varvec{\delta }}^{(k)}\Vert ^2\). Taking the norm of both sides of (B1) and using the triangle inequality yield

Since \(\Vert \varvec{\Lambda }\Vert _2 = \rho (\mathcal {T})\), the last inequality can be rewritten compactly as

where \(\rho =\rho (\mathcal {T})\) and \(\tilde{q} = q \Vert \varvec{Q}^{-1}\Vert _2 \Vert \varvec{Q}\Vert _2^2\).

To analyze the convergence of \(\{\Vert \tilde{\varvec{\delta }}^{(k)}\Vert \}_{k=0}^\infty\), let us consider a surrogate sequence \(\{a_k\}_{k=0}^\infty \subset {\mathbb R}\) defined by \(a_{k+1} = \rho a_k + \tilde{q} a_k^2\) with \(a_0=\Vert \tilde{\varvec{\delta }}^{(0)}\Vert\). We show that \(\{a_k\}_{k=0}^\infty\) upper-bounds \(\{\Vert \tilde{\varvec{\delta }}^{(k)}\Vert \}_{k=0}^\infty\), i.e.,

The base case when \(k=0\) holds trivially as \(a_0=\Vert \tilde{\varvec{\delta }}^{(0)}\Vert\). In the induction step, given \(\Vert \tilde{\varvec{\delta }}^{(k)}\Vert \le a_k\) for some integer \(k \ge 0\), we have

By the principle of induction, (B3) holds for all \(k \in \mathbb {N}\). Assume for now that \(a_0 = \Vert \tilde{\varvec{\delta }}^{(0)}\Vert < (1-\rho )/\tilde{q}\), then applying Theorem 1 yields \(a_k \le \tilde{\epsilon } a_0\) for any \(\tilde{\epsilon }>0\) and integer \(k \ge {\log (1/\tilde{\epsilon })}/{\log (1/\rho )} + c(\rho ,\tau )\). Using (B3) and setting \(\tilde{\epsilon } = \epsilon /\kappa (\varvec{Q})\), we further have \(\Vert \tilde{\varvec{\delta }}^{(k)}\Vert \le a_k \le \tilde{\epsilon } a_0 = \epsilon \Vert \tilde{\varvec{\delta }}^{(0)}\Vert / \kappa (\varvec{Q})\) for all

Now, it remains to prove *(i)* the accuracy on the transformed error vector \(\Vert \tilde{\varvec{\delta }}^{(k)}\Vert \le \tilde{\epsilon } \Vert \tilde{\varvec{\delta }}^{(0)}\Vert\) is sufficient for the accuracy on the original error vector \(\Vert {\varvec{\delta }}^{(k)}\Vert \le \epsilon \Vert {\varvec{\delta }}^{(0)}\Vert\); and *(ii)* the initial condition \(\Vert {\varvec{\delta }}^{(0)}\Vert < (1-\rho )/(q \kappa (\varvec{Q})^2)\) is sufficient for \(\Vert \tilde{\varvec{\delta }}^{(0)}\Vert < (1-\rho )/\tilde{q}\). In order to prove *(i)*, using \(\Vert \tilde{\varvec{\delta }}^{(k)}\Vert \le \epsilon \Vert \tilde{\varvec{\delta }}^{(0)}\Vert /\kappa (\varvec{Q})\), we have

where the last inequality stems from \(\Vert \tilde{\varvec{\delta }}^{(0)}\Vert = \Vert \varvec{Q}^{-1} {\varvec{\delta }}^{(0)}\Vert \le \Vert \varvec{Q}^{-1}\Vert _2 \Vert \varvec{\delta }^{(0)}\Vert\). To prove *(ii)*, we use similar derivation as follows

Finally, the case that \(\varvec{T}\) is symmetric can be proven by the fact that \(\varvec{Q}\) is orthogonal, i.e., \(\varvec{Q}^{-1} = \varvec{Q}^T\) and \(\kappa (\varvec{Q})=1\). Substituting this back into (10) and using the orthogonal invariance property of norm, we obtain the simplified version in (11). This completes our proof of Theorem 2.

## Rights and permissions

## About this article

### Cite this article

Vu, T., Raich, R. A closed-form bound on the asymptotic linear convergence of iterative methods via fixed point analysis.
*Optim Lett* **17**, 643–656 (2023). https://doi.org/10.1007/s11590-022-01893-7

Received:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s11590-022-01893-7

### Keywords

- Non-linear difference equations
- Asymptotic linear convergence
- Convergence bounds
- Fixed-point iterations