“FISTA” in Banach spaces with adaptive discretisations

Chambolle, Antonin; Tovey, Robert

doi:10.1007/s10589-022-00418-3

“FISTA” in Banach spaces with adaptive discretisations

Published: 02 November 2022

Volume 83, pages 845–892, (2022)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

296 Accesses
1 Altmetric
Explore all metrics

Abstract

FISTA is a popular convex optimisation algorithm which is known to converge at an optimal rate whenever a minimiser is contained in a suitable Hilbert space. We propose a modified algorithm where each iteration is performed in a subset which is allowed to change at every iteration. Sufficient conditions are provided for guaranteed convergence, although at a reduced rate depending on the conditioning of the specific problem. These conditions have a natural interpretation when a minimiser exists in an underlying Banach space. Typical examples are L1-penalised reconstructions where we provide detailed theoretical and numerical analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hanke–Raus rule for Landweber iteration in Banach spaces

Article 18 December 2023

Regularization of inverse problems by two-point gradient methods in Banach spaces

Article 29 July 2019

Bilevel Parameter Learning for Higher-Order Total Variation Regularisation Models

Article Open access 01 June 2016

Data Availibility Statement

The synthetic STORM dataset was provided as part of the 2016 SMLM challenge, http://bigwww.epfl.ch/smlm/challenge2016/datasets/MT4.N2.HD/Data/data.html. The remaining examples used in this work can be generated with the supplementary code, https://github.com/robtovey/2020SpatiallyAdaptiveFISTA.

Notes

References

Alamo, T., Limon, D., Krupa, P.: Restart fista with global linear convergence. In: 2019 18th European Control Conference (ECC), pp. 1969–1974. IEEE (2019)
Aujol, J.F., Dossal, C.: Stability of over-relaxations for the forward-backward algorithm, application to fista. SIAM J. Optim. 25(4), 2408–2433 (2015)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Bonnefoy, A., Emiya, V., Ralaivola, L., Gribonval, R.: Dynamic screening: accelerating first-order algorithms for the lasso and group-lasso. IEEE Trans. Signal Process. 63(19), 5121–5132 (2015)
Article MathSciNet MATH Google Scholar
Boyd, N., Schiebinger, G., Recht, B.: The alternating descent conditional gradient method for sparse inverse problems. SIAM J. Optim. 27(2), 616–639 (2017)
Article MathSciNet MATH Google Scholar
Boyer, C., Chambolle, A., Castro, Y.D., Duval, V., De Gournay, F., Weiss, P.: On representer theorems and convex regularization. SIAM J. Optim. 29(2), 1260–1281 (2019)
Article MathSciNet MATH Google Scholar
Bredies, K., Pikkarainen, H.K.: Inverse problems in spaces of measures. ESAIM: Control, Optim. Calculus Var. 19(1), 190–218 (2013)
MathSciNet MATH Google Scholar
Castillo, I., Rockova, V.: Multiscale analysis of Bayesian CART. University of Chicago, Becker Friedman Institute for Economics Working Paper (2019-127) (2019)
Catala, P., Duval, V., Peyré, G.: A low-rank approach to off-the-grid sparse superresolution. SIAM J. Imag. Sci. 12(3), 1464–1500 (2019)
Article MathSciNet Google Scholar
Chambolle, A., Dossal, C.: On the convergence of the iterates of the “fast iterative shrinkage/thresholding algorithm’’. J. Optim. Theory Appl. 166(3), 968–982 (2015)
Article MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imag. Vision 40(1), 120–145 (2011)
De Castro, Y., Gamboa, F., Henrion, D., Lasserre, J.B.: Exact solutions to super resolution on semi-algebraic domains in higher dimensions. IEEE Trans. Inf. Theory 63(1), 621–630 (2016)
Article MathSciNet MATH Google Scholar
Denoyelle, Q., Duval, V., Peyré, G., Soubies, E.: The sliding frank-wolfe algorithm and its application to super-resolution microscopy. Inverse Probl. 36(1), 014001 (2019)
Article MathSciNet MATH Google Scholar
Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82(2), 421–439 (1956)
Article MathSciNet MATH Google Scholar
Duval, V., Peyré, G.: Sparse spikes super-resolution on thin grids i: the lasso. Inverse Probl. 33(5), 055008 (2017)
Article MATH Google Scholar
Duval, V., Peyré, G.: Sparse spikes super-resolution on thin grids ii: the continuous basis pursuit. Inverse Probl. 33(9), 095008 (2017)
Article MathSciNet MATH Google Scholar
Ekanadham, C., Tranchina, D., Simoncelli, E.P.: Recovery of sparse translation-invariant signals with continuous basis pursuit. IEEE Trans. Signal Process. 59(10), 4735–4744 (2011)
Article MathSciNet MATH Google Scholar
El Ghaoui, L., Viallon, V., Rabbani, T.: Safe feature elimination in sparse supervised learning. Tech. Rep. UCB/EECS-2010–126, EECS Department, University of California, Berkeley (2010)
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex analysis and minimization algorithms II: advanced theory and bundle methods, vol. 305. Springer-Verlag, Berlin, Heidelberg (1993)
Book MATH Google Scholar
Huang, J., Sun, M., Ma, J., Chi, Y.: Super-resolution image reconstruction for high-density three-dimensional single-molecule microscopy. IEEE Trans. Comput. Imag. 3(4), 763–773 (2017)
Article MathSciNet Google Scholar
Jiang, K., Sun, D., Toh, K.C.: An inexact accelerated proximal gradient method for large scale linearly constrained convex sdp. SIAM J. Optim. 22(3), 1042–1064 (2012)
Article MathSciNet MATH Google Scholar
Kekkonen, H., Lassas, M., Saksman, E., Siltanen, S.: Random tree besov priors–towards fractal imaging. http://arxiv.org/abs/2103.00574 (2021)
Liang, J., Fadili, J., Peyré, G.: Activity identification and local linear convergence of forward-backward-type methods. SIAM J. Optim. 27(1), 408–437 (2017)
Article MathSciNet MATH Google Scholar
Liang, J., Schönlieb, C.B.: Improving fista: Faster, smarter and greedier. http://arxiv.org/abs/1811.01430 (2018)
Ndiaye, E., Fercoq, O., Gramfort, A., Salmon, J.: Gap safe screening rules for sparsity enforcing penalties. J. Mach. Learn. Res. 18(1), 4671–4703 (2017)
MathSciNet MATH Google Scholar
Nesterov, Y.: Introductory lectures on convex optimization: a basic course. Kluwer Academic Publishers Boston, Dordrecht, London (2004)
Book MATH Google Scholar
Ovesnỳ, M., Křížek, P., Borkovec, J., Švindrych, Z., Hagen, G.M.: Thunderstorm: a comprehensive imageJ plug-in for palm and storm data analysis and super-resolution imaging. Bioinformatics 30(16), 2389–2390 (2014)
Article Google Scholar
Parpas, P.: A multilevel proximal gradient algorithm for a class of composite optimization problems. SIAM J. Scientif. Comput. 39(5), S681–S701 (2017)
Article MathSciNet MATH Google Scholar
Poon, C., Keriven, N., Peyré, G.: The geometry of off-the-grid compressed sensing. Found Comput Math (2021)
Rosset, S., Zhu, J.: Piecewise linear regularized solution paths. Annals Stat. 87, 1012–1030 (2007)
MathSciNet MATH Google Scholar
Sage, D., Kirshner, H., Pengo, T., Stuurman, N., Min, J., Manley, S., Unser, M.: Quantitative evaluation of software packages for single-molecule localization microscopy. Nature Methods 12(8), 717–724 (2015)
Article Google Scholar
Sage, D., Pham, T.A., Babcock, H., Lukes, T., Pengo, T., Chao, J., Velmurugan, R., Herbert, A., Agrawal, A., Colabrese, S., et al.: Super-resolution fight club: assessment of 2d and 3d single-molecule localization microscopy software. Nature Methods 16(5), 387–395 (2019)
Article Google Scholar
Schermelleh, L., Ferrand, A., Huser, T., Eggeling, C., Sauer, M., Biehlmaier, O., Drummen, G.P.: Super-resolution microscopy demystified. Nature Cell Biol. 21(1), 72–84 (2019)
Article Google Scholar
Schindelin, J., Arganda-Carreras, I., Frise, E., Kaynig, V., Longair, M., Pietzsch, T., Preibisch, S., Rueden, C., Saalfeld, S., Schmid, B., et al.: Fiji: an open-source platform for biological-image analysis. Nature Methods 9(7), 676–682 (2012)
Article Google Scholar
Schmidt, M., Roux, N.L., Bach, F.R.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Advances in Neural Information Processing Systems, pp. 1458–1466 (2011)
Strang, G.: Approximation in the finite element method. Numerische Mathematik 19(1), 81–98 (1972)
Article MathSciNet MATH Google Scholar
Tao, S., Boley, D., Zhang, S.: Local linear convergence of ista and fista on the lasso problem. SIAM J. Optim. 26(1), 313–336 (2016)
Article MathSciNet MATH Google Scholar
Unser, M., Fageot, J., Gupta, H.: Representer theorems for sparsity-promoting $\ell ^1$ regularization. IEEE Trans. Inf. Theory 62(9), 5167–5180 (2016)
Article MathSciNet MATH Google Scholar
Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward-backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2013)
Article MathSciNet MATH Google Scholar
Yu, J., Lai, R., Li, W., Osher, S.: A fast proximal gradient method and convergence analysis for dynamic mean field planning. http://arxiv.org/abs/2102.13260 (2021)

Download references

Acknowledgements

R.T. acknowledges funding from EPSRC grant EP/L016516/1 for the Cambridge Centre for Analysis, and the ANR CIPRESSI project grant ANR-19-CE48-0017-01 of the French Agence Nationale de la Recherche. Most of this work was done while A.C. was still in CMAP, CNRS and Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France. Both authors would like to thank the anonymous reviewers who put in so much effort to improving this work.

Author information

Authors and Affiliations

CEREMADE, CNRS & Université Paris Dauphine, PSL Research University, Paris, France
Antonin Chambolle
MOKAPLAN, INRIA Paris, Paris, France
Robert Tovey

Authors

Antonin Chambolle
View author publications
You can also search for this author in PubMed Google Scholar
Robert Tovey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Tovey.

Ethics declarations

Conflicts of interest

The authors have no conflicts of interest to declare which are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Proofs for FISTA convergence

This section contains all of the statements and proofs of the results contained in Sect. 4. Recall that the subsets $\mathbb {U}^n\subset \mathbb {H}$ satisfy (10).

1.1 A.1 Proofs for Step 3

Theorem 5

(Lemma 2) Let be chosen arbitrarily and / be generated by Algorithm 1 for all $n\in \mathbb {N}$. For all $n>0$, it holds that

(74)

Proof

Modifying [10, Thm 3.2], for $n\ge 1$ we apply Lemma 1 with and . By (10), is convex so . This gives

(75)

By the convexity of ${\text {E}}$, this reduces to

(76)

Multiplying through by $t_n^2$ gives the desired inequality. $\square $

Theorem 6

(Theorem 1) Fix a sequence of subsets $(\mathbb {U}^n)_{n\in \mathbb {N}}$ satisfying (10), arbitrary , and FISTA stepsize choice $(t_n)_{n\in \mathbb {N}}$. Let and be generated by Algorithm 1, then, for any choice of and $N\in \mathbb {N}$ we have

(77)

Proof

Theorem 6 is just a summation of (74) over all $n=1,\ldots ,N$. To see this: first add and subtract to each term on the left-hand side to convert ${\text {E}}$ to ${\text {E}}_0$, then move to the right-hand side. Now (74) becomes

(78)

Summing this inequality from $n=1$ to $n=N$ gives

(79)

The final step is to flip the roles of / in the final inner product term. Re-writing the right-hand side gives

(80)

Noting that , the previous two equations combine to prove the statement of Theorem 6. $\square $

The following lemma is used to produce a sharper estimate on sequences $t_n$.

Lemma 10

If $\rho _n=t_{n}^2-t_{n+1}^2+t_{n+1}\ge 0$, $t_n\ge 1$ for all $n\in \mathbb {N}$ then $t_n\le n-1 +t_1$.

Proof

This is trivially true for $n=1$. Suppose true for $n-1$, the condition on $\rho _{n-1}$ gives

$$\begin{aligned} t_n^2 -t_n \le t_{n-1}^2 \le (n-2+t_1)^2 = (n-1+t_1)^2 -2(n-1+t_1) + 1. \end{aligned}$$

(81)

Assuming the contradiction, if $t_n> n-1+t_1$ then the above equation simplifies to $n-1+t_1 < 1$. However, $t_1\ge 1$ implying that $n<1$ which completes the contradiction. $\square $

Lemma 11

(Lemma 3) Let , be generated by Algorithm 1 with $(\mathbb {U}^n)_{n\in \mathbb {N}}$ satisfying (10), $(n_k\in \mathbb {N})_{k\in \mathbb {N}}$ be a monotone increasing sequence, and choose

for each $k\in \mathbb {N}$. If such a sequence exists, then for all $K\in \mathbb {N}$, $n_K\le N<n_{K+1}$ we have

(82)

where .

Proof

This is just a telescoping of the right-hand side of (77) with the introduction of $n_k$ and simplification ,

(83)

By Lemma 10, $t_n\le n$ so we can further simplify

to get the required bound. $\square $

1.2 A.2 Proof for Step 4

Lemma 12

(Lemma 4) Suppose and $n_k$ satisfy the conditions of Lemma 3 and forms an

$(a_{{\text {U}}},a_{{\text {E}}})$-minimising sequence of ${\text {E}}$ with

If either:

$a_{{\text {U}}}>1$ and $n_k^2\lesssim a_{{\text {E}}}^ka_{{\text {U}}}^{2k}$,
or $a_{{\text {U}}}=1$, $\sum _{k=1}^\infty n_k^2a_{{\text {E}}}^{-k}<\infty $, and

then

Proof

Starting from Lemma 11 we have

(84)

(85)

The inductive step now depends on the value of $a_{{\text {U}}}$.

Case $a_{{\text {U}}}>1$: We simplify the inequality
(86)
(87)
for some $C_1>C$. Choose such that
$$\begin{aligned} \frac{1}{2}C_2^2 \ge \frac{C_1}{a_{{\text {U}}}^2-1}(C_2+a_{{\text {U}}}^2). \end{aligned}$$
(88)
Assume for $1\le k\le K$ (trivially true for $K=1$), then for $N=n_{K+1}-1$ we have
(89)
(90)
(91)
Case $a_{{\text {U}}}=1$: Denote and note that . We therefore bound
(92)
(93)
for some $C_1>0$. Choose such that
(94)
Assume for $1\le k\le K$ (trivially true for $K=1$), then for $N=n_{K+1}-1$ we have
(95)

In both cases, the induction on holds for all K, and we have for all $N<n_K-1$. $\square $

1.3 A.3 Proof for Step 5

Lemma 13

(Lemma 5) Suppose and $n_k$ are sequences satisfying

then

Proof

The proof is direct computation, note that

$$\begin{aligned} (a_{{\text {E}}}a_{{\text {U}}}^2)^\kappa = \exp \left( \kappa \log (a_{{\text {E}}}a_{{\text {U}}}^2)\right) = \exp (\log a_{{\text {U}}}^2) = a_{{\text {U}}}^2, \end{aligned}$$

(96)

therefore

$$\begin{aligned} a_{{\text {U}}}^{2K} = \left( (a_{{\text {E}}}a_{{\text {U}}}^2)^K\right) ^\kappa \lesssim n_K^{2\kappa } \le N^{2\kappa }, \end{aligned}$$

(97)

so as required. $\square $

1.4 A.4 Proofs for Step 6

Theorem 7

(Theorem 3) Let $(\mathbb {U}^n\subset \mathbb {H})_{n\in \mathbb {N}}$ be a sequence of subsets satisfying (10), compute and by Algorithm 1. Suppose that there exists a monotone increasing sequence $n_k\in \mathbb {N}$ such that

for all $k\in \mathbb {N}$.

If is an $(a_{{\text {U}}},a_{{\text {E}}})$-minimising sequence of ${\text {E}}$ with $a_{{\text {U}}}>1$ and $n_k^2\lesssim a_{{\text {E}}}^ka_{{\text {U}}}^{2k}$, then

uniformly for $N\in \mathbb {N}$.

Proof

Let $C>0$ satisfy $n_k^2\le Ca_{{\text {E}}}^ka_{{\text {U}}}^{2k}$ for each $k\in \mathbb {N}$. Fix $N>C$ and choose k such that $Ca_{{\text {E}}}^{k-1}a_{{\text {U}}}^{2k-2}\le N<Ca_{{\text {E}}}^ka_{{\text {U}}}^{2k}$. By construction, and using the equality from (96), we have

(98)

as required. $\square $

Lemma 14

(Lemma 6) Let be a sequence in $\mathbb {H}$ with . Suppose and denote . Any of the following conditions are sufficient to show that is an $(a_{{\text {U}}},a_{{\text {E}}})$-minimising sequence of ${\text {E}}$:

1.
Small continuous gap refinement: for all $k\in \mathbb {N}$, some .
2.
Small discrete gap refinement: and for all $k>0$, some .

Otherwise, suppose there exists a Banach space which contains each $\widetilde{\mathbb {U}}^k$, , and the sublevel sets of ${\text {E}}$ are -bounded. With the subdifferential $\partial {\text {E}}:\mathbb {U}\rightrightarrows \mathbb {U}^*$, it is also sufficient if either:

3.
Small continuous gradient refinement: for all $k\in \mathbb {N}$, some .
4.
Small discrete gradient refinement: and for all $k\in \mathbb {N}$, some , where .

Proof

The conditions for $a_{{\text {U}}}$ in Definition 1 are already met, it remains to be shown that for some fixed $C>0$. For cases (3) and (4), fix $R>0$ such that both and the sublevel set are contained in the ball of radius R. Any minimising sequences of ${\text {E}}$ in $\mathbb {U}$ or $\widetilde{\mathbb {U}}^k$ are contained in this ball. We can therefore compute C in each case:

(1)
, so suffices.
(2)
, so $C=(a_{{\text {E}}}+1)\beta $ suffices.
(3)
for any with . Maximising over gives
(4)
for any with , so and .

This completes the requirements of Definition 1. $\square $

B Proof of Theorem 4

First we recall the setting of Definition 2, fix: $p\ge 0$, $q\in [1,\infty ]$, $h\in (0,1)$, $N\in \mathbb {N}$, connected and bounded domain $\varOmega \subset \mathbb {R}^d$, and . We assume that $\mathbb {H}=L^2(\varOmega )$, , and there exist spaces $(\widetilde{\mathbb {U}}^k)_{k\in \mathbb {N}}$ with $\widetilde{\mathbb {U}}^k\subset \mathbb {U}$ containing a sequence such that , c.f. (16). Furthermore, there exists constant $c_\alpha >0$ and meshes $\mathbb {M}^k$ such that:

(99)

(100)

In this section, these assumptions will be summarised simply by saying that $\mathbb {H}$ and $(\widetilde{\mathbb {U}}^k)_{k\in \mathbb {N}}$ satisfy Definition 2. We prove Theorem 4 as a consequence of Lemma 7, namely we compute exponents $p',q'$ with $a_{{\text {U}}}=h^{-q'}$ and $a_{{\text {E}}}=h^{-p'}$. These values are computed as the result of the following three lemmas. The first, Lemma 15, is a quantification of the equivalence between $L^q$ and $L^2$ norms on general sub-spaces. Lemma 16 applies this result to finite-element spaces to compute the value of $q'$. Finally, Lemma 17 then performs the computations for $p'$ depending on the smoothness properties of ${\text {E}}$.

Lemma 15

(Equivalence of norms for fixed k) Suppose $\mathbb {H}= L^2(\varOmega )$ for some connected, bounded domain $\varOmega \subset \mathbb {R}^d$ and for some $q\in [1,\infty ]$, $C>0$. For any linear subspace $\widetilde{\mathbb {U}}\subset \mathbb {U}$ and ,

(101)

Proof

The first statement of the result is by definition, for each we have

Recall . To go further we use Hölder’s inequality. If $\frac{1}{q}+\frac{1}{q^*}=1$, then for any

(102)

If $q\ge 2$ we use Hölder’s inequality a second time:

(103)

This confirms the inequality when $q\ge 2$. If $q<2$, we can simply upper bound ${\left\Vert \cdot \right\Vert }_{q^*}\le |\varOmega |^{\frac{1}{q^*}}{\left\Vert \cdot \right\Vert }_\infty $ as required. $\square $

Lemma 16

Suppose $\mathbb {H}$ and $(\widetilde{\mathbb {U}}^k)_{k\in \mathbb {N}}$ satisfy Definition 2, then

1.
If $q\ge 2$, then (i.e. $q'=0$).
2.
If $q<2$ and , then (i.e. $q'=-\frac{d}{2}$).

Proof

Most of the conditions of Lemma 15 are already satisfied. Furthermore observe that . For the $q\ge 2$ case, this is already sufficient to conclude from Lemma 15, as required.

For the case $q<2$, from Lemma 15 recall that we are required to bound

(104)

However, due to the decomposition property (100), for each $\omega \in \mathbb {M}^k$ and there exists such that

(105)

Combining these two equations with the assumed bound on confirms as required. $\square $

Lemma 17

Suppose $\mathbb {H}$ and $(\widetilde{\mathbb {U}}^k)_{k\in \mathbb {N}}$ satisfy Definition 2 and is the minimiser of E such that .

1.
If ${\text {E}}$ is -Lipschitz at , then (i.e. $p'=p$).
2.
If $\nabla {\text {E}}$ is -Lipschitz at , then (i.e. $p'=2p$).

Proof

Both statements are direct by definition, observe

(106)

(107)

The proof is concluded by using the approximation bounds of in Definition 2. $\square $

Operator norms for numerical examples

Theorem 8

Suppose $\textsf{A}:\mathbb {H} \rightarrow \mathbb {R}^m$ has kernels $\psi _j\in L^\infty ([0,1]^d)$ for $j\in [m]$.

Case 1:
If $\psi _j(\vec {x}) = {\left\{ \begin{array}{ll} 1 &{} \vec {x}\in \mathbb {X}_j \\ 0 &{} \text { else}\end{array}\right. }$ for some collection $\mathbb {X}_j\subset \varOmega $ such that $\mathbb {X}_i\cap \mathbb {X}_j = \emptyset $ for all $i\ne j$, then ${\left\Vert \textsf{A} \right\Vert }_{L^2\rightarrow \ell ^2} = \max _{j\in [m]} \sqrt{|\mathbb {X}_j|}.$
Case 2:
If for some frequencies with , then
$$\begin{aligned} {\left\Vert \textsf{A} \right\Vert }_{L^2\rightarrow \ell ^2} \le \sqrt{m}, \qquad |\textsf{A}^*\vec {r}|_{C^k}\le m^{1-\frac{1}{q}}A^k{\left\Vert \vec {r} \right\Vert }_q, \quad \text {and}\quad |\textsf{A}^*|_{\ell ^2\rightarrow C^k}\le \sqrt{m}A^k \end{aligned}$$
for all $\vec {r}\in \mathbb {R}^m$ and $q\in [1,\infty ]$.
Case 3:
Suppose $\psi _j(\vec {x}) = (2\pi \sigma ^2)^{-\frac{d}{2}}\exp \left( -\frac{|\vec {x}-\vec {x}_j|^2}{2\sigma ^2}\right) $ for some regular mesh $\vec {x}_j\in [0,1]^d$ and separation $\varDelta $. i.e.
$$\begin{aligned} \{\vec {x}_j{{\,\mathrm{\;s.t.\;}\,}}j\in [m]\} = \{\vec {x}_0 + (j_1\varDelta ,\ldots ,j_d\varDelta ){{\,\mathrm{\;s.t.\;}\,}}j_i\in [\widehat{m}]\} \end{aligned}$$
for some $\vec {x}_0\in \mathbb {R}^d$, $\widehat{m}{:}{=}\root d \of {m}$. For all $\frac{1}{q} + \frac{1}{q^*} = 1$, $q\in (1,\infty ]$, we have
$$\begin{aligned} {\left\Vert \textsf{A} \right\Vert }_{L^2\rightarrow \ell ^2}&\le \bigg ((4\pi \sigma ^2)^{-\frac{1}{2}}\sum _{j=-2\widehat{m},\ldots ,2\widehat{m}}\exp (-\tfrac{\varDelta ^2}{4\sigma ^2}j^2)\bigg )^d, \end{aligned}$$
(108)
$$\begin{aligned} |\textsf{A}^*\vec {r}|_{C^0}&\le (2\pi \sigma ^2)^{-\frac{d}{2}}\bigg (\sum _{\vec j\in J}\exp \left( -\tfrac{q^*\varDelta ^2}{2\sigma ^2}\max (0,|\vec j|-\delta )^2\right) \bigg )^{\frac{1}{q^*}}{\left\Vert \vec {r} \right\Vert }_q, \end{aligned}$$
(109)
$$\begin{aligned} |\textsf{A}^*\vec {r}|_{C^1}&\le \frac{(2\pi \sigma ^2)^{-\frac{d}{2}}}{\sigma }\frac{\varDelta }{\sigma }\bigg (\sum _{\vec j\in J}(|\vec j|+\delta )^{q^*}\exp \left( -\tfrac{q^*\varDelta ^2}{2\sigma ^2}\max (0,|\vec j|-\delta )^2\right) \bigg )^{\frac{1}{q^*}}{\left\Vert \vec {r} \right\Vert }_q, \end{aligned}$$
(110)
$$\begin{aligned} |\textsf{A}^*\vec {r}|_{C^2}&\le \frac{(2\pi \sigma ^2)^{-\frac{d}{2}}}{\sigma ^2} \bigg (\sum _{\vec j\in J} \left( 1+\tfrac{\varDelta ^2}{\sigma ^2}(|\vec j|+\delta )^2\right) ^{q^*}\nonumber \\&\qquad \exp \left( -\tfrac{q^*\varDelta ^2}{2\sigma ^2}\max (0,|\vec j|-\delta )^2\right) \bigg )^{\frac{1}{q^*}} {\left\Vert \vec {r} \right\Vert }_q, \end{aligned}$$
(111)
where $\delta = \frac{\sqrt{d}}{2}$ and $J=\{\vec j\in \mathbb {Z}^d {{\,\mathrm{\;s.t.\;}\,}}{\left\Vert \vec j \right\Vert }_{\ell ^\infty }\le 2\widehat{m}\}$. The case for $q=1$ can be inferred from the standard limit of ${\left\Vert \cdot \right\Vert }_{{q^*}}\rightarrow {\left\Vert \cdot \right\Vert }_{\infty }$ for $q^*\rightarrow \infty $.

Proof

(Case 1.) From Lemma 8 we have

$$\begin{aligned} (\textsf{A}\textsf{A}^*)_{i,j} = \left\langle \mathbbm {1}_{\mathbb {X}_i},\mathbbm {1}_{\mathbb {X}_j} \right\rangle = |\mathbb {X}_i\cap \mathbb {X}_j| = {\left\{ \begin{array}{ll} |\mathbb {X}_i| &{} i=j \\ 0 &{} i\ne j\end{array}\right. }. \end{aligned}$$

(112)

Therefore, $\textsf{A}\textsf{A}^*$ is a diagonal matrix and ${\left\Vert \textsf{A}\textsf{A}^* \right\Vert }_{\ell ^2\rightarrow \ell ^2} = \max _{j\in [m]} |\mathbb {X}_j|$ completes the result. $\square $

Proof

(Case 2.) $\psi _j$ are not necessarily orthogonal however $|\left\langle \psi _i,\psi _j \right\rangle |\le 1$ therefore we can estimate

$$\begin{aligned} {\left\Vert \textsf{A}\textsf{A}^* \right\Vert }_{\ell ^2\rightarrow \ell ^2} \le {\left\Vert \textsf{A}\textsf{A}^* \right\Vert }_{\ell ^\infty \rightarrow \ell ^\infty } \le m. \end{aligned}$$

(113)

Now looking to apply Lemma 9, note ${\left\Vert \nabla ^k\psi _j \right\Vert }_\infty \le A^k$, therefore

$$\begin{aligned}{} & {} |\textsf{A}^*\vec {r}|_{C^k}\le A^k m^{\frac{1}{q^*}}{\left\Vert \vec {r} \right\Vert }_q = A^k m^{1-\frac{1}{q}}{\left\Vert \vec {r} \right\Vert }_q \quad \text {and}\quad |\textsf{A}^*|_{\ell ^2\rightarrow C^k} \nonumber \\{} & {} \qquad \le A^k \min _{q\in [1,\infty ]}m^{1-\frac{1}{q}}\sqrt{m}^{\max (0,2-q)} = \sqrt{m}A^k. \end{aligned}$$

(114)

$\square $

Proof

(Case 3.) In the Gaussian case, we build our approximations around the idea that sums of Gaussians should converge very quickly. The first example can be used to approximate the operator norm. Computing the inner products gives

$$\begin{aligned} \left\langle \psi _i,\psi _j \right\rangle= & {} (2\pi \sigma ^2)^{-d}\int _{[0,1]^d} \exp \left( -\tfrac{|\vec {x}-\vec {x}_i|^2}{2\sigma ^2}-\tfrac{|\vec {x}-\vec {x}_j|^2}{2\sigma ^2}\right) \mathop {}\!\textrm{d}\vec {x}\nonumber \\\le & {} (2\pi \sigma ^2)^{-d}(\pi \sigma ^2)^{\frac{d}{2}}\exp \left( -\tfrac{|\vec {x}_i-\vec {x}_j|^2}{4\sigma ^2}\right) . \end{aligned}$$

(115)

Estimating the operator norm,

$$\begin{aligned} {\left\Vert \textsf{A}\textsf{A}^* \right\Vert }_{\ell ^2\rightarrow \ell ^2}&\le {\left\Vert \textsf{A}\textsf{A}^* \right\Vert }_{\ell ^\infty \rightarrow \ell ^\infty } = \max _{i\in [m]} \sum _{j=1}^m |\left\langle \psi _i,\psi _j \right\rangle | \end{aligned}$$

(116)

$$\begin{aligned}&= \max _{i\in [m]}(4\pi \sigma ^2)^{-\frac{d}{2}}\sum _{j_1,\ldots ,j_d\in [\widehat{m}]} \exp \left( -\frac{(j_1\varDelta -i_1\varDelta )^2+\ldots +(j_d\varDelta -i_d\varDelta )^2}{4\sigma ^2}\right) \end{aligned}$$

(117)

$$\begin{aligned}&\le (4\pi \sigma ^2)^{-\frac{d}{2}}\sum _{\vec j\in \mathbb {Z}^d\cap [-\widehat{m},\widehat{m}]^d} \exp \left( -\frac{(j_1\varDelta )^2+\ldots +(j_d\varDelta )^2}{4\sigma ^2}\right) \nonumber \\&=\qquad \left[ (4\pi \sigma ^2)^{-\frac{1}{2}}\sum _{j=-\widehat{m}}^{\widehat{m}} \exp \left( -\frac{\varDelta ^2j^2}{4\sigma ^2}\right) \right] ^d. \end{aligned}$$

(118)

This is a nice approximation because it factorises simply over dimensions. Applying the results from Lemma 9, note

$$\begin{aligned}{} & {} |\psi _j(\vec {x})| \displaystyle = \left| \psi _j(\vec {x})\right| \displaystyle = (2\pi \sigma ^2)^{-\frac{d}{2}}\exp \left( -\frac{|\vec {x}-\vec {x}_j|^2}{2\sigma ^2}\right) ,\\{} & {} |\nabla \psi _j(\vec {x})| \displaystyle = \left| \frac{\vec {x}-\vec {x}_j}{\sigma ^2}\psi _j(\vec {x})\right| \displaystyle = \frac{(2\pi \sigma ^2)^{-\frac{d}{2}}}{\sigma }\frac{|\vec {x}-\vec {x}_j|}{\sigma }\exp \left( -\frac{|\vec {x}-\vec {x}_j|^2}{2\sigma ^2}\right) ,\\{} & {} |\nabla ^2\psi _j(\vec {x})| \displaystyle =\left| \frac{1}{\sigma ^2} + \frac{(\vec {x}-\vec {x}_j)(\vec {x}-\vec {x}_j)^\top }{\sigma ^4}\right| \psi _j(\vec {x}) \displaystyle \\{} & {} \qquad = \frac{(2\pi \sigma ^2)^{-\frac{d}{2}}}{\sigma ^2}\left( 1+\frac{|\vec {x}-\vec {x}_j|^2}{\sigma ^2}\right) \exp \left( -\frac{|\vec {x}-\vec {x}_j|^2}{2\sigma ^2}\right) . \end{aligned}$$

We now wish to sum over $j=1,\ldots ,m$ and produce an upper bound on these, independent of t. To do so we will use the following lemma.

Lemma 18

Suppose $q>0$. If the polynomial $p(|\vec {x}|) = \sum p_k|\vec {x}|^k$ has non-negative coefficients and $\vec {x}\in [-m,m]^d$, then

$$\begin{aligned} \sum _{{\left\Vert \vec {j} \right\Vert }_{\ell ^\infty }\le m} p(|\vec {j}-\vec {x}|)\exp \left( -\tfrac{q|\vec {j}-\vec {x}|^2}{2}\right) \le \sum _{{\left\Vert \vec {j} \right\Vert }_{\ell ^\infty }\le 2m}p(|\vec {j}|+\delta )\exp \left( -\frac{q\max (0,|\vec {j}|-\delta )^2}{2}\right) \end{aligned}$$

where $\delta {:}{=}\frac{\sqrt{d}}{2}$ and $\vec {j}\in \mathbb {Z}^d$.

Proof

There exists $\widehat{\vec {x}}\in [-\tfrac{1}{2},\tfrac{1}{2}]^d$ such that $\vec {x} + \widehat{\vec {x}}\in \mathbb {Z}^d$, therefore

$$\begin{aligned} \sum _{{\left\Vert \vec {j} \right\Vert }_{\ell ^\infty }\le m} p(|\vec {j}-\vec {x}|)\exp \left( -\tfrac{q|\vec {j}-\vec {x}|^2}{2}\right)&= \sum _{{\left\Vert \vec {j} \right\Vert }_{\ell ^\infty }\le m} p(|\vec {j}-(\vec {x}+\widehat{\vec {x}})+\widehat{\vec {x}}|)\exp \left( -\tfrac{q|\vec {j}-(\vec {x}+\widehat{\vec {x}})+\widehat{\vec {x}}|^2}{2}\right) \\&\le \sum _{{\left\Vert \vec {j} \right\Vert }_{\ell ^\infty }\le 2m} p(|\vec {j}+\widehat{\vec {x}}|)\exp \left( -\tfrac{q|\vec {j}+\widehat{\vec {x}}|^2}{2}\right) \\&\le \sum _{\begin{array}{c} \vec {j}\in \mathbb {Z}^d\\ {\left\Vert \vec {j} \right\Vert }_{\ell ^\infty }\le 2m \end{array}} p(|\vec {j}|+\delta )\exp \left( -\tfrac{q\max (0,|\vec {j}|-\delta )^2}{2}\right) \end{aligned}$$

as $|\widehat{\vec {x}}|\le \delta $ and p has non-negative coefficients. $\square $

Now, continuing the proof of Theorem 8, for $\widehat{m}=\root d \of {m}$, $\delta =\frac{\sqrt{d}}{2}$ and $J=\{\vec {j}\in \mathbb {Z}^d {{\,\mathrm{\;s.t.\;}\,}}{\left\Vert \vec {j} \right\Vert }_{\ell ^\infty }\le 2\widehat{m}\}$, Lemma 18 bounds

$$\begin{aligned} \sum _{j=1}^m |\psi _j(\vec {x})|^{q^*}&\le (2\pi \sigma ^2)^{-\frac{dq^*}{2}} \left[ \sum _{\vec {j}\in J} \exp \left( -\frac{q^*\varDelta ^2}{2\sigma ^2}\max (0,|\vec {j}|-\delta )^2\right) \right] \\ \sum _{j=1}^m |\nabla \psi _j(\vec {x})|^{q^*}&\le \frac{(2\pi \sigma ^2)^{-\frac{dq^*}{2}}}{\sigma ^{q^*}}\frac{\varDelta ^{q^*}}{\sigma ^{q^*}} \left[ \sum _{\vec {j}\in J} (|\vec {j}|+\delta )^{q^*}\exp \left( -\frac{q^*\varDelta ^2}{2\sigma ^2}\max (0,|\vec {j}|-\delta )^2\right) \right] \\ \sum _{j=1}^m |\nabla ^2\psi _j(\vec {x})|^{q^*}&\le \frac{(2\pi \sigma ^2)^{-\frac{dq^*}{2}}}{\sigma ^{2q^*}} \left[ \sum _{\vec {j}\in J} \left( 1+\frac{\varDelta ^2}{\sigma ^2}(|\vec {j}|+\delta )^2\right) ^{q^*}\right. \\&\qquad \left. \exp \left( -\frac{q^*\varDelta ^2}{2\sigma ^2}\max (0,|\vec {j}|-\delta )^2\right) \right] \end{aligned}$$

for all $\vec {x}\in \varOmega $. In a worst case, this is $O(2^dm)$ time complexity however the summands all decay faster than exponentially and so should converge very quickly. $\square $

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chambolle, A., Tovey, R. “FISTA” in Banach spaces with adaptive discretisations. Comput Optim Appl 83, 845–892 (2022). https://doi.org/10.1007/s10589-022-00418-3

Download citation

Received: 09 March 2021
Accepted: 23 September 2022
Published: 02 November 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10589-022-00418-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

“FISTA” in Banach spaces with adaptive discretisations

Abstract

Access this article

Similar content being viewed by others

Hanke–Raus rule for Landweber iteration in Banach spaces

Regularization of inverse problems by two-point gradient methods in Banach spaces

Bilevel Parameter Learning for Higher-Order Total Variation Regularisation Models

Data Availibility Statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Appendices

A Proofs for FISTA convergence

1.1 A.1 Proofs for Step 3

Theorem 5

Proof

Theorem 6

Proof

Lemma 10

Proof

Lemma 11

Proof

1.2 A.2 Proof for Step 4

Lemma 12

Proof

1.3 A.3 Proof for Step 5

Lemma 13

Proof

1.4 A.4 Proofs for Step 6

Theorem 7

Proof

Lemma 14

Proof

B Proof of Theorem 4

Lemma 15

Proof

Lemma 16

Proof

Lemma 17

Proof

Operator norms for numerical examples

Theorem 8

Proof

Proof

Proof

Lemma 18

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation