Abstract
A precise characterization of the extremal points of sublevel sets of nonsmooth penalties provides both detailed information about minimizers, and optimality conditions in general classes of minimization problems involving them. Moreover, it enables the application of fully corrective generalized conditional gradient methods for their efficient solution. In this manuscript, this program is adapted to the minimization of a smooth convex fidelity term which is augmented with an unbalanced transport regularization term given in the form of a generalized Kantorovich–Rubinstein norm for Radon measures. More precisely, we show that the extremal points associated to the latter are given by all Dirac delta functionals supported in the spatial domain as well as certain dipoles, i.e., pairs of Diracs with the same mass but with different signs. Subsequently, this characterization is used to derive precise first-order optimality conditions as well as an efficient solution algorithm for which linear convergence is proved under natural assumptions. This behavior is also reflected in numerical examples for a model problem.
Similar content being viewed by others
References
F. Angrisani, G. Ascione, L. D’Onofrio, and G. Manzo, Duality and distance formulas in Lipschitz-Hölder spaces, Atti Accad. Naz. Lincei Rend. Lincei Mat. Appl. 31 (2020), no. 2, 401–419.
F. Angrisani, G. Ascione, and G. Manzo, Atomic decomposition of finite signed measures on compacts of \(\mathbb{R}^n\), Ann. Fenn. Math. 46 (2021), no. 2, 643–654.
N. Boyd, G. Schiebinger, and B. Recht, The alternating descent conditional gradient method for sparse inverse problems, SIAM J. Optim. 27 (2017), no. 2, 616–639.
C. Boyer, A. Chambolle, Y. De Castro, V. Duval, F. De Gournay, and P. Weiss, On representer theorems and convex regularization, SIAM J. Optim. 29 (2019), no. 2, 1260–1281.
K. Bredies and M. Carioni, Sparsity of solutions for variational inverse problems with finite-dimensional data, Calc. Var. Partial Differential Equations 59 (2020), no. 1, 1–26.
K. Bredies, M. Carioni, S. Fanzon, and F. Romero, On the extremal points of the ball of the Benamou–Brenier energy, Bull. Lond. Math. Soc. 53 (2021), no. 5, 1436–1452.
K. Bredies, M. Carioni, S. Fanzon, and F. Romero, A generalized conditional gradient method for dynamic inverse problems with optimal transport regularization, Found. Comput. Math. 23 (2023), no. 3, 833–898.
K. Bredies, M. Carioni, S. Fanzon, and D. Walter, Asymptotic linear convergence of fully-corrective generalized conditional gradient methods, Math. Program. (2023), https://doi.org/10.1007/s10107-023-01975-z.
K. Bredies and H. K. Pikkarainen, Inverse problems in spaces of measures, ESAIM Control Optim. Calc. Var. 19 (2013), no. 1, 190–218.
H. Brezis, Functional analysis, Sobolev spaces and partial differential equations, Universitext, New York, NY: Springer, 2011.
E. J. Candès and C. Fernandez-Granda, Towards a mathematical theory of super-resolution, Comm. Pure Appl. Math. 67 (2014), no. 6, 906–956.
E. Casas, C. Clason, and K. Kunisch, Parabolic control problems in measure spaces with sparse solutions, SIAM J. Control Optim. 51 (2013), no. 1, 28–63.
J. C. Dunn, Convergence rates for conditional gradient sequences generated by implicit step length rules, SIAM J. Control Optim. 18 (1980), no. 5, 473–487.
J. C. Dunn and S. Harshbarger, Conditional gradient algorithms with open loop step size rules, J. Math. Anal. Appl. 62 (1978), no. 2, 432–444.
V. Duval and G. Peyré, Exact support recovery for sparse spikes deconvolution, Found. Comput. Math. 15 (2015), no. 5, 1315–1355.
V. Duval and R. Tovey, Dynamical programming for off-the-grid dynamic inverse problems, Preprint arXiv:2112.11378 [math.OC], 2021.
I. Ekeland and R. Témam, Convex analysis and variational problems., Classics in Applied Mathematics, vol. 28, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1999.
M. Frank and P. Wolfe, An algorithm for quadratic programming, Naval Research Logistics Quarterly 3 (1956), no. 1-2, 95–110.
L. G. Hanin, Kantorovich-Rubinstein norm and its application in the theory of Lipschitz spaces, Proc. Amer. Math. Soc. 115 (1992), no. 2, 345–352.
J. A. Iglesias and D. Walter, Extremal points of total generalized variation balls in 1D: characterization and applications, J. Convex Anal. 29 (2022), no. 4, 1251–1290.
L. V. Kantorovich and G. P. Akilov, Functional analysis, Second ed., Pergamon Press, Oxford-Elmsford, N.Y., 1982.
P.-J. Laurent, Approximation et optimisation, Collection Enseignement des Sciences, No. 13, Hermann, Paris, 1972.
J. Lellmann, D. A. Lorenz, C. Schönlieb, and T. Valkonen, Imaging with Kantorovich-Rubinstein discrepancy, SIAM J. Imaging Sci. 7 (2014), no. 4, 2833–2859.
L. Métivier, R. Brossier, Q. Mérigot, E. Oudet, and J. Virieux, An optimal transport approach for seismic tomography: application to 3D full waveform inversion, Inverse Problems 32 (2016), no. 11, 115008, 36 pp.
P. Pegon, F. Santambrogio, and D. Piazzoli, Full characterization of optimal transport plans for concave costs, Discrete Contin. Dyn. Syst. 35 (2015), no. 12, 6113–6132.
F. Santambrogio, Optimal transport for applied mathematicians, Progress in Nonlinear Differential Equations and their Applications, vol. 87, Birkhäuser/Springer, Cham, 2015.
T. Strömberg, The operation of infimal convolution, Dissertationes Math. (Rozprawy Mat.) 352 (1996), 58 pp.
M. Unser, J. Fageot, and J. P. Ward, Splines are universal solutions of linear inverse problems with generalized TV regularization, SIAM Rev. 59 (2017), no. 4, 769–793.
D. J. Wales and J. P. K. Doye, Global optimization by basin-hopping and the lowest energy structures of lennard-jones clusters containing up to 110 atoms, J. Phys. Chem. A 101 (1997), no. 28, 5111–5116.
N. Weaver, Lipschitz algebras, World Scientific Publishing Co. Pte. Ltd., Hackensack, NJ, 2018.
Y. Yu, X. Zhang, and D. Schuurmans, Generalized conditional gradient for sparse estimation, J. Mach. Learn. Res. 18 (2017), Paper No. 144, 46 pp.
C. Zălinescu, Convex analysis in general vector spaces, World Scientific Publishing Co., Inc., River Edge, NJ, 2002.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Martin Burger.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Proofs for Sect. 3.3.2
Appendix A: Proofs for Sect. 3.3.2
In this section, we collect the necessary auxiliary results for the proof of Theorem 3.15 by applying the results of [8]. For this purpose, we keep using the notation \(B:=\{\mu \,|\, \Vert \mu \Vert _{{\text {KR}}_p^{\alpha ,\beta }}\le 1\}\) and further introduce \(\mathcal {B}:=\overline{{\text {Ext}}(B)}^*\). Since the predual space \(\mathcal {C}(\Omega )\) is separable, \(\mathcal {B}\) is weak* compact and there exists a metric \(d_\mathcal {B}\) which metrizes the weak* topology on \(\mathcal {B}\), see [10, Theorem 3.29].
Lemma A.1
We have
Proof
By the characterization of \({\text {Ext}}(B)\), we first observe that
Now, let \(\mu _k=(\sigma _k/\alpha ) \delta _{z_k}\), \(\sigma _k\in \{-1,1\}\), \(z_k\in \Omega \), \(k \in \mathbb {N}\), denote a weak* convergent sequence with limit \(\bar{\mu }\). Then, due to the compactness of \(\Omega \), there exists a subsequence, denoted by the same symbol, with
Setting \(\tilde{\mu }=(\bar{\sigma }/\alpha )\delta _{\bar{z}}\), the associated sequence of measures satisfies
Since weak* limits are unique, \(\bar{\mu }={\tilde{\mu }}\) follows.
Similarly, we see that any weak* convergent sequence \(\mu _k=\mathcal {D}_\beta (x_k,y_k)\) with
necessarily satisfies for some \((\bar{x},\bar{y})\in \Omega \times \Omega \) with \(0\le |\bar{x}-\bar{y}|^p\le 2\alpha -\beta \). This finishes the proof. \(\square \)
In order to apply the abstract convergence result of [8], we have to check some structural assumptions. First, we show that, due to Assumption \((\textbf{B2})\), the linear problem
admits finitely many maximizers and all of them are extremal points.
Lemma A.2
Let Assumption \((\textbf{B2})\) hold. Then, we have
Proof
Define
By assumption, D is nonempty and there holds \(\langle \bar{q}, \mu \rangle =1\) for all \(\mu \in D\). Moreover, since \(\bar{q}\) is the unique dual variable for Problem (\(\mathcal {P}\)) and \( \Vert \cdot \Vert _{{\text {KR}}_p^{\alpha , \beta }}\) is positively one-homogeneous, we conclude
The inverse inclusion follows immediately from Assumption \((\textbf{B2})\) which gives
as well as noting that
\(\square \)
For abbreviation, set
Second, we have to show the existence of \(d_{\mathcal {B}}\)-neighborhoods \(U^1_i\) of \(\bar{\mu }^1_i\) and \(U^2_j\) of \(\bar{\mu }^2_j\) in \(\mathcal {B}\), respectively, as well as of a mapping \(g :{\text {Ext}}(B) \times {\text {Ext}}(B)\) and \(\theta ,~C_K >0\) with
for all \(j=1, \dots , \bar{N}_k\), \(k=1,2\), and all \(\mu \in U^k_j \cap {\text {Ext}}(B)\). We claim that this satisfied for
The proof is split into two parts. First, we characterize open \(d_{\mathcal {B}}\)-neighborhoods around the associated extremal points.
Lemma A.3
For \(0< R\) define the sets
as well as
Then, \({U}^1_i(R)\) is a \(d_{\mathcal {B}}\)-neighborhood of \(({\text {sign}}(\bar{q}(\bar{z}_i)/\alpha )\delta _{\bar{z}_i}\), \(i=1,\dots ,\bar{N}_1\), and \(\bar{U}^2_j(R)\) is a \(d_{\mathcal {B}}\)-neighborhood of \(\mathcal {D}_\beta (\bar{x}_j,\bar{y}_j)\), \(j=1,\dots ,\bar{N}_2\). Moreover, for every \(R>0\) small enough, there holds \({U}^1_i(R),~{U}^2_i(R) \subset {\text {Ext}}(B)\).
Proof
Let indices \(i\in \{1, \dots , \bar{N}_1\}\) and \(j\in \{1, \dots , \bar{N}_2\}\) be arbitrary but fixed. We first show the claimed statement for \(\bar{U}^2_j\). Noting that \((\mathcal {B},d_{\mathcal {B}})\) is a metric space, it suffices to show that any sequence \(\{\mu _k\}_k \subset \mathcal {B}\) with eventually lies in \(\bar{U}^2_j\) for all \(k \in \mathbb {N}\) large enough. For this purpose, assume that \(\{\mu _k\}_{k}\) admits a subsequence, denoted by the same symbol, of the form \(\mu _k=(\sigma _k/\alpha ) \delta _{z_k}\) for some \(\sigma _k \in \{-1,1\},~z_k \in \Omega \). Then, by possibly selecting another subsequence, we get for some \(\bar{\sigma } \in \{-1,1\},~\bar{z} \in \Omega \). Noting that weak* limits are unique and \(\bar{\sigma }\delta _{\bar{z}} \ne \mathcal {D}_\beta (\bar{x}_j, \bar{y}_j)\) yields a contradiction. In the same way, we exclude the existence of a subsequence with \(\mu _k=0\) for all k. Hence, for all \(k\in \mathbb {N}\) large enough, we have \(\mu _k=\mathcal {D}_\beta (x_k,y_k)\) for some \((x_k,y_k ) \in \Omega \times \Omega \) with \(0< |x_k,y_k|\le 2\alpha -\beta \). By a similar contradiction argument, \((x_k,y_k) \rightarrow (\bar{x}_j,\bar{y}_j)\) has to hold. Thus, for every \(k \in \mathbb {N}\) large enough, we have \((x_k,y_k) \in B_{R_2}(\bar{x}_j,\bar{y}_j)\) and thus \(\mu _k \in \bar{U}^2_j\), finishing the proof. The openness of \(\bar{U}^1_j\) follows by similar argument. In fact, if \(\{\mu _k\}_k \subset \mathcal {B} \) satisfies
then \(\mu _k=(\sigma _k/\alpha ) \delta _{z_k}\), \(\sigma _k \in \{-1,1\},~z_k\in \Omega \) for all k large enough since \(\bar{\mu }^1_i \ne \mathcal {D}_\beta (x,y)\) for every \((x,y)\in \Omega \times \Omega \). Moreover, from [8, Lemma 3.16], we get \(\sigma _k=\) for all \(k \in \mathbb {N}\) large enough. Finally, if there is a subsequence of \(\{z_k\}_k\), denoted by the same symbol, with \(z_k \rightarrow \bar{z}\) with \(\bar{z}\ne \bar{z}_i\), then we can choose \(\varphi \in \mathcal {C}(\Omega )\) satisfying \(\varphi (\bar{z})=0\) and \(\varphi (\bar{z}_i)=1\). For the corresponding subsequence of measures \(\mu _k\), we then obtain
yielding a contradiction and thus \(\bar{z}=\bar{z}_i\). \(\square \)
Next we prove the Lipschitz and quadratic growth properties from (40).
Lemma A.4
There are \(R_1,C_K>0\) with
for all \(\mu \in U^\ell _j(R_1)\), \(j=1,\dots ,\bar{N}_\ell \), \(\ell =1,2\).
Proof
By assumption, \(K_* :Y \rightarrow {\text {Lip}}(\Omega )\) is continuous. As a consequence, we immediately get
for all \(z\in \Omega \). For \(\mathcal {D}_\beta (\bar{x}_j, \bar{y}_j)\) we can argue similarly. For this purpose, if \(R_1>0\) is small enough, we have
for all \((\bar{x}_i,\bar{y}_i) \in B_R(\bar{x}_j) \times B_R(\bar{y}_j)\) since \(|\bar{x}_i-\bar{y}_i|>0\). As a consequence, we get
where we abbreviate
as well as
The claimed statement then follows by definition of \(U^1_i(R_1)\) and \(U^2_j(R_1)\) from Lemma A.3 and noting that
as well as
Since all involved constants are independent of i and j, respectively, we conclude. \(\square \)
Proposition A.5
Let Assumption \((\textbf{B3})\) hold. Then, there are \(\theta >0 \) and a radius \(0<R_2\) with
and \(j=1,\dots ,\bar{N}_\ell \), \(\ell =1,2\).
Proof
Since \(\bar{z}_i \in {\text {int}}\Omega \) is a global extremum of \(\bar{q}\) and \((\bar{x}_j,\bar{y}_j) \in {\text {int}}\Omega \times {\text {int}}\Omega \) is a global maximum of \(\Psi _{\bar{q}}\), we have \(\nabla \bar{q}(\bar{z}_i)=0\) and \(\nabla \Psi _{\bar{q}}(\bar{x}_j, \bar{y}_j)=0\), respectively. Using the non-degeneracy of the associated Hessians, see Assumption \((\textbf{B3})\), and the continuity of \(\bar{q}\), we conclude the existence of \(R_2>0\) as well as of \(\theta >0\) with
as well as
by Taylor’s expansion. This implies
as well as
for all
By Lemma A.3, all elements of \(U^1_i(R_2)\) and \(U^2_i(R_2)\), respectively, are of the form (41), thus finishing the proof. \(\square \)
Summarizing the previous observations, we conclude Theorem 3.15 using the results of from [8]:
Proof of Theorem 3.15
Summarizing our previous observations, we have that:
-
The function F is strongly convex around the optimal observation \(\bar{y}\), see Assumption \((\textbf{B2})\).
-
According to Lemma A.2, there exists \(\{\bar{\mu }_j\}^{\bar{N}}_{j=1} \subset {\text {Ext}}(B)\) with \(\max _{\mu \in \mathcal {B}}\langle \bar{q}, \mu \rangle =\{\bar{\mu }_j\}^{\bar{N}}_{j=1}\).
-
The set \(\{\bar{\mu }_j\}^{\bar{N}}_{j=1}\) is linearly independent, see Assumption \((\textbf{B4})\).
-
The unique solution \(\bar{u}=\sum ^{\bar{N}}_{j=1} \bar{\gamma }_j \bar{\mu }_j\) satisfies \(\bar{\gamma }_j>0\), see Assumption \((\textbf{B5})\).
-
There are \(d_{\mathcal {B}}\)-neighborhoods \(U_j\) of \(\bar{\mu }_j\) for \(j=1,\dots ,\bar{N}\), a function \(g :{\text {Ext}}(B) \times {\text {Ext}}(B) \rightarrow \mathbb {R}\) and \(C_K,\theta >0\) with
$$\begin{aligned} \Vert K(\mu \!-\!\bar{\mu }_j)\Vert _Y \le C_K g(\mu , \bar{\mu }_j), 1\!-\!\langle \bar{q}, \mu \rangle \ge \theta \, g(\mu ,\bar{\mu }_j)^2\, \text {for all}~\mu \in U_j \cap {\text {Ext}}(B). \end{aligned}$$
Consequently, the assumptions of [8, Theorem 3.8] are satisfied, and applying it we conclude the linear convergence of Theorem 3.15. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Carioni, M., Iglesias, J.A. & Walter, D. Extremal Points and Sparse Optimization for Generalized Kantorovich–Rubinstein Norms. Found Comput Math (2023). https://doi.org/10.1007/s10208-023-09634-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10208-023-09634-7
Keywords
- Extremal points
- Unbalanced optimal transport
- Sparse optimization
- Conditional gradient methods
- Inverse problems
- Optimal design
- Kantorovich–Rubinstein duality