1 Introduction

Let \(({\mathcal {X}}, d)\) be a metric space, \(({\mathcal {Y}}, \Vert \cdot \Vert )\) a non-trivial, strictly convex real Banach space and consider the Banach space of Lipschitz functions [25] from \({\mathcal {X}}\) to \({\mathcal {Y}}\) that vanish at the distinct point \(x_0\in {\mathcal {X}}\). Let \(L\ge 0\). We define the set

$$\begin{aligned} \begin{aligned} {\text {Lip}}^L_0 =\{f:{\mathcal {X}}\rightarrow {\mathcal {Y}} \mid f(x_0)=0 \text { and } \Vert f(x)-f(y)\Vert&\le Ld(x, y)\\&\text { for all } x, y\in {\mathcal {X}}\}. \end{aligned} \end{aligned}$$

In this paper, we want to study certain structural properties of the special case \({\text {Lip}}_0^1\) (since all of the other cases can be treated analogously). In particular, we are interested in characterizing the set of its extreme points. The set of extreme points of the space \({\text {Lip}}^1_0\) above mentioned has been widely studied in the past decades for real-valued functions \(f:{\mathcal {X}}\rightarrow \mathbb {R}\) (see [11, 13, 17,18,19, 22]), and more recently in [1, 9], but no information about the set \({\text {ext}}({\text {Lip}}^1_0)\) has been provided in the case of \({\mathcal {Y}}\ne \mathbb {R}\). In this paper, we show that the latter space can be characterized when considering a finite metric space \({\mathcal {X}}=\{x_0,\ldots ,x_n\}\), \(x_0,\ldots , \, x_n\)\(n\ge 1\), being distinct points.

The name representer theorems was introduced in the field of machine learning, in particular, in the context of kernel methods [21]. In a few words, these results show that any element of a space can be expressed by a linear combination of finitely many specific points. Recently, the study of representation results has gained popularity in the setting of variational inverse problems [3, 4, 23]. Moreover, and motivated by the so-called Minkowski–Carathéodory theorem [14, Theorem  III.2.3.4], which yields representations in terms of extreme points in the finite-dimensional setting, it has been observed that there is a natural connection between extreme points and representation results [12]. As shown in [7, 8], a proper characterization of extreme points may lead to efficient optimization algorithms. For these reasons, there is increasing recent interest in characterizing extreme points associated with various regularizers, see [4, 6] and [8] and [2, 5, 10, 15, 24]. Finally, we mention that Lipschitz-type constraints have also recently become of interest in the context of plug-and-play regularization [20] and monotone splitting algorithms [16]. Hence, a proper characterization of the extreme points of the Lipschitz unit ball may have a considerable impact. In this paper, we aim to partially fill this gap.

We present the characterization result in Theorem 2.2. Finally, we provide in Theorem 2.3 a representer theorem for the space \({\text {Lip}}_0^1 \) that improves the Minkowski–Carathéodory theorem, in the sense that we will see that the number of required extreme points is independent of the dimension of the space.

2 Extreme points and representer theorems

We recall that, given a convex set of a real vector space C, an extreme point of C is a point \(y\in C\) such that, if \(y=\lambda y^1 + (1-\lambda )y^2\) with \(y^1, y^2\in C\) and \(\lambda \in (0,1)\), then \(y^1=y^2=y\). In other words, an extreme point of a convex set C is a point y in C such that \(C\setminus \{y\}\) is convex. We remind the reader of the well-known fact that it is sufficient to check the extreme point condition only for \(\lambda =1/2\). We denote with \({\text {ext}}(C)\) the set of extreme points of C.

We consider, for further convenience, the following definition of the set \({\text {Lip}}_0^1\).

$$\begin{aligned} \begin{aligned} {\text {Lip}}^1_0:=\{y=(y_0,\dots , \, y_n)\in {\mathcal {Y}}^{n+1} \,: \, y_0=0 \text { and }&\Vert y_i-y_j\Vert \le d(x_i,x_j)\\&\text { for every } i,j=1,\ldots , n\}. \end{aligned} \end{aligned}$$

Note that both of the definitions of the set \({\text {Lip}}_0^1 \) are equivalent since in this case, we are only considering the images of the finite set \({\mathcal {X}}=\{x_0,\ldots ,x_n\}\) through functions f mapping to \({\mathcal {Y}}\). We provide now a preliminary lemma.

Lemma 2.1

Let \(y\in {\text {Lip}}^1_0 \). For every \(i=1,\ldots , n\), there exists \(0=i_0, \, i_1,\ldots , i_k=i\), \(k\ge 1\), such that \(\Vert y_{i_{j+1}}-y_{i_j}\Vert =d(x_{i_{j+1}},x_{i_j})\) for every \(j=0,\ldots , k-1\) if and only if there does not exist a non-empty subset \(S\subset \{1,\ldots ,n\}\) such that \(\Vert y_i-y_j\Vert <d(x_i,x_j)\) for every \(i\in S\), \(j\in S^c\).

Proof

First, we proceed by contradiction: let \(S\subset \{1,\ldots , n\}\), \(S\ne \emptyset \), be such that \(\Vert y_i-y_j\Vert <d(x_i,x_j)\) for every \(i\in S\), \(j\in S^c\), and let \(i\in S\). By hypothesis, we can choose \(k\ge 1\) with \(0=i_0,\ldots , i_k=i\) such that \(\Vert y_{i_{j+1}}-y_{i_j}\Vert =d(x_{i_j}, x_{i_{j+1}})\) for every \(j=0,\ldots , k-1\). As \(i_0=0\in S^c\) and \(i_k=i\in S\), we derive that there must exist \(j=0,\ldots , k-1\) such that \(i_{j+1}\in S\) and \(i_j\in S^c\). It follows that \(\Vert y_{i_{j+1}}-y_{i_j}\Vert <d(x_{i_{j+1}}, x_{i_{j}})\) but this contradicts the hypothesis and, hence, concludes the first part of the proof.

Conversely, let us consider the set

$$\begin{aligned} T=\{i\in \{1,\ldots , n\} \, \mid \ {}&\text {there exists } \ 0=i_0,\ldots , i_k=i \, \text { such that }\\&\Vert y_{i_{\ell +1}}-y_{i_{\ell }}\Vert =d(x_{i_{\ell +1}},x_{i_{\ell +1}}), \, \ell =0,\ldots , k-1\}, \end{aligned}$$

and suppose that \(T\ne \{1,\ldots , n\}\). Define the set \(S:=T^c\subset \{1,\ldots , n\}\), and observe that \(S\ne \emptyset \). It is left to prove that, for every \(i\in S\), \(j\in S^c\), \(\Vert y_i-y_j\Vert < d(x_i,x_j)\). Let us suppose that there exists \(i\in S\) and \(j\in S^c\) such that \(\Vert y_i-y_j\Vert =d(x_i,x_j)\). Since \(j\in S^c\), there exists \(0=i_0,\ldots , i_k=j\), \(k\ge 1\), such that \(\Vert y_{i_{\ell +1}}-y_{i_{\ell }}\Vert =d(x_{i_{\ell }},x_{i_{\ell +1}})\) for \(\ell =0,\ldots , k-1\). Since, by hypothesis, we have that \(\Vert y_i-y_j\Vert =d(x_i,x_j)\), and defining \(i_{k+1}:=i\), we obtain a path from 0 to i satisfying the equalities for \(\ell =1,\ldots , k\). This implies that \(i\in T=S^c\), a contradiction. We therefore have found that there exists \(S\subset \{1,\ldots , n\}\), \(S\ne \emptyset \), such that \(\Vert y_i-y_j\Vert <d(x_i,x_j)\) for every \(i\in S\), \(j\in S^c\). \(\square \)

Define now the set

$$\begin{aligned} \begin{aligned} {\mathcal {E}}:=\displaystyle \{&y\in {\mathcal {Y}}^{n+1} \, \mid \, y_0=0 \text { and, for every } i =1,\ldots , n, \text { there exists } \\&0=i_0,\ldots , i_k=i, \ k\ge 1: \displaystyle \Vert y_{i_{j+1}}\!-\!y_{i_j}\Vert \!=\! d(x_{i_j}, x_{i_{j+1}}), \ j=0,\ldots , k\!-\!1\}. \end{aligned} \end{aligned}$$

Observe that the definition of \({\mathcal {E}}\) is motivated by the previous lemma since every point \(y\in {\mathcal {E}}\) satisfies the first condition of Lemma 2.1. We are now ready to characterize the extreme points of the set \({\text {Lip}}_0^1 \).

Theorem 2.2

We have that \({\text {ext}}({\text {Lip}}_0^1 )={\mathcal {E}}\).

Proof

First, we will prove that, if \(y\notin {\mathcal {E}}\), then \(y\notin {\text {ext}}({\text {Lip}}_0^1 )\). Let \(y\in {\text {Lip}}_0^1 \) be such that \(y\notin {\mathcal {E}}\). By the previous lemma, we get that there exists \(S\subset \{1,\ldots , n\}\), \(S\ne \emptyset \), such that \(\Vert y_i-y_j\Vert < d(x_i,x_j)\) for every \(i\in S\), \(j\in S^c\). Choose now

$$\begin{aligned} \varepsilon =\min _{i\in S, \, j\in S^c} d(x_i,x_j)-\Vert y_i-y_j\Vert , \end{aligned}$$

and observe that \(\varepsilon >0\). Moreover, choose \(v\in {\mathcal {Y}}\) such that \(\Vert v\Vert =1\) (which exists since \({\mathcal {Y}}\) is non-trivial) and set

$$\begin{aligned} y_i^1:= {\left\{ \begin{array}{ll} y_i+\varepsilon v &{} \text { if } i\in S, \\ y_i &{} \,\,\text {else}, \end{array}\right. } \quad y_i^2:= {\left\{ \begin{array}{ll} y_i-\varepsilon v &{} \text { if } i\in S, \\ y_i &{} \,\,\text {else}. \end{array}\right. } \end{aligned}$$

Indeed, if we define \(y^k:=(y^k_0, y^k_1,\ldots , y^k_n)\), \(k=1, 2 \), then \(y^1\ne y^2\). Moreover, observe that

$$\begin{aligned} \Vert y^k_i-y^k_j\Vert =\Vert y_i-y_j\Vert \le d(x_i,x_j) \quad \text {for every } i, j\in S \text { or } i, j\in S^c, \, k=1, 2, \end{aligned}$$

since \(y\in {\text {Lip}}_0^1\) and

$$\begin{aligned} \Vert y^k_i-y^k_j\Vert&=\Vert y_i \pm \varepsilon v-y_j\Vert \le \Vert y_i-y_j\Vert +\varepsilon \\&\le \Vert y_i-y_j\Vert +d(x_i,x_j)-\Vert y_i-y_j\Vert \\&=d(x_i,x_j)\quad \text {for } i\in S, \, j\in S^c, \, k=1, 2. \end{aligned}$$

Therefore, \(y^k\in {\text {Lip}}_0^1\), \(k=1, 2\) and \(y=\frac{1}{2} y^1+\frac{1}{2} y^2\), \(y^1\ne y^2\) and so \(y\notin {\text {ext}}({\text {Lip}}_0^1)\). Hence, \({\text {ext}}({\text {Lip}}_0^1)\subset {\mathcal {E}}\).

We would like to prove now that \({\mathcal {E}}\subset {\text {ext}}({\text {Lip}}_0^1 )\). Let \(y\in {\text {Lip}}_0^1 {\setminus }{\text {ext}}({\text {Lip}}_0^1 )\). We will prove that there exists \(S\subset \{1,\ldots , n\}\), \(S\ne \emptyset \), such that \(\Vert y_i-y_j\Vert <d(x_i,x_j)\) for every \(i\in S\), \(j\in S^c\). If so, by the previous lemma, this would mean that \(y\notin {\mathcal {E}}\). Since \(y\notin {\text {ext}}({\text {Lip}}_0^1 )\), there exist \(y^1, y^2\in {\text {Lip}}_0^1\), \(y^1\ne y^2\), such that \(y=\frac{1}{2} y^1 +\frac{1}{2} y^2\). Now, define the set \(S=\{i\in \{1,\ldots , n\} \, \mid \, y_i^1\ne y_i^2\}\) and observe that it is non-empty since \(y^1\ne y^2\) by hypothesis. Now, let \(i\in S\), \(j\in S^c\). Then,

$$\begin{aligned} \left\| y_i-y_j\right\| =\left\| \frac{1}{2} y_i^1 -\frac{1}{2} y_j^1+\frac{1}{2} y_i^2-\frac{1}{2} y_j^2\right\| =\left\| \frac{1}{2} y_i^1 -\frac{1}{2} y_j+\frac{1}{2} y_i^2-\frac{1}{2} y_j\right\| . \end{aligned}$$

In order to finish the proof, define \(a:= y_i^1 - y_j\), \(b:=y_i^2- y_j\), and observe that \(a\ne b\). Now, we distinguish two cases: if a is not proportional to b, we get

$$\begin{aligned} \left\| y_i-y_j\right\| < \frac{1}{2} \Vert a\Vert + \frac{1}{2} \Vert b\Vert \le d(x_i,x_j) \end{aligned}$$

since we assumed that \({\mathcal {Y}}\) is a strictly convex space. If they are proportional, then, by possibly interchanging a and b, we have \(b=\lambda a\) for some \(\lambda \ne 1\), we can further assume that \(-1\le \lambda < 1\), and obtain that

$$\begin{aligned} \left\| y_i-y_j\right\| =\left\| \frac{a}{2} +\frac{\lambda a}{2}\right\| \le \frac{|1+\lambda |}{2}\Vert a\Vert <d(x_i,x_j). \end{aligned}$$

The result immediately follows. \(\square \)

We are now ready to state the representer theorem for the space \({\text {Lip}}_0^1\). In the case of \({\mathcal {Y}}=\mathbb {R}^d\), the Minkowski–Carathéodory theorem would imply that every function in \({\text {Lip}}_0^1 \) can be represented as a convex combination of at most \(nd+1\) extreme points. We are able to improve this number up to \(n+1\) extreme points, which is independent of d, and covers the infinite-dimensional case as well.

Theorem 2.3

For every \(y\in {\text {Lip}}_0^1 \), there exist \(k\le n+1\), \(y^1,\ldots , y^k \in {\text {ext}}({\text {Lip}}_0^1 )\), and scalars \(\lambda _1,\ldots , \lambda _k\ge 0\) with \(\sum _{i=1}^k\lambda _i=1\) such that \(y=\sum _{i=1}^k\lambda _iy^i\).

Proof

Let \(y\in {\text {Lip}}_0^1\) and recall that it is of the form \(y=(y_0,\ldots , y_n)\) with \(y_0=0\). Choose \(v\in {\mathcal {Y}}\) such that \(\Vert v\Vert =1\). Define the set

$$\begin{aligned} D=\left\{ t=(t_0,\ldots , t_n)\in \mathbb {R}^{n+1} \, \mid \, y+tv\in {\text {Lip}}_0^1\right\} , \end{aligned}$$

Where \((y+tv)_i:=y_i+t_iv\) for every \(i=0,\ldots , n\). Moreover, observe that \(t_0=0\) for every \(t\in D\) since, if \(t_0\ne 0\), then \((y+tv)_0\ne 0\). Now, we claim that, if \(t\in {\text {ext}}(D)\), then \(y+tv\in {\text {ext}}({\text {Lip}}_0^1 )\). Indeed, if \(t\in D\) and \(y+tv\notin {\text {ext}}({\text {Lip}}_0^1)\), then there exists a subset \(S\subset \{1,\ldots , n\}\), \(S\ne \emptyset \), such that \(\Vert y_i-y_j+(t_i-t_j)v\Vert <d(x_i,x_j)\) for every \(i\in S\), \(j\in S^c\). Choose

$$\begin{aligned} \varepsilon =\min _{i\in S, \, j\in S^c} d(x_i,x_j)-\Vert y_i-y_j+(t_i-t_j)v\Vert , \end{aligned}$$

and observe that \(\varepsilon >0\). Moreover, define

$$\begin{aligned} t_i^1:= {\left\{ \begin{array}{ll} t_i+\varepsilon &{} \text { if } i\in S, \\ t_i &{} \,\,\text {else}, \end{array}\right. } \quad t_i^2:= {\left\{ \begin{array}{ll} t_i-\varepsilon &{} \text { if } i\in S, \\ t_i &{} \,\,\text {else}. \end{array}\right. } \end{aligned}$$

With such definitions, observe that \(t^1\ne t^2\). Now, \(y+t^kv\in {\text {Lip}}_0^1\), for \(k=1, 2\), because

$$\begin{aligned} \begin{aligned} \Vert y_i-y_j+(t^k_i-t^k_j)v\Vert&=\Vert y_i-y_j+(t_i-t_j)v\Vert \\&\le d(x_i,x_j) \quad \text {for every } i, j\in S \text { or } i, j\in S^c, \end{aligned} \end{aligned}$$

since \(t\in D\) and

$$\begin{aligned} \Vert y_i-y_j+(t^k_i-t^k_j)v\Vert&\le \Vert y_i-y_j+(t_i-t_j)v\Vert +\varepsilon \Vert v\Vert \\&\le \Vert y_i-y_j+(t_i-t_j)v\Vert +d(x_i,x_j) \\&\quad - \Vert y_i-y_j+(t_i-t_j)v\Vert \\&=d(x_i,x_j) \quad \text {for } i\in S, \, j\in S^c, \, k=1, 2. \end{aligned}$$

Then, \(t^1, t^2\in D\) and \(t=\frac{1}{2} t^1+\frac{1}{2} t^2\), \(t^1\ne t^2\), which implies that \(t\notin {\text {ext}}(D)\). Consequently, \(t\in {\text {ext}}(D)\) implies \(y+tv\in {\text {ext}}({\text {Lip}}_0^1 )\). Now, we show that D is a non-empty, convex, compact subset of \(\mathbb {R}^{n+1}\). First, note that \(0\in D\) and that convexity follows from the fact that D is the preimage of the convex set \({\text {Lip}}_0^1 \) through the affine mapping \(t\mapsto y+tv\). Moreover, boundedness follows because, for every \(t\in D\), we have

$$\begin{aligned} d(x_i,x_0)\ge \Vert (y+tv)_i-(y+tv)_0\Vert =\Vert y_i-y_0+t_iv\Vert \ge |t_i|-\Vert y_i-y_0\Vert \end{aligned}$$

and so, for every \(i=1,\dots ,n\), we have that

$$\begin{aligned} |t_i|\le d(x_i,x_0) + \Vert y_i-y_0\Vert \le 2 d(x_i,x_0). \end{aligned}$$

It is only left to prove that D is closed. Let \((t^k)_{k\in \mathbb {N}}\) be a sequence in D converging to some \(t\in \mathbb {R}^{n+1}\). We have that

$$\begin{aligned} \Vert y_i-y_j+(t_i-t_j)v\Vert&= \Vert y_i-y_j+(t_i^k-t_j^k)v-(t_i^k-t_j^k)v+(t_i-t_j)v\Vert \\&\le \Vert y_i-y_j+(t_i^k-t_j^k)v\Vert + \Vert (t_i-t_j)v-(t_i^k-t_j^k)v\Vert \\&\le d(x_i,x_j)+|t_i-t_i^k|+|t_j-t_j^k| \end{aligned}$$

for every \(i,j=0,\dots ,n\). We obtain the result by taking limits when \(k\rightarrow \infty \). By the Krein–Milman theorem, we know that \(D=\overline{\text {conv}}({\text {ext}}(D))\). Moreover, we can apply the Minkowski–Carathéodory theorem, and since \(0\in D\), \(\textrm{span} \ D\subset \{0\}\times \mathbb {R}^n\), we have \(\textrm{dim} \ \textrm{span} \ D\le n\). Consequently, there exist \(k\le n+1\) and scalars \(\lambda _1,\ldots , \lambda _k\ge 0\) with \(\sum _{i=1}^k\lambda _i=1\) such that \(0=\sum _{i=1}^k\lambda _it^i\), with \(t^i\in {\text {ext}}(D)\), \(i=1,\ldots , k\). Finally, by the previous claim, we know that for every \(i=1,..., \, k\), if we define \(y^i:= y+t^iv\), then \(y^i\in {\text {ext}}({\text {Lip}}_0^1 )\) and, hence

$$\begin{aligned} \sum _{i=1}^k \lambda _iy^i=\sum _{i=1}^k \lambda _i (y+t^i v)=y+\left( \sum _{i=1}^k\lambda _it^i\right) v=y, \end{aligned}$$

concluding the proof. \(\square \)