## Abstract

We consider a general regularised interpolation problem for learning a parameter vector from data. The well-known representer theorem says that under certain conditions on the regulariser there exists a solution in the linear span of the data points. This is at the core of kernel methods in machine learning as it makes the problem computationally tractable. Most literature deals only with sufficient conditions for representer theorems in Hilbert spaces and shows that the regulariser being norm-based is sufficient for the existence of a representer theorem. We prove necessary and sufficient conditions for the existence of representer theorems in reflexive Banach spaces and show that any regulariser has to be essentially norm-based for a representer theorem to exist. Moreover, we illustrate why in a sense reflexivity is the minimal requirement on the function space. We further show that if the learning relies on the linear representer theorem, then the solution is independent of the regulariser and in fact determined by the function space alone. This in particular shows the value of generalising Hilbert space learning theory to Banach spaces.

## References

Argyriou, A., Micchelli, C.A., Pontil, M.: When is there a representer theorem? vector versus matrix regularizers. J. Mach. Learn. Res.

**10**, 2507–2529 (2009)Asplund, E.: Positivity of duality mappings. Bull. Amer. Math. Soc.

**73**(2), 200–203 (1967)Blaz̆ek, J.: Some remarks on the duality mapping. Acta Univ. Carolinae Math. Phys.

**23**(2), 15–19 (1982)Brezis, H.: Functional Analysis, Sobolev Spaces and Partial Differential Equations. Springer, New York. https://doi.org/10.1007/978-0-387-70914-7 (2011)

Browder, F.E.: Multi-valued monotone nonlinear mappings and duality mappings in banach spaces. Trans. Am. Math. Soc.

**118**, 338–351 (1965)Cox, D.D., O’Sullivan, F.: Asymptotic analysis of penalized likelihood and related estimators. Ann. Statist.

**18**(4), 1676–1695 (1990). https://doi.org/10.1214/aos/1176347872Cucker, F., Smale, S.: On the mathematical foundations of learning. Bull. Am. Math. Soc.

**39**(1), 1–49 (2001)Dragomir, S.S.: Semi-inner Products and Applications. Nova Science Publishers (2004)

Kimeldorf, G., Wahba, G.: Some results on tchebycheffian spline functions. J. Math. Anal. Appl. 33(1):82–95. https://doi.org/10.1016/0022-247X(71)90184-3 (1971)

Lindenstrauss, J., Preiss, D., Tišer, J.: Frechet Differentiability of Lipschitz Functions and Porous Sets in Banach Spaces. Princeton University Press (2012)

Micchelli, C.A., Pontil, M.: A Function Representation for Learning in Banach Spaces. In: Learning Theory. COLT, vol. 2004, pp. 255–269 (2004)

Micchelli, C.A., Pontil, M.: Learning the kernel function via regularization. J. Mach. Learn. Res.

**6**, 1099–1125 (2005)Phelps, R.: Convex Functions Monotone Operators and Differentiability. Lecture Notes in Mathematics. Springer, Berlin (1993)

Schlegel, K.: When is there a representer theorem? nondifferentiable regularisers and banach spaces. Journal of Global Optimization. https://doi.org/10.1007/s10898-019-00767-0 (2019)

Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press (2002)

Schölkopf, B., Herbrich, R., Smola, A.J.: A generalized representer theorem. In: Computational Learning Theory, pp 416–426 (2001)

Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press. https://doi.org/10.1017/CBO9780511809682 (2004)

Smola, J.A., Schölkopf, B.: On a kernel-based method for pattern recognition, regression, approximation, and operator inversion. Algorithmica

**22**(1), 211–231 (1998). https://doi.org/10.1007/PL00013831Tropp, J.A.: Just relax: convex programming methods for identifying sparse signals in noise. IEEE Trans. Inf. Theory

**52**(3), 1030–1051 (2006). https://doi.org/10.1109/TIT.2005.864420Xu, Y., Ye, Q.: Generalized Mercer Kernels and Reproducing Kernel Banach Spaces. Memoirs of the American Mathematical Society. American Mathematical Society. https://books.google.co.uk/books?id=rd2RDwAAQBAJ (2019)

Zhang, H., Zhang, J.: Regularized learning in banach spaces as an optimization problem: Representer theorems. J. Glob. Optim.

**54**(2), 235–250 (2012). https://doi.org/10.1007/s10898-010-9575-zZhang, H., Xu, Y., Zhang, J.: Reproducing kernel banach spaces for machine learning. J. Mach. Learn. Res.

**10**, 2741–2775 (2009)

## Author information

### Authors and Affiliations

### Corresponding author

## Additional information

Communicated by: Russell Luke

### Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Appendices

### A Appendix

### A.1 Regularised Interpolation

The proof of theorem 1 is largely identical to the one presented in [1] but requires a few minor adjustments to hold for the generality of reflexive Banach spaces. We present the full proof here.

### Proof of theorem 1

To prove that *Ω* is admissible for the regularised interpolation problem (2) we are going to show that *Ω* is tangentially nondecreasing in the sense of lemma 1 depending on the properties of the space \({\mathscr{B}}\).

Fix \(0 \neq f\in {\mathscr{B}}\) and *L* ∈ *J*(*f*) and let *a*_{0} be the unique nonzero minimiser of \(\min \limits \{\mathcal {E}(a\nu ,y) : a\in \mathbb {R}\}\). For every *λ* > 0 Consider the regularisation problem

By assumption there exist solutions \(f_{\lambda }\in {\mathscr{B}}\) such that

i.e. there exist \(c_{\lambda }\in \mathbb {R}\) such that *c*_{λ}*L* ∈ *J*(*f*_{λ}).

Now fix any \(g\in {\mathscr{B}}\) such that *L* ∈ *J*(*g*) which exists as \({\mathscr{B}}\) is reflexive so *J* is surjective. We then obtain

where the first inequality follows from *a*_{0} minimising \(\mathcal {E}(a\nu ,y)\) and the second inequality from *L*(*g*) = ∥*L*∥^{2}. This shows that *Ω*(*f*_{λ}) ≤*Ω*(*g*) for all *λ* and so by hypothesis the set {*f*_{λ} : *λ* > 0} is bounded. Hence there exists a weakly convergent subsequence \((f_{\lambda _{l}})_{l\in \mathbb {N}}\) such that \(\lambda _{l}\underset {l\rightarrow \infty }{\longrightarrow }0\) and \(f_{\lambda _{l}}\rightharpoonup \overline {f}\) as \(l\rightarrow \infty \). Taking the limit inferior as \(l\rightarrow \infty \) on the right-hand side of inequality 10 we obtain

Since *a*_{0} is by assumption the unique, nonzero minimiser this means that

But then since \(L(\overline {f}) \leq \|{L}\|\cdot \|{\overline {f}}\) we have \(\|{L}\|\leq \|{\overline {f}}\|\).

Moreover since *J*(*f*_{λ}) ∩span{*L*}≠*∅* we have \(\|{L}\|\cdot \|{f_{\lambda }}\| = L(f_{\lambda }) \rightarrow \|{L}\|^{2}\) and thus \(\|{f_{\lambda }}\|\rightarrow \|{L}\|\). Since \(\|{\overline {f}}\| \leq \liminf \|{f_{\lambda }}\|=\|{L}\|\) (e.g. [4], Proposition 3.5 (iii)) we have \(\|{\overline {f}}\|= \|{L}\|\) and thus \(L\in J(\overline {f})\).

Since the *f*_{λ} are minimisers of the regularisation problem we have for all \(g\in {\mathscr{B}}\) such that *L*(*g*) = ∥*L*∥^{2}

Since *a*_{0} is the minimiser this implies in particular that

and taking the limit inferior again we obtain that \(\overline {f}\) is in fact a solution of the interpolation problem

Now this means that \({\varOmega }(\overline {f}+f_{T}) \geq {\varOmega }(\overline {f})\) for all \(f_{T}\in \ker (L)\) and if \(\overline {f} = f\) we are clearly done. If \(\overline {f}\neq f\) we know that *f* and \(\overline {f}\) are in the same face as *L* ∈ *J*(*f*) and \(L\in J(\overline {f})\). They thus have the same error \({\mathcal {E}}\). If \({\varOmega }(f) = {\varOmega }(\overline {f})\) then both are equivalent minimisers and it is clear that both satisfy the tangential bound. If \({\varOmega }(f) > {\varOmega }(\overline {f})\) then *f* is not admissible and does not need to satisfy the tangential bound.

Finally note that the claim is trivially true for *L* = 0 as in that case \(\mathcal {E}\) is independent of *f* and for every *λ* the minimiser *f*_{λ} has to be zero to satisfy *J*(*f*_{λ}) ∩{0}≠*∅*. This means *Ω* is minimised at 0. □

### A.2 Duality mappings

The proof of theorem 2 crucially relies on the following connection of the duality mapping with subgradients (c.f. [2, 3]).

###
**Proposition 4**

For a normed linear space *V* with duality mapping *J* with gauge function *μ* define \(M:V \rightarrow \mathbb {R}\) by

Then *x*^{∗}∈ *∂**M*(*x*) ⊂ *V*^{∗}, the subgradient of *M*, if and only if

For any 0≠*x* ∈ *V* we have that *∂**M*(*x*) = *J*(*x*).

We now give a proof of theorem 2 which follows the ideas of the one presented by [3] but corrects the issue in that paper.

### Proof of theorem 2

Using the functional *M* from proposition 4 define a functional \(F \colon V\rightarrow \mathbb {R}\) by

Since *M* is continuous, convex with strictly increasing derivative and *L*_{0} is linear, *F* is clearly continuous, convex and coercive. This means that *F* attains its minimum on the reflexive subspace *W* in at least one point, \(\overline {z}\) say.

Hence for all *y* ∈ *W*

By proposition 4 this means that \(L_{0}\big |_{W} \in \partial M\big |_{W}(\overline {z} - x_{0}) = J_{\mu }\big |_{W}(\overline {z} - x_{0})\). For simplicity we write *L*_{0}|_{W} = *L*_{W}.

Note that if *x*_{0} ∈ *W* and *L*_{W} = 0 we have that *F*(*x*) = *M*(*x* − *x*_{0}) on *W* so \(\overline {z} = x_{0}\) and we trivially have *J*_{μ}(*x*_{0} − *x*_{0}) = {0} = {−*L*_{0} + *L*_{0}}⊂ *W*^{⊥} + *L*_{0}. So we can without loss of generality assume that not both *x*_{0} ∈ *W* and *L*_{W} = 0.

In case *x*_{0} ∈ *W* it is clear that *M* is minimised at *x*_{0}. If *L*_{W}≠ 0 then *L*_{W} attains its norm on *W* in a point *z* say. Thus it is clear that there exists a minimiser for *F* of the form \(\overline {z} = z + x_{0}\). More precisely *F* is minimised where an element of *∂**M* and ∇*L*_{0} are equal. Since \(\partial M(x-x_{0})=\mu (\|{x-x_{0}}\|)\frac {L_{x}}{\mu (\|{x-x_{0}}\|)}\) for *L*_{x} ∈ *J*_{μ}(*x* − *x*_{0}) we get that the minimiser \(\overline {z}=z+x_{0}\) is such that \(\|{L_{W}}\|_{W^{\ast }}=\mu (\|{\overline {z}-x_{0}}\|)\).

If on the other hand *x*_{0}∉*W* then we note that \(\overline {z}\) being the minimum for *F* on *W* implies that *L*_{z}(*y*) ≥ 0 for all \(L_{z} \in \partial F(\overline {z})\) and all *y* ∈ *W*. But this means that

for every \(L_{z}\in J(\overline {z}-x_{0})\). But since \(\frac {L_{z}}{\mu (\|{\overline {z}-x_{0}}\|)}\) is of norm 1 this means that

for all *y* ∈ *W*. Thus \(\|{L_{W}}\|_{W^{\ast }}=\|{L_{0}\big |_{W}}\|_{W^{\ast }}\leq \mu (\|{\overline {z}-x_{0}}\|)\).

Now denote by \(\overline {W}\) the space generated by *W* and *x*_{0} and note that this space is still reflexive. Extend *L*_{W} to \(L_{\overline {W}}\) on \(\overline {W}\) by setting

Then

so \(\|{L_{\overline {W}}}\|_{\overline {W}^{\ast }} \geq \mu (\|{\overline {z}-x_{0}}\|)\).

Further \(L_{\overline {W}}(y)=L_{W}(y)\leq \mu (\|{\overline {z}-x_{0}}\|)\cdot \|{y}\|\) for all *y* ∈ *W*, so \(\|{L_{\overline {W}}}\| > \mu (\|{\overline {z}-x_{0}}\|)\) can only happen if the norm is attained for some point *λ**y* + *ν**x*_{0} for *y* ∈ *W*, *ν*≠ 0. Or equivalently, dividing through by *ν*, at a point *y* + *x*_{0} for some *y* ∈ *W*. But for those points we have

and thus \(\|{L_{\overline {W}}}\| = \mu (\|{\overline {z}-x_{0}}\|)\) and \(L_{\overline {W}}(\overline {z}-x_{0}) = \|{L_{\overline {W}}}\|\cdot \|{\overline {z}-x_{0}}\|\).

Since for *x*_{0} ∈ *W* we have \(\overline {W}=W\) in either case we have obtained a function \(L_{\overline {W}}\) such that \(L_{\overline {W}}=L_{0}\big |_{W}\), \(\|{L_{\overline {W}}}\|=\mu (\|{\overline {z}-x_{0}}\|)\) and \(L_{\overline {W}}(\overline {z}-x_{0}) = \|{L_{\overline {W}}}\|\cdot \|{\overline {z}-x_{0}}\|\).

Now extend \(L_{\overline {W}}\) by Hahn-Banach to *L*_{V} on *V* such that

and \(L_{V}\big |_{\overline {W}}=L_{\overline {W}}\). Hence (*L*_{V} − *L*_{0})|_{W} = 0 so *L*_{V} ∈ *W*^{⊥} + *L*_{0}.

It remains to show that \(L_{V}\in J_{\mu }(\overline {z}-x_{0})\) by showing 12 holds for *L*_{V} and every *y* ∈ *V*. Notice first that

But

and further

so the left-hand side of 15 is always at least as big as the left-hand side of 14. We can thus add the left-hand side of 14 to the right-hand side of 12 and the left-hand side of 15 to the left-hand side of 12 while preserving the inequality. 12 is in particular true for \(\overline {z}\) and in that case also for \(L_{\overline {W}}\) as it agrees with *L*_{0} on \(\overline {z}\) and *x*_{0}, i.e.

Thus by adding the left-hand sides of 14 and 15 as described we obtain

for all *y* ∈ *V*. But since *L*_{V} also agrees with \(L_{\overline {W}}\) on \(\overline {z}\) and *x*_{0} this together with 13 implies that

for all *y* ∈ *V* which is what we wanted to prove. Thus indeed \(L_{V}\in J_{\mu }(\overline {z}-x_{0})\) as claimed. By homogeneity of *J*_{μ} clearly − *L*_{V} with \(-\overline {z}\in W\) is as in the statement of the theorem. □

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Schlegel, K. When is there a representer theorem?.
*Adv Comput Math* **47**, 54 (2021). https://doi.org/10.1007/s10444-021-09877-4

Received:

Accepted:

Published:

DOI: https://doi.org/10.1007/s10444-021-09877-4

### Keywords

- Representer theorem
- Regularised interpolation
- Regularisation
- Kernel methods
- Reproducing kernel Banach spaces

### Mathematics Subject Classification (2010)

- 68T05