# When is there a representer theorem?

- 166 Downloads

## Abstract

We consider a general regularised interpolation problem for learning a parameter vector from data. The well known representer theorem says that under certain conditions on the regulariser there exists a solution in the linear span of the data points. This is at the core of kernel methods in machine learning as it makes the problem computationally tractable. Necessary and sufficient conditions for differentiable regularisers on Hilbert spaces to admit a representer theorem have been proved. We extend those results to nondifferentiable regularisers on uniformly convex and uniformly smooth Banach spaces. This gives a (more) complete answer to the question when there is a representer theorem. We then note that for regularised interpolation in fact the solution is determined by the function space alone and independent of the regulariser, making the extension to Banach spaces even more valuable.

## Keywords

Representer theorem Regularised interpolation Regularisation Semi-inner product spaces Kernel methods## Mathematics Subject Classification

68T05## 1 Introduction

Regularisation is often described as a process of adding additional information or using previous knowledge about the solution to solve an ill-posed problem or to prevent an algorithm from overfitting to the given data. This makes it a very important method for learning a function from empirical data from very large classes of functions. Intuitively its purpose is to pick from all the functions that may explain the data the function which is the simplest in some suitable sense. Hence regularisation appears in various disciplines wherever empirical data is produced and has to be explained by a function. This has motivated to study regularisation problems in mathematics, statistics and computer science and in particular in machine learning theory (Cucker and Smale [4], Shawe-Taylor and Cristianini [16], Micchelli and Pontil [13]).

In particular regularisation in Hilbert spaces has been studied in the literature for various reasons. First of all the existence of inner products allows for the design of algorithms with very clear geometric intuitions often based on orthogonal projections or the fact that the inner product can be seen as a kind of similarity measure.

But in fact crucial for the success of regularisation methods in Hilbert spaces is the well known *representer theorem* which states that for certain regularisers there is always a solution in the linear span of the data points (Kimeldorf and Wahba [8], Cox and O’Sullivan [3], Schölkopf and Smola [14, 17]). This means that the problem reduces to finding a function in a finite dimensional subspace of the original function space which is often infinite dimensional. It is this dimension reduction that makes the problem computationally tractable.

Another reason for Hilbert space regularisation finding a variety of applications is the *kernel trick* which allows for any algorithm which is formulated in terms of inner products to be modified to yield a new algorithm based on a different symmetric, positive semidefinite kernel leading to learning in *reproducing kernel Hilbert spaces* (Schölkopf and Smola [15], Shawe-Taylor and Cristianini [16]). This way nonlinearities can be introduced in the otherwise linear setup. Furthermore kernels can be defined on input sets which a priori do not have a mathematical structure by embeddings into a Hilbert space.

*Tikhonov regularisation*, i.e. an optimisation problem of the form

*error function*, \(\varOmega \,:\mathcal {H}\rightarrow \mathbb {R}\) a

*regulariser*and \(\lambda >0\) is a

*regularisation parameter*. Argyriou et al. [1] show that under very mild conditions this regularisation problem admits a linear representer theorem if and only if the regularised interpolation problem

We will be interested in regularisation not only in Hilbert spaces as stated above but *extend the theory to uniformly convex, uniformly smooth Banach spaces*, allowing for learning in a much larger variety of spaces. While any two Hilbert spaces of the same dimension are linearly isometrically isomorphic this is far from true for Banach spaces so they exhibit much richer geometric variety which may be exploited in learning algorithms. Furthermore we may encounter applications where the data has some intrinsic structure so that it cannot be embedded into a Hilbert space. Having a large amount of Banach spaces for potential embeddings may help to overcome this problem. Analogous to learning in reproducing kernel Hilbert spaces the generalisation to Banach spaces allows for learning in *reproducing kernel Banach spaces* which have been introduced by Zhang et al. [18]. Our results regarding the existence of representer theorems are in line with Zhang and Zhang’s work on representer theorems for reproducing kernel Banach spaces [19].

But as we will show at the end of this paper the variety of spaces to pose the problem in is of even greater importance. It is often said that the regulariser favours solutions with a certain desirable property. We will show that in fact for regularised interpolation when we rely on the linear representer theorem it is essentially *the choice of the space*, and only the choice of the space not the choice of the regulariser, which *determines the solution*.

It is well known that non-decreasing functions of the Hilbert space norm admit a linear representer theorem. Argyriou et al. [1] showed that this condition is not just necessary but for differentiable regularisers also sufficient. In this paper we *remove the differentiablity condition* and show that any regulariser on a uniformly convex and uniformly smooth Banach space that admits a linear representer theorem is in fact very close to being radially symmetric, thus giving a (more) complete answer to the question when there is a representer theorem. Before presenting those results we present the necessary theory of semi-inner products to generalise the Hilbert space setting considered by Argyriou, Micchelli and Pontil to Banach spaces.

In Sect. 2 we will introduce the notion of *semi-inner products* as defined by Lumer [11] and later extended by Giles [6]. We will state the results without proofs as they mostly are not difficult and can be found in the original papers. Another extensive reference about semi-inner products and their properties is the work by Dragomir [5].

After introducing the relevant theory we will present the generalised regularised interpolation problem in Sect. 3, replacing the inner product in Eq. (1) by a semi-inner product. We then state one of the main results of the paper that regularisers that admit a representer theorem are almost radially symmetric in a way that will be made precise in the statement. Before giving the proof of the theorem we state and prove two essential lemmas capturing most of the important structure of the problem to prove the theorem. We finish the section by giving the proof of the main result.

Finally in Sect. 4 we prove that in fact for admissible regularisers there is a unique solution of the regularised interpolation problem in the linear span of the data and it is independent of the regulariser. This in particular means that we may choose the regulariser which is most suitable for our task at hand without changing the solution.

### 1.1 Notation

Before the main sections we briefly introduce some notation used throughout the paper. We use \(\mathbb {N}_m\) as a shorthand notation for the set \(\{1,\ldots ,m\}\subset \mathbb {N}\). We will assume we have *m* data points \(\left\{ (x_i,y_i) \,:\,i\in \mathbb {N}_m\right\} \subset \mathcal {B}\times Y\), where \(\mathcal {B}\) will always denote a uniformly convex, uniformly smooth real Banach space and \(Y\subseteq \mathbb {R}\). Typical examples of *Y* are finite sets of integers for classification problems, e.g. \(\{-1,1\}\) for binary classification, or the whole of \(\mathbb {R}\) for regression.

We briefly recall the definitions of a Banach space being uniformly convex and uniformly smooth, further details can be found in [2, 9, 10].

### Definition 1

(*Uniformly convex Banach space*) A normed vector space *V* is said to be uniformly convex if for every \(\varepsilon >0\) there exists a \(\delta >0\) such that if \(x,y\in V\) with \({||}{x}{||}_V={||}{y}{||}_V=1\) and \({||}{x-y}{||}_V > \varepsilon \) then \({||}{\frac{x+y}{2}}{||}_V < 1 - \delta \).

### Definition 2

(*Uniformly smooth Banach space*) A normed vector space *V* is said to be uniformly smooth if for every \(\varepsilon >0\) there exists \(\delta >0\) such that if \(x,y\in V\) with \({||}{x}{||}_V=1, {||}{y}{||}_V\le \delta \) then \({||}{x+y}{||}_V+{||}{x-y}{||}_V\le 2+\varepsilon {||}{y}{||}_V\).

### Remark 1

- (i)The modulus of smoothness of the space
*V*is defined asNow$$\begin{aligned} \rho _V(\delta ) = \sup \left\{ \frac{{||}{x+y}{||}_V+{||}{x-y}{||}_V}{2} - 1 \,:\,{||}{x}{||}_V=1, {||}{y}{||}_V=\delta \right\} \end{aligned}$$(2)*V*is uniformly smooth if and only if \(\frac{\rho _V(\delta )}{\delta }\underset{{\delta }\rightarrow {0}}{\longrightarrow }0\). - (ii)The norm on
*V*is said to be uniformly Fréchet differentiable if the limitexists uniformly for all real$$\begin{aligned} \lim \limits _{t\rightarrow 0}\frac{{||}{x+t\cdot y}{||}_V-{||}{x}{||}_V}{t} \end{aligned}$$*t*and \(x,y\in V\) with \({||}{x}{||}_V={||}{y}{||}_V= 1\). The space*V*is uniformly smooth if its norm is uniformly Fréchet differentiable.

We always write \(\mathcal {H}\) to denote a Hilbert space and for the first part of Sect. 2 we will be speaking of general normed linear spaces denoted by *V*. Once we have seen the reasons to require the space to be a uniformly convex and uniformly smooth Banach space the remainder of Sect. 2 and the paper will consider such spaces denoted by \(\mathcal {B}\). When only the norm \({||}{\cdot }{||}_\mathcal {B}\) on \(\mathcal {B}\) is considered the subscript will often be omitted for simplicity. Throughout we will denote the inner product on a Hilbert space by \(\left<{\cdot },{\cdot }\right>_{\mathcal {H}}\) and a semi-inner product on a normed linear space by \(\left[ {\cdot },{\cdot }\right] _{V}\).

## 2 Semi-inner product spaces

There are various definitions of semi-inner products aiming to generalise Hilbert space methods to more general cases. The notion of semi-inner products we are going to use was first introduced by Lumer [11] and further developed by Giles [6]. In comparison to inner products the assumption of symmetry, or equivalently additivity in the second argument, is dropped. Following Giles we assume homogeneity in the second argument. Moreover we need to assume that the Cauchy–Schwarz inequality holds, as it is crucial for the semi-inner products to have inner-product like behaviour.

In this section we will give a brief summary of the most important results. More details can be found in the two papers referenced above. Moreover an extensive overview of the theory of this and other notions of semi-inner products can be found in Dragomir [5]. While all results in this section can be given for complex vector spaces we will only present them for the real case as our results only apply to real vector spaces.

### Definition 3

(*Semi-inner product*) A semi-inner product (s.i.p.) on a real vector space *V* is a map \(\left[ {\cdot },{\cdot }\right] _{V}:V\times V\rightarrow \mathbb {R}\) which is linear in the first argument, homogeneous in the second argument, positive definite, and satisfies the Cauchy–Schwarz inequality.

With these properties a semi-inner product \(\left[ {\cdot },{\cdot }\right] _{V}\) induces a norm \(\left[ {x},{x}\right] _{V}={||}{x}{||}_V\) on *V*. Conversely every norm \({||}{\cdot }{||}_V\) on a linear space *V* is induced by at least one semi-inner product, i.e. there exists at least one semi-inner product \(\left[ {\cdot },{\cdot }\right] _{V}\) such that \({||}{x}{||}_V=\left[ {x},{x}\right] _{V}\). This means that every normed linear space is a s.i.p. space, but the semi-inner product inducing the norm is in general not unique. We do have uniqueness for uniformly smooth spaces though.

### Proposition 1

*V*is uniformly Fréchet differentiable as defined in Remark 1, then the differential of the norm for \(x\ne 0\) is given by

The existence of a semi-inner product allows us to define a notion of orthogonality analogous to orthogonality in Hilbert spaces by requiring the semi-inner product to be zero. It turns out that this is equivalent to a generalisation of orthogonality to normed linear spaces introduced by James [7]. James in particular points out that his definition is closely related to linear functionals and hyperplanes. This is essential for our applications as we will see in the main part of the paper.

The lack of symmetry of the semi-inner product means that our notion of orthogonality is not symmetric in general and *x* being normal to *y* does not imply that *y* is normal to *x*.

### Definition 4

(*Orthogonality*) Let *V* be a s.i.p. space. For \(x,y\in V\) we say *x* is normal to *y* if \(\left[ {y},{x}\right] _{V}=0\).

*x*is normal to all \(y\in U\). Thus the orthogonal complement of

*U*is defined as

The relation to James orthogonality also helps to get a geometric understanding of what orthogonality means in a s.i.p. space. From Definition 4 it is immediately clear that *x* being normal to *y* means that the vector *y* is tangent to the ball \(B(0, {||}{x}{||})\) at the point *x*, where \(B(0,{||}{x}{||})\) is the ball of radius \({||}{x}{||}\) centred at the origin.

If the space is a uniformly convex Banach space there exists a unique orthogonal decomposition for every \(x\in V\). This is because in a uniformly convex space there is a unique closest point in a closed linear subspace and one easily checks that this immediately leads to a unique orthogonal decomposition.

For uniformly convex, uniformly smooth Banach spaces we are also able to establish a Riesz representation theorem using the semi-inner product.

### Theorem 1

(*Riesz representation theorem*) Let *V* be a uniformly convex, uniformly smooth s.i.p. space. Then for every \(f\in V^*\) there exists a unique vector \(y\in V\) such that \(f(x)=\left[ {x},{y}\right] _{V}\) for all \(x\in V\) and \({||}{y}{||}_V = {||}{f}{||}_{V^*}\).

Summarising the above results we see that a necessary structure to have a unique semi-inner product inducing the norm and allowing for a Riesz representation theorem is that the space is a uniformly convex and uniformly Fréchet differentiable Banach space. For simplicity we will be calling such spaces uniform.

### Definition 5

(*Uniform Banach space*) We say a space *V* is uniform if it is a uniformly convex and uniformly Fréchet differentiable Banach space.

It is well known that the dual space of a uniform Banach space is itself uniform. Lastly we note that the duality mapping of a uniform Banach space is norm-to-norm continuous.The proof for this is standard and can be found in the appendix.

### Proposition 2

The duality map \(*:\mathcal {B}\rightarrow \mathcal {B}^*, x\mapsto x^*\) is norm-to-norm continuous. In particular \(\left[ {z},{x+ty}\right] _{\mathcal {B}} \rightarrow \left[ {z},{x}\right] _{\mathcal {B}}\) for all \(x,y,z\in \mathcal {B}\) and \(t\in \mathbb {R}\).

Thus the dual map is a homeomorphism from \(\mathcal {B}\) to \(\mathcal {B}^*\) with the norm topologies. It is linear if and only if \(\mathcal {B}\) is a Hilbert space.

## 3 Existence of representer theorems

### Definition 6

*Admissible regulariser*) We say a function \(\varOmega :\mathcal {B}\rightarrow \mathbb {R}\) is admissible if for any \(m\in \mathbb {N}\) and any given data \(\left\{ (x_i,y_i) \,:\,i\in \mathbb {N}_m\right\} \subset \mathcal {B}\times Y\) such that the interpolation constraints can be satisfied the regularised interpolation problem Eq. (3) admits a solution \(f_0\) such that its dual element is of the form

With this definition at hand it is now our goal to classify all admissible regularisers. It is well known that being a non-decreasing function of the norm on a Hilbert space is a sufficient condition for the regulariser to be admissible. By a Hahn–Banach argument similar as e.g. in Zhang and Zhang [19] this generalises to our case of uniform Banach spaces. Below we show that this condition is already almost necessary in the sense that admissible regularisers cannot be very far from being radially symmetric.

### Theorem 2

*h*whenever \({||}{f}{||} \ne r\) for \(r\in \mathcal {R}\). Here \(\mathcal {R}\) is an at most countable set of radii where

*h*has a jump discontinuity. For any

*f*with \({||}{f}{||}=r\in \mathcal {R}\) the value \(\varOmega (f)\) is only constrained by the monotonicity property, i.e. it has to lie in between \(\lim \limits _{t\nearrow r}h(t)\) and \(\lim \limits _{t\searrow r}h(t)\).

In other words, \(\varOmega \) is radially non-decreasing and radially symmetric except for at most countably many circular jump discontinuities. In those discontinuities the function value is only limited by its monotonicity property.

In [1] Argyriou et al. show that any admissible regulariser on a Hilbert space is non-decreasing in orthogonal directions. An analogous result is true for uniform Banach spaces but with orthogonality not being symmetric and our intuition gained from the equivalence with James orthogonality we see that in fact it is tangential directions in which the regulariser is non-decreasing. This also becomes clear from the proofs in [1], in particular when proving radial symmetry.

Before we can prove the analogous result for uniform Banach spaces we need to show that we can extend this tangential bound considerably and a function that is non-decreasing in tangential directions is in fact non-decreasing in norm as is made precise in the following Lemma 1.

### Lemma 1

If \(\varOmega (f)\le \varOmega (f+f_T)\) for all \(f,f_T\in \mathcal {B}\) such that \(\left[ {f_T},{f}\right] _{\mathcal {B}}=0\) then for any fixed \(\hat{f}\) we have that \(\varOmega (\hat{f})\le \varOmega (f)\) for all *f* such that \({||}{\hat{f}}{||}<{||}{f}{||}\).

### Proof

*Part 1: (Bound*\(\varOmega \)*on the half space given by the tangent through*\(\hat{f}\))

*t*.

*f*, for small

*t*for which \({||}{f_t}{||}<{||}{f}{||}\) we must have that

*t*big enough so that \({||}{f_t}{||}>{||}{f}{||}\) we thus must have

*t*. By above discussion the expression is positive for small

*t*and negative for large

*t*so by the intermediate value theorem there exists \(t_0\) such that

Hence we have the bound along the entire ray \(\lambda \cdot \hat{f}\) for \(1<\lambda \in \mathbb {R}\) which extends along all tangents through those points to the half space given by the tangent through \(\hat{f}\), i.e. the shaded region in Fig. 1.

*Part 2: (Extend the bound around the circle)*

Next we note that we can actually extend the bound further to apply all the way around the circle, namely \(\varOmega (f)\ge \varOmega (\hat{f})\) for all *f* such that \({||}{f}{||}>{||}{\hat{f}}{||}\). This is done by considering \(f_t=\hat{f}+t\cdot f_T\) as before but then instead of following the tangent into the half space just considered we follow the tangent in the opposite direction around the circle, as shown in Fig. 2. We fix another point along that tangent and repeat the process, moving around the circle. We claim that by making the step size along each tangent small enough we can this way move around the circle while staying arbitrarily close to it.

Combining both arguments this proves that we can reach any point with norm greater than \({||}{\hat{f}}{||}\) from \(\hat{f}\) only by moving along tangents giving the claimed bound. \(\square \)

Having proved this lemma we are now in the position to prove that indeed any admissible regulariser on a uniform Banach space is non-decreasing in tangential directions. Note that the previous lemma will also play a crucial role in removing the differentiability assumption when establishing the closed form representation of the regulariser in Theorem 2.

### Lemma 2

*f*such that \({||}{\hat{f}}{||} < {||}{f}{||}\) we have

### Proof

*Part 1:* (\(\varOmega \)*admissible *\(\Rightarrow \)*nondecreasing along tangential directions)*

*f*itself. But if \(f_T\) is such that \(\left[ {f_T},{f}\right] _{\mathcal {B}}=0\) then \(\left[ {f+f_T},{f}\right] _{\mathcal {B}}=\left[ {f},{f}\right] _{\mathcal {B}}\) so \(f+f_T\) also satisfies the constraints and hence necessarily \(\varOmega (f+f_T)\ge \varOmega (f)\) as claimed. The second claim follows immediately from Lemma 1.

*Part 2: (Nondecreasing along tangential directions* \(\Rightarrow \) \(\varOmega \) *admissible)*

*Z*is

*m*. Without loss of generality we can assume that not all \(y_i\) are zero as otherwise \(f_0=f_0^*=0\) is a trivial solution in the span of the data points. Since not all \(y_i\) are zero \(f_0\not \in Z\) and thus

*X*is homeomorphic to a linear space of dimension

*m*. This means that that \(X\cap {{\,\mathrm{span}\,}}\{f_0,Z\}\) is homeomorphic to a one-dimensional space and hence in particular contains a nonzero element.

*f*being nonzero means that \(f\not \in {{\,\mathrm{span}\,}}\{f_0\}\) and \(f\not \in Z\). Thus \(f=\lambda f_0 + \mu g\) for \(\lambda ,\mu \ne 0, g\in Z\). By homogeneity of the dual map \(\lambda \cdot X=X\) and so

This means we have constructed an \(\overline{f_0}=f_0 + f_T\) with dual element in the span of the data points and \(f_T\in Z\) which means by definition of *Z* that \(\overline{f_0}\) satisfies the interpolation constraints. It remains to show that in fact \(\overline{f_0}\) is in norm at most as large as \(f_0\).

But by Lemma 1 we know that for a function which is non-decreasing along tangential directions is non-decreasing in norm so \({||}{\overline{f_0}}{||}<{||}{f_0}{||}\) implies that \(\varOmega (\overline{f_0})\le \varOmega (f_0)\) and so we have found a solution with dual element in the span of the data points as claimed. \(\square \)

Using those two results we can now give the proof that admissible regularisers are almost radially symmetric in the sense of Theorem 2.

### Proof of Theorem 2

*Part 1: *(\(\varOmega \)*continuous in radial direction implies*\(\varOmega \)*radially symmetric)*

We now show that instead of differentiability, the assumption that \(\varOmega \) is continuous in radial direction is sufficient to conclude that it has to be radially symmetric. We prove this by contradiction. Assume \(\varOmega \) is admissible but not radially symmetric. Then there exists a radius *r* so that \(\varOmega \) is not constant on the circle with radius *r* and hence there are two points *f* and *g* so that, without loss of generality, \(\varOmega (f)>\varOmega (g)\).

But then by Lemma 1 for all \(1<\lambda \in \mathbb {R}\) we have \(\varOmega (\lambda g)\ge \varOmega (f)\) and thus as \(\varOmega \) non-negative and non-decreasing \(|\varOmega (\lambda g)-\varOmega (g)|\ge |\varOmega (f)-\varOmega (g)|>0\) contradicting radial continuity of \(\varOmega \). Hence \(\varOmega \) has to be constant along every circle as claimed.

*Part 2: (Radial mollification preserves being nondecreasing in tangential directions)*

The observation in part 1 is useful as we can easily radially mollify a given \(\varOmega \) so that the property of being non-decreasing along tangential directions is preserved.

*t*is non-positive we can drop the modulus to obtain that this happens if \({||}{f+f_T}{||}\ge {||}{f}{||}\) which is just James orthogonality and thus follows from the fact that \(\left[ {f_T},{f}\right] _{\mathcal {B}}=0\). This proves that the integral estimate Eq. (7) indeed holds and hence the radially mollified \(\widetilde{\varOmega }\) is indeed non-decreasing in tangential directions.

*Part 3:* (\(\varOmega \)*is as claimed)*

Putting these two observations together we obtain the result. By part 2 \(\widetilde{\varOmega }\) is of the form \(\widetilde{\varOmega }(f)=h\left( \left[ {f},{f}\right] _{\mathcal {B}}\right) \) for some continuous, non-decreasing *h*. But if we consider \(\varOmega \) along any two distinct, fixed directions given by \(f_1,f_2\in \mathcal {B}\), \(f_1\ne f_2\), \({||}{f_1}{||}={||}{f_2}{||}=1\) as \(\varOmega (t\cdot f_i) = h_{f_i}\left( \left[ {t\cdot f_i},{t\cdot f_i}\right] _{\mathcal {B}}\right) \) then the mollifications of both \(h_{f_1}\) and \(h_{f_2}\) must equal *h* so \(h_{f_1}=h_{f_2}\) almost everywhere. Further by continuity of *h* they can only differ in points of discontinuity of \(h_{f_1}\) and \(h_{f_2}\). As each \(h_{f_i}\) is a monotone function on the positive real line it can only have countably many points of discontinuity. Clearly as above bounds are only making statements about values outside a given circle and *h* is itself monotone, each \(h_{f_i}\) is free to attain any value within the monotonicity constraint in those points of discontinuity. This shows that \(\varOmega \) is of the claimed form. \(\square \)

### Remark 2

We see that everything we say about \(\varOmega \) in this section relies crucially on the observation that it being admissible is a statement about its behaviour along tangents as stated in Lemma 2. But there is in fact no tangent into the complex plane, i.e. for fixed \(\hat{f}\) there is no tangent that intersects the ray \(\{t\cdot e^{i\theta }\cdot \hat{f}\,:\,t\in \mathbb {R}\}\) for any \(\theta \). Likewise it is not possible to reach any point along said ray via an “out and back” argument as in part 1 of the proof of Lemma 1. For this reason it is currently not clear whether one can say anything about the situation in complex vector spaces.

## 4 The solution is determined by the space

First of all, while it has been known that for regularisers which are a strictly increasing function of the norm every solution is within the linear span of the data, the proofs in Sect. 3 show immediately that something stronger can be said. For a regularised interpolation problem with an admissible regulariser to have a solution which is not in the linear span of the data the regulariser must have a flat region and the solution then has to lie within the flat region.

But there is more to be said, in fact it turns out that for admissible regularisers the set of solutions in the linear span is independent of the regulariser.

*X*is a Banach space and \(L_i\) are continuous linear functionals on

*X*. Hence this agrees with Eq. (3) for \(h(t)=\sqrt{t}\) i.e. \(\varOmega (f)={(\left[ {x},{x}\right] _{\mathcal {B}})}^{\frac{1}{2}}\) and \(X=\mathcal {B}\) a uniformly convex, uniformly smooth Banach space, giving the minimal norm interpolation problem

### Theorem 3

Let \(\varOmega \) be admissible. Then any \(f_0\) which is such that \(f^*_0 = \sum \nolimits _{i=1}^m c_i x^*_i\) is a solution of Eq. (3) if and only if it is a solution of Eq. (8).

The proof of this result relies on the following result which was proved by Micchelli and Pontil in [12].

### Proposition 3

Using this result it is easy to proof Theorem 3.

### Proof of Theorem 3

*Part 1: (A solution of Eq.* (3) *is a solution of Eq.* (8))

*Part 2: (A solution of Eq.* (8) *is a solution of Eq.* (3))

Assume \(f_0\) is a solution of Eq. (8). Then by Eq. (3) there exists

This result shows that any admissible regulariser on a uniformly convex and uniformly smooth Banach space has a unique solution in the linear span of the data and the solution is the same for every admissible regulariser. This in particular means that it is the choice of the function space, and only the choice of the space, which determines the solution of the problem. We are thus free to work with whichever regulariser is most convenient in application. Computationally in many cases this is likely going to be \(\frac{1}{2}{||}{\cdot }{||}^2\), for theoretical results other regularisers may be more suitable, such as in the afore mentioned paper [12] which heavily relies on a duality between the norm of the space and its continuous linear functionals.

## Notes

## References

- 1.Argyriou, A., Micchelli, C.A., Pontil, M.: When is there a representer theorem? vector versus matrix regularizers. J. Mach. Learn. Res.
**10**, 2507–2529 (2009)MathSciNetzbMATHGoogle Scholar - 2.Brezis, H.: Functional Analysis, Sobolev Spaces and Partial Differential Equations. Universitext, 1st edn. Springer, New York (2011)zbMATHGoogle Scholar
- 3.Cox, D.D., O’Sullivan, F.: Asymptotic analysis of penalized likelihood and related estimators. Ann. Stat.
**18**(4), 1676–1695 (1990). https://doi.org/10.1214/aos/1176347872 MathSciNetCrossRefzbMATHGoogle Scholar - 4.Cucker, F., Smale, S.: On the mathematical foundations of learning. Bull. Am. Math. Soc.
**39**(1), 1–49 (2001)MathSciNetCrossRefzbMATHGoogle Scholar - 5.Dragomir, S.S.: Semi-inner Products and Applications. Nova Science Publishers, Hauppauge (2004)zbMATHGoogle Scholar
- 6.Giles, J.R.: Classes of semi-inner-product spaces. Trans. Am. Math. Soc.
**129**(3), 436–446 (1967)MathSciNetCrossRefzbMATHGoogle Scholar - 7.James, R.C.: Orthogonality and linear functionals in normed linear spaces. Trans. Am. Math. Soc.
**61**(2), 265–292 (1947)MathSciNetCrossRefGoogle Scholar - 8.Kimeldorf, G., Wahba, G.: Some results on Tchebycheffian spline functions. J. Math. Anal. Appl.
**33**(1), 82–95 (1971). https://doi.org/10.1016/0022-247X(71)90184-3 MathSciNetCrossRefzbMATHGoogle Scholar - 9.Köthe, G.: Topological Vectorspaces I, Grundlehren der Mathematischen Wissenschaften, vol. 159. Springer, Berlin (1983)Google Scholar
- 10.Lindenstrauss, J., Tzafriri, L.: Classical Banach Spaces II: Function Spaces, Ergebnisse der Mathematik und ihrer Grenzgebiete, vol. 97. Springer, Berlin (1979)CrossRefzbMATHGoogle Scholar
- 11.Lumer, G.: Semi-inner-product spaces. Trans. Am. Math. Soc.
**100**(1), 29–43 (1961)MathSciNetCrossRefzbMATHGoogle Scholar - 12.Micchelli, C.A., Pontil, M.: A function representation for learning in banach spaces. In: Shawe-Taylor, J., Singer, Y. (eds.) Learning Theory, pp. 255–269. Springer, Berlin (2004)CrossRefGoogle Scholar
- 13.Micchelli, C.A., Pontil, M.: Learning the kernel function via regularization. J. Mach. Lear. Res.
**6**, 1099–1125 (2005)MathSciNetzbMATHGoogle Scholar - 14.Schölkopf, B., Herbrich, R., Smola, A.J.: A generalized representer theorem. In: Helmbold, D., Williamson, B. (eds.) Computational Learning Theory, pp. 416–426. Springer, Berlin (2001)CrossRefGoogle Scholar
- 15.Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)zbMATHGoogle Scholar
- 16.Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004). https://doi.org/10.1017/CBO9780511809682 CrossRefzbMATHGoogle Scholar
- 17.Smola, J.A., Schölkopf, B.: On a kernel-based method for pattern recognition, regression, approximation, and operator inversion. Algorithmica
**22**(1), 211–231 (1998). https://doi.org/10.1007/PL00013831 MathSciNetCrossRefzbMATHGoogle Scholar - 18.Zhang, H., Xu, Y., Zhang, J.: reproducing kernel banach spaces for machine learning. J. Mach. Learn. Res.
**10**, 2741–2775 (2009)MathSciNetzbMATHGoogle Scholar - 19.Zhang, H., Zhang, J.: Regularized learning in banach spaces as an optimization problem: representer theorems. J. Global Optim.
**54**(2), 235–250 (2012). https://doi.org/10.1007/s10898-010-9575-z MathSciNetCrossRefzbMATHGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.