, Volume 54, Issue 3, pp 963–1007 | Cite as

Data assimilation and sampling in Banach spaces

  • Ronald DeVore
  • Guergana Petrova
  • Przemyslaw Wojtaszczyk


This paper studies the problem of approximating a function f in a Banach space \(\mathcal{X}\) from measurements \(l_j(f)\), \(j=1,\ldots ,m\), where the \(l_j\) are linear functionals from \(\mathcal{X}^*\). Quantitative results for such recovery problems require additional information about the sought after function f. These additional assumptions take the form of assuming that f is in a certain model class \(K\subset \mathcal{X}\). Since there are generally infinitely many functions in K which share these same measurements, the best approximation is the center of the smallest ball B, called the Chebyshev ball, which contains the set \(\bar{K}\) of all f in K with these measurements. Therefore, the problem is reduced to analytically or numerically approximating this Chebyshev ball. Most results study this problem for classical Banach spaces \(\mathcal{X}\) such as the \(L_p\) spaces, \(1\le p\le \infty \), and for K the unit ball of a smoothness space in \(\mathcal{X}\). Our interest in this paper is in the model classes \(K=\mathcal{K}(\varepsilon ,V)\), with \(\varepsilon >0\) and V a finite dimensional subspace of \(\mathcal{X}\), which consists of all \(f\in \mathcal{X}\) such that \(\mathrm{dist}(f,V)_\mathcal{X}\le \varepsilon \). These model classes, called approximation sets, arise naturally in application domains such as parametric partial differential equations, uncertainty quantification, and signal processing. A general theory for the recovery of approximation sets in a Banach space is given. This theory includes tight a priori bounds on optimal performance and algorithms for finding near optimal approximations. It builds on the initial analysis given in Maday et al. (Int J Numer Method Eng 102:933–965, 2015) for the case when \(\mathcal{X}\) is a Hilbert space, and further studied in Binev et al. (SIAM UQ, 2015). It is shown how the recovery problem for approximation sets is connected with well-studied concepts in Banach space theory such as liftings and the angle between spaces. Examples are given that show how this theory can be used to recover several recent results on sampling and data assimilation.


Optimal recovery Reduced modeling Data assimilation Sampling 

Mathematics Subject Classification

46B99 41A65 94A20 

1 Introduction

One of the most ubiquitous problems in science is to approximate an unknown function f from given data observations of f. Problems of this type go under the terminology of optimal recovery, data assimilation or the more colloquial terminology of data fitting. To prove quantitative results about the accuracy of such recovery requires not only the data observations, but also additional information about the sought after function f. Such additional information takes the form of assuming that f is in a prescribed model class \(K\subset \mathcal{X}\).

The classical setting for such problems (see, for example, [9, 25, 26, 35]) is that one has a bounded set K in a Banach space \(\mathcal{X}\) and a finite collection of linear functionals \(l_j\), \(j=1,\ldots ,m\), from \(\mathcal{X}^*\). Given a function which is known to be in K and to have known measurements \(l_j(f)=w_j\), \(j=1,\ldots ,m\), the optimal recovery problem is to construct the best approximation to f from this information. Since there are generally infinitely many functions in K which share these same measurements, the best approximation is the center of the smallest ball B, called the Chebyshev ball, which contains the set \(\bar{K}\) of all f in K with these measurements. The best error of approximation is then the radius of this Chebyshev ball.

Most results in optimal recovery study this problem for classical Banach spaces \(\mathcal{X}\) such as the \(L_p\) spaces, \(1\le p\le \infty \), and for K the unit ball of a smoothness space in \(\mathcal{X}\). Our interest in this paper is in certain other model classes K, called approximation sets, that arise in various applications. As a motivating example, consider the analysis of complex physical systems from data observations. In such settings the sought after functions satisfy a (system of) parametric partial differential equation(s) with unknown parameters and hence lie in the solution manifold \(\mathcal{M}\) of the parametric model. There may also be uncertainty in the parametric model. Problems of this type fall into the general paradigm of uncertainty quantification. The solution manifold of a parametric partial differential equation (pde) is a complex object and information about the manifold is usually only known through approximation results on how well the elements in the manifold can be approximated by certain low dimensional linear spaces or by sparse Taylor (or other) polynomial expansions (see [11]). For this reason, the manifold \(\mathcal{M}\) is often replaced, as a model class, by the set \(\mathcal{K}(\varepsilon ,V)\) consisting of all elements in \(\mathcal{X}\) that can be approximated by the linear space \(V=V_n\) of dimension n to accuracy \(\varepsilon =\varepsilon _n\). We call these model classes \(\mathcal{K}(\varepsilon ,V)\)approximation sets and they are the main focus of this paper. Approximation sets also arise naturally as model classes in other settings such as signal processing where the problem is to construct an approximation to a signal from samples (see e.g. [1, 2, 3, 12, 36] and the papers cited therein as representative), although this terminology is not in common use in that setting.

Optimal recovery in this new setting of approximation sets as the model class was formulated and analyzed in [24] when \(\mathcal{X}\) is a Hilbert space, and further studied in [8]. In particular, it was shown in the latter paper that a certain numerical algorithm proposed in [24], based on least squares approximation, is optimal.

The purpose of the present paper is to provide a general theory for the optimal or near optimal recovery of approximation sets in a general Banach space \(\mathcal{X}\). While, as noted in the abstract, the optimal recovery has a simple theoretical description as the center of the Chebyshev ball and the optimal performance, i.e., the best error, is given by the radius of the Chebyshev ball, this is far from a satisfactory solution to the problem since it is not clear how to find the center and the radius of the Chebyshev ball. This leads to the two fundamental problems studied in the paper. The first centers on building numerically executable algorithms which are optimal or perhaps only near optimal, i.e., they either determine the Chebyshev center or approximate it sufficiently well. The second problem is to give sharp a priori bounds for the best error, i.e. the Chebyshev radius, in terms of easily computed quantities. We show how these two problems are connected with well-studied concepts in Banach space theory such as liftings and the angle between spaces. Our main results determine a priori bounds for optimal algorithms and give numerical recipes for obtaining optimal or near optimal algorithms.

1.1 Notation and problem formulation

Let \(\mathcal{X}\) be a Banach space with norm \(\Vert \cdot \Vert =\Vert \cdot \Vert _\mathcal{X}\) and let \(S\subset \mathcal{X}\) be any subset of \(\mathcal{X}\). We assume we are given measurement functionals \(l_1,\ldots ,l_m\in \mathcal{X}^*\) that are linearly independent. We study the general question of how best to recover a function f from the information that \(f\in S\) and f has the known measurements \(M(f):= M_m(f):=(l_1(f), \ldots ,l_m(f))=(w_1,\ldots ,w_m)\in \mathbb {R}^m\). In general, many functions \(f\in S\) may share the same data. We denote this collection by
$$\begin{aligned} S_w:= \{f\in S:\ l_j(f)=w_j, \quad \ j=1,\ldots ,m\}. \end{aligned}$$
An algorithm A for this recovery problem is a mapping which when given the measurement data \(w=(w_1,\ldots ,w_m)\) assigns an element \( A(w)\in \mathcal{X}\) as the approximation to f. Thus, an algorithm is a possibly nonlinear mapping
$$\begin{aligned} A: \ \mathbb {R}^m \mapsto \mathcal{X}. \end{aligned}$$
In this paper, we shall consider theoretical algorithms, which may or may not be easy to implement numerically. In some cases, we discuss the numerical issues in making these executable numerical algorithms.
A pointwise optimal algorithm \( A^*\) (if it exists) is one which minimizes the worst error for each w:
$$\begin{aligned} A^*(w):=\mathop {\mathrm{argmin}}\limits _{g\in \mathcal{X}}\sup _{f\in S_w}\Vert f-g\Vert . \end{aligned}$$
This optimal algorithm has a simple geometrical description that is well known (see e.g. [25]). For a given w, we consider all balls \(B(a,r)\subset \mathcal{X}\) which contain \(S_w\). The smallest ball B(a(w), r(w)), if it exists, is called the Chebyshev ball and its radius is called the Chebyshev radius of \(S_w\). We postpone the discussion of existence, uniqueness, and properties of the smallest ball to the next section. For now, we remark that when the Chebyshev ball B(a(w), r(w)) exists for each measurement vector \(w\in \mathbb {R}^m\), then the pointwise optimal algorithm is the mapping \(A^*:\ w\rightarrow a(w)\) and the pointwise optimal error for this recovery problem is given by
$$\begin{aligned} \mathrm{rad}(S_w):=\sup _{f\in S_w}\Vert f-a(w)\Vert = \inf _{a\in \mathcal{X}} \sup _{f\in S_w}\Vert f-a\Vert = \inf _{a\in \mathcal{X}} \inf \{ r:\ S_w\subset B(a,r)\}. \end{aligned}$$
In summary, the smallest error that any algorithm for the recovery of \(S_w\) can attain is \(\mathrm{rad}(S_w)\), and it is attained by taking the center of the Chebyshev ball of \(S_w\).
Pointwise near optimal algorithmWe say that an algorithmAis pointwise near optimal with constantCfor the setSif
$$\begin{aligned} \sup _{f\in S_w}\Vert f-A(w)\Vert \le C\mathrm{rad}(S_w),\quad \forall w\in \mathbb {R}^m. \end{aligned}$$
In applications, one typically knows the linear functionals \(l_j\), \(j=1,\ldots ,m\), (the measurement devices) but has no a priori knowledge of the measurement values \(w_j\), \(j=1,\ldots ,m\), that will arise. Therefore, a second meaningful measure of performance is
$$\begin{aligned} R(S):=\sup _{w\in \mathbb {R}^m}\mathrm{rad}(S_w). \end{aligned}$$
Note that an algorithm which achieves the bound R(S) will generally not be optimal for each \(w\in \mathbb {R}^m\). We refer to the second type of estimates as global and a global optimal algorithm would be one that achieved the bound R(S).
Global near optimal algorithmWe say that an algorithm A is a global near optimal algorithm with constant C for the set S, if
$$\begin{aligned} \sup _{w\in \mathbb {R}^m} \sup _{f\in S_w}\Vert f-A(w)\Vert \le C R(S). \end{aligned}$$

Note that if an algorithm is near optimal for each of the sets \(S_w\) with a constant C, independent of w, then it is a global near optimal algorithm with the same constant C.

The above description in terms of \(\mathrm{rad}(S_w)\) provides a nice simple geometrical description of the optimal recovery problem. However, it is not a practical solution for a given set S, since the problem of finding the Chebyshev center and radius of \(S_w\) is essentially the same as the original optimal recovery problem, and moreover, is known, in general, to be NP hard (see [19]). Nevertheless, it provides some guide to the construction of optimal or near optimal algorithms.

We will consider general Banach spaces \(\mathcal{X}\) and sometimes we will impose more structure on these spaces. Recall that the unit ball \(U:=U(\mathcal{X})\) of the Banach space \(\mathcal{X}\) is always convex. A Banach space \(\mathcal{X}\) is said to be strictly convex if
$$\begin{aligned} \Vert f+g\Vert < 2, \quad \forall f,g\in U, \quad f\ne g. \end{aligned}$$
A stronger property of \(\mathcal{X}\) is the uniform convexity. To describe this property, we introduce the modulus of convexity of \(\mathcal{X}\) defined by
$$\begin{aligned} \delta (\varepsilon ):=\delta _\mathcal{X}(\varepsilon ):=\inf \left\{ 1 - \left\| \frac{f + g}{2} \right\| \,:\, f, g \in U, \Vert f - g \Vert \ge \varepsilon \right\} ,\quad 0<\varepsilon \le 2. \end{aligned}$$
The space \(\mathcal{X}\) is called uniformly convex if \(\delta (\varepsilon )>0\) for all \(0< \varepsilon \le 2\). For uniformly convex spaces \(\mathcal{X}\), it is known that \(\delta \) is a strictly increasing function taking values in [0, 1] as \(\varepsilon \) varies in [0, 2] (see, for example, [4, Th. 2.3.7]).

Uniform convexity implies strict convexity and it also implies that \(\mathcal{X}\) is reflexive (see [Prop. 1.e.3] in [23]), i.e. \(\mathcal{X}^{**}=\mathcal{X}\), where \(\mathcal{X}^*\) denotes the dual space of \(\mathcal{X}\). If \(\mathcal{X}\) is uniformly convex, then there is quite a similarity between the results we obtain and those in the Hilbert space case. This covers, for example, the case when \(\mathcal{X}\) is an \(L_p\) or \(\ell _p\) space for \(1<p<\infty \), or one of the Sobolev spaces for these \(L_p\). The cases \(p=1,\infty \), as well as the case of a general Banach space \(\mathcal{X}\), add some new wrinkles and the theory is not as complete.

Of central interest to us are special model classes, which we call approximation sets.

Approximation setWe call the set\(\mathcal{K}=\mathcal{K}(\varepsilon ,V)\)an approximation set if
$$\begin{aligned} \mathcal{K}=\{f\in \mathcal{X}\,:\,\,\mathrm{dist}(f, V)_\mathcal{X}\le \varepsilon \}, \end{aligned}$$
where\(V\subset \mathcal{X}\)is a known finite dimensional space, and\(\mathrm{dist}(f, V)_\mathcal{X}:=\inf _{g\in V}\Vert f-g\Vert _{\mathcal{X}}\).

We denote the dependence of \(\mathcal{K}\) on \(\varepsilon \) and V only when this is not clear from the context.

1.2 Summary of results

The main contribution of the present paper is to describe near optimal algorithms for the recovery of approximation sets in a general Banach space \(\mathcal{X}\).

In the first part of this paper, namely 2 and 3, we use classical ideas and techniques of Banach space theory (see, for example, [6, 23, 31, 37]), to provide results on the optimal recovery of the sets \(S_w\) for any set \(S\subset \mathcal{X}\), \(w\in \mathbb {R}^m\). Much of the material in these two sections is known or easily derived from known results, but we recount this for the benefit of the reader and to ease the exposition of this paper. Not surprisingly, the form of these results depends very much on the structure of the Banach space. The determination of optimal or near optimal algorithms and their performance is connected to liftings (see §3), and the angle between the space V and the null space \(\mathcal{N}\) of the measurements (see §4). These concepts allow us to describe a general procedure for constructing recovery algorithms A for approximation sets which are pointwise near optimal (with constant 2) and hence are also globally near optimal (see §5).

The near optimal algorithms \(A: \mathbb {R}^m\mapsto \mathcal{X}\) that we construct, see Theorem 5.1, satisfy the performance bound
$$\begin{aligned} \Vert f-A(M(f))\Vert _\mathcal{X}\le C \mu (\mathcal{N},V)\varepsilon ,\quad f\in \mathcal{K}(\varepsilon ,V), \end{aligned}$$
where \(\mu (\mathcal{N},V)\) is the reciprocal of the angle \(\theta (\mathcal{N},V)\) between V and the null space \(\mathcal{N}\) of the measurement map M and C is any constant larger than 4. We prove that this estimate is near optimal in the sense that, for any recovery algorithm A, we have
$$\begin{aligned} \sup _{f\in \mathcal{K}(\varepsilon ,V)} \Vert f-A(M(f))\Vert _\mathcal{X}\ge \mu (\mathcal{N},V)\varepsilon , \end{aligned}$$
and so the only possible improvement that can be made in our algorithm is to reduce the size of the constant C. Thus, the constant \(\mu (\mathcal{N},V)\), respectively the angle \(\theta (\mathcal{N},V)\), determines how well we can recover approximation sets and quantifies the compatibility between the measurements and V. As we have already noted, this result is not new for the Hilbert space setting.
An important fact concerning our reconstruction algorithm A that we exploit in this paper is that it does not depend on \(\varepsilon \) (and, in fact, does not require the knowledge of \(\varepsilon \)). This means that we have the performance bound
$$\begin{aligned} \Vert f-A(M(f))\Vert _\mathcal{X}\le C \mu (\mathcal{N},V) \mathrm{dist}(f,V)_\mathcal{X},\quad f\in \mathcal{X}, \end{aligned}$$
and the algorithm only requires knowledge of f and V.
In Sect. 6, we discuss the main issues that could arise in the numerical implementation of the proposed algorithms A. In Sect. §7, we give examples of how to implement our near optimal algorithm in concrete settings when \(\mathcal{X}=L_p\), or \(\mathcal{X}\) is the space of continuous functions on a domain \(D\subset \mathbb {R}^d\), \(\mathcal{X}=C(D)\). As a representative example of the results in that section, consider the case \(\mathcal{X}=C(D)\) with D a domain in \(\mathbb {R}^d\), and suppose that the measurements functionals are \(l_j(f)=f(P_j)\), \(j=1,\ldots ,m\), where the \(P_j\) are points in D. Then, we prove that
$$\begin{aligned} \frac{1}{2} \mu (\mathcal{N},V) \le \sup _{v\in V}\frac{\Vert v\Vert _{C(D)}}{\displaystyle {\max _{1\le j\le m}|v(P_j)}|} \le 2 \mu (\mathcal{N},V), \end{aligned}$$
and hence, the performance of this data fitting problem is controlled by the ratio of the continuous and discrete norms on V. Results of this type are well-known, via Lebesgue constants, in the case of interpolation (when \(m=n\)).

In §8, we discuss how our results are related to generalized sampling in a Banach space and touch upon how several recent results in sampling can be obtained from our approach. Finally, in §9, we discuss good choices for where to take measurements if this option is available to us.

2 Preliminary remarks

In this section, we recall some standard concepts in Banach spaces and relate them to the optimal recovery problem of interest to us. We work in the setting that S is any set (not necessarily an approximation set). The results of this section are essentially known and are given only to orient the reader.

2.1 The Chebyshev ball

Note that, in general, the center of the Chebyshev ball may not come from S. This can even occur in finite dimensional setting (see Example 2.4 given below). However, it may be desired, or even required in certain applications, that the recovery for S be a point from S. The description of optimal algorithms with this requirement is connected with what we call the restricted Chebyshev ball of S. To explain this, we introduce some further geometrical concepts.

For a subset S in a Banach space \(\mathcal{X}\), we define the following quantities:
  • The diameter of S is defined by \(\mathrm{diam}(S):= \displaystyle {\sup _{f,g\in S}\Vert f-g\Vert }\).

  • The restricted Chebyshev radius of S is defined by
    $$\begin{aligned} {\mathrm{rad}}_{ R}(S):=\inf _{a\in S}\inf \{ r:\ S \subset B(a,r)\}=\inf _{a\in S}\sup _{f\in S} \Vert f-a\Vert . \end{aligned}$$
  • The Chebyshev radius of S was already defined as
    $$\begin{aligned} \mathrm{rad}(S):=\inf _{a\in \mathcal{X}}\inf \{ r:\ S \subset B(a,r)\}=\inf _{a\in \mathcal{X}}\sup _{f\in S} \Vert f-a\Vert . \end{aligned}$$
It is clear that for every \(S\subset \mathcal{X}\), we have
$$\begin{aligned} \mathrm{diam}(S)\ge {\mathrm{rad}}_{ R}(S)\ge \mathrm{rad}(S)\ge \tfrac{1}{2} \mathrm{diam}(S). \end{aligned}$$
Let us start with the following simple theorem, whose proof we omit, that tells us that we can construct near optimal algorithms for the recovery of the set S if we can simply find a point \(a\in S\).

Theorem 2.1

Let S be any subset of \(\mathcal{X}\). If \(a\in S\), then
$$\begin{aligned} {\mathrm{rad}} (S)\le {\mathrm{rad}}_{ R}(S)\le \sup _{f\in S} \Vert f-a\Vert \le 2 \mathrm{rad}(S), \end{aligned}$$
and therefore a is, up to the constant 2, an optimal recovery of S.

We say that an \(a\in \mathcal{X}\) which recovers S with the accuracy of (2.2) provides a near optimal recovery with constant 2. We shall use this theorem in our construction of algorithms for the recovery of the sets \(\mathcal{K}_w \), when \(\mathcal{K}\) is an approximation set. The relevance of the above theorem and (2.1) for our recovery problem is that if we determine the diameter or restricted Chebyshev radius of \(\mathcal{K}_w\), we will determine the optimal error \(\mathrm{rad}(\mathcal{K}_w)\) in the recovery problem, but only up to the factor two.

2.2 Is \(\mathrm{rad}(S)\) assumed?

In view of the discussion preceding (1.1), the best pointwise error we can achieve by any recovery algorithm for S is given by \(\mathrm{rad}(S)\). Unfortunately, in general, for arbitrary bounded sets S in a general infinite dimensional Banach space \(\mathcal{X}\), the radius \(\mathrm{rad}(S)\) may not be assumed. The first such example was given in [17], where \(\mathcal{X}=\{f\in C[-1,1] :\intop \limits _{-1}^1 f(t)\, dt =0\}\) with the uniform norm and \(S\subset \mathcal{X}\) is a set consisting of three functions. Another, simpler example of a set S in this same space was given in [33]. In [21], it is shown that each nonreflexive space admits an equivalent norm for which such examples also exist. If we place more structure on the Banach space \(\mathcal{X}\), then we can show that the radius of any bounded subset \(S\subset \mathcal{X}\) is assumed. We present the following special case of an old result of Garkavi (see [17, Th. II]).

Lemma 2.2

If the Banach space \(\mathcal{X}\) is reflexive (in particular, if it is finite dimensional), then for any bounded set \(S\subset \mathcal{X}\), \(\mathrm{rad}(S)\) is assumed in the sense that there is a ball B(ar) with \(r=\mathrm{rad}(S)\) which contains S. If, in addition, \(\mathcal{X}\) is uniformly convex, then this ball is unique.


Let \(B(a_n,r_n)\) be balls which contain S and for which \(r_n\rightarrow \mathrm{rad}(S)=:r\). Since S is bounded, the \(a_n\) are bounded and hence, without loss of generality, we can assume that \(a_n\) converges weakly to \(a\in \mathcal{X}\) (since every bounded sequence in a reflexive Banach space has a weakly converging subsequence). Now let f be any element in S. Then, there is a norming functional \(l\in \mathcal{X}^*\) of norm one for which \(l(f-a)=\Vert f-a\Vert \). Therefore
$$\begin{aligned} \Vert f-a\Vert =l(f-a)=\lim _{n\rightarrow \infty } l(f-a_n)\le \lim _{n\rightarrow \infty }\Vert f-a_n\Vert \le \lim _{n\rightarrow \infty } r_n=r. \end{aligned}$$
This shows that B(ar) contains S and so the radius is attained. If \(\mathcal{X}\) is uniformly convex and we assume that there are two balls, centered at a and \(a'\), respectively, \(a\ne a'\), each of radius r which contain S. If \(\varepsilon :=\Vert a-a'\Vert >0\), since \(\mathcal{X}\) is uniformly convex, for every \(f\in S\) and for \(\bar{a}:=\frac{1}{2}(a+a')\), we have
$$\begin{aligned} \Vert f-\bar{a}\Vert =\left\| \frac{f-a}{2}+\frac{f-a'}{2}\right\| \le r- r\delta (\varepsilon /r)<r, \end{aligned}$$
which contradicts the fact that \(\mathrm{rad}(S)=r\). \(\square \)

2.3 Some examples

In this section, we will show that for centrally symmetric (i.e. if \(f\in S\) then also \(-f\in S\)), convex sets S, we have a very explicit relationship between the \(\mathrm{diam}(S)\), \(\mathrm{rad}(S)\), and \({\mathrm{rad}}_{ R}(S)\). We also give some examples showing that for general sets S the situation is more involved and the only relationship between the latter quantities is the one given in (2.1).

Proposition 2.3

Let \(S\subset \mathcal{X}\) be a centrally symmetric, convex set in a Banach space \(\mathcal{X}\). Then, we have

(i) the smallest ball containing S is centered at 0 and
$$\begin{aligned} \mathrm{diam}(S)=2\sup _{f\in S} \Vert f\Vert =2 \mathrm{rad}(S)=2 {\mathrm{rad}}_{ R}(S). \end{aligned}$$
(ii) for any \(w\in \mathbb {R}^m\), we have \(\mathrm{diam}(S_w)\le \mathrm{diam}(S_0)\), where \(0\in \mathbb {R}^m\).


(i) We need only consider the case when \(r:=\sup _{f\in S} \Vert f\Vert <\infty \). Clearly \(0\in S\), \(S\subset B(0,r)\), and thus \({\mathrm{rad}}_{ R}(S)\le r\). In view of (2.1), \(\mathrm{diam}(S)\le 2\mathrm{rad}(S)\le 2{\mathrm{rad}}_{ R}(S)\le 2r\). Therefore, we need only show that \(\mathrm{diam}(S)\ge 2r\). For any \(\varepsilon >0\), let \(f_\varepsilon \in \mathcal{S}\) be such that \(\Vert f_\varepsilon \Vert \ge r-\varepsilon \). Since S is centrally symmetric \(-f_\varepsilon \in S\) and \(\mathrm{diam}(S)\ge \Vert f_\varepsilon -(-f_\varepsilon )\Vert \ge 2r-2\varepsilon .\) Since \(\varepsilon >0\) is arbitrary, \(\mathrm{diam}(S)\ge 2r\), as desired.

(ii) We need only consider the case \(\mathrm{diam}(S_0)<\infty \). Let \( g,h\in S_w\). From the convexity and central symmetry of S, we know that \(\frac{1}{2}( g-h)\) and \(\frac{1}{2}( h-g)\) are both in \(S_0\). Therefore
$$\begin{aligned} \mathrm{diam}S_0\ge \left\| \frac{1}{2}( g-h)-\frac{1}{2}( h-g)\right\| =\Vert g-h\Vert . \end{aligned}$$
Since gh were arbitrary, we get \(\mathrm{diam}(S_0)\ge \mathrm{diam}(S_w)\). \(\square \)
In what follows, we denote by \(\ell _p(\mathbb {N})\) the set of all real sequences x, such that
$$\begin{aligned}&\ell _p(\mathbb {N}):=\left\{ x=(x_1,x_2,\ldots ):\,\Vert x\Vert _{\ell _p(\mathbb {N})}:=\left( \sum _{j=1}^\infty |x_j|^p\right) ^{1/p}<\infty \right\} , \quad 1\le p<\infty ,\\&\ell _\infty (\mathbb {N}):=\left\{ x=(x_1,x_2,\ldots ):\,\Vert x\Vert _{\ell _\infty (\mathbb {N})}:=\sup _j|x_j|<\infty \right\} {,} \end{aligned}$$
$$\begin{aligned} c_0:=\left\{ x=(x_1,x_2,\ldots ):\lim _{j\rightarrow \infty } x_j=0\right\} {,} \quad \Vert x\Vert _{c_0}=\Vert x\Vert _{\ell _\infty (\mathbb {N})}. \end{aligned}$$
By \(\ell _p(m)\), \(1\le p<\infty \), and \(\ell _\infty (m)\) we will denote the linear space \(\mathbb {R}^m\) with the above norms. The following examples illustrate that except (2.1), very little can be said about relations between \(\mathrm{diam}(S)\), \(\mathrm{rad}(S)\), \({\mathrm{rad}}_{R}(S)\).
  • For \(\mathcal{X}=c_0\), \(S:=\{x\ :\ x_j\ge 0 \text{ and } \sum _{j=1}^\infty x_j=1\}\), we have \(\mathrm{rad}(S)={\mathrm{rad}}_{R}(S)=\mathrm{diam}(S)=1.\)

  • For \(\mathcal{X}=\ell _p(\mathbb {N})\), \(1<p<\infty \), and \(S:=\{x\ :\ x_j\ge 0 \text{ and } \sum _{j=1}^\infty x_j=1\}\), one computes that \(\mathrm{diam}(S)=2^{1/p}\), and \({\mathrm{rad}}_{R}(S)=\mathrm{rad}(S)=1\).

  • For \(\mathcal{X}=L_1([0,1])\), \(S:=\{f\in L_1([0,1]) :\ f\ge 0 \ \text{ and } \ \intop \limits _0^1|f|=1\}\), we have \(\mathrm{diam}(S)={\mathrm{rad}}_{R}(S)=2\), but \(\mathrm{rad}(S)=1\).

Example 2.4

For \(\mathcal{X}:={\ell _\infty (3)}\), consider the simplex T with vertices (1, 1, 0), (1, 0, 1), and (0, 1, 1),
$$\begin{aligned} T:=\{x=(x_1,x_2,x_3)\ :\ \Vert x\Vert _\infty \le 1 \text{ and } x_1+x_2+x_3=2\}. \end{aligned}$$
We have \( \mathrm{diam}(T)=1, \quad \mathrm{rad}(T)=\frac{1}{2}, \quad {\mathrm{rad}}_{ R}(T)=\frac{2}{3}. \)

Indeed, since T is the convex hull of its vertices, any point in T has coordinates in [0, 1], and hence the distance between any two such points is at most one. Since the vertices are at distance one from each other, we have that \(\mathrm{diam}(T)=1\). It follows from (2.1) that \(\mathrm{rad}(T)\ge 1/2\). Note that the ball with center \((\frac{1}{2},\frac{1}{2},\frac{1}{2})\) and radius 1 / 2 contains T, and so \(\mathrm{rad}(T)=1/2\). Given any point \(z\in T\) which is a potential center of the restricted Chebyshev ball for T, at least one of the coordinates of z is at least 2 / 3 (because \(z_1+z_2+z_3=2\)), and thus has distance at least 2 / 3 from one of the vertices of T. On the other hand, the ball with center \((\frac{2}{3}, \frac{2}{3}, \frac{2}{3})\in T\) and radius \(\frac{2}{3}\) contains T.

2.4 Connection to approximation sets and measurements

The discussion in this section is directed at showing that the behavior, observed in 2.3, can occur even when the sets S are described through measurements. The next example is a modification of Example 2.4, and the set under consideration is of the form \(\mathcal{K}_w\), where \(\mathcal{K}\) is an approximation set.

Example 2.5

We take \(\mathcal{X}:={\ell _\infty (4)}\), and define \(V\subset \mathcal{X}\) as the one dimensional subspace spanned by \(e_1:=(1,0,0,0)\). We consider the approximation set
$$\begin{aligned} \mathcal{K}=\mathcal{K}(1,V):=\{x\in \mathbb {R}^4: \mathrm{dist}(x,V)_{ \mathcal{X}}\le 1\}, \end{aligned}$$
and the measurement operator \(M(x_1,x_2,x_3,x_4)=(x_1,x_2+x_3+x_4)\). Let us now take the measurement \(w=(0,2)\in \mathbb {R}^2\) and look at \(\mathcal{K}_w\). Since
$$\begin{aligned} \mathcal{K}= \{(t,x_2,x_3,x_4):\ t\in \mathbb {R},\ \max _{2\le j\le 4}|x_j|\le 1\}, \quad \mathcal{X}_w= \{(0,x_2,x_3,x_4): \ x_2+x_3+x_4=2\}, \end{aligned}$$
we infer that \(\mathcal{K}_w=\mathcal{X}_w\cap \mathcal{K}=\{ (0, x) \ : \ x\in \ T\}\), where T is the set from Example 2.4. Thus, we have
$$\begin{aligned} \mathrm{diam}(\mathcal{K}_w)=1, \ \ \ \ \ \ \ \mathrm{rad}(\mathcal{K}_w)=\tfrac{1}{2},\ \ \ \ \ \ \ \ {\mathrm{rad}}_{ R}( \mathcal{K}_w)=\tfrac{2}{3}.\end{aligned}$$

The following theorem shows that any example for general sets S can be transferred to the setting of interest to us, where the sets are of the form \(\mathcal{K}_w\) with \(\mathcal{K}\) being an approximation set.

Theorem 2.6

Suppose X is a Banach space and \(K\subset X\) is a non-empty, closed and convex subset of the closed unit ball U of X. Then, there exists a Banach space \(\mathcal{X}\), a finite dimensional subspace \(V\subset \mathcal{X}\), a measurement operator M, and a measurement w, such that for the approximation set \(\mathcal{K}:=\mathcal{K}(1,V)\), we have
$$\begin{aligned} \mathrm{diam}(\mathcal{K}_w)=\mathrm{diam}(K), \quad \mathrm{rad}(\mathcal{K}_w)=\mathrm{rad}(K), \quad {\mathrm{rad}}_{ R}(\mathcal{K}_w)={\mathrm{rad}}_{ R}(K). \end{aligned}$$


Given X, we first define \(Z:=X\oplus \mathbb {R}:=\{( f,\alpha ):\ f\in X, \ \alpha \in \mathbb {R}\}.\) Any norm on Z is determined by describing its unit ball, which can be taken as any closed, bounded, centrally symmetric convex set. We take the set \(\Omega \) to be the convex hull of the set \((U,0)\cup (K,1)\cup (-K,-1)\). Since \(K\subset U\), it follows that a point of the form (f, 0) is in \( \Omega \) if and only if \(\Vert f\Vert _X\le 1\). Therefore, for any \( f\in X\),
$$\begin{aligned} \Vert ( f,0)\Vert _Z=\Vert f\Vert _X. \end{aligned}$$
Note also that for any point \( ( f,\alpha )\in \Omega \), we have \(\max \{\Vert f\Vert _X,|\alpha |\}\le 1\), and thus
$$\begin{aligned} \max \{\Vert f\Vert _X,|\alpha |\}\le \Vert ( f,\alpha )\Vert _Z. \end{aligned}$$
It follows from (2.3) that for any \( f_1, f_2\in X\), we have \(\Vert ( f_1,1)-( f_2,1)\Vert _Z=\Vert f_1- f_2\Vert _X\). Now we define \(\tilde{K}:=(K,1)\subset Z\). Then, we have
$$\begin{aligned}\mathrm{diam}(\tilde{K})_Z=\mathrm{diam}(K)_X, \quad {\mathrm{rad}}_{ R}(\tilde{K})_Z={\mathrm{rad}}_{ R}(K)_X. \end{aligned}$$
Clearly, \(\mathrm{rad}(\tilde{K})_Z\le \mathrm{rad}(K)_X\). On the other hand, for each \(( f',\alpha )\in Z\), we have
$$\begin{aligned} \sup _{( f,1)\in \tilde{K}}\Vert ( f,1)-( f',\alpha )\Vert _Z =\sup _{ f\in K}\Vert ( f- f',1-\alpha )\Vert _Z\ge \sup _{ f\in K}\Vert f- f'\Vert _X \ge \mathrm{rad}(K)_X, \end{aligned}$$
where the next to last inequality uses (2.4). Therefore, we have \(\mathrm{rad}(K)_X=\mathrm{rad}(\tilde{K})_Z\). Next, we consider the functional \(\Phi \in Z^*\), defined by
$$\begin{aligned} \Phi ( f,\alpha )=\alpha . \end{aligned}$$
It follows from (2.4) that it has norm one and
$$\begin{aligned} \{z\in Z : \Phi (z)=1,\ \Vert z\Vert _Z\le 1\}=\{( f,1)\in \Omega \} =\{( f,1): \ f\in K\}=\tilde{K}, \end{aligned}$$
where the next to the last equality uses the fact that a point of the form (f, 1) is in \(\Omega \) if and only if \( f\in K\). We next define the space \(\mathcal{X}=Z \oplus \mathbb {R}:=\{(z,\beta ):\ z\in Z, \beta \in \mathbb {R}\}\), with the norm
$$\begin{aligned} \Vert (z,\beta )\Vert _\mathcal{X}:=\ \max \{\Vert z\Vert _Z,|\beta |\}. \end{aligned}$$
Consider the subspace \(V=\{ (0,t) :\ t\in \mathbb {R}\}\subset \mathcal{X}\). If we take \(\varepsilon =1\), then the approximation set \(\mathcal{K}=\mathcal{K}(1,V)\subset \mathcal{X}\) is \(\mathcal{K}=\{(z,t):\ t\in \mathbb {R}, \ \Vert z\Vert _Z\le 1\}\). We now take the measurement operator \(M(z,\beta )= (\beta ,\Phi (z))\in \mathbb {R}^2\) and the measurement \(w=(0,1)\) which gives \(\mathcal{X}_w=\{(z,0)\ :\ \Phi (z)=1\} \). Then, because of (2.5), we have
$$\begin{aligned} \mathcal{K}_w= \{(z,0): \Phi (z)=1,\ \Vert z\Vert _Z\le 1\} = (\tilde{K},0). \end{aligned}$$
As above, we prove that
$$\begin{aligned} \mathrm{diam}((\tilde{K},0))_\mathcal{X}=\mathrm{diam}(\tilde{K})_Z, \quad {\mathrm{rad}}_{ R}((\tilde{K},0))_\mathcal{X}={\mathrm{rad}}_{ R}(\tilde{K})_Z, \quad \mathrm{rad}((\tilde{K},0))_\mathcal{X}=\mathrm{rad}(\tilde{K})_Z, \end{aligned}$$
which completes the proof of the theorem. \(\square \)

3 A description of algorithms via liftings

In this section, we show that algorithms for the optimal recovery problem can be described by what are called liftings in the theory of Banach spaces. We place ourselves in the setting that S is any subset of \(\mathcal{X}\), and we wish to recover the elements in \(S_w\) for each measurement \(w\in \mathbb {R}^m\). That is, at this stage, we do not require that S is an approximation set. Recall that given the measurement functionals \(l_1,\ldots ,l_m\) in \(\mathcal{X}^*\), the linear operator \(M:\mathcal{X}\rightarrow \mathbb {R}^m\) is defined as
$$\begin{aligned} M(f):=(l_1(f),\ldots ,l_m(f)),\quad f\in \mathcal{X}. \end{aligned}$$
Associated to M we have the null space
$$\begin{aligned} \mathcal{N}:=\ker M=\{f\in \mathcal{X}: \ M(f)=0\}\subset \mathcal{X}, \end{aligned}$$
$$\begin{aligned} \mathcal{X}_w:= M^{-1}(w):=\{f\in \mathcal{X}: M(f)=w\}. \end{aligned}$$
Therefore \(\mathcal{X}_0=\mathcal{N}\). Our goal is to recover the elements in \(S_w=\mathcal{X}_w\cap S\).

Remark 3.1

Let us note that if in place of \(l_1,\ldots , l_m\), we use functionals \(l_1',\ldots ,l_m'\) which span the same space L in \(X^*\), then the information about f contained in M(f) and \(M'(f)\) is exactly the same, and so the recovery problem is identical. For this reason, we can choose any spanning set of linearly independent functionals in defining M and obtain exactly the same recovery problem. Note that, since these functionals are linearly independent, M is a linear mapping from \(\mathcal{X}\) onto \(\mathbb {R}^m\).

We begin by analyzing the measurement operator M. We introduce the following norm on \(\mathbb {R}^m\) induced by M
$$\begin{aligned} \Vert w\Vert _M=\inf _{ f\in \mathcal{X}_w}\Vert f\Vert , \end{aligned}$$
and consider the quotient space \(\mathcal{X}/\mathcal{N}\). Each element in \(\mathcal{X}/\mathcal{N}\) is a coset \(\mathcal{X}_w\), \(w\in \mathbb {R}^m\). The quotient norm on this space is given by
$$\begin{aligned} \Vert \mathcal{X}_w\Vert _{\mathcal{X}/\mathcal{N}}=\Vert w\Vert _M.\end{aligned}$$
The mapping M can be interpreted as mapping \(\mathcal{X}_w\rightarrow w\) and, in view of (3.2), is an isometry from \(\mathcal{X}/\mathcal{N}\) onto \(\mathbb {R}^m\) under the norm \(\Vert \cdot \Vert _M\).

Lifting operatorA lifting operator\(\Delta \)is a mapping from\(\mathbb {R}^m\)to\(\mathcal{X}\)which assigns to each\(w\in \mathbb {R}^m\)an element from the coset\(\mathcal{X}_w\), i.e., a representer of the coset.

Recall that any algorithm A is a mapping from \(\mathbb {R}^m\) into \(\mathcal{X}\). If \(S\subset \mathcal{X}\) is any subset of \(\mathcal{X}\), we would like the mapping A for our recovery problem to send w to an element of \(S_w\), provided \(S_w\ne \emptyset \), since then we would know that A is nearly optimal (see Theorem 2.1) up to the constant 2. So, in going further, we consider only algorithms A which take w to an element of \(\mathcal{X}_w\) for all \(w\in \mathbb {R}^m\). At this stage we are not yet invoking our desire that A actually maps into \(S_w\), only that it maps into \(\mathcal{X}_w\).

Admissible algorithmWe say that an algorithm\(A:\mathbb {R}^m \rightarrow \mathcal{X}\)is admissible if, for each\(w\in \mathbb {R}^m\), \(A(w)\in \mathcal{X}_w\).

Our interest in lifting operators is because any admissible algorithm A is a lifting \(\Delta \), and the performance of such an A is related to the norm of \(\Delta \). A natural lifting, and the one with minimal norm 1, would be one which maps w into an element of minimal norm in \(\mathcal{X}_w\). Unfortunately, in general, no such minimal norm element exists, as the following illustrative example shows.

Example 3.2

We consider the space \(\mathcal{X}=\ell _1(\mathbb {N})\) with the \(\Vert \cdot \Vert _{\ell _1(\mathbb {N})}\) norm, and a collection of vectors \(h_j\in \mathbb {R}^2\), \(j=1,2,\ldots \), with \(\Vert h_j\Vert ^2_{\ell _2(2)}=\langle h_j, h_j\rangle =1\), which are dense on the unit circle. We define the measurement operator M as
$$\begin{aligned} M(x):=\sum _{j=1}^\infty x_jh_j \in \mathbb {R}^2. \end{aligned}$$
If follows from the definition of M that for every x such that \(M(x)=w\), we have \(\Vert w\Vert _{\ell _2(2)}\le \Vert x\Vert _{\ell _1(\mathbb {N})}\), and thus \(\Vert w\Vert _{\ell _2(2)}\le \Vert w\Vert _M\). In particular, for every \(i=1, 2, \ldots \),
$$\begin{aligned} 1=\Vert h_i\Vert _{\ell _2(2)}\le \Vert h_i\Vert _M\le \Vert e_i\Vert _{\ell _1(\mathbb {N})}=1, \end{aligned}$$
since \(M(e_i)=h_i\), where \(e_i\) is the i-th coordinate vector in \(\ell _1(\mathbb {N})\). So, we have that \(\Vert h_i\Vert _{\ell _2(2)}=\Vert h_i\Vert _M=1\). Since the \(h_i\)’s are dense on the unit circle, every w with Euclidean norm one satisfies \(\Vert w\Vert _M=1\). Next, we consider any \(w\in \mathbb {R}^2\), such that \(\Vert w\Vert _{\ell _2(2)}=1\), \(w\ne h_j\), \(j=1,2,\ldots \). If \(w=M(x)=\sum _{j=1}^\infty x_jh_j\), then
$$\begin{aligned} 1=\langle w, w\rangle =\sum _{j=1}^\infty x_j\langle w,h_j\rangle . \end{aligned}$$
Since \(|\langle w,h_j\rangle |<1\), we must have \(\Vert x\Vert _{\ell _1(\mathbb {N})}>1\). Hence, \(\Vert w\Vert _M\) is not assumed by any element x in the coset \(\mathcal{X}_w\). This also shows there is no lifting \(\Delta \) from \(\mathbb {R}^2\) to \(\mathcal{X}/\mathcal{N}\) with norm one.

While the above example shows that norm one liftings may not exist for a general Banach space \(\mathcal{X}\), there is a classical theorem of Bartle-Graves which states that there are continuous liftings \(\Delta \) with norm \(\Vert \Delta \Vert \) as close to one as we wish (see [5, 6, 30]). In our setting, this theorem can be stated as follows.

Theorem 3.3

(Bartle-Graves) Let \(M:\mathcal{X}\rightarrow \mathbb {R}^m\) be a measurement operator. For every \(\eta >0\), there exists a map \(\Delta :\mathbb {R}^m\rightarrow \mathcal{X}\), such that
  • \(\Delta \) is continuous.

  • \(\Delta (w)\in \mathcal{X}_w,\quad w\in \mathbb {R}^m\).

  • for every \(\lambda >0\), we have \(\Delta (\lambda w)=\lambda \Delta (w)\).

  • \(\Vert \Delta (w)\Vert _\mathcal{X}\le (1+\eta )\Vert w\Vert _M,\quad w\in \mathbb {R}^m\).

Unfortunately, a numerical recipe for constructing a lifting operator with norm as close to one as we wish, cannot be provided for a general Banach space. Note that if we put more structure on \(\mathcal{X}\), then we can guarantee the existence of a continuous lifting with norm one (see [6, Lemma 2.2.5]).

Theorem 3.4

If the Banach space \(\mathcal{X}\) is uniformly convex, then for each \(w\in \mathbb {R}^m\), there is a unique \( f(w)\in \mathcal{X}_w\), such that
$$\begin{aligned} \Vert f(w)\Vert _\mathcal{X}= \inf _{ f\in \mathcal{X}_w}\Vert f\Vert =:\Vert w\Vert _M.\end{aligned}$$
The mapping \(\Delta :w\rightarrow f(w)\) is a continuous lifting of norm one.


Fix \(w\in \mathbb {R}^m\) and let \( f_j\in \mathcal{X}_w\), \(j\ge 1\), be such that \(\Vert f_j\Vert \rightarrow \Vert w\Vert _M\). Since \(\mathcal{X}\) is uniformly convex, by weak compactness, there is a subsequence of \(\{ f_j\}\) which, without loss of generality, we can take as \(\{ f_j\}\) such that \( f_j\rightarrow f \in \mathcal{X}\) weakly. It follows that \(\lim _{j\rightarrow \infty }l( f_j)=l( f)\) for all \(l\in \mathcal{X}^*\). Hence \(M( f)=w\), and therefore \( f\in \mathcal{X}_w\). Also, if l is a norming functional for f, i.e. \(\Vert l\Vert _{\mathcal{X}^*}=1\) and \(l( f)=\Vert f\Vert \), then
$$\begin{aligned} \Vert f\Vert =l( f)=\lim _{j\rightarrow \infty }l( f_j)\le \lim _{j\rightarrow \infty }\Vert f_j\Vert =\Vert w\Vert _M, \end{aligned}$$
which shows the existence in (3.3). To see that \( f= f(w)\) is unique, we assume \( f'\in \mathcal{X}_w\) is another element with \(\Vert f'\Vert =\Vert w\Vert _M\). Then \(z:=\frac{1}{2}( f+ f')\in \mathcal{X}_w\), and by uniform convexity \(\Vert z\Vert <\Vert w\Vert _M\), which is an obvious contradiction. This shows that there is an \( f= f(w)\) satisfying (3.3), and it is unique.
To see that \(\Delta \) is continuous, let \(w_j\rightarrow w\) in \(\mathbb {R}^m\) and let \( f_j:=\Delta (w_j)\) and \( f:=\Delta (w)\). Since we also have that \(\Vert w_j\Vert _M\rightarrow \Vert w\Vert _M\), it follows from the minimality of \(\Delta (w_j)\) that \(\Vert f_j\Vert \rightarrow \Vert f\Vert \). If \(w=0\), we have \( f=0\), and thus we have convergence in norm. In what follows, we assume that \(w\ne 0\). Using weak compactness (passing to a subsequence), we can assume that \( f_j\) converges weakly to some \(\bar{f}\). So, we have \(w_j=M( f_j)\rightarrow M(\bar{f})\), which gives \(M(\bar{f})=w\). Let \(\bar{l}\in \mathcal{X}^*\) be a norming functional for \(\bar{f}\). Then, we have that
$$\begin{aligned} \Vert \bar{f}\Vert =\bar{l}(\bar{f})=\lim _{j\rightarrow \infty } \bar{l}( f_j)\le \lim _{j\rightarrow \infty } \Vert f_j\Vert =\Vert f\Vert , \end{aligned}$$
and therefore \(\bar{f}= f\) because of the definition of \(\Delta \). We want to show that \( f_j\rightarrow f\) in norm. If this is not the case, we can find a subsequence, which we again denote by \(\{ f_j\}\), such that \(\Vert f_j- f\Vert \ge \varepsilon >0\), \(j=1,2,\ldots \), for some \(\varepsilon >0\). It follows from the uniform convexity that \(\Vert \frac{1}{2}( f_j+ f)\Vert \le \max \{\Vert f_j\Vert ,\Vert f\Vert \}\alpha \) for all j, with \(\alpha <1\) a fixed constant. Now, let \(l\in \mathcal{X}^*\) be a norm one functional, such that \(l( f)=\Vert f\Vert \). Then, we have
$$\begin{aligned} 2\Vert f\Vert = 2l( f)=\lim _{j\rightarrow \infty } l( f_j+ f) \le \lim _{j\rightarrow \infty } \Vert f_j+x\Vert \le 2 \Vert f\Vert \alpha , \end{aligned}$$
which gives \(\alpha \ge 1\) and is the desired contradiction. \(\square \)

The latter theorem would not hold under the slightly milder assumptions on \(\mathcal{X}\) being strictly convex and reflexive (in place of uniform convexity), as shown in [10]. In that paper, the author gives an example of a strictly convex, reflexive Banach space \(\mathcal{X}\) and a measurement map \(M:\mathcal{X}\rightarrow \mathbb {R}^2\), for which there is no continuous norm one lifting \(\Delta \).

We conclude this section with the observation that linear liftings are closely related to projections. A linear lifting \(\Delta :\mathbb {R}^m\rightarrow \mathcal{X}\) with norm \(\le C\) exists if and only if there exists a linear projector P from \(\mathcal{X}\) onto a subspace \(Y\subset \mathcal{X}\) with \(\ker (P)=\mathcal{N}\) and \(\Vert P\Vert \le C\). Indeed, if \(\Delta \) is such a lifting then its range Y is a finite dimensional subspace and \(P({f}):=\Delta (M({f}))\) defines a projection from \(\mathcal{X}\) onto Y with the mentioned properties. On the other hand, given such a P and Y, notice that any two elements in \(M^{-1}(w)\) have the same image under P, since the kernel of P is \(\mathcal{N}\). Therefore, we can define the lifting \(\Delta (w):=P(M^{-1}(w))\), \(w\in \mathbb {R}^m\), which has norm at most C.

4 A priori estimates for the radius of \(\mathcal{K}_w \)

In this section, we discuss estimates for the radius of \(\mathcal{K}_w \) when \(\mathcal{K}=\mathcal{K}(\varepsilon ,V)\) is an approximation set. The main result we shall obtain is that the global optimal recovery error \(R(\mathcal{K})\) is determined a priori (up to a constant factor 2) by the angle between the null space \(\mathcal{N}\) of the measurement map M and the approximating space V [see (iii) of Theorem 4.4 below].

Note the following simple observations:
  1. (i)

    If \( \mathcal{N}\cap V\ne \{0\}\), then for any \(0\ne \eta \in \mathcal{N}\cap V\), and any \(f\in \mathcal{K}_w\), the line \(f+t \eta \), \(t\in \mathbb {R}\), is contained in \(\mathcal{K}_w\), and therefore there is no finite ball B(ar) which contains \(\mathcal{K}_w\). Hence \(\mathrm{rad}(\mathcal{K}_w)=\infty \).

  2. (ii)

    If \(\mathcal{N}\cap V=\{0\}\), then \(n=\dim V \le \mathrm{codim}\mathcal{N}=\mathrm{rank}M=m\) and therefore \(n\le m\). In this case \(\mathrm{rad}(\mathcal{K}_w)\) is finite for all \(w\in \mathbb {R}^m\).

Standing assumptionIn view of this observation, the only interesting case is (ii), and therefore we assume that\(\mathcal{N}\cap V=\{0\}\)for the remainder of this paper.
For (arbitrary) subspaces X and Y of a given Banach space \(\mathcal{X}\), we recall the angle \(\Theta \) between X and Y, defined as
$$\begin{aligned} \Theta (X,Y):= \inf _{ f\in X} \frac{ \mathrm{dist}( f,Y)}{\Vert f\Vert }. \end{aligned}$$
We are more interested in \(\Theta (X,Y)^{-1}\), and so accordingly, we define
$$\begin{aligned} \mu (X,Y):= \Theta (X,Y)^{-1}=\sup _{ f\in X}\frac{\Vert f\Vert }{\mathrm{dist}( f,Y)}= \sup _{ f\in X, g\in Y}\frac{\Vert f\Vert }{\Vert f- g\Vert }.\end{aligned}$$
Notice that \(\mu (X,Y)\ge 1\).

Remark 4.1

Since V is a finite dimensional space and \(\mathcal{N}\cap V=\{0\}\), we have \(\Theta (\mathcal{N},V)>0\). Indeed, otherwise there exists a sequence \(\{\eta _k\}_{k\ge 1}\) from \(\mathcal{N}\) with \(\Vert \eta _k\Vert =1\) and a sequence \(\{v_k\}_{k\ge 1}\) from V, such that \(\Vert \eta _k-v_k\Vert \rightarrow 0\), \(k\rightarrow \infty \). We can assume \(v_k\) converges to \(v_\infty \), but then also \(\eta _k\) converges to \(v_\infty \), so \(v_\infty \in \mathcal{N}\cap V\) and \(\Vert v_\infty \Vert =1\), which is the desired contradiction to \(\mathcal{N}\cap V=\{0\}\).

Note that, in general, \(\mu \) is not symmetric, i.e., \(\mu (Y,X)\ne \mu (X,Y)\). However, we do have the following comparison.

Lemma 4.2

For arbitrary subspaces X and Y of a given Banach space \(\mathcal{X}\), such that \(X\cap Y=\{0\}\), we have
$$\begin{aligned} \mu (X,Y)\le 1+\mu (Y,X)\le 2 \mu (Y,X). \end{aligned}$$


For each \( f\in X\) and \( g\in Y\) with \( f\ne 0\), \( f\ne g\), we have
$$\begin{aligned} \frac{\Vert f\Vert }{\Vert f- g\Vert }\le \frac{\Vert f- g\Vert +\Vert g\Vert }{\Vert f- g\Vert }=1+ \frac{\Vert g\Vert }{\Vert f- g\Vert }\le 1+ \mu (Y,X). \end{aligned}$$
Taking a supremum over \( f\in X, g\in Y\), we arrive at the first inequality in (4.2). The second inequality follows because \(\mu (Y,X)\ge 1\). \(\square \)

The following lemma records some properties of \(\mu \) for our setting in which \(Y=V\) and \(X=\mathcal{N}\) is the null space of M.

Lemma 4.3

Let \(\mathcal{X}\) be any Banach space, V be any finite dimensional subspace of \( \mathcal{X}\) with \(\dim (V)\le m\), and \(M:\mathcal{X}\rightarrow \mathbb {R}^m\) be any measurement operator. Then, for the null space \(\mathcal{N}\) of M, we have the following.
  1. (i)

    \( \mu (V,\mathcal{N})=\Vert M_V^{-1}\Vert , \)

  2. (ii)

    \( \mu (\mathcal{N},V) \le 1+\mu (V,\mathcal{N})=1+\Vert M_V^{-1}\Vert \le 2\Vert M_V^{-1}\Vert , \)

where \(M_V\) is the restriction of the measurement operator M on V and \(M_V^{-1}\) is its inverse considered as a map from \(M(V)\subset \mathbb {R}^m\) onto V.


The statement (ii) follows from (i) and Lemma 4.2. To prove (i), we see from the definition of \(\Vert \cdot \Vert _M\) given in (3.1), we have
$$\begin{aligned} \nonumber \Vert M_V^{-1}\Vert = \sup _{v\in V} \frac{\Vert v\Vert }{\Vert M_V(v)\Vert _M} =\sup _{v\in V} \frac{\Vert v\Vert }{\mathrm{dist}(v,\mathcal{N})}=\mu (V,\mathcal{N}), \end{aligned}$$
as desired. \(\square \)

We have the following simple, but important theorem.

Theorem 4.4

Let \(\mathcal{X}\) be any Banach space, V be any finite dimensional subspace of \( \mathcal{X}\), \(\varepsilon >0\), and \(M:\mathcal{X}\rightarrow \mathbb {R}^m\) be any measurement operator. Then, for the set \(\mathcal{K}=\mathcal{K}(\varepsilon ,V)\), we have the following

(i) For any \(w\in \mathbb {R}^m\), such that \(w=M(v)\) with \(v\in V\), we have
$$\begin{aligned} \mathrm{rad}(\mathcal{K}_w)=\varepsilon \mu (\mathcal{N},V). \end{aligned}$$
(ii) For any \(w\in \mathbb {R}^m\), we have
$$\begin{aligned} \mathrm{rad}(\mathcal{K}_w)\le 2\varepsilon \mu (\mathcal{N},V). \end{aligned}$$
(iii) We have
$$\begin{aligned} \varepsilon \mu (\mathcal{N},V)\le R(\mathcal{K})\le 2\varepsilon \mu (\mathcal{N},V). \end{aligned}$$


First, note that \(\mathcal{K}_0=\mathcal{K}\cap \mathcal{N}\) is centrally symmetric and convex. Hence, from Proposition 2.3, we have that the smallest ball containing this set is centered at 0 and has radius
$$\begin{aligned} \mathrm{rad}(\mathcal{K}_0)= \sup \{ \Vert z\Vert :\ z\in \mathcal{N}, \mathrm{dist}(z,V)\le \varepsilon \}= \varepsilon \mu (\mathcal{N},V). \end{aligned}$$
Suppose now that \(w=M(v)\) with \(v\in V\). Any \( f\in \mathcal{K}_w\) can be written as \( f=v+\eta \) with \(\eta \in \mathcal{N}\) if and only if \(\mathrm{dist}(\eta ,V)\le \varepsilon \). Hence \(\mathcal{K}_w=v+\mathcal{K}_0\) and (i) follows.
For the proof of (ii), let \( f_0\) be any point in \(\mathcal{K}_w\). Then, any other \( f\in \mathcal{K}_w\) can be written as \( f= f_0+\eta \). Since \(\mathrm{dist}( f,V)\le \varepsilon \), we have \(\mathrm{dist}(\eta ,V)\le 2\varepsilon \). Hence
$$\begin{aligned} \mathcal{K}_w \subset \, f_0+\mathcal{K}_0(2\varepsilon ,V), \end{aligned}$$
which from (4.3) has radius \(2\varepsilon \mu (\mathcal{N},V)\). Therefore, we have proven (ii). Statement (iii) follows from the definition of \(R(\mathcal{K})\) given in (1.2). \(\square \)

Let us make a few comments about Theorem 4.4 viz a viz the results in [8] (see Theorem 2.8 and Remark 2.15 of that paper) for the case when \(\mathcal{X}\) is a Hilbert space. In the latter case, it was shown in [8] that the same result as (i) holds, but in the case of (ii), an exact computation of \(\mathrm{rad}(K_w)\) was given with the constant 2 replaced by a number (depending on w) which is less than one and for some w can be very small. It is probably impossible to have an exact formula for \(\mathrm{rad}(K_w)\) in the case of a general Banach space. However, we show in the appendix that when \(\mathcal{X}\) is uniformly convex and uniformly smooth, we can improve on the constant appearing in (ii) of Theorem 4.4.

5 Near optimal algorithms

In this section, we discuss admissible algorithms for optimal recovery, and expose what properties these algorithms need to exhibit in order to be optimal or near optimal on the sets \(\mathcal{K}_w\) when \(\mathcal{K}=\mathcal{K}(\varepsilon ,V)\). We also discuss when \(A(w)\in \mathcal{K}_w\) for each \(w\in \mathbb {R}^m\) for which \(\mathcal{K}_w\ne \emptyset \), as this may be important in some applications. In the latter case Theorem 2.1 guarantees that the algorithmic error is
$$\begin{aligned} \sup _{ f\in \mathcal{K}_w}\Vert f-A(M( f))\Vert \le 2\mathrm{rad}(\mathcal{K}_w), \end{aligned}$$
and hence the algorithm, up to the factor 2, is optimal. In this section, we shall not be concerned about computational issues that arise in the numerical implementation of the algorithms we put forward. Numerical implementation issues will be discussed in the section that follows.
Recall that by \(M_V\) we denoted the restriction of M to the space V. By our Standing Assumption, \(M_V\) is invertible, and hence \(Z:=M(V)=M_V(V)\) is an n-dimensional subspace of \(\mathbb {R}^m\). Given \(w\in \mathbb {R}^m\), we consider its error of best approximation from Z in \(\Vert \cdot \Vert _M\), defined by
$$\begin{aligned} E(w):=\inf _{z\in Z}\Vert w-z\Vert _M. \end{aligned}$$
Notice that whenever \(w=M( f)\), from the definition of the norm \(\Vert \cdot \Vert _M\), we have
$$\begin{aligned} E(w)=\mathrm{dist}(w,Z)_M=\mathrm{dist}( f,V\oplus \mathcal{N})_\mathcal{X}\le \mathrm{dist}( f,V)_\mathcal{X}.\end{aligned}$$
While there is always a best approximation \(z^*=z^*(w)\in Z\) to w, and it is unique when the norm is strictly convex, for a possible ease of numerical implementation, we consider other non-best approximation maps. We say a mapping \(\Lambda :\mathbb {R}^m\mapsto Z\) is near best with constant \(\lambda \ge 1\), if
$$\begin{aligned} \Vert w-\Lambda (w)\Vert _M\le \lambda E(w),\quad w\in \mathbb {R}^m.\end{aligned}$$
Of course, if \(\lambda =1\), then \(\Lambda \) maps w into a best approximation of w from Z.
Now, given any lifting \(\Delta \) and any near best approximation map \(\Lambda \), we consider the mapping
$$\begin{aligned} A(w):= M_V^{-1}(\Lambda (w))+\Delta (w-\Lambda (w)),\quad w\in \mathbb {R}^m.\end{aligned}$$
Clearly, A maps \(\mathbb {R}^m\) into \(\mathcal{X}\), so that it is an algorithm. It also has the property that \(A(w)\in \mathcal{X}_w\), which means that it is an admissible algorithm. Finally, by our construction, whenever \(w=M(v)\) for some \(v\in V\), then \(\Lambda (w)=w\), and so \(A(w)=v\). Let us note some important properties of such an algorithm A.

Theorem 5.1

Let \(\mathcal{X}\) be a Banach space, V be any finite dimensional subspace of \( \mathcal{X}\), \(\varepsilon >0\), \(M:\mathcal{X}\rightarrow \mathbb {R}^m\) be any measurement operator with a null space \(\mathcal{N}\), and \(\mathcal{K}=\mathcal{K}(\varepsilon ,V)\) be an approximation set. Then, for any lifting \(\Delta \) and any near best approximation map \(\Lambda \) with constant \(\lambda \ge 1\), the algorithm A, defined in (5.4), has the following properties:

(i)  \(A(w)\in \mathcal{X}_w,\quad w\in \mathbb {R}^m\), i.e. A is admissible.

(ii)  \(\mathrm{dist}(A(M( f)),V)_{ \mathcal{X}}\le \lambda \Vert \Delta \Vert \mathrm{dist}( f,V)_\mathcal{X},\quad f\in \mathcal{X}.\)

(iii) the algorithm A has the a priori performance bound
$$\begin{aligned} \sup _{w\in \mathbb {R}^m}\sup _{ f\in \mathcal{K}_w} \Vert f-A(M( f))\Vert \le 4\varepsilon \lambda \Vert \Delta \Vert \mu (\mathcal{N},V). \end{aligned}$$
(iv)   if \(\Vert \Delta \Vert =1\) and \(\lambda =1\), then \(A(M( f))\in \mathcal{K}_w\), whenever \( f\in \mathcal{K}_w\).
(v)   if \(\Vert \Delta \Vert =1\) and \(\lambda =1\), then the algorithm A is pointwise near optimal with constant 2, i.e. for any \(w\in \mathbb {R}^m\),
$$\begin{aligned} \sup _{f\in \mathcal{K}_w}\Vert f-A(M( f))\Vert \le 2 \mathrm{rad}(\mathcal{K}_w). \end{aligned}$$
(vi)    if \(\Vert \Delta \Vert =1\) and \(\lambda =1\), then the algorithm A is also global near optimal with constant 2 i.e.
$$\begin{aligned} \sup _{w\in \mathbb {R}^m}\sup _{ f\in \mathcal{K}_w} \Vert f-A(M( f))\Vert \le 2R(\mathcal{K}). \end{aligned}$$


We have already noted that (i) holds. To prove (ii), let f be any element in \(\mathcal{X}\). Then \(M_V^{-1}(\Lambda (M( f)))\in V\), and therefore
$$\begin{aligned} \mathrm{dist}(A(M( f)),V)_\mathcal{X}\le \Vert \Delta \Vert \Vert M( f)-\Lambda (M( f))\Vert _M\le \Vert \Delta \Vert \lambda E(M( f))\le \Vert \Delta \Vert \lambda \mathrm{dist}( f,V), \end{aligned}$$
where the first inequality uses (5.4), the second inequality uses (5.3), and the last equality uses (5.2).
Next, we prove (iii). It follows from (i) and (ii) that for any \( f\in \mathcal{X}\) with \(M( f)=w\) and \(\mathrm{dist}( f,V)\le \varepsilon \)
$$\begin{aligned} A(M( f))\in \mathcal{K}_w(\lambda \Vert \Delta \Vert \varepsilon , V). \end{aligned}$$
Since \( f\in \mathcal{K}_w(\varepsilon ,V)\subset \mathcal{K}_w(\lambda \Vert \Delta \Vert \varepsilon ,V)\) we infer that
$$\begin{aligned} \Vert f-A(M( f))\Vert \le 2\mathrm{rad}\left( \mathcal{K}_w(\lambda \Vert \Delta \Vert \varepsilon ,V)\right) , \end{aligned}$$
and from Theorem 4.4 we get (5.5). Statment (iv) follows from (i) and (ii), since whenever \( f\in \mathcal{K}_w\), then \(\mathrm{dist}( f,V)_\mathcal{X}\le \varepsilon \). Statement (v) follows from (iv) because of Theorem 2.1. The estimate (vi) follows from (v) and the definition (1.2) of \(R(\mathcal{K})\). \(\square \)

It follows from the above theorem, the best choice for A, from a theoretical point of view, is to choose \(\Delta \) with \(\Vert \Delta \Vert =1\) and \(\Lambda \) with constant \(\lambda =1\). When \(\mathcal{X}\) is uniformly convex, we can always accomplish this theoretically, but there may be issues in the numerical implementation. If \(\mathcal{X}\) is a general Banach space, we can choose \(\lambda =1\) and \( \Vert \Delta \Vert \) arbitrarily close to one, but as in the latter case, problems in the numerical implementation may also arise. In the next section, we discuss some of the numerical considerations in implementing an algorithm A of the form (5.4). In the case that \(\lambda \Vert \Delta \Vert >1\), we only know (5.5) which is only slightly worse than the a priori bound \(4\varepsilon \mu (\mathcal{N},V)\) which we obtain when we know that A(w) is in \(\mathcal{K}_w(\varepsilon ,V)\). In this case, the algorithm A is globally near optimal with the constant \(4\lambda \Vert \Delta \Vert \). Note that we do not have pointwise near optimality because when \(\lambda \Vert \Delta \Vert >1\), \(\mathrm{rad}(\mathcal{K}_w(\lambda \Vert \Delta \Vert \varepsilon ,V))\) may be much bigger than \(\mathrm{rad}(\mathcal{K}_w(\varepsilon ,V))\); this can be easily seen in the Hilbert space case or from Proposition 10.4.

5.1 Noisy measurements

In this section, we discuss the issue of noisy measurements. Suppose that in place of the exact measurements \(w_i=l_i(f)\), where \(M(x)=(w_1,\ldots ,w_m)\), we receive the noisy data \(w_i+\delta _i\), \(i=1,\ldots ,m\). Without loss of generality, we can assume that \(\Vert l_i\Vert =1\), \(i=1,\ldots ,m\). Let us denote by \(\delta :=(\delta _1,\ldots ,\delta _m)\). Then, the following theorem holds.

Theorem 5.2

The application of algorithm A, defined in (5.4), to the noisy measurements \(w+\delta \), gives the error bound for \(f=M(w)\),
$$\begin{aligned} \Vert f-A(w+\delta )\Vert _\mathcal{X}\le 5\lambda \Vert \Delta \Vert \mu (\mathcal{N},V)\left[ \mathrm{dist}(f,V)_\mathcal{X}+\Vert \delta \Vert _M\right] . \end{aligned}$$


First, let us observe that we can always find an element \(g\in \mathcal{X}\), such that \(M(g)=\delta \), since the linear functionals \(l_i\), \(1=1,\ldots ,m\) are linearly independent. It follows from the definition of \(\Vert \delta \Vert _M\) that for every \(\epsilon >0\), we can find \(g_\epsilon \), such that \(M(g_\epsilon )=\delta \) and \(\Vert g_\epsilon \Vert \le \Vert \delta \Vert _M+\epsilon \). Note that
$$\begin{aligned} \mathrm{dist}(f+g_\epsilon ,V)\le \mathrm{dist}(f,V)_\mathcal{X}+\Vert g_\epsilon \Vert _\mathcal{X}\le \mathrm{dist}(f,V)_\mathcal{X}+\Vert \delta \Vert _M+\epsilon . \end{aligned}$$
Since \(M(f+g_\epsilon )=w+\delta \), the application of algorithm (5.4) to the noisy measurement \(w+\delta \) and Theorem 5.1 lead to the recovery bound
$$\begin{aligned} \Vert f-A(w+\delta )\Vert _\mathcal{X}\le & {} \Vert g_\epsilon \Vert _\mathcal{X}+ \Vert f+g_\epsilon -A(M(f+g_\epsilon ))\Vert \nonumber \\\le & {} \Vert \delta \Vert _M+\epsilon + 4\lambda \Vert \Delta \Vert \mu (\mathcal{N},V)\mathrm{dist}(f+g_\epsilon ,V)_\mathcal{X}\nonumber \\\le & {} \Vert \delta \Vert _M+\epsilon + 4\lambda \Vert \Delta \Vert \mu (\mathcal{N},V)(\mathrm{dist}(f,V)_\mathcal{X}+\Vert \delta \Vert _M+\epsilon ), \nonumber \end{aligned}$$
where in the last inequality we have used (5.8). We let \(\epsilon \rightarrow 0\) and use the fact that \(\lambda \Vert \Delta \Vert \mu (\mathcal{N},V)\ge 1\) to obtain
$$\begin{aligned} \Vert f-A(w+\delta )\Vert _\mathcal{X}\le 5\lambda \Vert \Delta \Vert \mu (\mathcal{N},V)(\mathrm{dist}(f,V)_\mathcal{X}+\Vert \delta \Vert _M), \end{aligned}$$
and the proof is completed. \(\square \)

6 Numerical issues in implementing the algorithms A

In this section, we address the main numerical issues in implementing algorithms of the form (5.4). These are
  • How to compute \(\Vert \cdot \Vert _M\) on \(\mathbb {R}^m\)?

  • How to numerically construct near best approximation maps \(\Lambda \) for approximating the elements in \(\mathbb {R}^m\) by the elements of \(Z=M(V)\) in the norm \(\Vert \cdot \Vert _M\)?

  • How to numerically construct lifting operators \(\Delta \) with a controllable norm \(\Vert \Delta \Vert \)?

Of course, the resolution of each of these issues depends very much on the Banach space \(\mathcal{X}\), the subspace V, and the measurement functionals \(l_j\), \(j=1,\ldots ,m\). In this section, we will consider general principles and see how these principles are implemented in three examples.

Example 1

\(\mathcal{X}=C(D)\), where D is a domain in \(\mathbb {R}^d\), V is any n dimensional subspace of \(\mathcal{X}\), and \(M=(l_1,\ldots ,l_m)\) consists of m point evaluation functionals at distinct points \(P_1,\ldots ,P_m\in D\), i.e., \(M(f)=(f(P_1),\ldots , f(P_m))\).

Example 2

\(\mathcal{X}=L_p(D)\), \(1\le p\le \infty \), where D is a domain in \(\mathbb {R}^d\), V is any n dimensional subspace of \(\mathcal{X}\) and M consists of the m functionals
$$\begin{aligned} l_j(f):=\intop \limits _D f(x)g_j(x)\,dx, \quad j=1,\ldots ,m, \end{aligned}$$
where the functions \(g_j\) have disjoint supports, \(g_j\in L_{p'}\), \(p'=\frac{p}{p-1}\), and \(\Vert g_j\Vert _{L_{p'}}=1\).

Example 3

\(\mathcal{X}=L_1([0,1])\), V is any n dimensional subspace of \(\mathcal{X}\), and M consists of the m functionals
$$\begin{aligned} l_j(f):=\intop \limits _0^1 f(t)r_j(t)\,dt, \quad j=1,\ldots ,m, \end{aligned}$$
where the functions \(r_j\) are the Rademacher functions
$$\begin{aligned} r_j(t):= \mathrm{sgn } ( \sin 2^{j+1} \pi t), \quad t\in [0,1], \quad j\ge 0.\end{aligned}$$
The functions \(r_j\) oscillate and have full support. This example is not so important in typical data fitting scenarios, but it is important theoretically since, as we shall see, it has interesting features with regard to liftings.

6.1 Computing \(\Vert \cdot \Vert _M\)

We assume that the measurement functionals \(l_j\), \(j=1,\ldots ,m\), are given explicitly and are linearly independent. Let \(L:=\mathrm{span}(l_j)_{j=1}^m\subset \mathcal{X}^*\). Our strategy is to first compute the dual norm \(\Vert \cdot \Vert _M^*\) on \(\mathbb {R}^m\) by using the fact that the functionals \(l_j\) are available to us. Let \(\alpha =(\alpha _1,\ldots ,\alpha _m)\in \mathbb {R}^m\) and consider its action as a linear functional. We have that
$$\begin{aligned} \Vert \alpha \Vert _M^*=\sup _{\Vert w\Vert _M=1}\left| \sum _{j=1}^m\alpha _jw_j\right| =\sup _{\Vert f\Vert _\mathcal{X}= 1} \left| \sum _{j=1}^m\alpha _j l_j( f)\right| =\left\| \sum _{j=1}^m\alpha _j l_j\right\| _{\mathcal{X}^*}{,} \end{aligned}$$
where we have used that if \(\Vert w\Vert _M=1\), there exists \( f\in \mathcal{X}\), such that \(M( f)=w\) and its norm is arbitrarily close to one. Therefore, we can express \(\Vert \cdot \Vert _M\) as
$$\begin{aligned} \Vert w\Vert _M=\sup \left\{ \left| \sum _{j=1}^m w_j\alpha _j\right| \ \ :\ \ \Vert (\alpha _j)\Vert _M^*\le 1\right\} {.} \end{aligned}$$
We consider these norms to be available to us since the space \(\mathcal{X}\) and the functionals \(l_j\) are known to us. Let us illustrate this in our three examples. In Example 1, for any \(\alpha \in \mathbb {R}^m\), we have
$$\begin{aligned} \Vert \alpha \Vert _M^*=\sum _{j=1}^m|\alpha _j|=\Vert \alpha \Vert _{\ell _1(m)}, \quad \Vert w\Vert _M=\max _{1\le j\le m}|w_j|=\Vert w\Vert _{\ell _\infty (m)}. \end{aligned}$$
In Example 2, we have \(\mathcal{X}=L_p\), and
$$\begin{aligned} \Vert \alpha \Vert _M^*= \Vert \alpha \Vert _{\ell _{p'}(m)}, \quad \Vert w\Vert _M= \Vert w\Vert _{\ell _p(m)}. \end{aligned}$$
In Example 3, we have \(\mathcal{X}^*=L_\infty ([0,1])\), and from (6.2) we infer that \(\Vert \alpha \Vert _M^*=\Vert \sum _{j=1}^m \alpha _j r_j\Vert _{L_\infty ([0,1])}\). From the definition (6.1), we see that the sum \(\sum _{j=1}^k \alpha _j r_j\) is constant on each interval of the form \((s2^{-k}, (s+1)2^{-k})\) when s is an integer. On the other hand, on such an interval, \(r_{k+1}\) takes on both of the values 1 and \(-1\). Therefore, by induction on k, we get \(\Vert \alpha \Vert _M^*= \sum _{j=1}^m|\alpha _j|\). Hence, we have
$$\begin{aligned} \Vert \alpha \Vert _M^*=\sum _{j=1}^m|\alpha _j|=\Vert \alpha \Vert _{\ell _1(m)}, \quad \Vert w\Vert _M=\max _{1\le j\le m} |w_j|= \Vert w\Vert _{\ell _\infty (m)}. \end{aligned}$$

6.2 Approximation maps

Once the norm \(\Vert \cdot \Vert _M\) is numerically computable, the problem of finding a best or near best approximation map \(\Lambda (w)\) to w in this norm becomes a standard problem in convex minimization. For instance, in the examples from the previous subsection, the minimization is done in \(\Vert \cdot \Vert _{\ell _p(m)}\). Of course, in general, the performance of algorithms for such minimization depend on the properties of the unit ball of \(\Vert \cdot \Vert _M\). This ball is always convex, but in some cases it is uniformly convex and this leads to faster convergence of the iterative minimization algorithms and guarantees a unique minimum.

6.3 Numerical liftings

Given a prescribed null space \(\mathcal{N}\), a standard way to find linear liftings from \(\mathbb {R}^m\) to \(\mathcal{X}\) is to find a linear projection \(P_Y\) from \(\mathcal{X}\) to a subspace \(Y\subset \mathcal{X}\) of dimension m which has \(\mathcal{N}\) as its kernel. We can find all Y that can be used in this fashion as follows. We take elements \(\psi _1,\ldots ,\psi _m\) from \(\mathcal{X}\), such that
$$\begin{aligned} l_i(\psi _j)=\delta _{i,j},\quad 1\le i,j\le m, \end{aligned}$$
where \(\delta _{i,j} \) is the usual Kronecker symbol. In other words, \(\psi _j\), \(j=1,\ldots ,m\), is a dual basis to \(l_1,\ldots ,l_m\). Then, for \(Y:=\mathrm{span}\{\psi _1,\ldots ,\psi _m\}\), the projection
$$\begin{aligned} P_Y( f)=\sum _{j=1}^ml_j( f)\psi _j, \quad f\in \mathcal{X}, \end{aligned}$$
has kernel \(\mathcal{N}\). We get a lifting corresponding to \(P_Y\) by defining
$$\begin{aligned} \Delta (w):=\Delta _Y(w):=\sum _{j=1}^mw_j\psi _j. \end{aligned}$$
This lifting is linear and hence continuous. The important issue for us is its norm. We see that
$$\begin{aligned} \Vert \Delta \Vert= & {} \sup _{\Vert w\Vert _M=1}\Vert \Delta (w)\Vert _\mathcal{X}=\sup _{\Vert w\Vert _M=1}\left\| \sum _{j=1}^mw_j\psi _j\right\| _\mathcal{X}\\= & {} \sup _{\Vert f\Vert _\mathcal{X}=1}\left\| \sum _{j=1}^ml_j( f) \psi _j\Vert _\mathcal{X}=\Vert P_Y\right\| . \end{aligned}$$
Here, we have used the fact that if \(\Vert w\Vert _M=1\), then there is an \( f\in \mathcal{X}\) with norm as close to one as we wish with \(M( f)=w\).

It follows from the Kadec-Snobar theorem that we can always choose a Y such that \(\Vert P_Y\Vert \le \sqrt{m}\). In general, the \(\sqrt{m}\) cannot be replaced by a smaller power of m. However, if \(\mathcal{X}=L_p\), then \(\sqrt{m}\) can be replaced by \(m^{|1/2-1/p|}\). We refer the reader to Chapter III.B of [37] for a discussion of these facts.

In many settings, the situation is more favorable. In the case of Example 1, we can take for Y the span of any norm one functions \(\psi _j\), \( j=1,\ldots , m\), such that \(l_i(\psi _j)=\delta _{i,j}\), \(1\le i,j\le m\). We can always take the \(\psi _j\) to have disjoint supports, and thereby get that \(\Vert P_Y\Vert =1\). Thus, we get a linear lifting \(\Delta \) with \(\Vert \Delta \Vert =1\) (see (6.4)). This same discussion also applies to Example 2.

Example 3 is far more illustrative. Let us first consider linear liftings \(\Delta :\mathbb {R}^m\rightarrow L_1([0,1])\). It is well known (see e.g. [37, III.A, III.B]) that we must have \(\Vert \Delta \Vert \ge c\sqrt{m}\). A well known, self-contained argument to prove this is the following. Let \(e_j\), \(j=1,\ldots ,m\), be the usual coordinate vectors in \(\mathbb {R}^m\). Then, the function \( \Delta (e_j)=:f_j\in L_1([0,1])\) and \(\Vert f_j\Vert _{L_1([0,1])}\ge \Vert e_j\Vert _M=\Vert e_j\Vert _{\ell _\infty (m)}=1\). Next, we fix \(\eta \in [0,1]\), and consider for each fixed \(\eta \)
$$\begin{aligned} \Delta (( r_1(\eta ), \ldots ,r_m(\eta )))=\Delta \left( \sum _{j=1}^mr_j(\eta )e_j\right) =\sum _{j=1}^m r_j(\eta ) f_j(t). \end{aligned}$$
Clearly, \(\Vert \Delta \Vert \ge \Vert \sum _{j=1}^m r_j(\eta ) f_j\Vert _{L_1([0,1])}\) for each \(\eta \in [0,1]\). Therefore, integrating this inequality over [0, 1] and using Khintchine’s inequality with the best constant (see [34]), we find
$$\begin{aligned} \Vert \Delta \Vert\ge & {} \displaystyle \intop \limits _0^1 \left\| \sum _{j=1}^m r_j(\eta ) f_j\right\| _{L_1([0,1])}\, d\eta =\intop \limits _0^1 \intop \limits _0^1\left| \sum _{j=1}^m r_j(\eta ) f_j(t)\right| \,d\eta \, dt\\\ge & {} \frac{1}{\sqrt{2}}\displaystyle \intop \limits _0^1\left( \sum _{j=1}^m f_j(t)^2\right) ^{1/2} \, dt \ge \frac{1}{\sqrt{2}} \intop \limits _0^1\frac{1}{\sqrt{m}}\sum _{j=1}^m|f_j(t)|\, dt\\= & {} \frac{1}{\sqrt{2}} \frac{1}{\sqrt{m}}\sum _{j=1}^m \Vert f_j\Vert _{L_1([0,1])}\ge \frac{\sqrt{m}}{\sqrt{2}}, \end{aligned}$$
where the next to last inequality uses the Cauchy-Schwarz inequality.
Even though linear liftings in Example 3 can never have a norm smaller than \( \sqrt{m/2}\), we can construct nonlinear liftings which have norm one. To see this, we define such a lifting for any \(w\in \mathbb {R}^m\) with \(\Vert w\Vert _M=\max _{1\le j\le m}|w_j|=1\), using the classical Riesz product construction. Namely, for such w, we define
$$\begin{aligned} \Delta (w):=\prod _{j=1}^m (1+w_jr_j(t))=\sum _{A\subset \{1,\ldots ,m\}} \prod _{j\in A}w_j r_j(t), \end{aligned}$$
where we use the convention that \( \prod _{j\in A}w_j r_j(t)=1\) when \(A=\emptyset \). Note that if \(A\ne \emptyset \), then
$$\begin{aligned} \intop \limits _0^1 \prod _{j\in A} r_j(t)\, dt=0. \end{aligned}$$
Therefore, \(\intop \limits _0^1\Delta (w)\, dt =1=\Vert \Delta (w)\Vert _{L_1([0,1])}\), because \(\Delta (w)\) is a nonnegative function. To check that \(M(\Delta (w))=w\), we first observe that
$$\begin{aligned} \left( \prod _{j\in A} r_j(t)\right) r_k(t)={\left\{ \begin{array}{ll}\prod _{j\in A\cup \{k\}} r_j,&{} \text{ when } k\notin A,\\ \\ \prod _{j\in A\setminus \{k\}} r_j, &{} \text{ when } k\in A. \end{array}\right. } \end{aligned}$$
Hence, from (6.6) we see that the only A for which the integral of the left hand side of (6.7) is nonzero is when \(A=\{k\}\). This observation, together with (6.5) gives
$$\begin{aligned} l_k(\Delta (w))=\intop \limits _0^1\Delta (w)r_k(t)\,dt = w_k, \quad 1\le k\le m, \end{aligned}$$
and therefore \(M(\Delta (w))=w\). We now define \(\Delta (w)\) when \(\Vert w\Vert _{\ell _\infty (m)} \ne 1\) by
$$\begin{aligned} \Delta (w)=\Vert w\Vert _{\ell _\infty }\Delta (w/\Vert w\Vert _{\ell _\infty }), \quad \Delta (0)=0. \end{aligned}$$
We have therefore proved that \(\Delta \) is a lifting of norm one.

The above situation, in which a nonlinear lifting can have better norm that any linear lifting is not restricted only to Example 3, as the following remark shows.

Remark 6.1

Let \(\mathcal{X}=C([-\pi ,\pi ])\), but now the measurements are given as lacunary Fourier coefficients, i.e.,
$$\begin{aligned} M(f)=(\hat{f}(2^0), \ldots ,\hat{f}(2^m)). \end{aligned}$$
It follows from (6.2) that \(\Vert \alpha \Vert _M^*=\frac{1}{2\pi } \intop \limits _{-\pi }^{\pi }|\sum _{j=0}^m \alpha _j e^{i2^j t}|\, dt\). Using a well known analog of Khintchine’s inequality valid for lacunary trigonometric polynomials (see, e.g. [38, ch 5, Th.8.20]), we derive
$$\begin{aligned} \sum _{j=0}^m|w_j|^2\le \Vert w\Vert _M^2\le C\sum _{j=0}^m|w_j|^2, \end{aligned}$$
for some constant C. If \(\Delta :\mathbb {R}^{m+1}\rightarrow C([-\pi ,\pi ])\) is any linear lifting, using [37, III.B.5 and III.B.16], we obtain
$$\begin{aligned} \Vert \Delta \Vert =\Vert \Delta \Vert \cdot \Vert M\Vert \ge \frac{\sqrt{\pi }}{2C}\sqrt{m+1}. \end{aligned}$$
On the other hand, there exists a constructive, nonlinear lifting \(\Delta _F:\mathbb {R}^{m+1}\rightarrow C([-\pi ,\pi ])\) with \(\Vert \Delta _F\Vert \le \sqrt{e}\), see [16].

7 Performance estimates for the examples

In this section, we consider the examples from 6. In particular, we determine \(\mu (\mathcal{N},V)\), which allows us to give the global performance error for near optimal algorithms for these examples. We begin with the optimal algorithms in a Hilbert space, which is not one of our three examples, but is easy to describe.

7.1 The case when \(\mathcal{X}\) is a Hilbert space \(\mathcal{H}\)

This case was completely analyzed in [8]. We summarize the results of that paper here in order to point out that our algorithm is a direct extension of the Hilbert space case to the Banach space situation, and to compare this case with our examples in which \(\mathcal{X}\) is not a Hilbert space. In the case \(\mathcal{X}\) is a Hilbert space, the measurement functionals \(l_j\) have the representation \(l_j(f)=\langle f,\phi _j\rangle \), where \(\phi _1,\ldots ,\phi _m\in \mathcal{H}\). Therefore, \(M(f)=(\langle f,\phi _1\rangle ,\ldots ,\langle f,\phi _m\rangle )\in \mathbb {R}^m\). We let \(W:=\mathrm{span}\{\phi _j\}_{j=1}^m\), which is an m dimensional subspace of \(\mathcal{H}\). We can always perform a Gram-Schmidt orthogonalization and assume therefore that \(\phi _1,\ldots ,\phi _m\in \mathcal{H}\) is an orthonormal basis for W (see Remark 3.1). We have \(\mathcal{N}=W^\perp \). From (6.2) and (6.3) we infer that \(\Vert \cdot \Vert _M \) on \(\mathbb {R}^m\) is the \(\ell _2(m)\) norm. Therefore, the approximation map is simple least squares fitting. Namely, to our data w, we find the element \( z^*(w)\in Z\), where \(Z:=M(V)\), such that
$$\begin{aligned} z^*(w):=\mathop {\mathrm{argmin}}\limits _{z\in Z} \sum _{j=1}^m |w_j-z_j|^2. \end{aligned}$$
The element \(v^*(w)=M_V^{-1}(z^*(w))\) is the standard least squares fit to the data \((f(P_1),\ldots , f(P_m))\) by vectors \((v(P_1),\ldots ,v(P_m))\) with \(v\in V\), and is found by the usual matrix inversion in least squares. This gives the best approximation to w in \(\Vert \cdot \Vert _M\) by the elements of Z, and hence \(\lambda =1\). The lifting \(\Delta (w_1,\ldots ,w_m):=\sum _{j=1}^m w_j \phi _j\) is linear and \(\Vert \Delta \Vert =1\). Hence, we have the algorithm
$$\begin{aligned} A(w) = M_V^{-1}(z^*(w))+\Delta (w-z^*(w))= v^*(w)+\sum _{j=1}^m [w_j-z_j^* (w)]\phi _j, \end{aligned}$$
which is the algorithm presented in [24] and further studied in [8]. The sum in (7.1) is a correction so that \(A(w)\in \mathcal{K}_w\), i.e., \(M(A(w))=w\).

Note that our general theory states that the above algorithm is near optimal with constant 2 for recovering \(\mathcal{K}_w\). Moreover, it is shown in [7] that it is actually an optimal algorithm. The reason for this is that the sets \(\mathcal{K}_w\) in this Hilbert space setting have a center of symmetry, so Proposition 2.3 can be applied. Furthermore, it was shown in [8] that the calculations can be streamlined by choosing at the beginning certain favorable bases for V and W. In particular, the quantity \(\mu (\mathcal{N},V)\) can be immediately computed from the cross-Grammian of the favorable bases.

7.2 Example 1

In this section, we summarize how the algorithm works for Example 1. Given \(P_j\in D\), \(j=1,\ldots ,m\), \(P_i\ne P_j\), and the data \(w=M(f)=(f(P_1),\cdots ,f(P_m))\), the first step is to find the min-max approximation to w from the space \(Z:=M(V)\subset \mathbb {R}^m\). In other words, we find
$$\begin{aligned} z^*(w):=\mathop {\mathrm{argmin}}\limits _{z\in Z} \max _{1\le j\le m}|f(P_i)-z_i|=\mathop {\mathrm{argmin}}\limits _{v\in V} \max _{1\le j\le m}|f(P_i)-v(P_i)|. \end{aligned}$$
Note that for general M(V) the point \(z^*(w)\) is not necessarily unique. For certain V, however, we have uniqueness.
Let us consider the case when \(D=[0,1]\) and V is a Chebyshev space on D, i.e., for any n points \(Q_1,\ldots ,Q_n\in D\), and any data \(y_1,\ldots ,y_n\), there is a unique function \(v\in V\) which satisfies \(v(Q_i)=y_i\), \(1\le i\le n\). In this case, when \(m=n\), problem (7.2) has a unique solution
$$\begin{aligned} z^*(w)=w=M(v^*(w))=(v^*(P_1), \ldots ,v^*(P_m)), \end{aligned}$$
where \(v^*\in V\) is the unique interpolant to the data \((f(P_1),\ldots ,f(P_m))\) at the points \(P_1, \ldots ,P_m\). For \(m\ge n+1\), let us denote by \(V_m\) the restriction of V to the point set \(\Omega :=\{P_1,\ldots ,P_m\}\). Clearly, \(V_m\) is a Chebyshev space on \(C(\Omega )\) as well, and therefore there is a unique point \(z^*(w):=(\tilde{v}(P_1), \ldots ,\tilde{v}(P_m))\in V_m\), coming from the evaluation of a unique \(\tilde{v}\in V\), which is the best approximant from \(V_m\) to f on \(\Omega \). The point \(z^*(w)\) is characterized by an oscillation property. Various algorithms for finding \(\tilde{v}\) are known and go under the name Remez algorithms.
In the general case where V is not necessarily a Chebyshev space, a minimizer \(z^*(w)\) can still be found by convex minimization, and the approximation mapping \(\Lambda \) maps w to a \(z^*(w)\). Moreover, \(z^*(w)=M(v^*(w))\) for some \(v^*(w)\in V\), where \(v^*(w) \) is characterized by solving the minimization
$$\begin{aligned} v^*(w)=\mathop {\mathrm{argmin}}\limits _{v\in V}\Vert w-M(v)\Vert _M= \mathop {\mathrm{argmin}}\limits _{v\in V}\inf _{g:\,M(g)=w}\Vert g-v\Vert _{\mathcal{X}}=\mathop {\mathrm{argmin}}\limits _{v\in V} \mathrm{dist}(v,\mathcal{X}_w). \end{aligned}$$
We have seen that the lifting in this case is simple. We may take functions \(\psi _j\in C(D)\), with disjoint supports and of norm one, such that \(\psi _i(P_j)=\delta _{i,j}\). Then, we can take our lifting to be the operator that maps \(w\in \mathbb {R}^m\) into the function \(\sum _{j=1}^mw_j\psi _j\). This is a linear lifting with norm one. Then, the algorithm A is given by
$$\begin{aligned} A(w):= M_V^{-1}\left( z^*(w))+\sum _{j=1}^m(w_j-z_j^*(w))\psi _j=v^*(w)+\sum _{j=1}^m(w_j-z_j^*(w)\right) \psi _j,\quad w\in \mathbb {R}^m.\end{aligned}$$
The sum in (7.3) is a correction to \(v^*(w)\) to satisfy the data. From (5.6), we know that for each \(w\in \mathbb {R}^m\), we have
$$\begin{aligned} \sup _{f\in \mathcal{K}_w} \Vert f-A(w)\Vert \le 2\mathrm{rad}(\mathcal{K}_w), \end{aligned}$$
and so the algorithm is near optimal with constant 2 for each of the classes \(\mathcal{K}_w\).

To give an a priori bound for the performance of this algorithm, we need to compute \(\mu (\mathcal{N},V)\).

Lemma 7.1

Let \(\mathcal{X}=C(D)\), V be a subspace of C(D), and \(M(f)=(f(P_1),\ldots ,f(P_m))\), where \(P_j\in D\), \(j=1,\ldots ,m\) are m distinct points in \(D\subset \mathbb {R}^d\). Then, for \(\mathcal{N}\) the null space of M, we have
$$\begin{aligned} \frac{1}{2}\sup _{v\in V}\frac{\Vert v\Vert _{C(D)}}{\displaystyle {\max _{1\le j\le m}|v(P_j)}|}\le \mu (\mathcal{N}, V)\le 2 \sup _{v\in V}\frac{\Vert v\Vert _{C(D)}}{\displaystyle {\max _{1\le j\le m}|v(P_j)|}}. \end{aligned}$$


From Lemma 4.2 and Lemma 4.3, we have
$$\begin{aligned} \frac{1}{2}\Vert M_V^{-1}\Vert \le \mu (\mathcal{N}, V)\le 2\Vert M_V^{-1}\Vert . \end{aligned}$$
Since, we know \(\Vert w\Vert _M=\max _{1\le j\le m}|w_j|\), we obtain that
$$\begin{aligned} \Vert M_V^{-1}\Vert = \sup _{v\in V}\frac{\Vert v\Vert _{C(D)}}{\displaystyle {\max _{1\le j\le m}|v(P_j)}|}, \end{aligned}$$
and the lemma follows. \(\square \)
From (5.5), we obtain the a priori performance bound
$$\begin{aligned} \sup _{w\in \mathbb {R}^m} \sup _{f\in \mathcal{K}_w} \Vert f-A(w)\Vert _{C(D)}\le 4\varepsilon \mu (\mathcal{N},V).\end{aligned}$$
Moreover, we know from Theorem 4.4 that (7.5) cannot be improved by any algorithm except for the possible removal of the factor 4, and hence the algorithm is globally near optimal.

Remark 7.2

It is important to note that the algorithm \(A:\ w\rightarrow A(w)\) does not depend on \(\varepsilon \), and so one obtains for any f with the data \(w=(f(P_1),\ldots ,f(P_m))\) the performance bound
$$\begin{aligned} \Vert f-A(w)\Vert _{C(D)}\le 4\mu (\mathcal{N},V)\mathrm{dist}(f,V). \end{aligned}$$
Approximations of this form are said to be instance optimal with constant \(4\mu (\mathcal{N},V)\).
As an illustrative example, consider the space V of trigonometric polynomials of degree \(\le n\) on \(D:=[-\pi ,\pi ]\), which is a Chebyshev system of dimension \(2n+1\). We take \(\mathcal{X}\) to be the space of continuous functions on D which are periodic, i.e., \( f(-\pi )=f(\pi )\). If the data consists of the values of f at \(2n+1\) distinct points \(\{P_i\}\), then the min-max approximation is simply the interpolation projection \(\mathcal{P}_nf\) of f at these points and \(A(M(f)))=\mathcal{P}_nf\). The error estimate for this case is
$$\begin{aligned} \Vert f- \mathcal{P}_nf\Vert _{C([-\pi ,\pi ])}\le (1+\Vert \mathcal{P}_n\Vert )\mathrm{dist}(f,V). \end{aligned}$$
It is well known (see [38], Chap. 1 of Vol. 2) that for \(P_j:= -\pi +j\frac{2\pi }{2n+1}\), \(j=1,\ldots , 2n+1\), \(\Vert \mathcal{P}_n\Vert \approx \log n\). However, if we double the number of points, and keep them equally spaced, then it is known that \(\Vert M_V^{-1}\Vert \le 2\) (see [38], Theorem 7.28). Therefore from (7.4), we obtain \(\mu (\mathcal{N},V)\le 4\), and we derive the bound
$$\begin{aligned} \Vert f-A(M(f))\Vert _{C([-\pi ,\pi ])}\le 16\mathrm{dist}(f,V). \end{aligned}$$

7.3 Example 2

This case is quite similar to Example 1. The main difference is that now
$$\begin{aligned} z^*(w):=\mathop {\mathrm{argmin}}\limits _{z\in Z} \Vert w-z\Vert _{\ell _p(m)},\end{aligned}$$
and hence when \(1<p<\infty \) it can be found by minimization in a uniformly convex norm. We can take the lifting \(\Delta \) to be \(\Delta (w)=\sum _{j=1}^m w_j\psi _j\), where now \(\psi _j\) has the same support as \(g_j\) and \(L_{p}(D)\) norm one, \(j=1,\ldots ,m\). The algorithm is again given by (7.3), and is near optimal with constant 2 on each class \(\mathcal{K}_w\), \(w\in \mathbb {R}^m\), that is
$$\begin{aligned} \Vert f-A(M(f))\Vert _{L_p(D)}\le 2\mathrm{rad}(\mathcal{K}_w) \le 4\mu (\mathcal{N},V)\varepsilon , \end{aligned}$$
where the last inequality follows from (5.5).
Similar to Lemma 7.1, we have the following bounds for \(\mu (\mathcal{N},V)\),
$$\begin{aligned} \frac{1}{2}\Vert M_V^{-1}\Vert \le \mu (\mathcal{N}, V)\le 2\Vert M_V^{-1}\Vert , \end{aligned}$$
where now the norm of \(M_V^{-1}\) is taken as the operator norm from \(L_p(D)\) to \(\ell _p(m)\), and hence is
$$\begin{aligned} \Vert M_V^{-1}\Vert = \sup _{v\in V}\frac{\Vert v\Vert _{L_p(D)}}{\Vert (l_1(v),\ldots ,l_m(v))\Vert _{\ell _p(m)}}. \end{aligned}$$

7.4 Example 3

As mentioned earlier, our interest in Example 3 is because it illustrates certain theoretical features. In this example, the norm \(\Vert \cdot \Vert _M\) is the \(\ell _\infty (m)\) norm, and approximation in this norm was already discussed in Example 1. The interesting aspect of this example centers around liftings. We know that any linear lifting must have norm \(\ge \sqrt{m/2}\). On the other hand, we have given in (6.8) an explicit formula for a (nonlinear) lifting with norm one. So, using this lifting, the algorithm A given in (5.4) will be near optimal with constant 2 for each of the classes \(\mathcal{K}_w\).

8 Relation to sampling theory

The results we have put forward, when restricted to problems of sampling, have some overlap with recent results, see, for example [1, 2, 3] . In this section, we point out these connections and what new light our general theory sheds on sampling problems. The main point to be made is that our results give a general framework for sampling in Banach spaces that includes many of the specific examples studied in the literature.

Suppose that \(\mathcal{X}\) is a Banach space and \(l_1,l_2,\ldots , \) is a possibly infinite sequence of linear functionals from \(\mathcal{X}^*\). The application of the \(l_j\) to an \(f\in \mathcal{X}\) give a sequence of samples of f. For \(m=1,2,\ldots \) we define mappings \(M_m:\mathcal{X}\rightarrow \mathbb {R}^m\) as
$$\begin{aligned} M_m(f)=(l_1( f), l_2( f),\ldots ,l_m( f)). \end{aligned}$$
Two prominent examples of sampling are the following.

Example PS: Point samples of continuous functions Consider the space \(\mathcal{X}=C(D)\) for a domain \(D\subset \mathbb {R}^d\) and a sequence of points \(P_j\) from D. Then, the point evaluation functionals \(l_j(f)=f(P_j)\), \(j=1,2,\ldots \), are point samples of f. Given a compact subset \(K\subset \mathcal{X}\), we are interested in how well we can recover \(f\in K\) from the information \(l_j(f)\), \(j=1,2,\ldots \).

Example FS: Fourier samples Consider the space \(\mathcal{X}=L_2(\Omega )\), \(\Omega =[-\pi ,\pi ]\), and the linear functionals
$$\begin{aligned} l_j(f):= \frac{1}{2\pi }\intop \limits _\Omega f(t)e^{-ijt}\, dt,\quad j\in \mathbb {Z},\end{aligned}$$
which give the Fourier coefficients of f. Given a compact subset \(K\subset \mathcal{X}\), we are interested in how well we can recover \(f\in K\) in the norm of \(L_2(\Omega )\) from the information \(l_j(f)\), \(j=1,2,\ldots \). The main problem in sampling is to build reconstruction operators \(A_m:\mathbb {R}^m\mapsto \mathcal{X}\) such that the reconstruction mapping \(R_m( f):=A_m(M_m( f))\) provide a good approximation to f. Typical questions are:
  1. (i)

    Do there exist such mappings such that \(R_m( f)\) converges to f as \(m\rightarrow \infty \), for each \( f\in \mathcal{X}\)?

  2. (ii)

    What is the best performance in terms of rate of approximation on specific compact sets K?

  3. (iii)

    Can we guarantee the stability of these maps in the sense of how they perform with respect to noisy observations?

The key in connecting such sampling problems with our theory is that the compact sets K typically considered are either directly defined by approximation or can be equivalently described by such approximation. That is, associated to K is a sequence of spaces \(V_n\), \(n\ge 0\), each of dimension n, and \(f\in K\) is equivalent to
$$\begin{aligned} \mathrm{dist}(f,V_n)_X\le \varepsilon _n,\quad n\ge 0,\end{aligned}$$
where \((\varepsilon _n)\) is a known sequence of positive numbers which decrease to zero. Typically, the \(V_n\) are nested, i.e. \(V_n\subset V_{n+1}\), \(n\ge 0\). Such characterizations of sets K are often provided by the theory of approximation. For example, a periodic function \(f\in C[-\pi ,\pi ]\) is in Lip \(\alpha \), \(0<\alpha <1\) if and only if
$$\begin{aligned} \mathrm{dist}(f,\mathcal{T}_n)_{C[-\pi ,\pi ]}\le C(n+1)^{-\alpha }, \quad n\ge 0,\end{aligned}$$
with \(\mathcal{T}_n\) the space of trigonometric polynomials of degree at most n, and moreover, the Lip \(\alpha \) semi-norm is equivalent to the smallest constant C(f) for which (8.3) holds, see [13]. Similarly, a function f defined on \([-1,1]\) has an analytic extension to the region in the plane with boundary given by Bernstein ellipse \(E_\rho \) with parameter \(\rho >1\) and belongs to the unit ball \(U(E_\rho )\) if and only if
$$\begin{aligned} \mathrm{dist}(f,\mathcal{P}_n)_{C[-1,1]}\le \rho ^{-n},\quad n\ge 0,\end{aligned}$$
where \(\mathcal{P}_n\) is the space of algebraic polynomials of degree at most n in one variable, see [29]. Recall that the Bernstein ellipse \(E_\rho \) is the open region in the complex plane bounded by the ellipse with foci \(\pm 1\) and semiminor and semimajor axis lengths summing to \(\rho \).
For the remainder of this section, we assume that the set K is given by
$$\begin{aligned} K:=\{ f\in \mathcal{X}: \mathrm{dist}( f,V_n)_\mathcal{X}\le \varepsilon _n\}=\bigcap _{n\ge 0}\mathcal{K}(\varepsilon _n,V_n).\end{aligned}$$
Our results previous to this section assumed only the knowledge that \(f\in \mathcal{K}(\varepsilon _n,V_n)\) for one fixed value of \(n\le m\).

Remark 8.1

It is more convenient in this section to use the quantity \(\mu (V,\mathcal{N})\) rather than \(\mu (\mathcal{N},V)\). Recall that \(\frac{1}{2}\mu (V,\mathcal{N})\le \mu ( \mathcal{N},V)\le 2\mu (V,\mathcal{N})\) and therefore this switch only effects constants by a factor of at most 2.

Our general theory (see Theorem 5.1) says that given the first m samples \(M_m( f)\), \(f\in \mathcal{X}\), for any \(n\le m\), the mapping \(A_{n,m}\) from \(\mathbb {R}^m\mapsto \mathcal{X}\), given by (5.4), provides an approximation \(A_{n,m}(M_m( f))\) to \( f\in K\) with the accuracy, see (5.5)
$$\begin{aligned} \Vert f-A_{n,m}(M_m( f))\Vert _\mathcal{X}\le 4\varepsilon _n\lambda \Vert \Delta \Vert \mu (\mathcal{N}_m,V_n)\le 8\varepsilon _m\lambda \Vert \Delta \Vert \mu (V_n,\mathcal{N}_m),\end{aligned}$$
where we have used Remark 8.1. Here \(\mathcal{N}_m\) is the null space of the mapping \(M_m\). We know that theoretically \(C:=8\lambda \Vert \Delta \Vert \) can be chosen as close to 8 as we wish but in numerical implementations, depending on the specific setting, C will generally be a known constant but larger than 8. In the two above examples, one can take \(C=8\) both theoretically and numerically.

Remark 8.2

Let \(A_{n,m}^*:\mathbb {R}^m\mapsto V_n\) be the mapping defined by (5.4) with the second term \(\Delta (w-M(w))\) on the right deleted. Fom (5.2), it follows that the term that is dropped has norm not exceeding \(\Vert \Delta \Vert \varepsilon _n\) whenever \( f\in K\), and since we can take \(\Vert \Delta \Vert \) as close to one as we wish, the resulting operators satisfy (8.6), with a new value of C, but now they map into \(V_n\).

Given a value of m and the information map \(M_m\), we are allowed to choose n, i.e., we can choose the space \(V_n\). Since \(\varepsilon _n\) is known, from the point of view of the error bound (8.6), given the m samples, the best choice of n is
$$\begin{aligned} n(m):=\mathop {\mathrm{argmin}}\limits _{0\le n\le m} \mu (V_n,\mathcal{N}_m)\varepsilon _n,\end{aligned}$$
and gives the bound
$$\begin{aligned} \Vert f-A_{n(m),m}(M_m( f))\Vert _\mathcal{X}\le C \min _{0\le n\le m}\mu (V_n,\mathcal{N}_m)\varepsilon _n,\quad f\in K.\end{aligned}$$
This brings out the importance of giving good estimates for \(\mu (V_n,\mathcal{N}_m)\) in order to be able to select the best choice for n(m). Consider the case of point samples. Then, the results of Example 1 in the previous section show that in this case, we have
$$\begin{aligned} \mu (V_n,\mathcal{N}_m)= \sup _{v\in V_n}\frac{\Vert v\Vert _{C(D)}}{\displaystyle {\max _{1\le j\le m}|v(P_j)}|}. \end{aligned}$$
The importance of this ratio of the continuous and discrete norms, in the case \(V_n\) are spaces of algebraic polynomials, has been known for some time. It is equivalent to the Lebesgue constant when \(n=m\) and has been recognized as an important ingredient in sampling theory dating back at least to Schönhage [32].
A similar ratio of continuous to discrete norms determine \(\mu (V_n,\mathcal{N}_m)\) for other sampling settings. For example, in the case of the Fourier sampling (Example FS above), we have for any space \(V_n\) of dimension n
$$\begin{aligned} \mu (V_n,\mathcal{N}_{2m+1})= \sup _{v\in V_n}\{\Vert v\Vert _{L_2(\Omega )}:\Vert \mathcal{F}_m(v)\Vert _{L_2(\Omega )}=1\}, \end{aligned}$$
where \(\mathcal{F}_m(v)\) is the m-th partial sum of the Fourier series of v. The right side of (8.10), in the case \(V_n=\mathcal{P}_{n-1}\) is studied in [3] (where it is denoted by \(B_{n,m}\)). Giving bounds for quotients, analogous to those in (8.9), has been a central topic in sampling theory (see [1, 2, 3, 29]) and such bounds have been obtained in specific settings, such as the case of equally spaced point samples on \([-1,1]\) or Fourier samples. The present paper does not contribute to the problem of estimating such ratios of continuous to discrete norms.

The results of the present paper give a general framework for the analysis of sampling. Our construction of the operators \(A_{n,m}\) (or their modification \(A_{n,m}^*\) given by Remark 8.2), give performance bounds that match those given in the literature in specific settings such as the two examples given at the beginning of this section. It is interesting to ask in what sense these bounds are optimal. Theorem 5.1 proves optimality of the bound (8.6) if the assumption that \(f\in K\) is replaced by the less demanding assumption that \(f\in \mathcal{K}(\varepsilon _n,V_n)\), for this one fixed value of n. The knowledge that \(K=\{ f\in \mathcal{X}: \mathrm{dist}( f,V_n)_\mathcal{X}\le \varepsilon _n\}=\bigcap _{n\ge 0}\mathcal{K}(\varepsilon _n,V_n)\), could allow an improved performance, since it is a more demanding assumption. In the case of a Hilbert space, this was shown to be the case in [7] where, in some instances, much improved bounds were obtained from this additional knowledge. However, the examples in [7] are not for classical settings such as polynomial or trigonometric polynomial approximation. In these cases, there is no known improvement over the estimate (8.8).

Next, let us consider the question of whether, in the case of an infinite number of samples, the samples guarantee that every \(f\in K\) can be approximated to arbitrary accuracy from these samples. This is of course the case whenever
$$\begin{aligned} \mu (V_{n(m)},\mathcal{N}_m)\varepsilon _{n(m)}\rightarrow 0, \quad m\rightarrow \infty .\end{aligned}$$
In particular, this will be the case whenever the sampling satisfies that for each finite dimensional space \(V\subset \mathcal{X}\)
$$\begin{aligned} \lim _{m\rightarrow \infty }\mu (V,\mathcal{N}_m)\le C,\end{aligned}$$
for a fixed constant \(C>0\), independent of V. Notice that the spaces \(\mathcal{N}_m\) satisfy \(\mathcal{N}_{m+1}\subset \mathcal{N}_m\) and hence the sequence \((\mu (V,\mathcal{N}_m))\) is non-increasing.

In what follows we give some conditions on the sequence of functionals \((l_j)_{j\ge 1}\), that guarantee estimate (8.12). Then, we proceed with discussing what that would mean for our algorithm, defined in (5.4), and in particular, what that would mean for Example PS and Example FS.

Of course a necessary condition for (8.12) to hold is that the sequence of functionals \(l_j\), \(j\ge 1\), is total on \(\mathcal{X}\), i.e., for each \( f\in \mathcal{X}\), we have
$$\begin{aligned} l_j( f)=0, \ j\ge 1,\implies f=0. \end{aligned}$$
It is easy to check that when the \((l_j)_{j\ge 1}\) are total, then for each V, we have that (8.12) holds with a constant \(C_V\) depending on V. Indeed, if (8.12) fails then \(\mu (V,\mathcal{N}_m)=+\infty \) for all m so by our previous remark \(V\cap \mathcal{N}_m\ne \{0\}\). The linear spaces \( V\cap \mathcal{N}_m\), \(m\ge 1\), are nested and contained in V. If \(V\cap \mathcal{N}_m\ne \{0\}\) for all m, then there is a \(v\ne 0\) and \(v\in \bigcap _{m\ge 1} \mathcal{N}_m\). This contradicts (8.13).
However, our interest is to have a bound uniform in V. To derive such a uniform bound, we introduce the following notation. Given the sequence \((l_j)_{j\ge 1}\), we let \(L_m:=\mathrm{span}\ \{l_j\}_{j\le m} \) and \(L:=\ \mathrm{span}\{l_j\}_{j\ge 1}\) which are closed linear subspaces of \(\mathcal{X}^*\). We denote by \(U(L_m)\) and U(L) the unit ball of these spaces with the \(\mathcal{X}^*\) norm. For any \(0<\gamma \le 1\), we say that the sequence \((l_j)_{j\ge 1}\) is \(\gamma \)-norming, if we have
$$\begin{aligned} \sup _{l\in U(L)}|l( f)|\ge \gamma \Vert f\Vert _\mathcal{X},\quad f\in \mathcal{X}.\end{aligned}$$
Clearly any \(\gamma \)-norming sequence is total. If \(\mathcal{X}\) is a reflexive Banach space, then, the Hahn-Banach theorem gives that every total sequence \((l_j)\) is 1-norming.

Theorem 8.3

Let \(\mathcal{X}\) be any Banach space and suppose that the functionals \(l_j\), \(j=1,2,\ldots \), are \(\gamma \)-norming. Then, for any finite dimensional subspace \(V\subset \mathcal{X}\), we have
$$\begin{aligned} \lim _{m\rightarrow \infty } \mu (V,\mathcal{N}_m) \le \gamma ^{-1}. \end{aligned}$$
Moreover, the estimate (8.15) is optimal.


Since the unit ball U(V) of V in \(\mathcal{X}\) is compact, for any \(\delta >0\), there exists an m such that
$$\begin{aligned} \sup _{l\in U(L_m)} |l(v)|\ge \frac{\gamma }{1+\delta }\Vert v\Vert _\mathcal{X},\quad v\in V.\end{aligned}$$
If we fix this value of m, then for any \(v\in U(V)\) and any \(\eta \in \mathcal{N}_m\) we have
$$\begin{aligned}\Vert v-\eta \Vert _\mathcal{X}\ge \sup _{l\in U(L_m)} |l(v-\eta )|= \sup _{l\in U(L_m)} |l(v)| \ge \frac{\gamma }{1+\delta }. \end{aligned}$$
It follows from (4.1) that \(\mu (V,\mathcal{N}_m)\le (1+\delta )\gamma ^{-1}\) and since \(\delta \) was arbitrary this proves the first part of the theorem.
To show the second part, we take \(\mathcal{X}=\ell _1:=\ell _1(\mathbb {N})\), with its usual basis \((e_j)_{j=1}^\infty \) and the space V to be \(V:=\mathrm{span}\{e_1\}\). We denote by \((e_j^*)_{j=1}^\infty \) the coordinate functionals, fix any number \(0<\gamma \le 1\) and define the linear functionals
$$\begin{aligned} l_1:= \gamma e_1^*-\sum _{j=2}^\infty e_j^*, \ \ \ \text{ and } \ \ \ \ l_k:=e_k^* , \quad k\ge 2. \end{aligned}$$
Clearly, if \(\eta \in \mathcal{N}_m\), then \(\eta =(\eta _1,0,\ldots , 0,\eta _{m+1},\ldots )\), where \(\eta _1=\gamma ^{-1}\sum _{j>m}\eta _ j=:\gamma ^{-1}z\). Therefore, we have
$$\begin{aligned} \Vert e_1-\eta \Vert _{\ell _1}= |1-\gamma ^{-1}z|+\sum _{j>m}|\eta _j| \ge |1-\gamma ^{-1}z|+|z|\ge |1-\gamma ^{-1}|z||+|z|\ge \gamma . \end{aligned}$$
Moreover, for \(\eta _\gamma :=e_1+\gamma e_{m+1}\in \mathcal{N}_m\), the norm \(\Vert e_1-\eta _\gamma \Vert _{\ell _1}=\gamma \) , and therefore we have \(\displaystyle {\inf _{\eta \in \mathcal{N}_m}\Vert e_1-\eta \Vert _{\ell _1}= \gamma }\), which gives
$$\begin{aligned} \mu (V,\mathcal{N}_m)=\sup _{\eta \in \mathcal{N}_m}\frac{1}{\Vert e_1-\eta \Vert _{\ell _1}}= \left[ \inf _{\eta \in \mathcal{N}_m}\Vert e_1-\eta \Vert _{\ell _1}\right] ^{-1} = \gamma ^{-1}, \quad m\ge 1. \end{aligned}$$
To complete the proof of the second part of the theorem, we need only show that the system \((l_j)_{j=1}^\infty \) is \(\gamma \)-norming. For any \(x=(x_j)_{j\ge 1}\in \ell _1\), we define
$$\begin{aligned} {\varphi }_n:=(\gamma \mathrm{sgn}x_1,\mathrm{sgn}x_2,\ldots ,\mathrm{sgn}x_n,-\mathrm{sgn}x_1,-\mathrm{sgn}x_1,\ldots )\in \ell _\infty (\mathbb {N})=\ell _1^*. \end{aligned}$$
This is a norm one functional on \(\mathcal{X}\) which can be written as a finite sum
$$\begin{aligned} {\varphi }_n=\mathrm{sgn}x_1 l_1+\sum _{j=2}^n(\mathrm{sgn}x_j +\mathrm{sgn}x_1)l_j, \end{aligned}$$
and therefore \(\varphi _n\in U(L)\) for every \(n\ge 1\). Since
$$\begin{aligned} {\varphi }_n(x)=\gamma |x_1|+\sum _{j=2}^n|x_j|-\mathrm{sgn}x_1\sum _{j=n+1}^\infty x_j \rightarrow \gamma |x_1|+\sum _{j=2}^\infty |x_j|, \end{aligned}$$
we infer that \((l_j)_{j=1}^\infty \) is \(\gamma \)-norming. \(\square \)

Remark 8.4

One can build on the example from Theorem 8.3 to construct a sequence of functionals \((l_j)_{j\ge 1}\), \(l_j\in \ell _1^*=\ell _\infty (\mathbb {N})\) which are total but for each \(j\ge 1\) there is a space \(V_j\) of dimension one such that
$$\begin{aligned} \lim _{m\rightarrow \infty } \mu (V_j,\mathcal{N}_m)\ge j.\end{aligned}$$

Corollary 8.5

Let \(\mathcal{X}\) be any separable Banach space and let \((l_j)_{j\ge 1}\) be any sequence of functionals from \(\mathcal{X}^*\) which are \(\gamma \)-norming for some \(0<\gamma \le 1\). Then, we have the following results:

(i) If \((V_n)_{n\ge 0}\) is any sequence of nested finite dimensional spaces whose closure is \(\mathcal{X}\), then, for each m, large enough, there is a choice \(\tilde{n}(m)\) such that
$$\begin{aligned} \Vert f-A_{\tilde{n}(m),m}(M_m( f))\Vert _\mathcal{X}\le C\mu (V_{\tilde{n}(m)},\mathcal{N}_m)\mathrm{dist}( f,V_{ \tilde{n}(m)})_\mathcal{X}\le 2C \gamma ^{-1}\mathrm{dist}( f,V_{ \tilde{n}(m)})_\mathcal{X},\end{aligned}$$
with C an absolute constant, and the right side of (8.18) tends to zero as \(m\rightarrow \infty \).
(ii) There exist operators \(A_{\tilde{n}(m),m}\) mapping \(\mathbb {R}^m\) to \(\mathcal{X}\) such that
$$\begin{aligned} A_{\tilde{n}(m),m}(M_m( f)) \rightarrow f, \quad m\rightarrow \infty , \end{aligned}$$
for all \( f\in \mathcal{X}\).

In particular, both (i) and (ii) hold whenever \(\mathcal{X}\) is reflexive and \((l_j)_{j\ge 1}\) is total.


In view of the previous theorem, for each n, there is an N(n) such that for \(m>N(n)\), we have \(\mu (V_{n},\mathcal{N}_m)\le 2\gamma ^{-1}\). Without loss of generality, we can assume that \(N(1)<N(2)<\ldots \), in which case we have \(\mu (V_{k},\mathcal{N}_m)\le 2\gamma ^{-1}\) for all k, \(1\le k\le n\) provided \(m\ge N(n)\). We set \(\tilde{n}(m)=n\) for \(N(n)<m\le N(n+1)\), \(n=1,2,\ldots \). Note that \(\tilde{n}(m)\rightarrow \infty \) as \(m\rightarrow \infty \). Then, (i) follows from (8.6) since
$$\begin{aligned} \nonumber \Vert f-A_{\tilde{n}(m),m}(M_m( f))\Vert _\mathcal{X}\le & {} C\mu (V_{\tilde{n}(m)},\mathcal{N}_m)\mathrm{dist}(f,V_{\tilde{n}(m)}) \le 2C\gamma ^{-1}\mathrm{dist}(f,V_{\tilde{n}(m)}). \nonumber \end{aligned}$$
The statement (ii) follows from (i) because there is always a sequence \((V_n)\) of spaces of dimension n whose closure is dense in \(\mathcal{X}\). \(\square \)

While the spaces C and \(L_1\), are not reflexive, our two examples are still covered by Theorem 8.3 and Corollary 8.5.

Recovery for Example PS If the points \(P_j\) are dense in D, then the sequence of functionals \(l_j(f)=f(P_j)\), \(j\ge 1\) is 1-norming and Theorem 8.3 and Corollary 8.5 hold for this sampling.

Recovery for Example FS   The sequence \((l_j)_{j\ge 1}\) of Fourier samples is 1-norming for each of the spaces \(L_p(\Omega )\), \(1\le p<\infty \), or \(C(\Omega )\) and hence Corollary 8.5 holds for this sampling.

We leave the simple details of these last two statements to the reader.

A major issue in generalized sampling is whether the reconstruction maps are numerically stable. To quantify stability, one introduces for any potential reconstruction operators \(\phi _m:\mathbb {R}^m\mapsto \mathcal{X}\), the reconstruction maps \(R_m(f):=\phi _m(M_m(f))\) and the condition numbers
$$\begin{aligned} \kappa _m:= \sup _{f\in \mathcal{X}}\lim _{\varepsilon \rightarrow 0} \sup _{\Vert g\Vert _\mathcal{X}=1}\frac{\Vert R_m(f+\varepsilon g)-R_m(g)\Vert _\mathcal{X}}{\varepsilon }, \end{aligned}$$
and asks whether there is a uniform bound on the \(\kappa _m\). In our case
$$\begin{aligned} \phi _m=A_{ \tilde{n}(m),m}^* = M_{V_{ \tilde{n}(m)}}^{-1}\circ \Lambda _m, \end{aligned}$$
where \(\Lambda _m\) is the approximation operator for approximating \(M_m(f)\) by the elements from \(M_m(V_{ \tilde{n}(m)})\), see Remark 8.2. Thus,
$$\begin{aligned} \kappa _m\le \Vert M_{V_{ \tilde{n}(m)}}^{-1}\Vert _{\mathrm{Lip}\ 1}\Vert \Lambda _m\Vert _{\mathrm{Lip} \ 1 }\le C\mu (V_{\tilde{n}(m)},\mathcal{N}_m)\Vert \Lambda _m\Vert _{\mathrm{Lip} \ 1 },\end{aligned}$$
where \(\Vert \cdot \Vert _{\mathrm{Lip}\ 1}\) is the Lipschitz norm of the operator. Here to bound the Lipschitz norm of the operator \(M_{V_{ \tilde{n}(m)}}^{-1}\) we used the fact that it is a linear operator and hence its Lipschitz norm is the same as its norm and in this case is given by (i) of Lemma 4.3. Under the assumptions of Theorem 8.3, we know that \(\mu (V_{ \tilde{n}(m)},\mathcal{N}_m)\le C\) for a fixed constant C if m is sufficiently large.
In the case \(\mathcal{X}\) is a Hilbert space, the approximation operator \(\Lambda _m\) can also be taken as a linear operator of norm one on a Hilbert space and hence we obtain a uniform bound on the condition numbers \(\kappa _m\). For more general Banach spaces \(\mathcal{X}\), bounding the Lipschitz norm of \(\Lambda _m\) depends very much on the specific spaces \(V_n\), \(\mathcal{X}\), and the choice of \(\Lambda _m\). However, from the Kadec-Snobar theorem, we can always take for \(\Lambda _m\) a linear projector whose norm is bounded by \(\sqrt{ \tilde{n}(m)}\) and therefore, under the assumptions of Theorem 8.3, we have for sufficiently large m
$$\begin{aligned} \kappa _m\le C \sqrt{ \tilde{n}(m)}. \end{aligned}$$
Next, we briefly mention how the framework, outlined above for general Banach spaces, relates to some recent results in sampling theory. We concentrate on [1], where \(\mathcal{X}\) is a Hilbert space, the measurements \(l_j(f)=\langle f,\psi _j\rangle =:w_j\), \(j=1,2,\ldots \), with \((\psi _j)_{j\ge 1}\) being a Riesz basis for H, and \(V_n=\mathrm{span}\{\phi _1,\ldots ,\phi _n\}\) (this is the space \(T_n\) according to the notation in [1]) for some given linearly independent elements \(\phi _j\in \mathcal{X}\). The Algorithm \(\tilde{A}_{n.m}:\mathbb {R}^m\rightarrow V_n\), considered in [1], is given by the formula
$$\begin{aligned} \tilde{A}_{n,m}(w)=M^{-1}_{V_n}(\tilde{\Lambda }(w)),\quad \omega \in \mathbb {R}^m, \end{aligned}$$
where \(\tilde{\Lambda }\) is the orthogonal projection of the measurement \(w=(\langle f,\psi _1\rangle ,\ldots ,\langle f,\psi _m\rangle )\) onto the space
$$\begin{aligned} Z=M_{V_n}(V_n)=\mathrm{span}\{(\langle \phi _j,\psi _1\rangle ,\ldots ,\langle \phi _j,\psi _m\rangle )\}\subset \mathbb {R}^m \end{aligned}$$
with respect to the standard Euclidean norm in \(\mathbb {R}^m\). This is the same as the operator \(A^*_{n,m}:\mathbb {R}^m\rightarrow V_n\) defined in Remark 8.2, i.e., \(A^*_{n,m}(w)=M^{-1}_{V_n}( \Lambda (w)) \), with the only difference that the mapping \(\Lambda \) is the best or a near best approximation to w from Z with respect to the \(\Vert \cdot \Vert _M\) norm. Note that both norms, the Euclidean norm in \(\mathbb {R}^m\) and the \(\Vert \cdot \Vert _M\) norm are hilbertian and they are equivalent but different, so the recovery is also different.
One of the main results in [1], see Theorem 2.4, is that for \(f\in \mathcal{X}\) and every \(n\in \mathbb {N}\) there is \(m_0\) large enough, such that for \(m\ge m_0\), the approximant \(\tilde{A}_{n,m}(M_m(f))\in V_n\) exists, is unique, \(\Vert \tilde{A}_{n,m}(M_m(f))\Vert \le C\cdot C^{-1}_{n,m}\Vert f\Vert _\mathcal{X}\), and
$$\begin{aligned} \Vert f-\tilde{A}_{n,m}(M_m(f))\Vert _\mathcal{X}\le \left( 1+\frac{D^2_{n,m}}{C^2_{n,m}}\right) ^{1/2}\mathrm{dist}(f,V_n)_\mathcal{X}, \end{aligned}$$
$$\begin{aligned} D_{n,m}:=\sup _{g\in V_n^\perp ,\,\Vert g\Vert =1}\sup _{v\in \,V_n,\Vert v\Vert =1} |(M_m(g),M_m(v))|, \quad C_{n,m}:=\inf _{v\in V_n, \,\Vert v\Vert =1}\Vert M_m(v)\Vert ^2_{\ell _2(m)}. \end{aligned}$$
It is shown in [1] that \(D_{n,m}\le C\) and \(D_{n,m}\rightarrow 0\) as \(m\rightarrow \infty \) for each fixed n. Since \(C_{n,m}\) are converging to one when n is fixed and \(m\rightarrow \infty \), they end up with the estimate
$$\begin{aligned} \Vert f-\tilde{A}_{n,m}(M_m(f))\Vert _\mathcal{X}\le C \mathrm{dist}(f,V_n)_\mathcal{X},\end{aligned}$$
if \(m=m(n)\) is chosen large enough depending on n. One can make C arbitrarily close to one by choosing \(m=m(n)\) large enough.
Our results in [8] provide the estimate
$$\begin{aligned} \Vert f-A_{n,m}\Vert _\mathcal{X}\le \mu ( \mathcal{N}_m,V_n)\mathrm{dist}(f,V_n)_\mathcal{X},\end{aligned}$$
provided \(0<n\le m\), and this estimate is shown to be optimal, i.e. the constant \(C=1\) cannot be reduced. In Corollary 8.5, we suggest that rather than fixing n and choosing m large that one should for each m choose \(n=\tilde{n}(m)\) to establish convergence of \(A_{\tilde{n}(m),m}f \rightarrow f\), \(m\rightarrow \infty \), for each f. Hence, in the case that \(\mathcal{X}\) is a Hilbert space the results in [1] and those in the present paper are essentially the same, although stated differently. However, the main point of the present paper is to handle the case of a general Banach space rather than only Hilbert spaces which is an issue not previously discussed in the sampling literature.

9 Choosing measurements

In some settings, one knows the space V, but is allowed to choose the measurement functionals \(l_j\), \(j=1,\ldots ,m\). In this section, we discuss how our results can be a guide in such a selection. The main issue is to keep \(\mu (\mathcal{N},V)\) as small as possible, and so we concentrate on this.

Let us recall that from Lemmas 4.2 and 4.3, we have
$$\begin{aligned} \frac{1}{2}\Vert M_V^{-1}\Vert \le \mu (\mathcal{N}, V)\le 2\Vert M_V^{-1}\Vert . \end{aligned}$$
Therefore, we want to choose M so as to keep
$$\begin{aligned} \Vert M_V^{-1}\Vert = \sup _{v\in V}\frac{\Vert v\Vert _\mathcal{X}}{\Vert M_V(v)\Vert _M} \end{aligned}$$
small, namely, we want to keep \(\Vert M_V(v)\Vert _M \) large whenever \(\Vert v\Vert _\mathcal{X}=1\).
Case 1 Let us first consider the case when \(m=n\). Given any linear functionals \(l_1,\ldots ,l_n\), which are linearly independent over V (our candidates for measurements), we can choose a basis for V which is dual to the \(l_j\)’s, that is, we can choose \(\psi _j\in V\), \(j=1,\ldots ,n\), such that
$$\begin{aligned} l_i(\psi _j)=\delta _{i,j},\quad 1\le i,j\le n. \end{aligned}$$
It follows that each \(v\in V\) can be represented as \(v=\sum _{j=1}^nl_j(v)\psi _j\). The operator \(P_V: \mathcal{X}\rightarrow V\), defined as
$$\begin{aligned} P_V(f)= \sum _{j=1}^n l_j(f)\psi _j,\quad f\in \mathcal{X},\end{aligned}$$
is a projector from \(\mathcal{X}\) onto V, and any projector onto V is of this form. If we take \(M(v)=(l_1(v),\ldots ,l_n(v))\), we have
$$\begin{aligned} \Vert M(v)\Vert _M= \inf _{M(f)=M(v)}\Vert f\Vert = \inf _{P_V(f)=v}\Vert f\Vert =\inf _{P_V(f)=v} \frac{\Vert f\Vert }{\Vert P_V(f)\Vert } \Vert v\Vert . \end{aligned}$$
If we now take the infimum over all \(v\in V\) in (9.2), we run through all \(f\in \mathcal{X}\), and hence
$$\begin{aligned} \inf _{\Vert v\Vert =1}\Vert M(v)\Vert _M= \inf _{f\in \mathcal{X}} \frac{\Vert f\Vert }{\Vert P_V(f)\Vert } =\Vert P_V\Vert ^{-1}. \end{aligned}$$
In other words,
$$\begin{aligned} \Vert M_V^{-1}\Vert =\Vert P_V\Vert . \end{aligned}$$
This means the best choice of measurement functionals is to take the linear projection onto V with smallest norm, then take any basis \(\psi _1,\ldots ,\psi _n\) for V and represent the projection in terms of this basis as in (9.1). The dual functionals in this representation are the measurement functionals.
Finding projections of minimal norm onto a given subspace V of a Banach space \(\mathcal{X}\) is a well-studied problem in functional analysis. A famous theorem of Kadec-Snobar [20] says that there always exists such a projection with
$$\begin{aligned} \Vert P_V\Vert \le \sqrt{n}. \end{aligned}$$
It is known that there exists Banach spaces \(\mathcal{X}\) and subspaces V of dimension n, where (9.3) cannot be improved in the sense that for any projection onto V we have \(\Vert P_V\Vert \ge c\sqrt{n}\) with an absolute constant \(c>0\). If we translate this result to our setting of recovery, we see that given V and \(\mathcal{X}\) we can always choose measurement functionals \(l_1,\ldots ,l_n\), such that \(\mu (\mathcal{N},V)\le 2\sqrt{n}\), and this is the best we can say in general.

For a general Banach space \(\mathcal{X}\) and a finite dimensional subspace \(V\subset \mathcal{X}\) of dimension n, finding a minimal norm projection or even a near minimal norm projection onto V is not constructive. There are related procedures such as Auerbach’s theorem [37, II.E.11], which give the poorer estimate Cn for the norm of \(\Vert P_V\Vert \). These constructions are easier to describe but they also are not computationally feasible.

If \(\mathcal{X}\) is an \(L_p\) space, \(1<p<\infty \), then the best bound in (9.3) can be replaced by \(n^{|1/2-1/p|}\), and this is again known to be optimal, save for multiplicative constants. When \(p=1\) or \(p=\infty \) (corresponding to C(D)), we obtain the best bound \(\sqrt{n}\) and this cannot be improved for general V. Of course, for specific V the situation may be much better. Consider \(\mathcal{X}=L_p([-1,1])\), and \(V=\mathcal{P}_{n-1}\) the space of polynomials of degree at most \(n-1\). In this case, there are projections with norm \(C_p\), depending only on p. For example, the projection given by the Legendre polynomial expansion has this property. For \(\mathcal{X}=C([-1,1])\), the projection given by interpolation at the zeros of the Chebyshev polynomial of first kind has norm \(C\log n\), and this is again optimal save for the constant C.

Case 2 Now, consider the case when the number of measurement functionals \(m>n\). One may think that one can drastically improve on the results for \(m=n\). We have already remarked that this is possible in some settings by simply doubling the number of data functionals [see (7.6)]. While adding additional measurement functionals does decrease \(\mu \), generally speaking, we must have m exponential in n to guarantee that \(\mu \) is independent of n. To see this, let us discuss one special case of Example 1. We fix the domain D to be the unit sphere in \(\mathbb {R}^n\), namely \(D=\{x\in \mathbb {R}^{n}\, :\ \sum _{j=1}^{n} x_j^2 = 1\}\) and the subspace \(V\subset C(D)\) of all linear functions restricted to D, i.e., \(f\in V\) if and only if
$$\begin{aligned} f(x)=f_a(x):=\sum _{j=1}^n a_jx_j, \quad a:=(a_1,\ldots ,a_n)\in \mathbb {R}^n. \end{aligned}$$
It is obvious that V is an n dimensional subspace. Since for \(f\in V\), we have \(\Vert f\Vert _{C(D)}=\Vert a\Vert _{\ell _2(n)}\), the map \(a\rightarrow f_a(x)\) establishes a linear isometry between V with the supremum norm and \(\ell _2(n)\). Let M be the measurement map given by the linear functionals corresponding to point evaluation at any set \(\{P_j\}_{j=1}^m\) of m points from D. Then M maps C(D) into \(\ell _\infty (m)\) and \(\Vert M_V\Vert =1\). It follows from (7.4) that \(\mu (\mathcal{N},V)\approx \Vert M_V\Vert \cdot \Vert M_V^{-1}\Vert \). This means that
$$\begin{aligned} \mu (\mathcal{N},V)\le C d(\ell _2(n), M(V)), \quad M(V)\subset \ell _\infty (m), \end{aligned}$$
where \(d(\ell _2(n), M(V)):=\inf \{\Vert T\Vert \Vert T^{-1}\Vert , \, T:\ell _2(n)\rightarrow M(V), \, T\,\, \text{ isomorphism }\}\) is the Banach-Mazur distance between \(\ell _2(n)\) and the subspace \(M(V)\subset \ell _\infty (m)\). It is a well known, but nontrivial fact in the local theory of Banach spaces (see [15, Example 3.1] or [27, Section 5.7]) that to keep \(d(\ell _2(n), M(V))\le C\), one needs \(\ln m\ge c n\).
The scenario of the last paragraph is the worst that can happen. To see why, let us recall the following notion: a set A is a \(\delta \)-net for a set S (\(A\subset S\subset \mathcal{X}\) and \(\delta >0\)) if for every \( f\in S\) there exists \(a\in A\), such that \(\Vert f-a\Vert \le \delta \). For a given n-dimensional subspace \(V\subset \mathcal{X}\) and \(\delta >0\), let us fix a \(\delta \)-net \(\{v_j\}_{j=1}^{ m}\) for \( \{v\in V\,:\,\Vert v\Vert =1\}\) with \( m\le (1+2/\delta )^n\). It is well known that such a net exists (see [15, Lemma 2.4] or [27, Lemma 2.6]). Let \(l_j\in \mathcal{X}^*\) be norm one functionals, such that \(1=l_j(v_j)\), \(j=1,2,\ldots , m\). We define our measurement M as \(M:=(l_1,\ldots ,l_{ m})\), so \(\mathcal{N}=\bigcap _{j=1}^{ m} \ker l_j\). When \( \eta \in \mathcal{N}\), \(v\in V\) with \(\Vert v\Vert =1\), and \(v_j\) is such that \(\Vert v-v_j\Vert \le \delta \), we have
$$\begin{aligned} \Vert \eta -v\Vert \ge \Vert \eta -v_j\Vert -\delta \ge |l_j( \eta -v_j)|-\delta =1-\delta , \end{aligned}$$
and for this choice of M, we have
$$\begin{aligned} \mu (\mathcal{N},V)\le 2\mu (V,\mathcal{N})=2 \sup _{ \eta \in \mathcal{N},\, v\in V,\,\Vert v\Vert =1\,}\frac{1}{\Vert \eta -v\Vert }\le \frac{2}{1-\delta }. \end{aligned}$$
For specific Banach spaces \(\mathcal{X}\) and subspaces \(V\subset \mathcal{X}\), the situation is much better. We have already discussed such example in the case of the space of trigonometric polynomials and \(\mathcal{X}\) the space of periodic functions in \(C([-\pi ,\pi ])\).


  1. 1.
    Adcock, B., Hansen, A.C.: Stable reconstructions in Hilbert spaces and the resolution of the Gibbs phenomenon. Appl. Comput. Harm. Anal. 32, 357–388 (2012)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Adcock, B., Hansen, A.C., Poon, C.: Beyond consistent reconstructions: optimality and sharp bounds for generalized sampling, and application to the uniform resampling problem. SIAM J. Math. Anal. 45, 3132–3167 (2013)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Adcock, B., Hansen, A.C., Shadrin, A.: A stability barrier for reconstructions from Fourier samples. SIAM J. Numer. Anal. 52, 125–139 (2014)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Agarwal, R.P., O’Regan, D., Sahu, D.R.: Fixed Point Theory for Lipschitzian-Type Mappings and Applications. Springer, Berlin (2009)Google Scholar
  5. 5.
    Bartle, R., Graves, L.: Mappings between function spaces. Trans. Am. Math. Soc. 72, 400–413 (1952)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Benyamini, Y., Lindenstrauss, J.: Geometric Nonlinear Functional Analysis, vol. 48. American Mathematical Society Colloquium Publications, Providence, RI (2000)MATHGoogle Scholar
  7. 7.
    Binev, P., Cohen, A., Dahmen, W., DeVore, R., Petrova, G., Wojtaszczyk, P.: Convergence rates for Greedy Algorithms in reduced basis methods. SIAM J. Math. Anal. 43, 1457–1472 (2011)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Binev, P., Cohen, A., Dahmen, W., DeVore, R., Petrova, G., Wojtaszczyk, P.: Data Assimilation in Reduced Modeling, SIAM UQ, to appear; arXiv: 1506.04770v1 (2015)
  9. 9.
    , B.: Optimal Recovery of Functions and Integrals, First European Congress of Mathematics, Vol. I, pp. 371–390 (1992), Progr. Math., Birkhauser, Basel, 119 (1994)Google Scholar
  10. 10.
    Brown, A.: A rotund space have a subspace of codimension 2 with discontinuous metric projection. Mich. Math. J. 21, 145–151 (1974)CrossRefMATHGoogle Scholar
  11. 11.
    Cohen, A., DeVore, R.: Approximation of high dimensional parametric PDEs. Acta Numer. 24, 1–159 (2016)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Demanet, L., Townsend, A.: Stable extrapolation of analytic functions, preprintGoogle Scholar
  13. 13.
    DeVore, R., Lorentz, G.: Constructive Approximation, vol. 303. Springer, Grundlehren (1993)MATHGoogle Scholar
  14. 14.
    Figiel, T.: On the moduli of convexity and smoothness. Stud. Math. 56, 121–155 (1976)MathSciNetMATHGoogle Scholar
  15. 15.
    Figiel, T., Lindenstrauss, J., Milman, V.: The dimension of almost spherical sections of convex bodies. Acta Math. 139, 53–94 (1977)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Fournier, J.: An interpolation problem for coefficients of \(H^\infty \) functions. Proc. Am. Math. Soc. 42, 402–407 (1974)MathSciNetMATHGoogle Scholar
  17. 17.
    Garkavi, A.: The Best Possible Net and the Best Possible Cross Section of a Set in a Normed Space, Vol. 39, pp. 111–132. Translations Series 2. American Mathematical Society, Providence, RI (1964)Google Scholar
  18. 18.
    Hanner, O.: On the uniform convexity of \(L^p\) and \(\ell ^p\). Ark. Matematik 3, 239–244 (1956)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Henrion, D., Tarbouriech, S., Arzelier, D.: LMI approximations for the radius of the intersection of ellipsoids: survey. J. Optim. Theory Appl. 108, 1–28 (2001)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Kadec, M., Snobar, M.: Certain functionals on the Minkowski compactum. Mat. Zamet. 10, 453–457 (1971). (Russian)MathSciNetGoogle Scholar
  21. 21.
    Konyagin, S.: A remark on renormings of nonreflexive spaces and the existence of a Chebyshev center. Mosc. Univ. Math. Bull. 43, 55–56 (1988)MathSciNetMATHGoogle Scholar
  22. 22.
    Lewis, J., Lakshmivarahan, S., Dhall, S.: Dynamic Data Assimilation: A Least Squares Approach, Encyclopedia of Mathematics and its Applications, vol. 104. Cambridge University Press, Cambridge (2006)CrossRefMATHGoogle Scholar
  23. 23.
    Lindenstrauss, J., Tzafriri, L.: Classical Banach Spaces II. Springer, Berlin (1979)CrossRefMATHGoogle Scholar
  24. 24.
    Maday, Y., Patera, A., Penn, J., Yano, M.: A parametrized-background data-weak approach to variational data assimilation: formulation, analysis, and application to acoustics. Int. J. Numer. Method Eng. 102, 933–965 (2015)CrossRefMATHGoogle Scholar
  25. 25.
    Micchelli, C., Rivlin, T.: Lectures on optimal recovery, numerical analysis, Lancaster 1984 (Lancaster, 1984), 21–93. Lecture Notes in Math, vol. 1129. Springer, Berlin (1985)Google Scholar
  26. 26.
    Micchelli, C., Rivlin, T., Winograd, S.: The optimal recovery of smooth functions. Numerische Mathematik 26, 191–200 (1976)MathSciNetCrossRefMATHGoogle Scholar
  27. 27.
    Milman, V., Schechtman, G.: Asymptotic Theory of Finite Dimensional Normed Spaces. Lecture Notes in Mathematics, vol. 1200. Springer, Berlin (1986)Google Scholar
  28. 28.
    Powell, M.: Approximation Theory and Methods. Cambridge University Press, Cambridge (1981)MATHGoogle Scholar
  29. 29.
    Platte, R., Trefethen, L., Kuijlaars, A.: Impossibility of fast stable approximation of analytic functions from equispaced samples. SIAM Rev. 53, 308–318 (2011)MathSciNetCrossRefMATHGoogle Scholar
  30. 30.
    Repovš, D., Semenov, P.: Continuous Selections of Multivalued Mappings. Springer, Berlin (1998)CrossRefMATHGoogle Scholar
  31. 31.
    Singer, I.: Best Approximation in Normed Linear Spaces by Elements of Linear Subspaces, Die Grundlehren der mathematischen Wissenschaften, vol. 171. Springer, Berlin (1970)CrossRefGoogle Scholar
  32. 32.
    Schönhage, A.: Fehlerfort pflantzung bei interpolation. Numer. Math. 3, 62–71 (1961)MathSciNetCrossRefMATHGoogle Scholar
  33. 33.
    Smith, P., Ward, J.: Restricted centers in \(C(\Omega )\). Proc. Am. Math. Soc. 48, 165–172 (1975)MathSciNetMATHGoogle Scholar
  34. 34.
    Szarek, S.: On the best constants in the Khinchin inequality. Stud. Math. 58, 197–208 (1976)MathSciNetMATHGoogle Scholar
  35. 35.
    Traub, J., Wozniakowski, H.: A General Theory of Optimal Algorithms. Academic Press, London (1980)MATHGoogle Scholar
  36. 36.
    Trefethen, L.N., Weideman, J.A.C.: Two results on polynomial interpolation in equally spaced points. J. Approx. Theory 65, 247–260 (1991)MathSciNetCrossRefMATHGoogle Scholar
  37. 37.
    Wojtaszczyk, P.: Banach Spaces for Analysts. Cambridge University Press, Cambridge (1991)CrossRefMATHGoogle Scholar
  38. 38.
    Zygmund, A.: Trigonometric Series. Cambridge University Press, Cambridge (2002)MATHGoogle Scholar

Copyright information

© Springer-Verlag Italia 2017

Authors and Affiliations

  • Ronald DeVore
    • 1
  • Guergana Petrova
    • 1
  • Przemyslaw Wojtaszczyk
    • 2
  1. 1.Department of MathematicsTexas A&M UniversityCollege StationUSA
  2. 2.Interdisciplinary Center for Mathematical and Computational ModellingUniversity of WarsawWarsawPoland

Personalised recommendations