1 Introduction

Entropy numbers quantify the degree of compactness of a set, i.e., how well the set can be approximated by a finite set. Given a compact set K in a quasi-Banach space Y, the k-th entropy number \(e_k(K,Y)\) is defined to be the smallest radius \(\varepsilon > 0\) such that K can be covered with \(2^{k-1}\) copies of the ball \(\varepsilon B_Y\), i.e.,

$$\begin{aligned} e_k(K,Y):= \inf \Big \{\varepsilon >0~:~\exists y_1,\ldots ,y_{2^{k-1}}\text { such that }K \subset \bigcup \limits _{\ell =1}^{2^{k-1}} y_\ell + \varepsilon B_Y \Big \}\quad ,\quad k\in \mathbb {N}\,. \end{aligned}$$

The concept of entropy numbers can be easily extended to operators. Given a compact operator \(T: X \rightarrow Y\), where X and Y are quasi-Banach spaces, the k-th entropy number of the operator T is defined to be

$$\begin{aligned} e_k(T: X \rightarrow Y) := e_k(T(B_X), Y). \end{aligned}$$

If the spaces XY are clear from the context, we will abbreviate \(e_k(T:X \rightarrow Y)\) by \(e_k(T)\).

Entropy numbers (or the inverse concept of metric entropy) belong to the fundamental concepts of approximation theory. They appear in various approximation problems, e.g., in the estimation of the decay of operator eigenvalues [4, 11, 20]; in the estimation of learning rates for machine learning problems [39, 43]; or in bounding s-numbers such as approximation, Gelfand, or Kolmogorov numbers from below [4, 16]. We note that Gelfand numbers find application in the recent field of compressive sensing [6, 13, 16] and Information Based Complexity in general. Entropy numbers are also closely connected to small ball problems in probability theory [21, 24]. For further applications and basic properties, we refer to the monographs [5, 28], and the recent survey [8, Chapter 6].

The goal of this paper is to improve estimates for entropy numbers of embeddings between function spaces of dominating mixed smoothness

$$\begin{aligned} \mathrm {Id}:S^{r_0}_{p_0,q_0}A(\Omega ) \rightarrow S^{r_1}_{p_1,q_1}A^{\dag }(\Omega )\,, \quad A, A^{\dag } \in \{B, F\}\,, \end{aligned}$$
(1)

where \(\Omega \subset \mathbb {R}^n\) is a bounded domain, \(0<p_0, p_1, q_0, q_1 \le \infty \), and \(r_0-r_1>(1/p_0-1/p_1)_+\). The case \(A=B\) stands for the scale of Besov spaces of dominating mixed smoothness, while \(A=F\) refers to the scale of Triebel–Lizorkin spaces, which includes classical \(L_p\) and Sobolev spaces of mixed smoothness. That is why (1) also includes the classical embeddings

$$\begin{aligned} \mathrm {Id}:S^{r}_{p_0,q_0}A(\Omega ) \rightarrow L_{p_1}(\Omega )\,, \quad A \in \{B, F\}\,, \end{aligned}$$

if \(r>1/p_0-1/p_1\). Function space embeddings of this type play a crucial role in hyperbolic cross approximation [8]. Entropy numbers of such embeddings have been the subject of intense study, see[42, 8, Chapt. 6], and the recent papers by A.S. Romanyuk [33,34,35,36] and V.N. Temlyakov [40]. Note that there are a number of deep open problems connected to the case \(p_1 = \infty \), which reach out to probability and discrepancy theory [8, 2.6,6.4].

Typically, one observes asymptotic decays of the form

$$\begin{aligned} e_m(\mathrm {Id}) \simeq _n m^{-(r_0-r_1)}(\log m)^{(n-1)\eta }, \end{aligned}$$

where \(\eta >0\). This behavior is also well-known for s-numbers of these embeddings such as approximation, Gelfand, or Kolmogorov numbers, see [8] and the references therein. Although the main rate is the same as in the univariate case, the dimension still appears in the logarithmic term. We show that the logarithmic term completely disappears in regimes of small smoothness

$$\begin{aligned} 1/p_0-1/p_1 < r_0-r_1 \le 1/q_0-1/q_1. \end{aligned}$$

That is, we establish sharp purely polynomial asymptotic bounds of the form

$$\begin{aligned} e_m(\text {Id}) \simeq _n m^{-(r_0-r_1)}\quad ,\quad m\in \mathbb {N}\,, \end{aligned}$$
(2)

which depends on the underlying dimension n only in the constant. This settles several open questions stated in the literature [8, 42], see Sect. 5, and makes the framework highly relevant for high-dimensional approximation.

A key ingredient in the proof of (2) is a counterpart of Schütt’s theorem for the entropy numbers of the embedding

$$\begin{aligned} \mathrm {id}:\ell _p^b(\ell _q^d) \rightarrow \ell _r^b(\ell _u^d), \end{aligned}$$

where \(0<p<r\le \infty \) and \(0<q\le u\le \infty \). We prove matching bounds for all parameter constellations. A particularly relevant case for the purpose of this paper is the situation where \(b\le d\) and

$$\begin{aligned} 1/p-1/r > 1/q-1/u \ge 0\,. \end{aligned}$$

Here, we have the surprising behavior

$$\begin{aligned} e_k(\mathrm {id}) \simeq {\left\{ \begin{array}{ll} 1&{}: 1\le k\le \log (bd),\\ \left( \frac{\log (ed/k)}{k}\right) ^{1/q-1/u} &{}: \log (bd) \le k \le d,\\ \left( \frac{d}{k}\right) ^{1/p - 1/r} d^{-(1/q - 1/u)} &{}: d \le k \le bd,\\ b^{-(1/p-1/r)}d^{-(1/q-1/u)}2^{-\frac{k-1}{bd}} &{}: k\ge bd. \end{array}\right. } \end{aligned}$$
(3)

Note that this relation is not a trivial extension of the classical Schütt result [38], which reads as

$$\begin{aligned} e_k(\mathrm {id}: \ell _p^b \rightarrow \ell _r^b) \simeq \left\{ \begin{array}{rcl}1&{}:&{}1\le k \le \log (b),\\ \Big (\frac{\log (eb/k)}{k}\Big )^{1/p-1/r}&{}:&{}\log (b)\le k\le b,\\ 2^{-\frac{k-1}{b}}b^{-(1/p-1/r)}&{}:&{}k\ge b\,, \end{array} \right. \end{aligned}$$

for the norm-1-embedding \(\mathrm {id}:\ell ^b_p \rightarrow \ell ^b_r\), where \(0<p\le r\le \infty \). In fact, using trivial embeddings would give an additional \(\log \)-term in the third case of (3). The absence of this \(\log \)-term makes (3) interesting and useful as we will see below.

For \(1 \le k \le \log (db)\) and \(k \ge bd\), it requires only trivial and standard volumetric arguments to establish matching bounds for the entropy numbers \(e_k(id: \ell _p^b(\ell _q^d) \rightarrow \ell _r^b(\ell _u^d))\). The middle range \(\log (bd)~\le ~k~\le ~bd\) is much more involved. In general, it is far from straightforward to generalize the proof ideas from \(d=1\) (Schütt) to \(d>1\). Fortunately, the crucial work has already been done in a recent work by Edmunds and Netrusov [10]. They prove a general abstract version of Schütt’s theorem for operators between vector-valued sequence spaces. It remains for us to turn these general, abstract bounds into explicit estimates for the entropy numbers \(e_k(\mathrm {id}:\ell _p^b(\ell _q^d)\rightarrow \ell _r^b(\ell _u^d))\). Unfortunately, the paper [10] is written very concisely, which makes it difficult to follow the arguments at several points. Hence, we decided to provide some additional, explanatory material. We hope that Sect. 3 helps a broader readership to appreciate the powerful ideas in [10], in particular, a novel covering construction based on dyadic grids.

Outline The paper is organized as follows. In Sect. 2, we recapitulate basic definitions and results including entropy numbers and Schütt’s theorem. Afterwards, in Sect. 3, we discuss the generalization of Schütt’s theorem by [10]. In Sect. 4, we show consequences of this result, including matching bounds for the entropy numbers \(e_k(\mathrm {id}:\ell _p^b(\ell _q^d) \rightarrow \ell _r^b(\ell _u^d))\). Finally, we improve upper bounds for the entropy numbers of Besov and Triebel–Lizorkin embeddings in regimes of small smoothness in Sect. 5.

Notation As usual \(\mathbb {N}\) denotes the natural numbers, \(\mathbb {N}_0:=\mathbb {N}\cup \{0\}\)\(\mathbb {Z}\) denotes the integers, \(\mathbb {R}\) the real numbers, \(\mathbb {R}_+\) the positive real numbers, and \(\mathbb {C}\) the complex numbers. For \(a\in \mathbb {R}\) we denote \(a_+ := \max \{a,0\}\). We write \(\log \) for the natural logarithm. \(\mathbb {R}^{m\times n}\) denotes the set of all \(m\times n\)-matrices with real entries and \(\mathbb {R}^n\) denotes the Euclidean space. Vectors are usually denoted with \(x,y\in \mathbb {R}^n\). For \(0<p\le \infty \) and \(x\in \mathbb {R}^n\), we use the quasi-norm \(\Vert x\Vert _p := (\sum _{i=1}^n |x_i|^p)^{1/p}\) with the usual modification in the case \(p=\infty \). If X is a (quasi-)normed space, then \(B_X\) denotes its unit ball and the (quasi-)norm of an element x in X is denoted by \(\Vert x\Vert _X\). If X is a Banach space, then we denote its dual by \(X^*\). We will frequently use the quasi-norm constant, i.e., the smallest constant \(\alpha _X\) satisfying

$$\begin{aligned} \Vert x+y\Vert _X \le \alpha _X(\Vert x\Vert _X + \Vert y\Vert _X), \qquad \text {for all } x,y\in X. \end{aligned}$$

For a given \(0<p\le 1\) we say that \(\Vert \cdot \Vert _X\) is a p-norm if

$$\begin{aligned} \Vert x+y\Vert ^p_X \le \Vert x\Vert _X^p + \Vert y\Vert _X^p, \qquad \text {for all } x,y\in X. \end{aligned}$$

As is well known, any quasi-normed space can be equipped with an equivalent p-norm (for a certain \(0<p\le 1\), see [2, 32]). If \(T:X\rightarrow Y\) is a continuous operator we write \(T\in \mathcal {L}(X,Y)\) and \(\Vert T\Vert \) for its operator (quasi-)norm. The notation \(X \hookrightarrow Y\) indicates that the identity operator \(\mathrm {Id}:X \rightarrow Y\) is continuous. For two non-negative sequences \((a_n)_{n=1}^{\infty },(b_n)_{n=1}^{\infty }\subset \mathbb {R}\) we write \(a_n \lesssim b_n\) if there exists a constant \(c>0\) such that \(a_n \le c\,b_n\) for all n. We will write \(a_n \simeq b_n\) if \(a_n \lesssim b_n\) and \(b_n \lesssim a_n\). If \(\alpha \) is a set of parameters, then we write \(a_n \lesssim _{\alpha } b_n\) if there exists a constant \(c_{\alpha }>0\) depending only on \(\alpha \) such that \(a_n \le c_{\alpha }\,b_n\) for all n.

Let \(b,d \in \mathbb {N}\). For \(0<p,q \le \infty \), the bd-dimensional mixed space \(\ell _p^b(\ell _q^d)\) is defined as the space of all matrices \(x \in \mathbb {R}^{b\times d}\) equipped with the mixed (quasi-)norm

$$\begin{aligned} \Vert x\Vert _{p,q} := \left( \sum _{i=1}^b \Big (\sum _{j=1}^d |x_{ij}|^q \Big )^{p/q}\right) ^{1/p}, \qquad 0<p<\infty , \; 0<q<\infty \,, \end{aligned}$$

with the usual modification that the corresponding sum is replaced by a maximum in the case that either \(p=\infty \) or \(q=\infty \). We always refer to the \(\ell _p\)-space supported on \([b]:=\{1,\ldots ,b\}\) as the outer space and to the \(\ell _q\)-space supported on [d] as the inner space. For any \(S\subset [b]\times [d]\) and \(x\in \mathbb {R}^{b\times d}\) we define \(x_S\) as the matrix \((x_S)_{ij} = x_{ij}\) for \((i,j)\in S\)\((x_S)_{ij} = 0\) for \((i,j)\in S^c\).

2 Entropy Numbers and Schütt’s Theorem

Let us recall basic notions and properties concerning entropy numbers. Let K be a subset of a quasi-Banach space Y. Given \(\varepsilon > 0\), an \(\varepsilon \) -covering is a set of points \(x_1,\dots ,x_n \in K\) such that

$$\begin{aligned} K \subset \bigcup _{i=1}^n \big ( x_i + \varepsilon B_Y \big )\,. \end{aligned}$$

An \(\varepsilon \)-packing is a set of points \(x_1,\dots , x_m \in K\) such that \(\Vert x_i - x_j\Vert _Y > \varepsilon \) for pairwise different ij. The covering number \(N_\varepsilon (K,Y)\) is the smallest n such that there exists an \(\varepsilon \)-covering of K, while the packing number \(M_\varepsilon (K, Y)\) is the largest m such that there exists an \(\varepsilon \)-packing of K. It is easy to see that

$$\begin{aligned} M_{2\varepsilon }(K,Y) \le N_\varepsilon (K,Y) \le M_\varepsilon (K,Y). \end{aligned}$$

The metric entropy is defined to be

$$\begin{aligned} H_\varepsilon (K,Y) = \log _2 N_\varepsilon (K,Y)\,; \end{aligned}$$

see Remark 4 for the relation of metric entropy to other notions of entropy.

The k-th entropy number \(e_k(K,Y)\) can be redefined as

$$\begin{aligned} e_k(K, Y) := \inf \{ \varepsilon > 0: H_\varepsilon (K,Y) \le k-1 \}. \end{aligned}$$

It is easy to see that the sequence of entropy numbers is decaying, i.e., \(e_1 \ge e_2 \ge \dots \ge 0\). Moreover, the set K is compact in X if and only if \(\lim _{k \rightarrow \infty } e_k(K,Y) = 0\).

Let T denote an operator mapping between two quasi-Banach spaces X and Y. Recall from the introduction that the operator’s entropy numbers are given by

$$\begin{aligned} e_k(T: X \rightarrow Y) = e_k(T(B_X),Y), \quad k \in \mathbb {N}. \end{aligned}$$

Clearly, we have

$$\begin{aligned} \Vert T\Vert = e_1(T) \ge e_2(T) \ge e_3(T) \ge \cdots \ge e_k(T) \ge 0\,. \end{aligned}$$

If \(T_1,T_2\) are both operators from X to Y, and Y is a \(\vartheta \)-normed space, then the entropy numbers of the sum can be estimated as follows

$$\begin{aligned} e_{k_1 + k_2-1}(T_1 + T_2)^\vartheta \le e_{k_1}(T_1)^\vartheta + e_{k_2}(T_2)^\vartheta . \end{aligned}$$
(4)

Moreover, if \(S \in \mathcal {L}(X,Y)\) and \(R \in \mathcal {L}(Y,Z)\) then

$$\begin{aligned} e_{k_1 + k_2-1}(R\circ S) \lesssim e_{k}(R)e_{k_2}(S)\,. \end{aligned}$$
(5)

In particular, this gives

$$\begin{aligned} e_{k}(R\circ S) \le e_{k}(R)\Vert S\Vert \,. \end{aligned}$$
(6)

For further general properties of entropy numbers and basic estimates, we refer the reader to the monographs [5, 25, 29]. For remarks on the history of entropy number research, see [5, 43].

In the concrete situation where \(X=\ell _p^b\) and \(Y=\ell _q^b\) for \(0<p \le q \le \infty \), the entropy numbers of the embedding \(\mathrm {id}: \ell _p^b \rightarrow \ell _q^d\) are completely understood in terms of their decay in k and b. This central result is often referred to as Schütt’s theorem. For its history and references, see Remark 3. We only state the interesting case \(0<p<q \le \infty \) here.

Theorem 1

(Schütt’s theorem) For \(0<p\le q \le \infty \) and \(k,b \in \mathbb {N}\), we have

$$\begin{aligned} e_k(\mathrm {id}: \ell _p^b \rightarrow \ell _q^b) \simeq \left\{ \begin{array}{rcl}1&{}:&{}1\le k \le \log (b),\\ \Big (\frac{\log (1+b/k)}{k}\Big )^{1/p-1/q}&{}:&{}\log (b)\le k\le b,\\ 2^{-k/b}b^{1/q-1/p}&{}:&{}k\ge b. \end{array} \right. \end{aligned}$$

The constants in the estimates do neither depend on k nor on b.

Remark 2

Note that \(e_k(\mathrm {id}: \ell _\infty ^b \rightarrow \ell _\infty ^b) = 1\) as long as \(k \le b\) because \(\Vert x - y\Vert _\infty = 2\) for different \(x,y \in \{-1,1\}^b\).

Remark 3

In 1984, Schütt [38] gave a proof for the general case of symmetric Banach spaces, which implies Theorem 1 if \(1\le p \le q \le \infty \). In the range \(1 \le k \le b\), the upper bound was first proved for all \(0<p \le q \le \infty \) by Edmunds and Triebel [11] in 1996 by covering the unit ball using suitable sparse vectors. Edmunds and Netrusov [9, Thm. 2] generalized this covering construction in 1998 to arbitrary quasi-Banach spaces. In the same paper, Edmunds and Netrusov also proved matching lower bounds for general quasi-Banach spaces [9, Thm.2]. Kühn [22] also proved the lower bound for \(e_k(\mathrm {id}: \ell _p^b \rightarrow \ell _q^b)\) with \(0<p\le q \le \infty \) in 2001. Both [9, Thm. 2] and [22] rely on the very same idea to pack the unit ball with sparse vectors and use the fundamental combinatorial fact discussed in Remark 12 (ii) below. In 2000, Guédon and Litvak [15, Thm.6] provided an alternative proof of Theorem 1 that relies completely on interpolation arguments and improved the constants in the upper bound.

Remark 4

The concept of metric entropy for compact sets has been introduced independently by Kolmogorov [18] and Pontrjagin and Schnirelmann [31]. It should not be confused with the metric entropy of a dynamical system, which also has been introduced by Kolmogorov [19]. The latter entropy is also called Kolmogorov-Sinai entropy or measure-theoretic entropy. However, these two notions of metric entropy are related [1]. There is also a deep connection between Kolmogorov-Sinai entropy and the notions of information entropy and thermodynamic entropy [3].

3 Edmunds–Netrusov Revisited

In addition to Schütt’s theorem, the main tool that we employ in this work is a powerful result by Edmunds and Netrusov [10]. They prove a generalization of Schütt’s theorem for vector-valued sequence spaces. Let us restate the part of their result that is relevant for us.

Theorem 5

(Theorems 3.1 and 3.2 in [10]) Let \(b \in \mathbb {N}\) such that \(b \ge 2\), \(0<p\le r\le \infty \) and let X and Y be \(\gamma \)-normed quasi-Banach spaces. For \(k,m \in \mathbb {N}\) such that \(m \le k\), let

$$\begin{aligned} D(m,k) = \max _{\ell \in \mathbb {N}, m \le \ell \le k} (\ell /k)^{1/p-1/r} e_\ell (\mathrm {id}:X \rightarrow Y), \end{aligned}$$

and

$$\begin{aligned} A(k,b) = \max \left\{ \Vert \mathrm {id}: X \rightarrow Y\Vert \left( \frac{\log (eb/k)}{k}\right) ^{1/p-1/r}, D(1,k) \right\} . \end{aligned}$$

For \(k \ge \log _2(b)\), we have the following.

  1. (i)

    If \(k \le b\), then

    $$\begin{aligned} e_k(\mathrm {id}:\ell _p^b(X) \rightarrow \ell _r^b(Y)) \simeq A(k,b). \end{aligned}$$
  2. (ii)

    If \(k \ge b\), then there are absolute constants \(c_1, c_2\) such that

    $$\begin{aligned} D(c_1 k/b, k) \lesssim e_k(\mathrm {id}:\ell _p^b(X) \rightarrow \ell _r^b(Y)) \lesssim D(c_2 k/b,k). \end{aligned}$$

Theorem 5 gives abstract lower and upper bounds that are “matching” in the sense that both have the same functional form. At first glance, this functional form is neither obvious nor easy to interpret. In addition, we found it difficult to follow the arguments in [10] at several points due to its succinct style of presentation. We thus believe that it is of value to review their key arguments and to provide some additional material that makes Theorem 5 more comprehensible. This is the subject of the remainder of this section. The reader who is only interested in applications of Theorem 5 may proceed directly to Sect. 4.

Remark 6

Theorems 3.1 and 3.2 in [10] are only stated for \(0<p<r\le \infty \). However, these theorems also hold true for \(p=r\). First note that in the latter case, we have

$$\begin{aligned} D(m,k) = e_m(\mathrm {id}: X \rightarrow Y), \qquad A(k,b) = \Vert \mathrm {id}: X \rightarrow Y\Vert . \end{aligned}$$

Now for \(k \ge b\), Theorem 5 has been proved in [27, Thm. 4.3]. For \(k \le b\), the lower bound in Theorem 5 is a consequence of [27, Thm.4.3] in combination with arguments analogous to Remark 12; the upper bound is trivial.

3.1 A Special Case to Begin with

If \(p=r=\infty \) it is clear that one simply has to take b-fold Cartesian products of the optimal covering and packing of \(B_X\) in Y to obtain the bounds

$$\begin{aligned} \frac{1}{2} \, e_{k+1}(\mathrm {id}:X \rightarrow Y)\le e_{kb}(\mathrm {id}:\ell _\infty ^b(X) \rightarrow \ell _\infty ^b(Y)) \le e_k(\mathrm {id}:X \rightarrow Y), \qquad k \in \mathbb {N}. \end{aligned}$$

In any other case, simple Cartesian products will not be good enough.

The special case of equal inner spaces \(X=Y\) also allows for a rather straightforward solution if the dimension of the inner space is finite. For an easier understanding of the contribution in [10], see Theorem 5 above, we find it instructive to give a direct proof of this special case and point out its limitations. Indeed, a straightforward generalization of the well-known Edmunds–Triebel covering construction [11] based on volume arguments will do the job to establish the optimal upper bound. Recall that the essence of this covering construction is a result from best s-term approximation, sometimes referred to as Stechkin’s inequality, see [8, Sect. 7.4], which yields a \(s^{-1/p+1/r}\)-covering of \(B_{\ell _p^b}\) in \(\ell _r^b\) using only s-sparse vectors. We simply have to extend this approach to row-sparse matrices. To improve readability, we will omit some technical details in the following proof.

Proposition 7

Let \(0<p \le r \le \infty \) and X be \(\mathbb {R}^d\) (quasi-)normed with \(\Vert \cdot \Vert _X\). Further let \(b,d \in \mathbb {N}\) and \(d>5\). Then, for \(1 \le k \le bd\),

$$\begin{aligned} e_k(\mathrm {id}: \ell _p^b(X) \rightarrow \ell _r^b(X)) \lesssim _{p,r} {\left\{ \begin{array}{ll} 1&{}: 1\le k\le \max \{\log (b),d\},\\ \left[ \frac{\log (eb/k)+d}{k}\right] ^{1/p - 1/r} &{}: \max \{\log (b),d\} \le k \le bd,\\ b^{-(1/p-1/r)}2^{-(k-1)/(bd)} &{}:k\ \ge bd. \end{array}\right. } \end{aligned}$$

Proof

The first case is trivial. The last case follows from volumetric arguments using the recent findings in [17, Sect. 3.2]. By these we know that

$$\begin{aligned} {{\,\mathrm{vol}\,}}(B_{\ell _p^b(X)})^{1/(bd)} = \frac{\Gamma (1+d/p)^{1/d}}{\Gamma (1+db/p)^{1/(db)}}\cdot {{\,\mathrm{vol}\,}}(B_X)^{1/d} \end{aligned}$$
(7)

and similarly for \({{\,\mathrm{vol}\,}}(B_{\ell _r^b(X)})^{1/(bd)}\). For \(k>bd\) we use the standard volume argument to obtain

$$\begin{aligned} e_k \lesssim \Big [\frac{{{\,\mathrm{vol}\,}}(B_{\ell _r^d(X)})}{{{\,\mathrm{vol}\,}}(B_{\ell _p^d(X)})}\Big ]^{1/(db)}2^{-(k-1)/bd} \simeq _{p,r} b^{-(1/p-1/r)}2^{-(k-1)/(bd)}\,. \end{aligned}$$
(8)

For the second case let \(s \in [b]\). Clearly, we have that

$$\begin{aligned} B_{\ell _p(X)} \subseteq \bigcup _{I \subseteq [b]: \sharp I = s} B_I, \end{aligned}$$
(9)

where \( B_I := \{ x \in B_{\ell _p^b(\ell _q^d)} : \Vert x_{i\cdot }\Vert _X \ge \Vert x_{k\cdot }\Vert _X \text { for } i \in I, k\in [b]\setminus I\}\). When we replace the s rows with the largest \(\Vert \cdot \Vert _X\)-(quasi-)norm by 0 in \(x \in B_I\), then the resulting matrix has a \(\ell _r^b(X)\)-(quasi-)norm of at most \(s^{-(1/p-1/r)}\), which follows from a well-known relation for best s-term approximation in \(\ell _r\). Hence, if we wish to cover the set \(B_I\) by balls of radius \(\varepsilon \simeq s^{-(1/p-1/r)}\), it suffices to take care of the s largest components of the vectors in \(B_I\). That is, we take a suitable covering of \(B_{\ell _p^s(X)}\) in \(\ell _r^s(X)\) and append \(b-s\) zero rows to every matrix of the covering. A similar volumetric argument as above in (7) and (8) tells us that

$$\begin{aligned} e_{1+c_{p,q}sd}(\mathrm {id}:\ell _p^s(X) \rightarrow \ell _r^s(X) ) \lesssim _{p,r} \Big [\frac{{{\,\mathrm{vol}\,}}(B_{\ell _r^s(X)})}{{{\,\mathrm{vol}\,}}(B_{\ell _p^s(X)})}\Big ]^{1/(ds)} \lesssim _{p,r} s^{-(1/p-1/r)}\,, \end{aligned}$$
(10)

so that we obtain a covering of \(B_I\) with cardinality \(2^{c_{p,q}sd}\).

Combining the coverings for all possible index sets I yields an \(\varepsilon \)-covering U of \(B_{\ell _p^b(X)}\) in \(B_{\ell _r^b(X)}\), where \(\varepsilon \simeq s^{-1/p+1/r}\), with cardinality

$$\begin{aligned} \sharp U = {b \atopwithdelims ()s} 2^{c_{p,q}sd}. \end{aligned}$$

Now, given \(k \in [bd]\), we choose

$$\begin{aligned} s \simeq \frac{k}{\log (eb/k)+d} \end{aligned}$$

such that

$$\begin{aligned} \log (\sharp U) \lesssim s (\log (eb/s)+d) \le k-1 \end{aligned}$$

is assured. Consequently, we obtain the upper bound

$$\begin{aligned} e_k(\mathrm {id}:\ell _p^b(X) \rightarrow \ell _r^b(X)) \lesssim s^{-1/p+1/r} \lesssim \left( \frac{\log (eb/k)+d}{k}\right) ^{1/p-1/r}. \end{aligned}$$

\(\square \)

Remark 8

One way to obtain the matching lower bound in the case \(X = Y\) is to generalize the proof idea underlying Schütt’s theorem (Theorem 1) in the case that \(\log (b) \le k \le b\). However, the standard combinatorial lemma is not sufficient here. A suitable packing to do this generalization has already been considered in [6, Prop. 5.3]. See also Remark 12 below.

3.2 The Covering Construction by Edmunds and Netrusov

The generalized Edmunds–Triebel covering is optimal for finite dimensional \(X=Y\), see Proposition 7 in the previous section. In the general situation, where X is compactly embedded into Y, it seems that the volumetric arguments underlying (10) are too coarse to obtain sharp estimates (at least in the finite dimensional situation). The main contribution of [10] is a covering construction which resolves this shortcoming by not using volumetric arguments at all. In particular, X and Y do not have to be finite dimensional. We give a detailed recapitulation of their idea in this section. For some comments concerning the lower bound in Theorem 5, see Remark 12 at the end of this section.

The covering in [10] works in the very general situation where we are given quasi-Banach spaces \(X_1,\dots , X_b\) and \(Y_1,\dots , Y_b\), see Proposition 10 below. The basic idea is to cover the unit ball \(B_{\ell _p(\{X_i\}_{i=1}^b)}\) by N cuboids

$$\begin{aligned} U(v^i) = v_1^i B_{X_1} \times \cdots \times v_b^i B_{X_b}, \end{aligned}$$
(11)

where \(v^1,\dots ,v^N \in \mathbb {R}_+^b\) and N is exponential in b (think of each cuboid as an anisotropically rescaled version of \(B_{\ell _\infty (\{X_i\}_{i=1}^b)}\)). The crux is to find suitable vectors \(v^i\) such that an optimal covering can be reached by covering the cuboid \(U(v^i)\) using a product of optimal coverings of \(B_{X_1}\),...,\(B_{X_b}\). Edmunds and Netrusov [10] had the idea to consider vectors that form a dyadic grid derived from the simplex

$$\begin{aligned} S(b) = \Big \lbrace x \in [0,1]^b: \sum _{i=1}^b x_i \le 1 \Big \rbrace , \qquad b \in \mathbb {N}. \end{aligned}$$

The dyadic grid is constructed with the help of the following mapping. Let

$$\begin{aligned} \upsilon _0: \mathbb {R}_+ \rightarrow \{2^k: k \in \mathbb {N}_0\}, \; x \mapsto 2^{\max \{0,\lceil \log _2(x) \rceil \}}, \end{aligned}$$

and for \(x \in [0,1]^b\), put

$$\begin{aligned} \upsilon (x) := b^{-1} (\upsilon _0(bx_1),\dots , \upsilon _0(bx_b)). \end{aligned}$$

This mapping \(\upsilon \) leads to a finite grid with the following properties.

Lemma 9

(Simplification of Lemma 2.2 in [10]) For \(b \in \mathbb {N}\), let \(\Gamma (b) = \upsilon (S(b))\). The set \(\Gamma (b)\) has the following properties.

  1. (i)

    For all \(u \in S(b)\), there is \(v \in \Gamma (b)\) such that \(u_i \le v_i\) for all \(i \in [b]\).

  2. (ii)

    For all \(v \in \Gamma (b)\), we have \(\Vert v\Vert _1 \le 2\).

  3. (iii)

    For all \(v \in \Gamma (b)\), we have \(bv_i \in \mathbb {N}\) for each \(i \in [b]\).

  4. (iv)

    We have \(\sharp \Gamma (b) \le 2^{3b}\).

Proof

Given \(x \in S(b)\), let \(v = \upsilon (x)\). We clearly have \(\sum _{i=1}^b v_i \le 2\) and \(b v_i \in \mathbb {N}\) for all indices \(i=1,\dots ,b\). Further

$$\begin{aligned} \sharp \{ i \in [b] : v_i \ge t \} \le 2/t, \end{aligned}$$

which is a crucial property to estimate the cardinality of the set \(\Gamma (b)\). Let

$$\begin{aligned} B(v,k)&:= \{ i \in [b]: v_i = 2^k/b \}, \quad k \in \mathbb {N}_0,\\ C(v,k)&:= \{ i \in [b]: v_i \ge 2^k/b \}. \end{aligned}$$

Clearly, \(\sharp B(v,k) \le \sharp C(v,k) \le \min \{b, b2^{1-k}\}\). Varying over all elements in the simplex, B(v, 0) can be any of the \(2^b\) subsets of [b]. Fixing B(v, 0), there are at most \(2^b\) possibilities for B(v, 1). Fixing B(v, 0) up to \(B(v,k-1)\), there are at most \(2^{b2^{1-k}}\) possibilities for B(vk). Hence, in total the set \(\Gamma (b)\) may contain at most

$$\begin{aligned} 2^b \cdot 2^b \cdot \sum _{k=2}^\infty 2^{b2^{k-1}} = 2^{3b} \end{aligned}$$

many elements.\(\square \)

The dyadic grid according to Lemma 9 allow us to establish the following upper bound on entropy numbers.

Proposition 10

(Reformulation of Lemma 2.3 in [10]) Let \(X_1,\dots ,X_b\) and \(Y_1,\dots ,Y_b\) be quasi-Banach spaces, let \(0<p\le r\le \infty \), and let \(k\in \mathbb {N}\) such that \(k \ge 8b\). Then, we have

$$\begin{aligned} \begin{aligned}&e_{k+1}(\mathrm {id}:\ell _p(\{X_i\}_{i=1}^b) \rightarrow \ell _r(\{Y_i\}_{i=1}^b)) \\&~~~~~~~~\le 2^{1/r} \, 8^{1/p-1/r} \max _{i \in [b]} \max _{\lfloor \frac{3k}{8b}\rfloor \le m \le k} (m/k)^{1/p-1/r} e_{m}(\mathrm {id}:X_i \rightarrow Y_i). \end{aligned} \end{aligned}$$

Proof

Consider the transformed grid

$$\begin{aligned} \Gamma (b,p) = \big \{ (v_1^{1/p}, \dots , v_b^{1/p}): v \in \Gamma (b) \big \}. \end{aligned}$$

By Lemma 9 (i), we have

$$\begin{aligned} B_{\ell _p(\{X_i\}_{i=1}^b)} \subset \bigcup _{v \in \Gamma (b,p)} U(v), \end{aligned}$$

where U(v) is the cuboid defined in (11).

Let \(v \in \Gamma (b,p)\) be given by \(v = (v_1^{1/p}, \dots , v_b^{1/p})\). For each

$$\begin{aligned} m_i = \lfloor 1/2(k/b-2) \rfloor b v_i, \quad i \in [b], \end{aligned}$$

let \(\mathcal C_i\) be a \(e_{m_i}(v_i^{1/p}B_{X_i},Y_i)\)-covering. Then, for every \(x \in U(v)\), there is \(y \in \ell _r^b(Y)\) such that \(y_{i\cdot } \in \mathcal C_i\) and

$$\begin{aligned} \Vert x - y\Vert _{\ell _r^b(Y)}&\le \left( \sum _{i=1}^b v_i^{r/p} e_{m_i}(B_{X_i},Y_i)^r \right) ^{1/r}\\&\le \left( \sum _{i=1}^b v_i \right) ^{1/r} \max _{i=1,\dots ,b} \max _{j=1,\dots ,b} v_j^{1/p-1/r} e_{m_j}(B_{X_i},Y_i). \end{aligned}$$

By construction of the set \(\Gamma (b)\), we have \(\left( \sum _{i=1}^b v_i \right) ^{1/r} \le 2^{1/r}\) and

$$\begin{aligned} \max _{j=1,\dots ,b} v_j^{1/p-1/r} e_{m_j}(B_{X_i},Y_i)&\le 8^{1/p-1/r} \max _{m=\lfloor \frac{3k}{8b} \rfloor ,\dots ,k} (m/k)^{1/p-1/r} e_{m}(B_{X_i},Y_i). \end{aligned}$$

Finally, note that the product \(\mathcal C_1 \times \cdots \times \mathcal C_s\) has cardinality

$$\begin{aligned} \prod _{i=1}^{b} 2^{m_i-1} \le 2^{k-3b}, \end{aligned}$$

which, in combination with \(\sharp \Gamma (b,p) \le 2^{3b}\), implies the desired result.\(\square \)

Proposition 10 is not the complete final answer. For \(k \le b\), we have to modify the proof of Proposition 7. We sketch the proof and refer to the proof of [10, Thm 3.1] for technical details.

Proposition 11

Let \(\log _2(b) \le k \le b\). Then, we have

$$\begin{aligned} e_k(\mathrm {id}:\ell _p^b(X) \rightarrow \ell _r^b(Y)) \lesssim A(k,b), \end{aligned}$$

where A(kb) is defined in Theorem 5.

Proof

(Proof sketch) Let \(s \in [k]\). It is clear that, analogously to (9), we have

$$\begin{aligned} B_{\ell _p^b(X)} \subseteq \bigcup _{I \subset [b]:\, \sharp I = s} B_I\,. \end{aligned}$$

Similar to Proposition 7, we can use a covering for \(B_{\ell _p^s(X)}\) to construct a covering for \(B_I\). Consider now \(\varepsilon = e_k(B_{\ell _p^s(X)},\ell _r^s(Y))\) and let \(\Gamma _0\) be a minimal \(\varepsilon \)-covering of \(B_{\ell _p^s(X)}\) in \(\ell _r^s(Y)\). Let \(\Gamma _I = \Gamma _0 \times \{0\}^{[b]\setminus I}\). Then, for every \(x \in B_I\), there is \(y \in \Gamma _I\) such that

$$\begin{aligned} \Vert x - y\Vert _{\ell _r^b(Y)} \lesssim _{r,p} \varepsilon + s^{1/r-1/p}\Vert \mathrm {id}: X \rightarrow Y\Vert , \end{aligned}$$

where the second term on the right-hand side follows from the best s-term approximation result already used in Proposition 7. Consequently, we have

$$\begin{aligned} \Vert x - y\Vert _{\ell _r^b(Y)} \lesssim _{r,p} \max \left\{ e_k(B_{\ell _p^s(X)},\ell _r^s(Y)),\, s^{1/r-1/p}\Vert \mathrm {id}: X \rightarrow Y\Vert \right\} . \end{aligned}$$
(12)

In contrast to Proposition 7, volumetric arguments would now give a suboptimal estimate for the entropy numbers \(e_k(B_{\ell _p^s(X)},\ell _r^s(Y)\). In this general situation, it requires Proposition 10 with \(X_1=\dots =X_b=X\) and \(Y_1=\dots =Y_b=Y\) to get the proper estimate. Concretely, since \(s \le k\), we have

$$\begin{aligned} e_k(B_{\ell _p^s(X)}, \ell _r^s(Y)) \le e_k(B_{\ell _p^k(X)}, \ell _r^k(Y))\,, \end{aligned}$$

which leads in combination with Proposition 10 and (12) to an upper bound of the form

$$\begin{aligned} e_k(B_{\ell _p^b(X)}, \ell _r^b(Y)) \lesssim \max \left\{ s^{1/r-1/p}\Vert \mathrm {id}: X \rightarrow Y\Vert , \, \max _{m \in [k]} \, (m/k)^{1/p-1/r} e_m(B_X,Y) \right\} . \end{aligned}$$

The usual arguments show that it is optimal to choose s of the order \(k/\log (eb/k)\).\(\square \)

Remark 12

We close this section with some remarks concerning the lower bound in Theorem 5. Its proof relies on two surprisingly simple observations, see [10] for details.

(i) Let M be a maximal \(\varepsilon \)-packing of \(B_X\) in Y. Using the Gilbert–Varshamov bound, which is well-known in coding theory [14, 41], we know that \((2s)^{-1/p}M^{2s} \subset B_{\ell _p^b(X)}\) contains N elements of mutual distance \(s^{1/r-1/p}\varepsilon \), where \(N \simeq \mathrm {card}(M)^s\). This leads to the lower bound

$$\begin{aligned} e_{ms}(B_{\ell _p^s(X)}, \ell _r^s(Y)) \gtrsim s^{1/r - 1/p} e_{4m+6}(B_X, Y), \end{aligned}$$

see [27, p. 68] and [10, Lem.2.6] for a more general formulation. Given \(k \in \mathbb {N}\), we have to make a good choice for the dimension s to maximize the lower bound. Choose \(s = k/m\) for some \(m \in [k]\) to obtain

$$\begin{aligned} e_k(B_{\ell _p^{s}(X)}, \ell _r^{s}(Y)) \gtrsim (m/k)^{1/p-1/r} e_{4m+6}(B_X,Y). \end{aligned}$$

If \(k \le b\), we conclude

$$\begin{aligned} e_k(B_{\ell _p^{b}(X)}, \ell _r^{b}(Y)) \gtrsim \max _{m \in [k]} \; (m/k)^{1/p-1/r} e_{4m+6}(B_X,Y). \end{aligned}$$

If \(k \ge b\), then \(m \ge k/b\) guarantees \(s=k/m \le b\) and thus

$$\begin{aligned} e_k(B_{\ell _p^{b}(X)}, \ell _r^{b}(Y)) \gtrsim \max _{k/b \le m \le k} \; (m/k)^{1/p-1/r} e_{4m+6}(B_X,Y). \end{aligned}$$

(ii) Choose a vector \(x \in B_X\) such that

$$\begin{aligned}\Vert x\Vert _Y \ge \frac{1}{2}\Vert \mathrm {id}:X \rightarrow Y\Vert .\end{aligned}$$

We construct a packing by building row-sparse matrices, where the nonzero rows contain copies of x and the row support sets are chosen according to the following combinatorial fact that is well-known in various disciplines of mathematics, see e.g., [13, Lemma 10.12], [22], [12], or [30, Prop. 2.21, p. 219]. Given \(s,n \in \mathbb {N}\) such that \(0< s < n/2\), there exist subsets \(I_1, \ldots , I_N\) of [n], where

$$\begin{aligned} N \ge \Big (\frac{n}{8s}\Big )^{s}, \end{aligned}$$

such that each subset \(I_i\) has cardinality 2s and

$$\begin{aligned} \sharp (I_i \cap I_j) < s \ \ \text { whenever } i \ne j. \end{aligned}$$

This leads to the lower bound

$$\begin{aligned} e_k(B_{\ell _p^{b}(X)}, \ell _r^{b}(Y)) \gtrsim \Vert \mathrm {id}: X \rightarrow Y\Vert \left( \frac{\log (eb/k)}{k}\right) ^{1/p-1/r}. \end{aligned}$$

In view of the packing construction that we have mentioned in Remark 8 it is somewhat surprising that it is not necessary to combine the combinatorics of the two observations in order to obtain the optimal abstract bound in Theorem 5. An explanation is given in [27, Rem. 4.13,p. 69].

4 Consequences of the Edmunds–Netrusov Result

We discuss some consequences of Theorem 5. Let us begin with considering the entropy numbers

$$\begin{aligned} e_k := e_k(\mathrm {id}:\ell _p^b(\ell _q^d) \rightarrow \ell _r^b(\ell _u^d)), \quad 0<p\le r \le \infty , \, 0<q\le u \le \infty . \end{aligned}$$

We have the following matching bounds.

Theorem 13

Let \(0<p \le r\le \infty \) and \(0< q \le u \le \infty \). Then, we have

$$\begin{aligned} e_k \simeq {\left\{ \begin{array}{ll} 1&{}: 1\le k\le \log (bd),\\ b^{-(1/p-1/r)}d^{-(1/q-1/u)}2^{-\frac{k-1}{bd}} &{}: k\ge bd. \end{array}\right. } \end{aligned}$$

For \(\log (bd) \le k \le bd\), we have the following case distinctions.

  1. (i)

    Let \(1/p-1/r > 1/q-1/u \ge 0\).

    1. (i.a)

      In the special case \(q=u\), we have

      $$\begin{aligned} e_k \simeq {\left\{ \begin{array}{ll} 1 &{}: \log (bd) \le k \le d,\\ \left\{ \frac{\log (eb/k)+d}{k} \right\} ^{1/p-1/r} &{}: d \le k \le bd. \end{array}\right. } \end{aligned}$$
    2. (i.b)

      If \(q<u\) and \(b \le d\), then

      $$\begin{aligned} e_k \simeq {\left\{ \begin{array}{ll} \left( \frac{\log (ed/k)}{k}\right) ^{1/q-1/u} &{}: \log (bd) \le k \le d,\\ \left( \frac{d}{k}\right) ^{1/p - 1/r} d^{1/u - 1/q} &{}: d \le k \le bd. \end{array}\right. } \end{aligned}$$
    3. (i.c)

      If \(q < u\) and \(d \le b\), then

      $$\begin{aligned} e_k \simeq {\left\{ \begin{array}{ll} \max \left\{ \left( \frac{\log (eb/k)}{k}\right) ^{1/p-1/r}, \left( \frac{\log (ed/k)}{k}\right) ^{1/q-1/u} \right\} &{}: \log (bd) \le k \le d,\\ \max \left\{ \Big (\frac{\log (eb/k)}{k}\Big )^{1/p-1/r}, \Big (\frac{d}{k}\Big )^{1/p-1/r}d^{1/u-1/q} \right\} &{}: d \le k \le b,\\ \left( \frac{d}{k}\right) ^{1/p-1/r} d^{1/u-1/q} &{}: b \le k \le bd. \end{array}\right. } \end{aligned}$$
  2. (ii)

    Let \(1/q-1/u \ge 1/p - 1/r \ge 0\). Then, we have

    $$\begin{aligned} e_k \simeq {\left\{ \begin{array}{ll} \left( \frac{\log (ebd/k)}{k}\right) ^{1/p-1/r} &{}: \log (bd) \le k \le b\log (d),\\ b^{1/r-1/p} \left( \frac{b\log (ebd/k)}{k}\right) ^{1/q-1/u} &{}: b\log (d) \le k \le bd. \end{array}\right. } \end{aligned}$$

Proof

For \(1 \le k \le \log (bd)\) and \(k \ge bd\), it requires only standard volumetric arguments, see [27, Appendix A] for details. Let us also refer to [7, Lemma 3], where this case has been already considered. Let D(mk) and A(kb) be as defined in Theorem 5. Moreover, throughout the proof, we write for \(k,l \in \mathbb {N}\),

$$\begin{aligned} s_{k,l} := (l/k)^{1/p-1/r} e_l(\mathrm {id}: \ell _q^d \rightarrow \ell _u^d). \end{aligned}$$

Ad (i.a). Since \(q=u\), it follows from Theorem 1 that \(e_l(\mathrm {id}: \ell _q^d \rightarrow \ell _u^d) \simeq 1\) for \(1 \le l \le d\) and consequently that \(D(1,k) = D(k/b,k) \simeq 1\) and \(A(k,b) \simeq 1\) for all \(k \le d\). Now, for \(k \ge d\), we have that \(s_{k,l} \simeq (l/k)^{1/p-1/q}\) for \(1 \le l \le d\), so the sequence is bounded from above by a monotonically increasing sequence. For \(d \le l \le k\), we have

$$\begin{aligned} s_{k,l} \simeq (l/k)^{1/p - 1/r} 2^{-l/d}:=t_{k,l}. \end{aligned}$$

Since \(2^{-l/d}\) decays faster in l than \((l/k)^{1/p - 1/r}\) increases, we conclude that for \(d \le l \le k\), the sequence \(s_{k,l}\) is “essentially monotonically decreasing”. To be more precise \(t_{k,l}\) attains at \(l=\beta _{p,r}d\) its maximum, where the factor \(\beta \) depends only on p and r. Hence, the maximum of \(s_{k,l}\) can be bounded from above by a constant times the maximum of \(t_{k,l}\) and therefore by \(c_{p,r}(d/k)^{1/p-1/r}\). Using analogous arguments for D(k/bk), we conclude that \(\widetilde{D}(1,k) = D(k/b,k) \simeq (d/k)^{1/p-1/r}\) and

$$\begin{aligned} A(k, b) \simeq \max \left\{ \left( \frac{\log (eb/k)}{k}\right) ^{1/p-1/r}, \Big (\frac{d}{k}\Big )^{1/p - 1/r} \right\} \simeq \max \left\{ \frac{\log (eb/k)}{k}, \frac{d}{k}\right\} ^{1/p-1/r} \end{aligned}$$

for \(d \le k \le b\).

Ad (i.b). Consider now \(0<q<u\) and \(b \le d\). For \(\log (bd) \le k \le b\), we have, in consequence of Theorem 1, that \(s_{k,l} \simeq (l/k)^{1/p - 1/r}\) for \(1 \le l \le \log (d)\) and

$$\begin{aligned} s_{k,l} \simeq (l/k)^{1/p-1/r} \left( \frac{\log (ed/l)}{l}\right) ^{1/q-1/u}. \end{aligned}$$

Since \(1/p-1/r > 1/q-1/u\), the sequence \(s_{k,l}\) is bounded from above and below up to a constant by a monotonically increasing sequence and consequently, the maximum is attained at \(l=k\) such that \(D(1,k) \simeq (\log (ed/k)/k)^{1/q-1/u}\). Since \(b \le d\), we further have

$$\begin{aligned} D(1,k) \le A(k,b) \lesssim D(1,k). \end{aligned}$$

For \(b \le k \le d\) we find as before that \(D(k/b, k) \simeq (\log (ed/k)/k)^{1/q-1/u}\) and for \(d < k \le bd\), we have the estimate 

$$\begin{aligned} D(k/b,k) \simeq \Big (\frac{d}{k}\Big )^{1/p-1/r} d^{1/u-1/q}\,. \end{aligned}$$

Ad (i.c). Consider now \(d \le b\). For \(\log (bd) \le k \le d\), we find \(D(1,k) \simeq (\log (ed/k)/k)^{1/q-1/u}\) since the sequence \(s_{k,l}\) is bounded from below and above by a sequence that increases monotonically in l. If \(d \le k \le b\), then

$$\begin{aligned} D(1,k) \simeq \Big (\frac{d}{k}\Big )^{1/p-1/r} d^{1/u-1/q} \end{aligned}$$

and

$$\begin{aligned} A(k,b)&\simeq \max \left\{ \Big (\frac{\log (eb/k)}{k}\Big )^{1/p-1/r}, \Big (\frac{d}{k}\Big )^{1/p-1/r} d^{1/u-1/q} \right\} . \end{aligned}$$

Finally, if \(b \le k \le bd\), then \(D(k/b,k) \simeq (d/k)^{1/p-1/r} d^{1/u-1/q}\).

Ad (ii). For \(\log (bd) \le k \le b\), we observe that

$$\begin{aligned} D(1,k) \simeq \left( \frac{\log (d)}{k}\right) ^{1/p-1/r} \end{aligned}$$

since the term \(e_\ell (B_{\ell _q^d}, \ell _u^d)\) is decaying in \(\ell \) at least as fast as \((\ell /k)^{1/p-1/r}\) is growing. Hence,

$$\begin{aligned} A(k,b)&\simeq \max \left\{ \Big (\frac{\log (eb/k)}{k}\Big )^{1/p-1/r}, \Big (\frac{\log (d)}{k}\Big )^{1/p-1/r}\right\} \\&\simeq \left( \frac{\log (ebd/k)}{k}\right) ^{1/p-1/r}. \end{aligned}$$

Next, we consider \(b \le k \le b \log (d)\). Since \(k/b \le \log (d)\), we find

$$\begin{aligned} D(k/b, k) \simeq (\log (d)/k)^{1/p-1/r} \ge (\log (bd/k)/k)^{1/p-1/r}\,, \end{aligned}$$

where we have used \(b/k \le 1\) in the last estimate. At the same time, since \(k/b \le \log (d)\), we also have \(\log (bd/k) \gtrsim \log (d)\) and thus

$$\begin{aligned} D(k/b, k) \simeq \left( \frac{\log (ebd/k)}{k}\right) ^{1/p-1/r}. \end{aligned}$$

Finally, for \(b \log (d) \le k \le bd\) it is easy to see that

$$\begin{aligned} D(k/b, k) \simeq b^{1/p-1/r} \left( \frac{b\log (ebd/k)}{k}\right) ^{1/q-1/u}. \end{aligned}$$

\(\square \)

Remark 14

The upper bound for \(k \ge bd\) in Theorem 13 also follows from [7, Lem. 3]. The upper bound in Theorem 13 (ii) has also been proved in [42, Lem 3.16] for the range \(b \max \{\log (d),\log (b)\} \le k \le bd\). The proof there uses the following covering construction, which as far as we know, first appeared in [23, Proof of Prop. 4]. Let \(X_1,\dots ,X_b\) and \(Y_1,\dots , Y_b\) be (quasi-)Banach spaces and \(0<p,r\le \infty \). The covering rests on the idea to split the ball \(B_{\ell _p^b({X_1,\dots ,X_b})}\) into subsets of matrices with non-increasing rows,

$$\begin{aligned} B_{\ell _p^b({X_1,\dots ,X_b})} \subseteq \bigcup _\pi \{ x \in B_{\ell _p^b({X_1,\dots ,X_b})}: \Vert x_{\pi (1)\cdot }\Vert _{X_1} \ge \dots \ge \Vert x_{\pi (b)\cdot }\Vert _{X_b} \}, \end{aligned}$$

where the union is taken over all permutations of [b]. This leads to the upper bound

$$\begin{aligned} e_{\sum _{j=1}^b n_j + b \log _2(b)}\big (B_{\ell _p(\{X_j\}_{j=1}^b)}, \ell _r(\{Y_j\}_{j=1}^b)\big ) \le \big ( \sum _{j=1}^b j^{-r/p} e_{n_j}(B_{X_j}, Y_j)^r\big )^{1/r} \end{aligned}$$
(13)

for \(n_1,\dots ,n_b \in \mathbb {N}\). If

$$\begin{aligned} X=X_1=\dots =X_b=\ell _q^d \quad \text { and } \quad Y=Y_1=\dots =Y_b=\ell _u^d \end{aligned}$$

with \(0<q\le u\), and we choose \(n_j \simeq j^{-\alpha }\) for some \(0<\alpha <1\) such that

$$\begin{aligned} \alpha (1/q-1/u) > 1/p - 1/r, \end{aligned}$$

then  (13) is strong enough to obtain the upper bound in Theorem 13 (ii), provided

$$\begin{aligned} b\max \{\log (d),\log (b)\} \le k \le bd. \end{aligned}$$

Now we increase the level of abstraction and consider mixed norms of higher order. Let, for \(\mu =1,\ldots ,b\), the weighted spaces \(X_{\mu }\) and \(Y_{\mu }\) be given by

$$\begin{aligned} X_{\mu }= & {} \ell _p^{b_\mu }(d_{\mu }^{\alpha }\ell _q^{d_\mu })\,,\nonumber \\ Y_{\mu }= & {} \ell _r^{b_\mu }(d_{\mu }^{\beta }\ell _u^{d_\mu })\,, \end{aligned}$$
(14)

with \(0<p\le r\le \infty \)\(0<q\le u\le \infty \), and \(\alpha \, \beta \in \mathbb {R}\). The dimensions \((d_\mu )_\mu \) and \((b_\mu )_{\mu }\) are non-decreasing natural numbers satisfying \(d_{\mu } \gtrsim b_{\mu }\). These spaces are used as “inner spaces” in the way that

$$\begin{aligned} X = \ell _p^b((X_\mu )_{\mu = 1,\ldots ,b}) \text{ and } Y = \ell _r^b((Y_\mu )_{\mu = 1,\ldots ,b})\,. \end{aligned}$$

Note that for \(x=(x_{\mu ,i,j})_{\mu ,i,j} \in X\) with \(\mu =1,\dots ,b\), \(i=1,\dots ,b_{\mu }\), \(j=1,\dots ,d_\mu \), the norm is given by

$$\begin{aligned} \Vert x\Vert _X = \left\{ \sum _{\mu =1}^{b} \sum _{i=1}^{b_\mu } d_\mu ^\alpha \left( \sum _{j=1}^{d_\mu } |x_{\mu ,i,j}|^q \right) ^{p/q} \right\} ^{1/p}. \end{aligned}$$

We are interested in the behavior of the entropy numbers

$$\begin{aligned} e_k(\mathrm {id}:X\rightarrow Y) = e_k(\mathrm {id}:\ell _p^b(X_\mu )\rightarrow \ell _r^b(Y_\mu )) \end{aligned}$$

in the special situation \(1/q-1/u < 1/p-1/r\) .

Proposition 15

Let \(0 \le 1/q-1/u < 1/p-1/r\) and \(\alpha -\beta \le 1/p-1/r-(1/q-1/u)\). Let further XY and \(X_{\mu }\)\(Y_{\mu }\) be as above. Then we have for all \(k\ge 8b\) and \(k \ge \max \limits _{\mu = 1,\ldots ,b} d_{\mu }\)

$$\begin{aligned} e_k(\mathrm {id}:\ell _p^b(X_\mu ) \rightarrow \ell _r(Y_\mu )) \lesssim \Big (\frac{1}{k}\Big )^{\alpha -\beta +1/q-1/u}\,. \end{aligned}$$

Proof

We use Theorem 5, in particular the upper bound in Proposition 10. Since \(k\ge 8b\) we obtain

$$\begin{aligned} e_k(\mathrm {id}:\ell _p^b(X_\mu ) \rightarrow \ell _r(Y_\mu ))\lesssim & {} \max \limits _{\mu = 1,\ldots ,b}\max \limits _{1\le \ell \le k} \Big (\frac{\ell }{k}\Big )^{1/p-1/r}e_{\ell }(\mathrm {id}:X_{\mu } \rightarrow Y_{\mu })\\= & {} \max \limits _{\mu = 1,\ldots ,b}\max \limits _{1\le \ell \le k} \Big (\frac{\ell }{k}\Big )^{1/p-1/r}d_\mu ^{-(\alpha -\beta )}e_{\ell }(\mathrm {id}:\ell _p^{b_\mu }(\ell _q^{d_{\mu }})\\\rightarrow & {} \ell _r^{b_\mu }(\ell _u^{d_{\mu }}))\\\lesssim & {} \max \limits _{\mu = 1,\ldots ,b}\Big \{\max \limits _{1\le \ell \le d_{\mu }}\Big [\cdots \Big ], \max \limits _{d_{\mu }\le \ell \le k}\Big [\cdots \Big ]\Big \}\,. \end{aligned}$$

Let us evaluate the first \(\max [\cdots ]\). With Theorem 13, (i.b), (i.c) we have

$$\begin{aligned} \begin{aligned}&\max \limits _{1\le \ell \le d_{\mu }} \Big (\frac{\ell }{k}\Big )^{1/p-1/r}d_\mu ^{-(\alpha -\beta )}e_{\ell }(\mathrm {id}:\ell _p^{b_\mu }(\ell _q^{d_{\mu }}) \rightarrow \ell _r^{b_\mu }(\ell _u^{d_{\mu }}))\\&\lesssim \Big (\frac{\ell }{k}\Big )^{1/p-1/r}d_\mu ^{-(\alpha -\beta )}\Big (\frac{\log (ed_{\mu }/\ell )}{\ell }\Big )^{1/q-1/u}\,. \end{aligned} \end{aligned}$$
(15)

Because \(1/p-1/r > 1/q-1/u\) the maximum is attained for \(\ell = d_{\mu }\), which leads to

$$\begin{aligned} \max \limits _{1\le \ell \le d_{\mu }}\Big [\cdots \Big ] \lesssim d_{\mu }^{-(\alpha -\beta )}d_{\mu }^{1/p-1/r}d_{\mu }^{-(1/q-1/u)}k^{-(1/p-1/r)}\,. \end{aligned}$$

Let us discuss the second \(\max [\cdots ]\). Using again Proposition 10 we obtain

$$\begin{aligned} \max \limits _{d_{\mu }\le \ell \le k}\Big [\cdots \Big ] \lesssim k^{-(1/p-1/r)}d_{\mu }^{-(\alpha -\beta )}d_{\mu }^{1/p-1/r}d_{\mu }^{-(1/q-1/u)}\,. \end{aligned}$$

Due to our assumption the exponent for \(d_{\mu }\) is positive in both cases. Since \(k \ge d_\mu \) we may replace \(d_\mu \) by k to increase the right-hand side. This leads to

$$\begin{aligned} e_k(\mathrm {id}:\ell _p^b(X_\mu ) \rightarrow \ell _r(Y_\mu )) \lesssim \Big (\frac{1}{k}\Big )^{\alpha -\beta +1/q-1/u}\,. \end{aligned}$$

\(\square \)

We are now aiming for a similar relation for small k.

Proposition 16

Let \(\alpha -\beta >0\) and \(1/p-1/r>1/q-1/u \ge 0\). Then we have for \(8b \le k \le \min \limits _{\mu =1,\ldots ,b} d_{\mu }\) the estimate

$$\begin{aligned} e_k(\mathrm {id}:\ell _p^b(X_\mu )\rightarrow \ell _r(Y_\mu )) \lesssim \Big (\frac{1}{k}\Big )^{\alpha -\beta +1/q-1/u}\,. \end{aligned}$$

Proof

Again we use Theorem 5, in particular the upper bound in Proposition 10. This gives

$$\begin{aligned} \begin{aligned} e_k&\lesssim \max \limits _{\mu } \max \limits _{1\le \ell \le k} \Big (\frac{\ell }{k}\Big )^{1/p-1/r}e_{\ell }(\mathrm {id}:X_\mu \rightarrow Y_{\mu })\\&=\max \limits _{\mu } \max \limits _{1\le \ell \le k} \Big (\frac{\ell }{k}\Big )^{1/p-1/r}d^{-(\alpha -\beta )} e_{\ell }(\mathrm {id}:\ell _p^{b_\mu }(\ell _q^{d_{\mu }})\rightarrow \ell _r^{b_\mu }(\ell _u^{d_{\mu }}))\\&\lesssim \max \limits _{\mu } \max \limits _{1\le \ell \le k} \Big (\frac{\ell }{k}\Big )^{1/p-1/r}d_\mu ^{-(\alpha -\beta )} \Big (\frac{\log (ed_\mu /\ell )}{\ell }\Big )^{1/q-1/u}\,, \end{aligned} \end{aligned}$$

where we used once again Theorem 13, (i.b). Clearly, we get

$$\begin{aligned} \begin{aligned} e_k&\lesssim \max \limits _{\mu } \max \limits _{1\le \ell \le k} \Big (\frac{\ell }{k}\Big )^{1/p-1/r}k^{-(\alpha -\beta )}\Big (\frac{k}{d_\mu }\Big )^{\alpha -\beta }\Big (\frac{\log (ed_\mu /\ell )}{\ell }\Big )^{1/q-1/u}\\&\lesssim k^{-(\alpha -\beta )}k^{-(1/q-1/u)} \Big (\frac{k}{d_\mu }\Big )^{\alpha -\beta }\big (\log (ed_\mu /k)\big )^{1/q-1/u}\,. \end{aligned} \end{aligned}$$

Since the function \(x \mapsto x^{-(\alpha -\beta )}[\log (ex)]^{(1/q-1/u)}\) is bounded on \([1,\infty )\) we conclude with

$$\begin{aligned} e_k \lesssim k^{-(\alpha -\beta +1/q-1/u)}\,. \end{aligned}$$

\(\square \)

5 Polynomial Decay of Entropy Numbers for Multivariate Function Space Embeddings

We come to the main subject of this paper, improved upper bounds for entropy numbers of function space embeddings (1) in regimes of small mixed smoothness.

5.1 Function Spaces of Dominating Mixed Smoothness

Besov and Triebel–Lizorkin spaces of mixed smoothness are typically defined via a dyadic decomposition on the Fourier side. Let \(\{\varphi _{j}\}_{j\in \mathbb {N}_0^n}\) be the standard tensorized dyadic decomposition of unity, see [37] and [42]. We further denote by \(S'(\mathbb {R}^n)\) the space of tempered distributions and by \(D'(\Omega )\) the space of distributions (dual space of \(D(\Omega )\), which represents the space of test functions on the bounded domain \(\Omega \subset \mathbb {R}^n\)). The Besov space of dominating mixed smoothness \(S^r_{p,q}B(\mathbb {R}^n)\) with smoothness parameter \(r > 0\) and integrability parameters \(0<p,q\le \infty \) is given by

$$\begin{aligned} S^r_{p,q}B(\mathbb {R}^n) := \Big \{f\in S'(\mathbb {R}^n)~:~\Vert f\Vert _{S^r_{p,q}B}:=\Big (\sum \limits _{j \in \mathbb {N}_0^n} 2^{rq\Vert j\Vert _1}\Vert \mathcal F^{-1}[\varphi _{j}\mathcal F f]\Vert _p^q\Big )^{1/q} < \infty \Big \}\,, \end{aligned}$$

with the usual modification in the case \(q=\infty \). The Triebel–Lizorkin space of dominating mixed smoothness \(S^r_{p,q}F(\mathbb {R}^n)\) is given by \((p<\infty )\)

$$\begin{aligned} S^r_{p,q}F(\mathbb {R}^n):= & {} \Big \{f\in S'(\mathbb {R}^n)~:~\Vert f\Vert _{S^r_{p,q}F}\\:= & {} \Big \Vert \Big (\sum \limits _{j \in \mathbb {N}_0^n} 2^{rq\Vert j\Vert _1}|\mathcal F^{-1}[\varphi _{j}\mathcal F f](\cdot )|^q\Big )^{1/q}\Big \Vert _p < \infty \Big \}\,. \end{aligned}$$

The latter scale of spaces contains the classical \(L_p\) spaces and Sobolev spaces with dominating mixed smoothness if \(1<p<\infty \) and \(q=2\), namely we have \(S^0_{p,2}F(\mathbb {R}^n) = L_p(\mathbb {R}^n)\) and \(S^k_{p,2}F(\mathbb {R}^n) = S^k_pW(\mathbb {R}^n)\) for \(k\in \mathbb {N}\). Note that we also have \(S^r_{p,p}B(\mathbb {R}^n) = S^r_{p,p}F(\mathbb {R}^n)\) for all \(0<p<\infty \) and \(r\in \mathbb {R}\). Though we have the embedding

$$\begin{aligned} \begin{aligned} S^{r_0}_{p_0,q_0}A(\mathbb {R}^n)&\hookrightarrow S^{r_1}_{p_1,q_1}A^{\dag }(\mathbb {R}^n), \quad A,A^{\dag } \in \{B,F\}, \end{aligned} \end{aligned}$$
(16)

for \(p_0 \le p_1\) and \(r_0-r_1>1/p_0-1/p_1\), see [37, Chapt. 2], the embedding (16) is never compact. Hence, the entropy numbers of embeddings between function spaces defined on the whole \(\mathbb {R}^n\) do not converge to zero. We restrict our considerations to spaces on bounded domains \(\Omega \). Let \(\Omega \) be an arbitrary bounded domain in \(\mathbb {R}^n\). Then, we define \(S^r_{p,q}A(\Omega )\) for \(A\in \{B,F\}\) as

$$\begin{aligned} S^r_{p,q}A(\Omega ) := \{f\in D'(\Omega )~:~\exists g\in S^r_{p,q}A(\mathbb {R}^n) \text { such that } g|_{\Omega } = f\}\, \end{aligned}$$

and its (quasi-)norm is given by \(\Vert f\Vert _{S^r_{p,q}A(\Omega )}:=\inf _{g|_{\Omega } = f} \Vert g\Vert _{S^r_{p,q}A}\) . The embedding (16) transfers to the bounded domain \(\Omega \) and is compact such that the entropy numbers decay and converge to zero.

5.2 Sequence Spaces

The key to establishing the decay rate of entropy numbers for the embedding (1) is a discretization technique which has been developed over the years by several authors beginning with Maiorov [26]. Later, after wavelet isomorphisms had been established, this technique was refined by Lemarie, Meyer, Triebel, and many others. In [42, Thm. 2.10] Vybíral gave the necessary modifications to deal with the above defined \(S^r_{p,q}A(\Omega )\) spaces in detail. The main advantage of this approach is to transfer questions for function space embeddings to certain sequence spaces.

Using sufficiently smooth wavelets with sufficiently many vanishing moments (and the notation from [42]) the mapping

$$\begin{aligned} f \mapsto \lambda _{j,m}(f):=\langle f, \psi _{j,m} \rangle \quad ,\quad j\in \mathbb {N}_0^n, m \in \mathbb {Z}^n\,, \end{aligned}$$

represents a sequence spaces isomorphism between \(S^r_{p,q}B(\mathbb {R}^n), S^r_{p,q}F(\mathbb {R}^n)\) and

$$\begin{aligned} \begin{aligned} s^r_{p,q}b&:= \Big \{\lambda =(\lambda _{j,m})_{j, m}~:~ \Vert \lambda \Vert _{s^r_{p,q}b}\\&:=\Big [\sum \limits _{j\in \mathbb {N}_0^n} 2^{(r-1/p)q\Vert j\Vert _1} \Big (\sum \limits _{m\in \mathbb {Z}^n}|\lambda _{j,m}|^p\Big )^{q/p}\Big ]^{1/q}<\infty \Big \},\\ s^r_{p,q}f&:= \Big \{\lambda =(\lambda _{j,m})_{j, m}~:~ \Vert \lambda \Vert _{s^r_{p,q}f} \\&:= \Big \Vert \Big (\sum _{j \in \mathbb {N}_0^n} 2^{\Vert j\Vert _1rq}\Big |\sum _{m \in \mathbb {Z}}\lambda _{j, m}\chi _{j,m}(\cdot )\Big |^q\Big )^{1/q}\Big \Vert _p<\infty \Big \}\,, \end{aligned} \end{aligned}$$

respectively, with the usual modification in the case \(\max \{p,q\} = \infty \). Here we denote for \(j \in \mathbb {N}_0^n\) and \(m \in \mathbb {Z}^n\)

$$\begin{aligned} Q_{j, m} := \prod \limits _{i=1}^n 2^{-j_i} [m_i-1,m_i+1] \end{aligned}$$

and

$$\begin{aligned} A_{j}^\Omega = \{m \in \mathbb {Z}^n: Q_{j, m} \cap \Omega \ne \emptyset \}. \end{aligned}$$

Further, \(\chi _{j,m}\) denotes the characteristic function of \(Q_{j,m}\). Consider the sequence spaces

$$\begin{aligned} \begin{aligned} s_{p,q}^{r}b(\Omega )&:= \{ \lambda = (\lambda _{j, m})_{j \in \mathbb {N}_0^n, m \in A_{j}^\Omega } : \Vert \lambda \Vert _{s_{p,q}^{r} b(\Omega )}< \infty \}\,,\\ s_{p,q}^rf(\Omega )&:= \{ \lambda = (\lambda _{j, m})_{j \in \mathbb {N}_0^n, m \in A_{j}^\Omega } : \Vert \lambda \Vert _{s_{p,q}^{r} f(\Omega )} < \infty \} \end{aligned} \end{aligned}$$

with (quasi-)norms given by

$$\begin{aligned} \begin{aligned} \Vert \lambda \Vert _{s_{p,q}^{r}b(\Omega )}&:= \Big [\sum _{j \in \mathbb {N}_0^n} 2^{\Vert j\Vert _1(r-1/p)q} \Big ( \sum _{m \in A_{j}^\Omega } |\lambda _{j, m}|^p \Big )^{q/p} \Big ]^{1/q},\\ \Vert \lambda \Vert _{s_{p,q}^{r}f(\Omega )}&:= \Big \Vert \Big (\sum _{j \in \mathbb {N}_0^n} 2^{\Vert j\Vert _1rq} \Big |\sum _{m \in A_{j}^\Omega }\lambda _{j, m}\chi _{j,m}(\cdot )\Big |^q\Big )^{1/q}\Big \Vert _p\,. \end{aligned} \end{aligned}$$

Let us also define the following building blocks for \(\mu \in \mathbb {N}_0\) fixed

$$\begin{aligned} \begin{aligned} s_{p,q}^{r}b(\Omega )_{\mu }&:= \Big \{ \lambda : \Vert \lambda \Vert _{s_{p,q}^{r}b(\Omega )_\mu }\\&:= \Big [\sum _{\Vert j\Vert _1 = \mu } 2^{\Vert j\Vert _1(r-1/p)q} \Big ( \sum _{m \in A_{j}^\Omega } |\lambda _{j, m}|^p \Big )^{q/p} \Big ]^{1/q}<\infty \Big \}\,,\\ s_{p,q}^{r}f(\Omega )_{\mu }&:= \Big \{ \lambda : \Vert \lambda \Vert _{s_{p,q}^{r}f(\Omega )_\mu }\\&:= \Big \Vert \Big (\sum _{\Vert j\Vert _1 = \mu } 2^{\Vert j\Vert _1rq} \Big |\sum _{m \in A_{j}^\Omega }\lambda _{j, m}\chi _{j,m}(\cdot )\Big |^q\Big )^{1/q}\Big \Vert _p < \infty \Big \}\,. \end{aligned} \end{aligned}$$
(17)

Clearly, for \(\mu \in \mathbb {N}_0\) we have

$$\begin{aligned} \sharp \{j \in \mathbb {N}_0^n: \Vert j\Vert _1 = \mu \} = {\mu + n -1 \atopwithdelims ()\mu } \simeq _d (\mu +1)^{n-1} \end{aligned}$$

and \(\sharp A_{j}^\Omega \simeq 2^{\Vert j\Vert _1} = 2^{\mu }\). Consider

$$\begin{aligned} \mathrm {id}:s^{r_0}_{p_0,q_0}a(\Omega ) \rightarrow s^{r_1}_{p_1,q_1}a^{\dag }(\Omega ) \end{aligned}$$

for \(r_0-r_1 > 1/p_0-1/p_1\) such that the embedding is compact. Defining the building blocks

$$\begin{aligned} (\mathrm {id}_\mu \lambda )_{j, m} := {\left\{ \begin{array}{ll} \lambda _{j, m} &{}: \Vert j\Vert _1 = \mu , m\in A_j^{\Omega }\\ 0 &{}: \text { otherwise, } \end{array}\right. } \end{aligned}$$

we have \(\mathrm {id}= \sum _{\mu =0}^{\infty } \mathrm {id}_{\mu }\). Of course, the identity

$$\begin{aligned} e_k(\mathrm {id}_\mu :s^{r_0}_{p_0,q_0}a(\Omega ) \rightarrow s^{r_1}_{p_1,q_1}a^{\dag }(\Omega )) = e_k(\mathrm {id}'_\mu :s^{r_0}_{p_0,q_0}a(\Omega )_\mu \rightarrow s^{r_1}_{p_1,q_1}a^{\dag }(\Omega )_{\mu }) \end{aligned}$$

holds true, where \(\mathrm {id}_{\mu }'\) denotes the corresponding embedding operator on the respective block (17). Although these operators have the same mapping properties we use different notations to formally distinguish between them. If \(a = a^{\dag } = b\) we also have, for a finite index set I, that

$$\begin{aligned} \begin{aligned}&e_k\Big (\sum \limits _{\mu \in I}\mathrm {id}_{\mu } : s^{r_0}_{p_0,q_0}b(\Omega ) \rightarrow s^{r_1}_{p_1,q_1}b(\Omega )\Big )\\&~~~~= e_k\Big ( \mathrm {id}':\ell _{q_0}\Big (\big (s^{r_0}_{p_0,q_0}b(\Omega )_\mu \big )_{\mu \in I}\Big ) \rightarrow \ell _{q_1}\Big (\big (s^{r_1}_{p_1,q_1}b(\Omega )_{\mu }\big )_{\mu \in I}\Big )\Big )\\&~~~~\simeq e_k\Big ( \mathrm {id}'':\ell _{q_0}\Big ((X_\mu )_{\mu \in I}\Big ) \rightarrow \ell _{q_1}\Big ((Y_{\mu })_{\mu \in I}\Big )\Big )\,, \end{aligned} \end{aligned}$$
(18)

where \(X_{\mu } = 2^{\mu (r_0-1/p_0)}\ell _{q_0}^{(\mu +1)^{n-1}}(\ell _{p_0}^{2^\mu })\) and \(Y_{\mu } = 2^{\mu (r_1-1/p_1)}\ell _{q_1}^{(\mu +1)^{n-1}}(\ell _{p_1}^{2^\mu })\), which means \(d_{\mu } = 2^{\mu }\) and \(b_\mu = (\mu +1)^{n-1}\) in the notation of (14). In particular, we have \(b_{\mu } \lesssim d_{\mu }\) .

5.3 Entropy Numbers

As a consequence of the boundedness of certain restriction and extension operators, see [42, 4.5], the investigation of entropy numbers of Besov space embeddings can be shifted to the sequences spaces side. We formulate our first result in the framework of sequence spaces, which improves the upper bound. More specifically, we prove that the lower bound in (23) is sharp in the case that \(0 \le 1/p_0-1/p_1<r_0-r_1 \le 1/q_0-1/q_1\), which also includes the limiting case \(r_0-r_1 = 1/q_0 - 1/q_1\) . What is known in this direction is summarized in Remark 20 below.

Proposition 17

Let \(\Omega \) be a bounded domain and \(0<q_0 < q_1 \le \infty \), \(0<p_0\le p_1\le \infty \) such that

$$\begin{aligned} 1/p_0-1/p_1<r_0-r_1 \le 1/q_0-1/q_1\,. \end{aligned}$$

Then we have

$$\begin{aligned} e_m(\mathrm {id}:s^{r_0}_{p_0,q_0}b(\Omega ) \rightarrow s^{r_1}_{p_1,q_1}b(\Omega )) \simeq m^{-(r_0-r_1)}\quad ,\quad m\in \mathbb {N}\,. \end{aligned}$$

Proof

The lower bound follows by [42, Thm. 3.18]. The upper bound is the actual contribution. We argue as follows.

Step 1. Put \(\varrho :=\min \{1,p_1,q_1\}\) and fix \(m\ge m_0\), where \(m_0\) is large enough (depending on \(p_0,p_1,q_0,q_1,r_0,r_1\)). We decompose the identity operator \(\mathrm {id}\) as follows

$$\begin{aligned} \mathrm {id}= \Big (\sum \limits _{\mu = 0}^{L_m} \mathrm {id}_{\mu }\Big ) + \Big (\sum \limits _{\mu = L_m+1}^{M_m+L_m} \mathrm {id}_{\mu }\Big ) + \Big (\sum \limits _{\mu =M_m+L_m+1}^{\infty } \mathrm {id}_{\mu }\Big )\,, \end{aligned}$$
(19)

where \(L_m := \lfloor \log _2(m) \rfloor \) and \(M_m := \lfloor m/8 \rfloor \). With an eye on Proposition 10, this means, in particular, that \(m \ge 8L_m\) and \(m\ge 8M_m\) (for m large enough). Using (4) we obtain

$$\begin{aligned} e_{2m}(\mathrm {id})^{\varrho } \lesssim e_m\Big (\sum \limits _{\mu = 0}^{L_m} \mathrm {id}_{\mu }\Big )^{\varrho } + e_m\Big (\sum \limits _{\mu = L_m+1}^{M_m+L_m}\mathrm {id}_{\mu }\Big )^{\varrho } + \sum \limits _{\mu = L_m+M_m+1}^{\infty } \Vert \mathrm {id}_\mu \Vert ^{\varrho }\,. \end{aligned}$$
(20)

Step 2. We estimate the first summand. By (18) this breaks down to the entropy numbers

$$\begin{aligned} e_m\Big ( \mathrm {id}:\ell _{q_0}\Big ((X_\mu )_{\mu \in I}\Big ) \rightarrow \ell _{q_1}\Big ((Y_{\mu })_{\mu \in I}\Big )\Big ) \end{aligned}$$
(21)

with \(X_\mu , Y_\mu \) chosen as after (18) and I denotes the range for \(\mu \). Putting

$$\begin{aligned} p:=q_0, r:=q_1, q := p_0, u:=p_1, d_\mu := 2^\mu , b_{\mu } := (\mu +1)^{n-1}, \alpha := r_0-1/p_0 \end{aligned}$$

and \(\beta :=r_1-1/p_1\) in (14) we may apply Proposition 15 . Since \(m \ge \max \{8L_m, \max \limits _\mu d_\mu \}\) we may apply Proposition 15 to obtain

$$\begin{aligned} e_m \lesssim \Big (\frac{1}{m}\Big )^{\alpha -\beta +1/q-1/u} = \Big (\frac{1}{m}\Big )^{r_0-r_1}\,. \end{aligned}$$
(22)

Note that, due to Proposition 15, we only used that \(r_0-r_1 \le 1/q_0-1/q_1\). To estimate the first summand in (20) it is not needed that \(r_0-r_1>1/p_0-1/p_1\).

Step 3. Let us address the second summand in (20). Clearly, it can be reduced to (21) with spaces \(X_\mu , Y_\mu \) defined analogously, but with \(\mu \) running this time in the range

$$\begin{aligned} I = \{L_m+1,\ldots ,L_m+M_m\}. \end{aligned}$$

Hence, we have \(b := \sharp I = M_m \le \min \limits _\mu d_\mu \) . We apply Proposition 16 to end up with (22). Note, that we have used here only that \(\alpha -\beta >0\), or, equivalently, \(r_0-r_1>1/p_0-1/p_1\) .

Step 4. Finally, we deal with the third summand in (20). Clearly, we have

$$\begin{aligned} \Vert \mathrm {id}_{\mu }\Vert \lesssim 2^{-\mu (r_0-r_1-1/p_0 + 1/p_1)}. \end{aligned}$$

This gives

$$\begin{aligned} \begin{aligned} \sum \limits _{\mu = M_m+L_m+1}^{\infty }\Vert \mathrm {id}_\mu \Vert ^{\varrho }&\lesssim \sum \limits _{\mu = M_m+L_m+1}^{\infty } 2^{-\varrho \mu (r_0-r_1-1/p_0 + 1/p_1)}\\&\simeq 2^{-\varrho (m/8+L_m)(r_0-r_1-1/p_0 + 1/p_1)}\\&\lesssim m^{-\varrho (r_0-r_1)}\,. \end{aligned} \end{aligned}$$

This concludes the proof.\(\square \)

In the next theorem we consider the situation where a Besov type sequence space compactly embeds into a Triebel–Lizorkin type sequence space. This setting is particularly important, since it leads to results with target space \(L_p\).

Proposition 18

Let \(\Omega \) be a bounded domain and \(0<q_0 < q_1 \le \infty \), \(0<p_0\le p_1 < \infty \), \(q_0 <p_0\), \(r_0>r_1\) such that

$$\begin{aligned} 1/p_0-1/p_1<r_0-r_1 \le 1/q_0-1/q_1\,. \end{aligned}$$

Then we have

$$\begin{aligned} e_m(\mathrm {id}:s^{r_0}_{p_0,q_0}b(\Omega ) \rightarrow s^{r_1}_{p_1,q_1}f(\Omega )) \simeq m^{-(r_0-r_1)}\,. \end{aligned}$$

Proof

Again, the lower bounds follow from [42, Thm. 3.18].

Step 1. In the case \(p_1 > q_1\) we use the commutative diagram in Fig. 1.

Fig. 1
figure 1

Decomposition of \(\mathrm {id}\) in the case \(p_1 \ge q_1\)

Then we have by (5) and (6)

$$\begin{aligned} \begin{aligned}&e_m(\mathrm {id}:s^{r_0}_{p_0,q_0}b(\Omega ) \rightarrow s^{r_1}_{p_1,q_1}f(\Omega ))\\&~~~~\lesssim e_m(\mathrm {id}:s^{r_0}_{p_0,q_0}b(\Omega ) \rightarrow s^{r_1}_{p_1,q_1}b(\Omega )) \cdot \Vert \mathrm {id}:s^{r_1}_{p_1,q_1}b(\Omega ) \rightarrow s^{r_1}_{p_1,q_1}f(\Omega )\Vert \,. \end{aligned} \end{aligned}$$

Hence, we may use Proposition 17 and obtain

$$\begin{aligned} e_m(\mathrm {id}:s^{r_0}_{p_0,q_0}b(\Omega ) \rightarrow s^{r_1}_{p_1,q_1}f(\Omega )) \\ \lesssim m^{-(r_0-r_1)}\,. \end{aligned}$$

Step 2. Now we consider \(p_1 < q_1\). After decomposing the identity operator in an analogous way as in (19) and (20) we use the commutative diagrams in Fig. 2 for the first and second summand, respectively. In fact, for the first summand in (20) we obtain by (6)

$$\begin{aligned} \begin{aligned}&e_m\Big (\sum \limits _{\mu = 0}^{L_m} \mathrm {id}_{\mu }:s^{r_0}_{p_0,q_0}b(\Omega ) \rightarrow s^{r_1}_{p_1,q_1}f(\Omega )\Big )\\&~~~~\lesssim e_m\Big (\sum \limits _{\mu = 0}^{L_m} \mathrm {id}_{\mu }:s^{r_0}_{p_0,q_0}b(\Omega ) \rightarrow s^{r_1}_{q_1,q_1}b(\Omega )\Big )\cdot \Vert \mathrm {id}:s^{r_1}_{q_1,q_1}b(\Omega ) \rightarrow s^{r_1}_{p_1,q_1}f(\Omega )\Vert \,. \end{aligned} \end{aligned}$$

Note, that the identity operator is bounded since \(\Omega \) is a bounded domain.

Fig. 2
figure 2

Decomposition of the operator \(\mathrm {id}_I = \sum _{\mu \in I} \mathrm {id}_\mu \) in the case \(p_1 < q_1\) 

Furthermore, the entropy numbers

$$\begin{aligned} e_m\Big (\sum \limits _{\mu = 0}^{L_m} \mathrm {id}_{\mu }:s^{r_0}_{p_0,q_0}b(\Omega ) \rightarrow s^{r_1}_{q_1,q_1}b(\Omega )\Big ) \lesssim m^{-(r_0-r_1)} \end{aligned}$$

can be estimated by the same reasoning as in Step 2 of the proof of Proposition 17. Note that \(r_0-r_1\) may be smaller than \(1/p_0 - 1/q_1\). However, this is not important for the argument (based on Proposition 15). It remains to consider the second summand in (20). Here we use the right diagram in Fig. 2 and obtain

$$\begin{aligned} \begin{aligned}&e_m\Big (\sum \limits _{\mu = L_m+1}^{M_m+L_m} \mathrm {id}_{\mu }:s^{r_0}_{p_0,q_0}b(\Omega ) \rightarrow s^{r_1}_{p_1,q_1}f(\Omega )\Big )\\&~~~~\lesssim e_m\Big (\sum \limits _{\mu = L_m+1}^{M_m+L_m} \mathrm {id}_{\mu }:s^{r_0}_{p_0,q_0}b(\Omega ) \rightarrow s^{r_1}_{p_1,p_1}b(\Omega )\Big )\cdot \Vert \mathrm {id}:s^{r_1}_{p_1,p_1}b(\Omega ) \rightarrow s^{r_1}_{p_1,q_1}f(\Omega )\Vert \,. \end{aligned} \end{aligned}$$

We continue to estimate the appearing entropy numbers as in Step 3 of the proof of Proposition 17. Note that \(r_0-r_1\) might be larger than \(1/q_0-1/p_1\). However, for the argument, we only need \(r_0-r_1>1/p_0-1/p_1\). This concludes the proof.\(\square \)

Let us finally consider the situation, where a Triebel–Lizorkin type sequence space compactly embeds into a Besov type sequence space.

Proposition 19

Let \(0<q_0 < q_1 \le \infty \), \(0<p_0\le p_1 < \infty \), \(q_1 > p_1\), \(r_0>r_1\) such that

$$\begin{aligned} 1/p_0-1/p_1<r_0-r_1 \le 1/q_0-1/q_1\,, \end{aligned}$$

and let \(\Omega \) be a bounded domain. Then we have

$$\begin{aligned} e_m(\mathrm {id}:s^{r_0}_{p_0,q_0}f(\Omega ) \rightarrow s^{r_1}_{p_1,q_1}b(\Omega )) \simeq m^{-(r_0-r_1)}\,. \end{aligned}$$

Proof

The lower bound follows from [42, Thm. 3.18].

Step 1. For the upper bound in the case \(p_0<q_0\) we may use the commutative diagram in Fig. 3 below to decompose the identity operator. Afterwards, we use (5) to reduce everything to the situation in Proposition 17.

Fig. 3
figure 3

Decomposition of \(\mathrm {id}\) in the case \(p_0 < q_0\)

Step 2. In the case \(p_0>q_0\) we argue analogously to Step 2 of Proposition 18. This time we use the decompositions in Fig. 4 for the first and second summand in (20), respectively.\(\square \)

Unfortunately, we were not able to find a corresponding result for the \(f-f\) situation. So, this remains an open problem.

Remark 20

To clarify the contribution of this paper, let us briefly recapitulate the known results and open questions which motivated this work. For several results and historical remarks on the subject we refer to [8] and the references therein. In particular, Vybíral [42, Thm. 4.9] proved for \(0<p_0\le p_1 \le \infty \) and \(0<q_0\le q_1 \le \infty \) in the case of small smoothness

$$\begin{aligned}1/p_0-1/p_1<r \le 1/q_0-1/q_1,\end{aligned}$$

that there is for any \(\varepsilon >0\) a number \(C_{\varepsilon }>0\) such that

$$\begin{aligned} c m^{-r} \le e_m(\mathrm {id}: s_{p_0,q_0}^{r_0}b(\Omega ) \rightarrow s_{p_1,q_1}^{r_1}b(\Omega )) \le C_\varepsilon m^{-r} (\log m)^\varepsilon , \quad m \ge 2. \end{aligned}$$
(23)

The result is a direct consequence of the bound for \(r>\max \{1/p_0-1/p_1,1/q_0-1/q_1\}\) (the case of “large smoothness”), saying that

$$\begin{aligned} e_m(\mathrm {id}: s_{p_0,q_0}^{r_0}b(\Omega ) \rightarrow s_{p_1,q_1}^{r_1}b(\Omega )) \simeq m^{-r}(\log m)^{(n-1)(r_0-r_1-1/q_0+1/q_1)_+}\,. \end{aligned}$$
(24)

In fact, the entropy numbers in (23) can be bounded from above by

$$\begin{aligned} e_m(\mathrm {id}: s_{p_0,q^{*}}^{r_0}b(\Omega ) \rightarrow s_{p_1,q_1}^{r_1}b(\Omega )) \end{aligned}$$

if \(q^{*}\ge q_0\). Now choose \(q_1>q^{*}>q_0\) such that \(1/q^{*}-1/q_1+\varepsilon /(n-1)=r_0-r_1>1/q^{*}-1/q_1\), which, together with (24) and \(q_0\) replaced by \(q^{*}\), implies (23).

The propositions proved above allow us to improve a number of existing results for the entropy numbers of the embedding

$$\begin{aligned} \mathrm {Id}:S^{r_0}_{p_0,q_0}A(\Omega ) \rightarrow S^{r_1}_{p_1,q_1}A^{\dag }(\Omega )\,. \end{aligned}$$

Theorem 21

Let \(\Omega \) be a bounded domain and \(A, A^{\dag } \in \{B,F\}\) but \((A,A^{\dag }) \ne (F,F)\). Let \(0<q_0 < q_1 \le \infty \), \(0<p_0\le p_1 < \infty \) and \(r_0>r_1\) such that

$$\begin{aligned} 1/p_0-1/p_1<r_0-r_1 \le 1/q_0-1/q_1\,. \end{aligned}$$

In addition, we assume \(q_0<p_0\) if \((A, A^{\dag }) = (B,F)\) and \(q_1>p_1\) if \((A, A^{\dag }) = (F,B)\), respectively. Then the following holds

$$\begin{aligned} e_m(\mathrm {Id}:S^{r_0}_{p_0,q_0}A(\Omega ) \rightarrow S^{r_1}_{p_1,q_1}A^{\dag }(\Omega )) \simeq m^{-(r_0-r_1)}\quad ,\quad m\in \mathbb {N}\,. \end{aligned}$$

Proof

The result is a direct consequence of Propositions 17, 18, 19 and the machinery described in the proof of [42, Thm. 4.11].\(\square \)

As a corollary of Theorem 21, we obtain the following result, which settles Open Problem 6.4 in [8].

Corollary 22

Let \(\Omega \) be as above. Let further \(0<q<p_0\le p_1\), \(1<p_1<\infty \) and \(1/p_0-1/p_1<r\le 1/q-1/2\). Then we have

$$\begin{aligned} e_m(\mathrm {Id}:S^r_{p_0,q}B(\Omega )\rightarrow L_{p_1}(\Omega )) \simeq m^{-r}\,. \end{aligned}$$
(25)

Proof

Identifying \(S^0_{p_1,2}F(\Omega ) = L_{p_1}(\Omega )\) in the case \(1<p_1<\infty \), the result is a direct consequence of Theorem 21.\(\square \)

With the final corollary below (from Theorem 21) we close some more gaps in [42, Thm. 4.18 (ii), (iii)].

Corollary 23

Let \(\Omega \) be as above. We have the following sharp bounds for entropy numbers.

(i) Let \(1 < p \le \infty \) and \(1/p<r\le 1\). Then, we have

$$\begin{aligned} e_m(\mathrm {Id}:S^r_{p,1}B(\Omega ) \rightarrow S^0_{\infty ,\infty }B(\Omega )) \simeq m^{-r}\,. \end{aligned}$$

(ii) Let \(1< p< q <\infty \) and \(1/p-1/q<r\le 1/2\). Then, we have

$$\begin{aligned} e_m(\mathrm {Id}:S^r_pW(\Omega ) \rightarrow S^0_{q,\infty }B(\Omega )) \simeq m^{-r}\,. \end{aligned}$$

(iii) Let \(0< q<p \le \infty \), \(q<1\) and \(1/p < r \le 1/q - 1\). Then, we have

$$\begin{aligned} e_m(\mathrm {Id}:S^r_{p,q}B(\Omega ) \rightarrow L_\infty (\Omega )) \simeq m^{-r}. \end{aligned}$$

Remark 24

Entropy numbers of mixed smoothness Sobolev-Besov embeddings into \(L_p\), where \(1\le p\le \infty \), recently gained significant interest, see [40] and [33,34,35,36]. There are some fundamental open problems connected with \(p=\infty \), see [8, 2.6, 6.4, 6.5]. Interestingly, when choosing the third index q small enough in Corollaries 22, 23 we get rid of the logarithm.

Fig. 4
figure 4

Decomposition of the operator \(\mathrm {id}_I = \sum _{\mu \in I} \mathrm {id}_\mu \) in the case \(p_0 > q_0\)