Entropy Numbers of Finite Dimensional Mixed-Norm Balls and Function Space Embeddings with Small Mixed Smoothness

We study the embedding id:ℓpb(ℓqd)→ℓrb(ℓud)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {id}: \ell _p^b(\ell _q^d) \rightarrow \ell _r^b(\ell _u^d)$$\end{document} and prove matching bounds for the entropy numbers ek(id)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$e_k(\mathrm {id})$$\end{document} provided that 0<p<r≤∞\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0<p<r\le \infty $$\end{document} and 0<q≤u≤∞\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0<q\le u\le \infty $$\end{document}. Based on this finding, we establish optimal dimension-free asymptotic rates for the entropy numbers of embeddings of Besov and Triebel–Lizorkin spaces of small dominating mixed smoothness, which gives a complete answer to an open problem mentioned in the recent monograph by Dũng, Temlyakov, and Ullrich. Both results rely on a novel covering construction recently found by Edmunds and Netrusov.


Introduction
Entropy numbers quantify the degree of compactness of a set, i.e., how well the set can be approximated by a finite set. Given a compact set K in a quasi-Banach space Y , the k-th entropy number e k (K , Y ) is defined to be the smallest radius ε > 0 such that K can be covered with 2 k−1 copies of the ball ε B Y , i.e., e k (K , Y ) := inf ε > 0 : ∃y 1 , . . . , y 2 k−1 such that K ⊂  The concept of entropy numbers can be easily extended to operators. Given a compact operator T : X → Y , where X and Y are quasi-Banach spaces, the k-th entropy number of the operator T is defined to be e k (T : X → Y ) := e k (T (B X ), Y ).
If the spaces X , Y are clear from the context, we will abbreviate e k (T : X → Y ) by e k (T ).
Entropy numbers (or the inverse concept of metric entropy) belong to the fundamental concepts of approximation theory. They appear in various approximation problems, e.g., in the estimation of the decay of operator eigenvalues [4,11,20]; in the estimation of learning rates for machine learning problems [39,43]; or in bounding snumbers such as approximation, Gelfand, or Kolmogorov numbers from below [4,16]. We note that Gelfand numbers find application in the recent field of compressive sensing [6,13,16] and Information Based Complexity in general. Entropy numbers are also closely connected to small ball problems in probability theory [21,24]. For further applications and basic properties, we refer to the monographs [5,28], and the recent survey [8,Chapter 6].
The goal of this paper is to improve estimates for entropy numbers of embeddings between function spaces of dominating mixed smoothness where ⊂ R n is a bounded domain, 0 < p 0 , p 1 , q 0 , q 1 ≤ ∞, and r 0 − r 1 > (1/ p 0 − 1/ p 1 ) + . The case A = B stands for the scale of Besov spaces of dominating mixed smoothness, while A = F refers to the scale of Triebel-Lizorkin spaces, which includes classical L p and Sobolev spaces of mixed smoothness. That is why (1) also includes the classical embeddings if r > 1/ p 0 − 1/ p 1 . Function space embeddings of this type play a crucial role in hyperbolic cross approximation [8]. Entropy numbers of such embeddings have been the subject of intense study, see [42], [8,Chapt. 6], and the recent papers by A.S. Romanyuk [33][34][35][36] and V.N. Temlyakov [40]. Note that there are a number of deep open problems connected to the case p 1 = ∞, which reach out to probability and discrepancy theory [8, 2.6,6.4]. Typically, one observes asymptotic decays of the form e m (Id) n m −(r 0 −r 1 ) (log m) (n−1)η , where η > 0. This behavior is also well-known for s-numbers of these embeddings such as approximation, Gelfand, or Kolmogorov numbers, see [8] and the references therein. Although the main rate is the same as in the univariate case, the dimension still appears in the logarithmic term. We show that the logarithmic term completely disappears in regimes of small smoothness 1/ p 0 − 1/ p 1 < r 0 − r 1 ≤ 1/q 0 − 1/q 1 .
That is, we establish sharp purely polynomial asymptotic bounds of the form which depends on the underlying dimension n only in the constant. This settles several open questions stated in the literature [8,42], see Sect. 5, and makes the framework highly relevant for high-dimensional approximation.
A key ingredient in the proof of (2) is a counterpart of Schütt's theorem for the entropy numbers of the embedding where 0 < p < r ≤ ∞ and 0 < q ≤ u ≤ ∞. We prove matching bounds for all parameter constellations. A particularly relevant case for the purpose of this paper is the situation where b ≤ d and Here, we have the surprising behavior Note that this relation is not a trivial extension of the classical Schütt result [38], which reads as In fact, using trivial embeddings would give an additional log-term in the third case of (3). The absence of this log-term makes (3) interesting and useful as we will see below.
For 1 ≤ k ≤ log(db) and k ≥ bd, it requires only trivial and standard volumetric arguments to establish matching bounds for the entropy numbers e k (id ). The middle range log(bd) ≤ k ≤ bd is much more involved. In general, it is far from straightforward to generalize the proof ideas from d = 1 (Schütt) to d > 1. Fortunately, the crucial work has already been done in a recent work by Edmunds and Netrusov [10]. They prove a general abstract version of Schütt's theorem for operators between vector-valued sequence spaces. It remains for us to turn these general, abstract bounds into explicit estimates for the entropy numbers e k (id ). Unfortunately, the paper [10] is written very concisely, which makes it difficult to follow the arguments at several points. Hence, we decided to provide some additional, explanatory material. We hope that Sect. 3 helps a broader readership to appreciate the powerful ideas in [10], in particular, a novel covering construction based on dyadic grids.
Outline The paper is organized as follows. In Sect. 2, we recapitulate basic definitions and results including entropy numbers and Schütt's theorem. Afterwards, in Sect. 3, we discuss the generalization of Schütt's theorem by [10]. In Sect. 4, we show consequences of this result, including matching bounds for the entropy numbers e k (id ). Finally, we improve upper bounds for the entropy numbers of Besov and Triebel-Lizorkin embeddings in regimes of small smoothness in Sect. 5.
Notation As usual N denotes the natural numbers, N 0 := N ∪ {0}, Z denotes the integers, R the real numbers, R + the positive real numbers, and C the complex numbers. For a ∈ R we denote a + := max{a, 0}. We write log for the natural logarithm. R m×n denotes the set of all m × n-matrices with real entries and R n denotes the Euclidean space. Vectors are usually denoted with x, y ∈ R n . For 0 < p ≤ ∞ and x ∈ R n , we use the quasi-norm x p := ( n i=1 |x i | p ) 1/ p with the usual modification in the case p = ∞. If X is a (quasi-)normed space, then B X denotes its unit ball and the (quasi-)norm of an element x in X is denoted by x X . If X is a Banach space, then we denote its dual by X * . We will frequently use the quasi-norm constant, i.e., the smallest constant α X satisfying for all x, y ∈ X .
For a given 0 < p ≤ 1 we say that · X is a p-norm if As is well known, any quasi-normed space can be equipped with an equivalent p-norm (for a certain 0 < p ≤ 1, see [2,32]). If T : X → Y is a continuous operator we write T ∈ L(X , Y ) and T for its operator (quasi-)norm. The notation X → Y indicates that the identity operator Id : X → Y is continuous. For two non-negative sequences (a n ) ∞ n=1 , (b n ) ∞ n=1 ⊂ R we write a n b n if there exists a constant c > 0 such that a n ≤ c b n for all n. We will write a n b n if a n b n and b n a n . If α is a set of parameters, then we write a n α b n if there exists a constant c α > 0 depending only on α such that a n ≤ c α b n for all n.
is defined as the space of all matrices x ∈ R b×d equipped with the mixed (quasi-)norm with the usual modification that the corresponding sum is replaced by a maximum in the case that either p = ∞ or q = ∞. We always refer to the p -space supported on

Entropy Numbers and Schütt's Theorem
Let us recall basic notions and properties concerning entropy numbers. Let K be a subset of a quasi-Banach space Y . Given ε > 0, an ε -covering is a set of points x 1 , . . . , An ε-packing is a set of points x 1 , . . . , x m ∈ K such that x i − x j Y > ε for pairwise different i, j. The covering number N ε (K , Y ) is the smallest n such that there exists an ε-covering of K , while the packing number M ε (K , Y ) is the largest m such that there exists an ε-packing of K . It is easy to see that The metric entropy is defined to be see Remark 4 for the relation of metric entropy to other notions of entropy.
The k-th entropy number e k (K , Y ) can be redefined as It is easy to see that the sequence of entropy numbers is decaying, i.e., e 1 ≥ e 2 ≥ · · · ≥ 0. Moreover, the set K is compact in X if and only if lim k→∞ e k (K , Y ) = 0. Let T denote an operator mapping between two quasi-Banach spaces X and Y . Recall from the introduction that the operator's entropy numbers are given by Clearly, we have If T 1 , T 2 are both operators from X to Y , and Y is a ϑ-normed space, then the entropy numbers of the sum can be estimated as follows In particular, this gives For further general properties of entropy numbers and basic estimates, we refer the reader to the monographs [5,25,29]. For remarks on the history of entropy number research, see [5,43].
In the concrete situation where X = b p and Y = b q for 0 < p ≤ q ≤ ∞, the entropy numbers of the embedding id : b p → d q are completely understood in terms of their decay in k and b. This central result is often referred to as Schütt's theorem. For its history and references, see Remark 3. We only state the interesting case 0 < p < q ≤ ∞ here.
The constants in the estimates do neither depend on k nor on b.

Remark 3
In 1984, Schütt [38] gave a proof for the general case of symmetric Banach spaces, which implies Theorem 1 if 1 ≤ p ≤ q ≤ ∞. In the range 1 ≤ k ≤ b, the upper bound was first proved for all 0 < p ≤ q ≤ ∞ by Edmunds and Triebel [11] in 1996 by covering the unit ball using suitable sparse vectors. Edmunds and Netrusov [9, Thm. 2] generalized this covering construction in 1998 to arbitrary quasi-Banach spaces. In the same paper, Edmunds and Netrusov also proved matching lower bounds for general quasi-Banach spaces [9, Thm.2]. Kühn [22] also proved the lower bound for e k (id Thm. 2] and [22] rely on the very same idea to pack the unit ball with sparse vectors and use the fundamental combinatorial fact discussed in Remark 12 (ii) below. In 2000, Guédon and Litvak [15,Thm.6] provided an alternative proof of Theorem 1 that relies completely on interpolation arguments and improved the constants in the upper bound.

Remark 4
The concept of metric entropy for compact sets has been introduced independently by Kolmogorov [18] and Pontrjagin and Schnirelmann [31]. It should not be confused with the metric entropy of a dynamical system, which also has been introduced by Kolmogorov [19]. The latter entropy is also called Kolmogorov-Sinai entropy or measure-theoretic entropy. However, these two notions of metric entropy are related [1]. There is also a deep connection between Kolmogorov-Sinai entropy and the notions of information entropy and thermodynamic entropy [3].

Edmunds-Netrusov Revisited
In addition to Schütt's theorem, the main tool that we employ in this work is a powerful result by Edmunds and Netrusov [10]. They prove a generalization of Schütt's theorem for vector-valued sequence spaces. Let us restate the part of their result that is relevant for us.

Theorem 5 (Theorems 3.1 and 3.2 in
For k ≥ log 2 (b), we have the following.
Theorem 5 gives abstract lower and upper bounds that are "matching" in the sense that both have the same functional form. At first glance, this functional form is neither obvious nor easy to interpret. In addition, we found it difficult to follow the arguments in [10] at several points due to its succinct style of presentation. We thus believe that it is of value to review their key arguments and to provide some additional material that makes Theorem 5 more comprehensible. This is the subject of the remainder of this section. The reader who is only interested in applications of Theorem 5 may proceed directly to Sect. 4. [10] are only stated for 0 < p < r ≤ ∞. However, these theorems also hold true for p = r . First note that in the latter case, we have

Remark 6 Theorems 3.1 and 3.2 in
Now for k ≥ b, Theorem 5 has been proved in [27,Thm. 4.3]. For k ≤ b, the lower bound in Theorem 5 is a consequence of [27,Thm.4.3] in combination with arguments analogous to Remark 12; the upper bound is trivial.

A Special Case to Begin with
If p = r = ∞ it is clear that one simply has to take b-fold Cartesian products of the optimal covering and packing of B X in Y to obtain the bounds In any other case, simple Cartesian products will not be good enough.
The special case of equal inner spaces X = Y also allows for a rather straightforward solution if the dimension of the inner space is finite. For an easier understanding of the contribution in [10], see Theorem 5 above, we find it instructive to give a direct proof of this special case and point out its limitations. Indeed, a straightforward generalization of the well-known Edmunds-Triebel covering construction [11] based on volume arguments will do the job to establish the optimal upper bound. Recall that the essence of this covering construction is a result from best s-term approximation, sometimes referred to as Stechkin's inequality, see [8,Sect. 7.4], which yields a s −1/ p+1/r -covering of B b p in b r using only s-sparse vectors. We simply have to extend this approach to row-sparse matrices. To improve readability, we will omit some technical details in the following proof.
Proof The first case is trivial. The last case follows from volumetric arguments using the recent findings in [17,Sect. 3.2]. By these we know that and similarly for vol(B b r (X ) ) 1/(bd) . For k > bd we use the standard volume argument to obtain For the second case let s ∈ [b]. Clearly, we have that where When we replace the s rows with the largest · X -(quasi-)norm by 0 in x ∈ B I , then the resulting matrix has a b r (X )-(quasi-)norm of at most s −(1/ p−1/r ) , which follows from a well-known relation for best s-term approximation in r . Hence, if we wish to cover the set B I by balls of radius ε s −(1/ p−1/r ) , it suffices to take care of the s largest components of the vectors in B I . That is, we take a suitable covering of B s p (X ) in s r (X ) and append b − s zero rows to every matrix of the covering. A similar volumetric argument as above in (7) and (8) tells us that so that we obtain a covering of B I with cardinality 2 c p,q sd . Combining the coverings for all possible index sets I yields an ε-covering U of is assured. Consequently, we obtain the upper bound

Remark 8
One way to obtain the matching lower bound in the case X = Y is to generalize the proof idea underlying Schütt's theorem (Theorem 1) in the case that log(b) ≤ k ≤ b. However, the standard combinatorial lemma is not sufficient here.
A suitable packing to do this generalization has already been considered in [6,Prop. 5.3]. See also Remark 12 below.

The Covering Construction by Edmunds and Netrusov
The generalized Edmunds-Triebel covering is optimal for finite dimensional X = Y , see Proposition 7 in the previous section. In the general situation, where X is compactly embedded into Y , it seems that the volumetric arguments underlying (10) are too coarse to obtain sharp estimates (at least in the finite dimensional situation). The main contribution of [10] is a covering construction which resolves this shortcoming by not using volumetric arguments at all. In particular, X and Y do not have to be finite dimensional. We give a detailed recapitulation of their idea in this section. For some comments concerning the lower bound in Theorem 5, see Remark 12 at the end of this section. The covering in [10] works in the very general situation where we are given quasi-Banach spaces X 1 , . . . , X b and Y 1 , . . . , Y b , see Proposition 10 below. The basic idea is to cover the unit ball B where v 1 , . . . , v N ∈ R b + and N is exponential in b (think of each cuboid as an anisotropically rescaled version of ). The crux is to find suitable vectors v i such that an optimal covering can be reached by covering the cuboid U (v i ) using a product of optimal coverings of B X 1 ,…,B X b . Edmunds and Netrusov [10] had the idea to consider vectors that form a dyadic grid derived from the simplex The dyadic grid is constructed with the help of the following mapping. Let This mapping υ leads to a finite grid with the following properties.  υ(S(b)). The set (b) has the following properties.
which is a crucial property to estimate the cardinality of the set (b). Let The dyadic grid according to Lemma 9 allow us to establish the following upper bound on entropy numbers. Proposition 10 (Reformulation of Lemma 2.3 in [10]) Let X 1 , . . . , X b and Y 1 , . . . , Y b be quasi-Banach spaces, let 0 < p ≤ r ≤ ∞, and let k ∈ N such that k ≥ 8b. Then, we have Proof Consider the transformed grid By Lemma 9 (i), we have where U (v) is the cuboid defined in (11).
Finally, note that the product C 1 × · · · × C s has cardinality which, in combination with (b, p) ≤ 2 3b , implies the desired result.
Proposition 10 is not the complete final answer. For k ≤ b, we have to modify the proof of Proposition 7. We sketch the proof and refer to the proof of [10, Thm 3.1] for technical details.
where the second term on the right-hand side follows from the best s-term approximation result already used in Proposition 7. Consequently, we have In contrast to Proposition 7, volumetric arguments would now give a suboptimal estimate for the entropy numbers e k (B s p (X ) , s r (Y ). In this general situation, it requires Proposition 10 with X 1 = · · · = X b = X and Y 1 = · · · = Y b = Y to get the proper estimate. Concretely, since s ≤ k, we have which leads in combination with Proposition 10 and (12) to an upper bound of the form The usual arguments show that it is optimal to choose s of the order k/ log(eb/k).

Remark 12
We close this section with some remarks concerning the lower bound in Theorem 5. Its proof relies on two surprisingly simple observations, see [10] for details. (i) Let M be a maximal ε-packing of B X in Y . Using the Gilbert-Varshamov bound, which is well-known in coding theory [14,41], we know that (2s) − (ii) Choose a vector x ∈ B X such that We construct a packing by building row-sparse matrices, where the nonzero rows contain copies of x and the row support sets are chosen according to the following combinatorial fact that is well-known in various disciplines of mathematics, see e.g., [ This leads to the lower bound In view of the packing construction that we have mentioned in Remark 8 it is somewhat surprising that it is not necessary to combine the combinatorics of the two observations in order to obtain the optimal abstract bound in Theorem 5. An explanation is given in [27,Rem. 4.13,p. 69].

Consequences of the Edmunds-Netrusov Result
We discuss some consequences of Theorem 5. Let us begin with considering the entropy numbers We have the following matching bounds.
(i.a) In the special case q = u, we have Proof For 1 ≤ k ≤ log(bd) and k ≥ bd, it requires only standard volumetric arguments, see [27, Appendix A] for details. Let us also refer to [7,Lemma 3], where this case has been already considered. Let D(m, k) and A(k, b) be as defined in Theorem 5. Moreover, throughout the proof, we write for k, l ∈ N, Ad (i.a). Since q = u, it follows from Theorem 1 that e l (id : d q → d u ) 1 for 1 ≤ l ≤ d and consequently that D(1, k) = D(k/b, k) 1 and A(k, b) 1 for all k ≤ d. Now, for k ≥ d, we have that s k,l (l/k) 1/ p−1/q for 1 ≤ l ≤ d, so the sequence is bounded from above by a monotonically increasing sequence. For d ≤ l ≤ k, we have s k,l (l/k) 1/ p−1/r 2 −l/d := t k,l .
Since 2 −l/d decays faster in l than (l/k) 1/ p−1/r increases, we conclude that for d ≤ l ≤ k, the sequence s k,l is "essentially monotonically decreasing". To be more precise t k,l attains at l = β p,r d its maximum, where the factor β depends only on p and r . Hence, the maximum of s k,l can be bounded from above by a constant times the maximum of t k,l and therefore by c p,r (d/k) 1/ p−1/r . Using analogous arguments for D(k/b, k), we conclude that D (1, k) we have, in consequence of Theorem 1, that s k,l (l/k) 1/ p−1/r for 1 ≤ l ≤ log(d) and Since 1/ p − 1/r > 1/q − 1/u, the sequence s k,l is bounded from above and below up to a constant by a monotonically increasing sequence and consequently, the maximum is attained at l = k such that D(1, k) (log(ed/k)/k) 1/q−1/u . Since b ≤ d, we further have For b ≤ k ≤ d we find as before that D(k/b, k) (log(ed/k)/k) 1/q−1/u and for d < k ≤ bd, we have the estimate For log(bd) ≤ k ≤ d, we find D(1, k) (log(ed/k)/k) 1/q−1/u since the sequence s k,l is bounded from below and above by a sequence that increases monotonically in l.
is decaying in at least as fast as ( /k) 1/ p−1/r is growing. Hence, where we have used b/k ≤ 1 in the last estimate. At the same time, since k/b ≤ log(d), we also have log(bd/k) log(d) and thus

Remark 14
The upper bound for k ≥ bd in Theorem 13 also follows from [7,Lem. 3]. The upper bound in Theorem 13 (ii) has also been proved in [42,Lem 3.16] for the range b max{log(d), log(b)} ≤ k ≤ bd. The proof there uses the following covering construction, which as far as we know, first appeared in [23,Proof of Prop. 4]. Let X 1 , . . . , X b and Y 1 , . . . , Y b be (quasi-)Banach spaces and 0 < p, r ≤ ∞. The covering rests on the idea to split the ball B b p (X 1 ,...,X b ) into subsets of matrices with non-increasing rows, where the union is taken over all permutations of [b]. This leads to the upper bound for n 1 , . . . , n b ∈ N. If X = X 1 = · · · = X b = d q and Y = Y 1 = · · · = Y b = d u with 0 < q ≤ u, and we choose n j j −α for some 0 < α < 1 such that (13) is strong enough to obtain the upper bound in Theorem 13 (ii), provided Now we increase the level of abstraction and consider mixed norms of higher order. Let, for μ = 1, . . . , b, the weighted spaces X μ and Y μ be given by with 0 < p ≤ r ≤ ∞, 0 < q ≤ u ≤ ∞, and α β ∈ R. The dimensions (d μ ) μ and (b μ ) μ are non-decreasing natural numbers satisfying d μ b μ . These spaces are used as "inner spaces" in the way that ...,b ) .
the norm is given by We are interested in the behavior of the entropy numbers in the special situation 1/q − 1/u < 1/ p − 1/r .
Let further X , Y and X μ , Y μ be as above. Then we have for all k ≥ 8b and k ≥ max

Proof
We use Theorem 5, in particular the upper bound in Proposition 10. Since k ≥ 8b we obtain Let us evaluate the first max[· · · ]. With Theorem 13, (i.b), (i.c) we have max 1≤ ≤d μ k Let us discuss the second max[· · · ]. Using again Proposition 10 we obtain Due to our assumption the exponent for d μ is positive in both cases. Since k ≥ d μ we may replace d μ by k to increase the right-hand side. This leads to We are now aiming for a similar relation for small k.
where we used once again Theorem 13, (i.b). Clearly, we get

Polynomial Decay of Entropy Numbers for Multivariate Function Space Embeddings
We come to the main subject of this paper, improved upper bounds for entropy numbers of function space embeddings (1) in regimes of small mixed smoothness.

Function Spaces of Dominating Mixed Smoothness
Besov and Triebel-Lizorkin spaces of mixed smoothness are typically defined via a dyadic decomposition on the Fourier side. Let {ϕ j } j∈N n 0 be the standard tensorized dyadic decomposition of unity, see [37] and [42]. We further denote by S (R n ) the space of tempered distributions and by D ( ) the space of distributions (dual space of D( ), which represents the space of test functions on the bounded domain ⊂ R n ). The Besov space of dominating mixed smoothness S r p,q B(R n ) with smoothness parameter r > 0 and integrability parameters 0 < p, q ≤ ∞ is given by with the usual modification in the case q = ∞. The Triebel-Lizorkin space of dominating mixed smoothness S r p,q F(R n ) is given by ( p < ∞) The latter scale of spaces contains the classical L p spaces and Sobolev spaces with dominating mixed smoothness if 1 < p < ∞ and q = 2, namely we have S 0 p,2 F(R n ) = L p (R n ) and S k p,2 F(R n ) = S k p W (R n ) for k ∈ N. Note that we also have S r p, p B(R n ) = S r p, p F(R n ) for all 0 < p < ∞ and r ∈ R. Though we have the embedding for p 0 ≤ p 1 and r 0 − r 1 > 1/ p 0 − 1/ p 1 , see [37,Chapt. 2], the embedding (16) is never compact. Hence, the entropy numbers of embeddings between function spaces defined on the whole R n do not converge to zero. We restrict our considerations to spaces on bounded domains . Let be an arbitrary bounded domain in R n . Then, we define S r p,q A( ) for A ∈ {B, F} as S r p,q A( ) := { f ∈ D ( ) : ∃g ∈ S r p,q A(R n ) such that g| = f } and its (quasi-)norm is given by f S r p,q A( ) := inf g| = f g S r p,q A . The embedding (16) transfers to the bounded domain and is compact such that the entropy numbers decay and converge to zero.

Sequence Spaces
The key to establishing the decay rate of entropy numbers for the embedding (1) is a discretization technique which has been developed over the years by several authors beginning with Maiorov [26]. Later, after wavelet isomorphisms had been established, this technique was refined by Lemarie, Meyer, Triebel, and many others. In [42, Thm. 2.10] Vybíral gave the necessary modifications to deal with the above defined S r p,q A( ) spaces in detail. The main advantage of this approach is to transfer questions for function space embeddings to certain sequence spaces.
Using sufficiently smooth wavelets with sufficiently many vanishing moments (and the notation from [42]) the mapping represents a sequence spaces isomorphism between S r p,q B(R n ), S r p,q F(R n ) and , which means d μ = 2 μ and b μ = (μ + 1) n−1 in the notation of (14). In particular, we have b μ d μ .

Entropy Numbers
As a consequence of the boundedness of certain restriction and extension operators, see [42, 4.5], the investigation of entropy numbers of Besov space embeddings can be shifted to the sequences spaces side. We formulate our first result in the framework of sequence spaces, which improves the upper bound. More specifically, we prove that the lower bound in (23) is sharp in the case that 0 ≤ 1/ p 0 −1/ p 1 < r 0 −r 1 ≤ 1/q 0 −1/q 1 , which also includes the limiting case r 0 − r 1 = 1/q 0 − 1/q 1 . What is known in this direction is summarized in Remark 20 below.

Proposition 17 Let be a bounded domain and
Then we have Proof The lower bound follows by [42,Thm. 3.18]. The upper bound is the actual contribution. We argue as follows.
Step 1. Put := min{1, p 1 , q 1 } and fix m ≥ m 0 , where m 0 is large enough (depending on p 0 , p 1 , q 0 , q 1 , r 0 , r 1 ). We decompose the identity operator id as follows where L m := log 2 (m) and M m := m/8 . With an eye on Proposition 10, this means, in particular, that m ≥ 8L m and m ≥ 8M m (for m large enough). Using (4) we obtain Step 2. We estimate the first summand. By (18) this breaks down to the entropy numbers e m id : q 0 (X μ ) μ∈I → q 1 (Y μ ) μ∈I (21) with X μ , Y μ chosen as after (18) and I denotes the range for μ. Putting Note that, due to Proposition 15, we only used that r 0 −r 1 ≤ 1/q 0 − 1/q 1 . To estimate the first summand in (20) it is not needed that r 0 − r 1 > 1/ p 0 − 1/ p 1 .
Step 3. Let us address the second summand in (20). Clearly, it can be reduced to (21) with spaces X μ , Y μ defined analogously, but with μ running this time in the range I = {L m + 1, . . . , L m + M m }.

This gives
In the next theorem we consider the situation where a Besov type sequence space compactly embeds into a Triebel-Lizorkin type sequence space. This setting is particularly important, since it leads to results with target space L p .
Step 1. In the case p 1 > q 1 we use the commutative diagram in Fig. 1.
Proof Identifying S 0 p 1 ,2 F( ) = L p 1 ( ) in the case 1 < p 1 < ∞, the result is a direct consequence of Theorem 21.
There are some fundamental open problems connected with p = ∞, see [8, 2.6, 6.4, 6.5]. Interestingly, when choosing the third index q small enough in Corollaries 22, 23 we get rid of the logarithm.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.