Banach-valued modulation invariant Carleson embeddings and outer-$L^p$ spaces: the Walsh case

We prove modulation invariant embedding bounds from Bochner spaces $L^p(\mathbb{W};X)$ on the Walsh group to outer-$L^p$ spaces on the Walsh extended phase plane. The Banach space $X$ is assumed to be UMD and sufficiently close to a Hilbert space in an interpolative sense. Our embedding bounds imply $L^p$ bounds and sparse domination for the Banach-valued tritile operator, a discrete model of the Banach-valued bilinear Hilbert transform.


Introduction
The bilinear Hilbert transform (BHT) of two complex-valued Schwartz functions f 0 , f 1 ∈ S (R; C) is given by The L p bounds with p 0 , p 1 ∈ (1, ∞] and p ∈ (2/3, ∞) such that p −1 = p −1 0 + p −1 1 , were first proven by Lacey and Thiele [23,24]. Their proof extended techniques developed by Carleson and Fefferman in their proofs of Carleson's theorem on the almost-everywhere convergence of Fourier series [7,13]. These techniques are now referred to as 'timefrequency' or 'wave packet' analysis. In order to streamline and modularise these techniques, Do and Thiele developed a theory of 'outer-L p ' spaces, yielding proofs of L p bounds for the BHT in which the key difficulties are cleanly compartmentalised [12].
The outer-L p technique is not applied directly to the BHT, but rather to its associated trilinear form BHF, given by dualising with a third function f 2 ∈ S (R; C): For p 0 , p 1 , p ∈ [1, ∞], the estimate (1.1) is equivalent to the bound The trilinear form BHF is a nontrivial linear combination of the Hölder form (which satisfies the desired L p bounds by Hölder's inequality) and another trilinear form: Here R 3 + = R × R × (0, ∞) is the extended phase plane, which parametrises the underlying translation, modulation, and dilation symmetries of BHF. The functions E[f u ] are representations of the functons f u ∈ S (R; C) as functions R 3 + → C. We refer to E as an embedding map; the modified embedding maps E u differ from E by a change of variables. 1 The outer-L p technique factorises L p bounds for the trilinear form in (1.4) into a chain of inequalities: The first inequality is a Radon-Nikodym-style domination; the classical integral over R 3 + is controlled by an iterated outer-L 1 quasinorm · L 1 ν - This quasinorm is defined with respect to certain outer measures µ and ν on R 3 + , along with a size S 1 on functions on R 3 + , which measures functions on distinguished subsets of R 3 + . The second inequality is a Hölder inequality for the iterated outer-L p quasinorms. This involves further sizes S u , u ∈ {0, 1, 2}, which are connected to the size S 1 by a 'size-Hölder' inequality. The first two inequalities follow from general properties of outer-L p spaces. The third inequality follows from the bounds which carry most of the difficulty of the problem. These are modulation invariant Carleson embedding bounds, so named because the operators E u are modulation invariant in the sense that E u [e 2πiz·ξ f (z)](x, η, s) = E u [f ](x, η + ξ, s), and the outer-L p quasinorms · L pu ν -L qu µ Su are invariant with respect to translation in the second variable. 2 These bounds do not follow from general properties of outer-L p spaces, making them an interesting object of study in their own right. The abstract outer-L p theory offers one useful reduction in this direction: to prove the bounds (1.5), it suffices to prove weak endpoint bounds and argue by an outer-L p version of the Marcinkiewicz interpolation theorem.
In this paper we consider functions f : R → X valued in a complex Banach space X. Banach-valued analysis has a rich history which we do not attempt to summarise here; we simply point the reader to the recent volumes [15,16]. Embedding bounds from the Bochner space L p (R; X) into outer-L p spaces on the upper half-space R × (0, ∞) have been proven by Di Plinio and Ou [11], with applications to Banach-valued multilinear singular integrals; the upper half-space parametrises translation and dilation symmetries, but not modulation symmetries. We would like to prove such embedding bounds into outer-L p spaces on R × R × (0, ∞), in order to incorporate modulation invariance. As a first step we prove these for a discrete model of the real line-the 3-Walsh model-in which many technical difficulties in time-frequency analysis are removed, while the core features of the analysis remain. 1 The precise definitions of E and Eu are not important for this introduction. [12, (6 where [x] n denotes the n-th component of x, and with group operation inherited from the infinite product. Up to measure zero, the Haar measure on W can be identified with the Lebesgue measure on [0, ∞), and the group operation on W corresponds to ternary digitwise addition modulo 3 (i.e. ternary digitwise addition without carry) on [0, ∞). The dual group of W can be identified with W itself, and the Walsh-Fourier transform of the characteristic function of a triadic interval is again such a characteristic function (suitably renormalised). Thus one can construct 'Walsh wave packets', supported in a given triadic interval of [0, ∞), with frequency support in another given triadic interval. In this way, W supports an idealised timefrequency analysis that is not possible on R, as a compactly supported function on R cannot have compactly supported Fourier transform. We work with the 3-Walsh group, although the 2-Walsh group (defined by replacing 3 by 2 in the definition) is more commonly used in the literature. We have made this choice because the 3-Walsh group leads to a more natural discrete model of BHF than the 2-Walsh group. Our arguments work equally well for any choice of integer parameter greater than or equal to 2.
In our application of the 3-Walsh model, the role of the extended phase plane R 3 + is taken by the set 3P of all tritiles: i.e. the set of all rectangles P = I P × ω P of area 3 in [0, ∞) × [0, ∞) (identified with W × W) whose sides are triadic intervals. The tritile P roughly corresponds to the point (x P , ξ P , |I P |), where x P and ξ P are the centres of I P and ω P respectively. Each tritile is split into three tiles P u = I P × ω Pu , u ∈ {0, 1, 2}-i.e. rectangles of area 1 with triadic sides-all with the same time interval I P , and to each of these tiles is associated a Walsh wave packet w Pu : W → C, supported in I P with frequency support in ω Pv . The embedding E[f ] : 3P → X 3 of a function f : W → X is given by integrating f against the three wave packets corresponding to a given tritile P, and collecting the results in a triple There are two equivalent ways of looking at this embedding: either as an X 3 -valued function on tritiles, or as an X-valued function on tiles, where we write where P is the unique tritile containing the tile P , and u is the index such that P = P u . Both viewpoints are handy, and we switch between them freely. The main results of this paper are the following embedding bounds. The Banach space assumptions (UMD, r-Hilbertian) are explained in Section 3, and the relevant outer structures on 3P in Section 4. The sizes S are also defined in Section 4; they depend on the Banach space X appearing in the statement of the theorem, although this is not apparent from the notation. Theorem 1.1. Let X be a Banach space which is UMD and r-Hilbertian for some r ∈ [2, ∞). Then for all convex sets A ⊂ 3P of tritiles, the following embedding bounds hold.
The implicit constants in the above bounds do not depend on A.
The set of exponents for which the embedding bounds (1.9) hold is sketched in Figure 1; in the dotted region, the iterated embedding bounds basically correspond to the non-iterated bounds. For p ≤ r we only have embeddings into iterated outer-L p spaces; such behaviour 'outside local L r ' necessitated the introduction of iterated outer-L p spaces by the second author [32].
Let us now discuss Banach-valued versions of the trilinear form BHF. Consider a triple of Banach spaces (X 0 , X 1 , X 2 ) and a bounded trilinear form With respect to this data we define for f u ∈ S (R; X u ), u ∈ {0, 1, 2}. The first L p -bounds for BHF Π were proven by Silva, in the case X 0 = R , X 1 = ∞ , X 2 = R , for R ∈ (4/3, 4), with Π the natural product-sum map [30,Theorem 1.7]. The set of allowed Banach spaces was extended by Benea and Muscalu using a new 'helicoidal method' [1,2], and by Lorist and Nieraeth by Rubio de Francia-type extrapolation methods [26,27]. One limitation of these results is that they only hold when the spaces X 0 , X 1 , X 2 are Banach lattices, excluding interesting examples such as the Schatten classes C p and more general non-commutative L p spaces.
It remains an open question as to whether there are any L p -bounds for BHF Π without this limitation. As a corollary of Theorem 1.1, we prove L p -bounds for the 3-Walsh model of BHF Π without assuming any lattice structure. This model is the tritile form Λ Π , defined by Theorem 1.2. Let (X u ) u∈{0,1,2} be UMD Banach spaces, such that each X u is r u -Hilbertian for some r u ∈ [2, ∞), and let Π : X 0 ×X 1 ×X 2 → C be a bounded trilinear 3 It is not obvious that the sum on the right hand side converges absolutely; see [19,Lemma 5.1] for a proof of this convergence for the quartile form, which will be discussed later in the introduction. Of course, the absolute convergence follows from our theorem.
form. Given any Hölder triple of exponents (p u ) u∈{0,1,2} ∈ (1, ∞) 3 satisfying we have the bound The region of exponents (p u ) 2 u=0 for which this theorem holds (more precisely, the region of their reciprocals) is characterised as the interior of a polygon in Section 6.1. This region is only nonempty when the Hilbertian exponents (r u ) 2 u=0 are jointly sufficiently close to 2, in the sense that L p bounds for the Banach-valued quartile form (the 2-Walsh analogue of Λ Π ) were first established by Hytönen, Lacey, and Parissis [19]. Their assumptions on the Banach spaces X u are very similar to ours-possibly equivalent, although this is not known-and the resulting range of exponents in their L p bounds are the same as ours when restricted to the reflexive range (see Section 6.1). Banachvalued time-frequency analysis was initiated by Hytönen and Lacey in their work on the Carleson operator, and continued with their work with Parissis on the Walsh model of the variational Carleson operator [17,18,20]. We have taken substantial inspiration from these papers.
The iterated embeddings of Theorem 1.1 imply not only L p bounds for the tritile form, but also sparse domination. The connection between sparse domination and Carleson embeddings into iterated outer-L p spaces was first shown by Di Plinio, Do, and the second author [9]. A collection of intervals G in W is sparse if where the supremum is taken over all intervals I ⊂ W (see [25, §6] for a proof that this is equivalent to the more familiar definition of a sparse collection). Theorem 1.3. Let (X u ) u∈{0,1,2} , (r u ) u∈{0,1,2} , and Π be as in Theorem 1.2. Let (p u ) u∈0,1,2 be any triple of exponents satisfying (1.12). Then where the supremum is taken over all sparse collections of intervals G.
The term appearing on the right of the bound of Theorem 1.3 is referred to as a sparse form. It is straightforward to show that sparse forms satisfy the bounds for any Hölder triple of exponents (p u ) u∈{0,1,2} with p u > p u . Furthermore, sparse forms satisfy various weighted bounds, which we do not pursue here; for more information see for example [25]. Let us return to the assumptions of Theorem 1.2: we consider three UMD Banach spaces (X u ) u=0,1,2 , each of which is r u -Hilbertian, linked with a bounded trilinear form Π : X 0 × X 1 × X 2 → C. There are a few natural examples that one should keep in mind: • Let X be a UMD Banach space which is r-Hilbertian. Then the dual space X * is also UMD and r-Hilbertian, and we can consider the 'duality trilinear form' Since C is UMD and 2-Hilbertian (i.e. Hilbert), the corresponding region of exponents in Theorem 1.2 is nonempty provided i.e. when r < 4. • Consider a Hölder triple of exponents r 0 , r 1 , r 2 ∈ (1, ∞), so that the Lebesgue spaces L ru (R) are UMD and max(r u , r u )-Hilbertian, and there is no exponent r < max(r u , r u ) such that L ru (R) is r-Hilbertian. Consider the 'integration trilinear form' Then Theorem 1.2 would yield a nontrivial region of exponents provided that u max(r u , r u ) −1 > 1.
But since u r u = 1, this occurs only if max(r u , r u ) < r u for some u, which is impossible. Thus this trilinear form never fits into our framework. This is in stark contrast with the results of Benea and Muscalu, who obtain bounds for BHF Π for this trilinear form for any Hölder triple r 0 , r 1 , r 2 with r 0 , r 1 ∈ (1, ∞] and r 2 ∈ [1, ∞) [1]. The reason for this discrepancy is our reliance on UMD methods. • On the other hand, replacing R with N in the preceding example, one can define the integration trilinear form on r0 × r1 × r2 provided that r −1 u ≥ 1. Thus this trilinear form fits into our framework provided that r −1 u > 1 and r u ≥ 2 for each u. The same holds when each ru is replaced by the Schatten class C ru and Π is replaced by the 'composition trilinear form'.
Here is a brief overview of the paper. In Section 2 we introduce the basics of the Walsh group W and the associated time-frequency analysis. In Section 3 we discuss various Banach space properties and their analytic consequences. In Section 4 we set up the framework of outer structures and outer-L p spaces. Of particular importance are the size-Hölder inequality (Proposition 4.12) for the 'randomised' sizes, and the size domination theorem (Theorem 4.15), which lets us control the randomised sizes by a simpler 'deterministic' size. Section 5 is devoted to proving Theorem 1.1. Crucial to these arguments is a basic tile selection algorithm given in Proposition 5.1. This is a simpler version of a more familiar 'tree selection algorithm' often used in time-frequency analysis; the simplification is thanks to the aforementioned size domination theorem. Finally, in Section 6, we deduce L p bounds and sparse domination for the tritile form. Section 7.1 is an appendix, in which we sketch an alternative method using R-bounds and the RMF property; this requires additional Banach space assumptions, but the proof is a bit more direct.
1.1. Notation. The letter W will always stand for the 3-Walsh group W 3 ; we always write W p when we want to use a different parameter p (see Section 2). For a Banach space X and p ∈ [1, ∞], L p (W; X) denotes the Bochner space of strongly measurable functions W → X such that the function x → f (x) X is in the usual Lebesgue space L p (W). For technical details on Bochner spaces see [15,Chapter 1]. When I ⊂ W is an interval and f ∈ L p loc (W; X), we let denote the triadic p-maximal function; the supremum is taken over all intervals I ∈ W containing x. For f ∈ L 1 loc (W; X) we let denote the average, of f on I. For f ∈ S (W; X) and g ∈ S (W; C) let We say that a triple of exponents (p u ) u∈{0,1,2} with p u ∈ [1, ∞] is a Hölder triple if 2 u=0 p −1 u = 1. Throughout the paper, we use (ε n ) n∈A to denote a sequence of independent Rademacher variables (i.e. random variables that take the values ±1 with equal probability), indexed over some countable indexing set A. It never matters precisely which probability space these Rademacher variables live on. We denote the expectation over this probability space by E.
1.2. Acknowledgements. Part of this research was completed while the first author was a postdoctoral researcher at the TU Delft and while the second author was a doctoral student at Bonn International Graduate School.
The first author was supported by the VIDI subsidy 639.032.427 of the Netherlands Organisation for Scientific Research (NWO) and a Fellowship for Postdoctoral Researchers from the Alexander von Humboldt Foundation. We thank Mark Veraar and Christoph Thiele for their encouragement and suggestions,

Walsh time-frequency analysis
In this section we introduce the Walsh group W and the extended Walsh phase plane. In particular we introduce tiles, wave packets, tritiles, trees, and strips; none of this material is new; we include it here for the convenience of the reader, and to fix notation. In Subsection 2.3 we introduce the defect operator, which is an important technical tool in our analysis.
2.1. The Walsh group. Fix an integer p ≥ 2. The Walsh group W p is where |x| = max p n : [x] n = 0 and [x] n is the n th component ('digit') of x. The group operation + is the digit-wise addition in Z/pZ, and the map (x, y) → |x−y| is a translation invariant metric on W p , giving W p the structure of a locally compact abelian group, and thus guaranteeing the existence of a Haar measure on W p . We normalise this measure so that |B 1 (0)| = 1, and it follows that As explained in the introduction, there is a correspondence between the Walsh group and the non-negative reals [0, ∞) given by the surjective map which is injective up to a set of measure zero. The pullback of the Lebesgue measure by this map is the Haar measure on W p , and intervals in [0, ∞) correspond to balls in W p . Thus we often refer to Walsh balls as intervals.
Let X be a Banach space. We say that a function f : W p → X is Schwartz, denoted f ∈ S (W p ; X), if there exists N > 0 such that f is supported on B p N (0) and constant on any interval I with |I| < p −N . For all p ∈ [1, ∞), the Schwartz functions are dense in L p (W p ; X).
The dual group of W p can be identified with W p itself, and the characters of W p are the Walsh exponentials The Walsh-Fourier transform of a function f ∈ L 1 (W p ; C) is thus and we have the Plancherel identitŷ Consider the modulation, translation, and dilation operators on functions f : W p → C, given by where [p −n x] j := [x] j+n for all n, j ∈ Z. It follows from the definition of the Walsh-Fourier transform that (2.6) Given two intervals I, I ⊂ W, it holds that In the remainder of the paper we will work with the case p = 3, and we will write W := W 3 .

2.2.
The extended Walsh phase plane. Strictly speaking, the extended Walsh phase plane is {(x, ξ, 3 n ) ∈ W × W × R + : n ∈ Z} where each point (x, ξ, 3 n ) ∈ W×W×R + represents the time x, the frequency ξ, and the scale 3 n . We can identify each point (x, ξ, 3 n ) with the rectangle B 3 n (x) × B 3 −n (ξ) ⊂ W × W; this provides for a more graphically intuitive way of thinking of time-frequency localisation. This identification is not injective, but it turns out that that this failure of injectivity correctly encodes the 'uncertainty principle' i.e. the impossibility of determining both position (in time) and frequency to an arbitrary scale.
We thus introduce the notion of a tile.
Definition 2.1 (Tiles). A tile is a rectangle P = I P × ω P in W × W of area 1, such that the sides I P and ω P are intervals. We call I P the time interval and ω P the frequency interval of P . For each tile P there exist unique x P , ξ P ∈ W and n ∈ Z such that (2.8) We call x P the centre of the tile, and ξ P the frequency of the tile. We denote the set of all tiles by P.
To each tile P we associate a wave packet w P , which is a C-valued function supported in I P with frequency support ω P . In the time-frequency sense, the wave packet w P is localised to P .
Definition 2.2 (Wave packets). Given a tile P ∈ P, the wave packet associated with P is the function This is the unique function, up to multiplication by a unimodular constant, such that (2.10) spt w P = I P , w P = ω P , and w P L 1 (W) = 1.
Remark 2.3. It is convenient to identify wave packets with tiles, and thus to consider the translation, dilation, and modulation operators (2.5) as acting directly on tiles, so that for example Mod ξ P = P ⇐⇒ Mod ξ w P = c w P for some |c| = 1.
We could equivalently define our wave packets with an arbitrary choice of unimodular constant out the front; all the statements we make about wave packets will be invariant under this transformation. In essence, what is most important is not the wave packet itself, but the subspace of L 2 (W; C) that it spans.
Simple support (and Walsh-Fourier support) considerations show that two tiles are disjoint if and only if their associated wave packets are orthogonal. More refined statements can be made about the connection between tiles and wave packets. For example, a union of disjoint tiles i P i corresponds to the subspace of L 2 (W; C) spanned by the pairwise orthogonal wave packets (w Pi ) i , and this subspace does not depend on the specific representation of i P i as a disjoint union of tiles. In particular, if a tile P is contained in such a union, then the wave packet w P can be written as a linear combination of the wave packets w Pi . This is made precise in the following lemma.
Lemma 2.4 (Basis expansion of wave packets). Let (P i ) i∈{1,...,N } be a finite collection of pairwise disjoint tiles. Then for any P ⊂ ∪ N i=1 P i it holds that Proof. We may assume that P ∩ P i = ∅ for all i ∈ {1, . . . , N }, for otherwise we would have w P ; w Pi = 0 and P i would not contribute to the right hand side of (2.11).
If I Pi ⊂ I P ⊂ I Pj for i = j then P i ∩ P j = ∅, contradicting the assumption, so either I P ⊃ I Pi for all i ∈ {1, . . . , N } or I P ⊂ I Pi for all i ∈ {1, . . . , N }. We consider only the first case, as the proof of the second is similar. Write The third identity comes from the fact that ω P ⊂ ω Pi and thus |ξ Pi − ξ P | < |I P | −1 , so that by (2.3) it holds that completing the proof when I Pi ⊂ I P for all i.
The expression (1.11) of the tritile form involves multiplication of 'nearby' wave packet coefficients of three separate functions. This 'nearness' of tiles is encoded by grouping triples of frequency-adjacent tiles into tritiles.
Definition 2.5 (Tritiles). A tritile is a rectangle P = I P × ω P of area 3, such that the sides I P and ω P are intervals. As with tiles, for every tritile P there are unique x P , ξ P ∈ W and n ∈ Z such that (2.12) We denote the set of all tritiles by 3P. Every tritile P can be written in a unique way as a disjoint union of 3 tiles with time interval I P ; these tiles are given by Conversely, for every tile P , there is a unique tritile P such that that P = P v for some v ∈ {0, 1, 2}. This splitting of P into tiles is the horizontal splitting; there is also a vertical splitting (2.14) that we will use less often.
The horizontal and vertical splittings are sketched in Figure 2.2. Figure 2. A tritile, the horizontal splitting, and the vertical splitting.
Remark 2.6. It is occasionally useful to identify the tritile P with the set of corresponding tiles {P 0 , P 1 , P 2 }, and to consider these tiles as 'subtiles' of P. Furthermore, given a Banach space X and a triple-valued function on tritiles F : 3P → X 3 , we can identify F with an X-valued function on tiles F : P → X defined by where P ∈ 3P and u ∈ {0, 1, 2} are uniquely determined such that P u = P . We will abuse notation and write F = F .
We consider 3P as being the 'correct' representation of the extended Walsh phase plane, and for us it plays the role that R 3 + plays for time-frequency analysis on the real line, as explained in the introduction.
One of Fefferman's (many) innovations in his proof of Carleson's theorem was the introduction of a partial order on tiles. Using this order one can define trees, which represent sets of tiles that are frequency-localised at a certain 'top frequency', with time restricted to a given interval. On these subsets, time-frequency analysis is essentially reduced to Calderón-Zygmund theory. 4 Definition 2.7 (Order and trees). Given two tritiles P and P , we say that The tree with top P is the collection of tritiles Given a tree T we denote by P T the unique tritile such that T = T (P T ). We write I T := I P T , ω T := ω P T , x T = x P T , and ξ T = ξ P T . The collection of all trees is denoted by T. For each u ∈ {0, 1, 2} the u-component of T is given by Remark 2.8. Given a tile P and a tree T , it will be useful to write P ∈ T to mean that P ∈ T , where P is the unique tritile containing P as a subtile (in the horizontal decomposition).
Another important class of subsets are the strips, which consist of tiles with time restricted to a given interval, with no restriction on frequency. These play an important role in the construction of iterated outer-L p quasinorms. Finally, we define the notion of convexity for sets of tritiles.
Note that trees, strips, and their complements are convex, and that the intersection of two convex sets is convex.
2.3. The embedding and the defect. Consider a Banach space X and a function f : W → X. Recall from the introduction the embedding E[f ] : P → X, defined by A general function F : P → X cannot be realised as an 'embedded function' F = E[f ], as the wave packet coefficients f ; w P are not independent. This lack of independence is codified by the relations in Lemma 2.4. We use these relations to construct a 'defect operator', which measures how far a function F : P → X is from being an embedded function.
Definition 2.11 (Defect operator). Given a Banach space X and a function F : P → X, the defect dF : P → X is given by where P ∈ 3P is the unique tritile containing P , and P ↑ is the vertical splitting of P defined in (2.14).
The defect operator satisfies In the following proposition, we show how a function F : P → X can be decomposed as the sum of an embedded function and its defect.
Proposition 2.12 (Function reconstruction). Let T be a tree, and let P be a tile with P ∈ T (recall from Remark 2.8 that this means P ∈ T , where P is the unique tritile with P ∈ P). Then for all N ∈ N it holds that Proof. We induct on N ∈ N. If N = 0 then the result follows immediately by definition of dF , as the first sum is empty and the condition in the second sum is a rewriting of the condition Q ∈ P ↑ . Let us show that if (2. 19) holds for N then it also holds for N + 1. Apply the result with N = 0 to each tile in the second sum to obtain where the last identity holds since the tiles Q ∈ T with |I Q | = 3 −(N +1) |I P | are disjoint and cover P . Plugging this into (2.19) for N gives the statement for N + 1 as required.
A similar result is true if F is not precisely an embedded function, but rather a 'cut-off ' embedded function; for this result we need to think in terms of tritiles rather than tiles. If F = E[f ] and A ⊂ 3P, then d(1 A F )(P) = 0 only if P happens to be on the "boundary" of the set A; that is, if P ∈ A and there exists Q ≤ P with |I Q | = |I P |/3 such that Q / ∈ A, or if P / ∈ A and there exists Q ≤ P with |I Q | = |I P |/3 such that Q ∈ A. A crucial observation is that if A is convex, then for any fixed x ∈ W there exist at most two tritiles P on the boundary of A with x ∈ I P .

Analysis in Banach spaces
The harmonic analysis of functions f : W → X valued in a Banach space X exhibits phenomena that are not present in the scalar case X = C. Generally techniques that work for scalar-valued functions require geometric assumptions on X in order to have X-valued extensions. The most famous of these geometric assumptions is the UMD (Unconditional Martingale Differences) property, which we discuss in Section 3.2. We will also require the q-Hilbertian property (also referred to as the θ-Hilbertian property in the literature). Before discussing these geometric assumptions we give a short introduction to Rademacher sums, a crucial tool in Banach-valued analysis without which not much can be said.
A relatively complete introduction to Banach-valued analysis is the incomplete series [15,16]. The reader will benefit from having a copy of these references at hand while reading this paper.
3.1. Rademacher sums. A great deal of scalar-valued harmonic analysis is connected with square functions; that is, functions of the form where (f n ) n∈{1,...,N } is a sequence of C-valued functions on R (for example). If X is a Banach lattice (or in particular, a function space), then for all finite sequences (x n ) N n=1 in X one can make sense of the quantity N n=1 as an element of X. However, for general Banach spaces X, this is not possible.
The correct X-valued analogue of a square function is a Rademacher sum, which is a quantity of the form where (x n ) N n=1 is a finite sequence in X, and where (ε n ) N n=1 is a sequence of independent Rademacher variables on some probability space Ω, i.e. random variables taking the values ±1 with probability 1/2. When X is a Banach lattice with finite cotype (for example, if X = L p (Ξ) for some σ-finite measure space Ξ, with p ∈ [1, ∞)), then Rademacher sums are equivalent to norms of square functions; that is, Here we mention two particularly important results that allow us to manipulate Rademacher sums. The first lets us replace the expectation in a Rademacher sum with an L p -expectation for any p ∈ (0, ∞); the second lets us pull out bounded scalar coefficients in a Rademacher sum. We will use these results throughout the paper, often without mention. For proofs see [16, Theorems 6.2.4 and 6.1.13].
Theorem 3.1 (Kahane-Khintchine). Let X be a Banach space. For all finite sequences (x n ) N n=1 in X and all p ∈ (0, ∞), we have the equivalence with implicit constant independent of N .
3.2. The UMD property. As already mentioned, the most important of our geometric assumptions is the UMD property. It is natural to assume this property when doing Banach-valued harmonic analysis, as a Banach space X is UMD if and only if the Hilbert transform extends to a bounded operator on L p (R; X) for all p ∈ (1, ∞) [4,5]. The classical reflexive function spaces, for example L p -spaces, Sobolev spaces, and Triebel-Lizorkin and Besov spaces, are all UMD. However, there are also important UMD spaces that are not function spaces (or not even Banach lattices); in particular, non-commutative L p -spaces, including the Schatten classes C p (see [28,Chapter 14] and [15,Appendix D]). For more exposition on UMD spaces see for example [6, 28,15]. We recall one possible definition of the UMD property in terms of Haar decompositions. For every dyadic interval J = [m2 n , (m + 1)2 n ) ⊂ R, n, m ∈ Z, define the L 1 -normalised Haar function where J 0 and J 1 are the left and right halves of J = J 0 ∪ J 1 , i.e.
It is straightforward to see that h J ; h J = 0 unless J = J , and thus for any finitely-supported sequence of signs a J ∈ {−1, 1}, where the sum is over all dyadic intervals J ⊂ [0, 1). When L 2 is replaced with L p for some p ∈ (1, ∞), the estimate (3.6) still holds, with a constant depending on p (although naturally the proof above, being reliant on orthogonality, does not extend to p = 2). This motivates the following definition.
such that for any f ∈ L p ([0, 1), X) and any finitely-supported sequence (a J ) J⊂[0,1) of signs, it holds that where the sum is over all dyadic intervals J ⊂ [0, 1).
If (3.7) holds for one p ∈ (1, ∞), then it holds for all p ∈ (1, ∞) (with a different constant) and with [0, 1) replaced by any dyadic interval (see [ The Haar functions are in fact 2-Walsh wave packets associated to T 1 B 1 (0) 2 , so the bound (3.7) can be interpreted as unconditionality of a tree projection operator. In the 3-Walsh case, we use the following randomised version of (3.7); the proof is a bit harder than the 2-Walsh case because the tree projections cannot be directly related to martingale transforms. The idea is to reduce to the tree T (B 1 (0) 2 ) by modulation, translation, and dilation, and then to reduce matters to a result of Clément et al. [8] which has already done the hard work of relating 3-Walsh-Fourier projections to martingale transforms. Proposition 3.4. Let p ∈ (1, ∞) and X be a UMD Banach space. Then for all trees T and all f ∈ L p (I T ; X) we have Proof. First we reduce to consideration of the tree T 1 := T (B 1 (0) 2 ). Fix an arbitrary tree T . Define the 'lacunary tiles' associated with T to be the set of tiles The lacunary tiles associated with T can be related to those associated with T 1 by the relation with dilation, translation, and modulation operators acting on tiles as in Remark 2.3. Applying these operators to the wave packets appearing in (3.8) we obtain for some unimodular constants a P . Supposing that (3.8) holds for T 1 , the contraction principle (3.3) yields that Thus it suffices to show (3.8) for the tree T = T 1 . For this tree, only the 0-part is nontrivial; i.e. T 1 = T 0 1 . Let us show the bound restricted to the summand corresponding to v = 1; the bound for the v = 2 summand is shown in the same way, and one combines these summands using the triangle inequality. Using the Kahane-Khintchine inequality (Theorem 3.1) and Fubini one has .
In reindexing the Rademacher variables we used that for each x ∈ B 1 (0) the tiles P ∈ T 0 for which w P1 (x) = 0 are in bijective correspondence with the scales {|I P | : P ∈ T 0 }, and thus the two sets of Rademacher variables {ε P1 : P ∈ T 0 , w P1 (x) = 0}, {ε |I P | : P ∈ T 0 } are equally distributed.
For each n ∈ N, let S n denote the Walsh-Fourier projection onto the interval .
To bound this quantity, we use a result of Clément et al. [8,Corollary 4.4]; since X is UMD, this result implies and completes the proof. 5 Remark 3.7. In [28], r-Hilbertian spaces are referred to as θ-Hilbertian. In our computations the parameter r plays a more important role, so we prefer to use our terminology.
For an introduction to interpolation spaces, see for example [3] or [15,Appendix C]. Note that if X is r-Hilbertian, then X is s-Hilbertian for all s > r. Every L pspace with p ∈ [2, ∞), either classical or non-commutative, is p-Hilbertian: to see this, note that L p = [L 2 , L ∞ ] θ with [2, ∞] θ = p. By the same argument, replacing L ∞ with L 1 , L p is p -Hilbertian when p ∈ (1, 2]. r-Hilbertian spaces enjoy the following 'r-orthogonality' of wave packet coefficients, which should be compared to the notions of tile-type and quartile-type in [17,18,19,20]. It should be noted that this is the only consequence of the r-Hilbertian property that we actually use. Thus one could isolate this estimate as a geometric assumption, perhaps called 'Walsh tile-type r' (although that name is already taken). However, we do not know how to establish the property without assuming the r-Hilbertian property, so we choose not to make this definition. 5 We briefly show how to deduce (3.9) from [8,Corollary 4.4], assuming familiarity with the notation of [8,Section 4]. The interval B 3 −n (3 −n ) can be identified with the set {n ∈ W : d (n,1) ≤ n < d (n,2) }, and thus the Walsh-Fourier projection Sn can be identified with the projection ∆ (n,1) . Since X is UMD, [8,Corollary 4.4] says that the set {∆ (n,v) : (n, v) ∈ N × {1, 2}} is an unconditional Schauder decomposition of L p (W; X), and this implies (3.9) by the contraction principle. Proposition 3.8 (Walsh tile-type). Let X be r-Hilbertian, then for any finite collection A ⊂ P of pairwise disjoint tiles, with implicit constant independent of A.
while Plancherel's theorem yields Remark 3.9. It is natural to suspect that if a Banach space X is r-Hilbertian for some r < ∞, then it must be UMD. This is false; a counterexample is given by Qiu's construction (see [15, §4.3.c] and [29]). 6 For all r ∈ [2, ∞], and k ∈ N, inductively define spaces is equipped with counting measure). Then set X r := ⊕ r n∈N X r k . For all r = 2, X r is not UMD, while X r = [X 2 , X ∞ ] θ is r-Hilbertian.

Outer-L p spaces
In this section we introduce outer structures and their associated outer-L p quasinorms. Roughly speaking, an outer structure on a topological space consists of an outer measure on the space, a Banach space X, and a size on X-valued functions on the topological space. Currently the standard references on this topic are the initial work by Do and Thiele [12], and the first Banach-valued implementation by Di Plinio and Ou [11]. However, the outer-L p concept is still quite new, and the terminology and definitions are not fixed. Our interpretation of the theory differs slightly (but not fundamentally) from what appears in the literature. In Sections 4.2 and 4.3 we analyse particular outer structures that are relevant to our problem. 4.1. Initial definitions. For a topological space X we let B(X) denote the σalgebra of Borel sets in X, and for a Banach space X we let B(X; X) denote the set of strongly Borel measurable functions X → X. Recall that a Polish space is a topological space that is homeomorphic to a complete separable metric space. This is a technical assumption that will ultimately play no role in this paper, as we only really care about the countable space 3P with the discrete topology.
Definition 4.1 (Outer structure). Let X be a Polish space. An outer structure on X, or simply an outer structure, consists of the following data: • a collection E ⊂ B(X) of generating sets, • a function σ : E → [0, ∞), called the premeasure, • a Banach space X, • an X-size (or simply a size) S on (X, E); that is, a family of maps indexed by E ∈ E such that there exists a constant C ≥ 1 satisfying the following properties for all E ∈ E and F, G ∈ B(X; X): homogeneity: λF S(E) = |λ| F S(E) for all λ ∈ C; quasi-triangle inequality: F + G S(E) ≤ C( F S(E) + G S(E) ); nondegeneracy: F S(E) = 0 for all E ∈ E if and only if F = 0. That is, the maps · S(E) are (possibly infinite) quasinorms on E, with quasinorm constant uniformly bounded in E ∈ E, and with an additional unconditionality property. 7 Given an outer structure on X as above, we define the induced outer measure σ : P(X) → [0, ∞] (which we denote by the same letter as the premeasure) by where the infimum is taken over all countable covers E of A by generating sets. For all f ∈ B(X; X) we define f S := sup E∈E f S(E) , and for all λ > 0 we define the outer superlevel measure Different choices of sizes lead to fundamentally different outer structures, even when the outer measure and the Banach space remain fixed. Thus we consider the size (and the underlying Banach space) as a component of the outer structure.
To each outer structure is associated a family of quasinorms, defined in a way that mimics the so-called layer cake representation of the L p norm. Definition 4.2 (Outer-L p quasinorms). Let X be a Polish space, and let (E, σ, X, S) be an outer structure on X. For all p ∈ (0, ∞) we define the outer-L p quasinorms and weak outer-L p quasinorms of a function f ∈ B(X; X) by setting It is straightforward to check that these are indeed quasinorms.
A Hölder-type inequality holds for outer-L p spaces defined with respect to different sizes provided that it holds in a certain sense for the sizes themselves. The proof below is a straightforward extension of that of [12,Proposition 3.4].

Proposition 4.3 (Outer Hölder inequality)
. Let X be a Polish space. For each u ∈ {0, 1, 2} let (E, σ, X u , S u ) be an outer structure on X, and let (E, σ, X, S) be another outer structure on X. Note that all these outer structures have the same generating sets and premeasure. Let Π : X 0 × X 1 × X 2 → X be a bounded trilinear map, and suppose that the size-Hölder inequality holds. Then for all p u ∈ [1, ∞] we have the outer Hölder inequality Assume that the factors on the right hand side of (4.2) are finite and nonzero, for otherwise there is nothing to prove. By homogeneity we may assume that F u pu L pu σ Su = 1 for each u. For each u ∈ {0, 1, 2} and n ∈ Z let A u n ⊂ X be such that n∈Z 2 n σ(A u n ) 1 1 X\A u n F u Su 2 n/pu .
We may assume that A u n ⊂ A u n−1 by considering A u n = k≥n A u n and noticing that A u n satisfies the conditions above. Let A n = 2 u=0 A u n . Then it holds that It is possible to control classical L 1 norms by outer-L 1 quasinorms, by the following Radon-Nikodym-type domination principle. For the proof in the case X = C, which extends to general Banach spaces, see [32, Lemma 2.2] and [12, Proposition 3.6] Proposition 4.4 (Radon-Nikodym-type domination). Let X be a Polish space, and let (E, σ, X, S) be an outer structure on X such that X = i∈N E i for some countable sequence of generating sets E i ∈ E. If m is a positive Borel measure on X such that The outer-L p spaces support a useful Marcinkiewicz-type interpolation theorem, proven in [12, Proposition 3.5] (see also [11,Propostion 7.4]). In applications we only prove bounds for outer-L p quasinorms by establishing endpoint weak outer-L p bounds.
Then for all p ∈ (p 1 , p 2 ). Two families of sizes on (3P, T), called 'deterministic' and 'randomised', will be needed. The deterministic sizes are C-sizes, while the randomised sizes are X 3sizes, where X is a given Banach space.
Definition 4.6 (Deterministic sizes). The C-sizes S 1 and S ∞ on (3P, T) are given by for all F ∈ B(3P; C). 8 We also define the mixed deterministic C-size S (∞,1) by Definition 4.7 (Randomised sizes). Let X be a Banach space. The X 3 -size S is given for all F ∈ B(3P; X 3 ) by Remark 4.8. We do not mention the Banach space X in the notation for the randomised size S; this should always be clear from context. Often we will refer to three functions F u ∈ B(3P; X u ) (u ∈ {0, 1, 2}) valued in different Banach spaces, and discuss the three sizes F u S ; here we have three different X 3 u -sizes S, but we gain no clarity from denoting these sizes differently.
It is almost clear that S satisfies all the conditions of a size; the only subtlety is in showing that the component measuring dF satisfies the unconditionality property. Proposition 4.9. Let X be a Banach space, F ∈ B(3P; X), and suppose that A ∈ T ∪ is a countable union of trees. Then It follows that S satisfies the unconditionality property.
Proof. Notice that for all tritiles P, by (2.18) and since d(1 A F )(P) = dF (P) unless P / ∈ A and Q ∈ A, where Q is a tritile with Q ≤ P and |I Q | = |I P |/3. Since A ∈ T ∪ , for any x ∈ I T it holds that there is at most one P ∈ 3P such that writing out the definition of S (∞,1) , one sees that this gives the required estimate.
We make use of the two premeasures µ and ν on 3P by iterating this construction to obtain 'iterated' outer structures. Definition 4.10 (Iterated outer structures). Let X be a Banach space. Given an X-size S on (3P, T), for all q ∈ (0, ∞) we define an X-size - It is straightforward to verify that this is indeed an X-size on (3P, D), and thus (D, ν, X, -L q µ S) is an iterated outer structure on 3P, inducing iterated outer-L p quasinorms · L p ν -L q µ S for all p ∈ (0, ∞]. The following iterated outer Hölder inequality is a straightforward consequence of the 'non-iterated' outer Hölder inequality of Proposition 4.3. Corollary 4.11 (Hölder inequality for iterated outer-L p spaces). Let X 0 , X 1 , X 2 , X be Banach spaces, and let Π : X 0 × X 1 × X 2 → X be a bounded trilinear form. 9 Let S be a X-size on (3P, T), and for each u ∈ {0, 1, 2}, let S u be an X u -size on (3P, T) such that the size-Hölder inequality holds. Then for all p u , q u ∈ [1, ∞], where p −1 = 2 u=0 p −1 u and q −1 = 2 u=0 q −1 u . We return to consideration of our trilinear form Π : X 0 × X 1 × X 2 → C. Define an 'extended' trilinear form Π * : 0 , x 1,1 , x 1,2 ), (x 2,0 , x 2,1 , x 2,2 ) := Π(x 0,0 , x 1,1 , x 2,2 ). 9 Here we write Π to emphasise that this is not the trilinear form that we consider in the introduction, and throughout most of the paper; in practise Π will be an extension of the aforementioned Π.
The most important result of this section is the following size-Hölder inequality for the randomised sizes and the deterministic size S 1 .
Proposition 4.12 (Size-Hölder). Let X 0 , X 1 , X 2 , Π, and Π * be as above. Then Proof. First note that so it suffices to fix u ∈ {0, 1, 2} and deal with the summands in the last entry individually. We concentrate on the case u = 0; the other cases are analogous. We restrict the sum over P to 3P N = P ∈ 3P : |I P | > 2 −N and we look for a bound independent of N , allowing us to conclude by standard limiting arguments. For ease of notation we set F N 0 (P) := 1 3P N (P)F 0 (P).
Fix a normalised sequence a ∈ ∞ (T u ; C) and estimate by duality We bound the first summand as follows: where the coefficients b P,Q := w Qv ; w P0 |I P | satisfy |b P,Q | < 1. Letting ε P be independent Rademacher variables, we have Finally, applying Cauchy-Schwartz to the last entry we obtain This Hölder inequality, combined with Radon-Nikodym domination, leads to the following result.
Corollary 4.13. Let X 0 , X 1 , X 2 , Π, and Π * be as above. Let (p 0 , p 1 , p 2 ) and (q 0 , q 1 , q 2 ) be Hölder triples of exponents. Then Proof. The first estimate follows from combining the outer Hölder inequality (Proposition 4.3) with Radon-Nikodym domination (Proposition 4.4), using Proposition 4.12. For the second, we have as a consequence of the outer Hölder inequality. By multiplying by characteristic functions of strips this implies For each F : 3P → C and each strip D ∈ D, Radon-Nikodym domination yields Applying Radon-Nikodym domination and the iterated outer Hölder inequality (Corollary 4.11) completes the proof.
Remark 4.14. The tritile form associated to Π : X 0 × X 1 × X 2 → C can be written as so by Corollary 4.13 we have Thus given a Hölder triple (p u ) 2 u=0 , in order to prove the L p -bounds it suffices to find a Hölder triple (q 0 , q 1 , q 2 ) such that for all u ∈ {0, 1, 2}.
with implicit constant independent of A.
Convexity of a set of tritiles is defined in Definition 2.10. Any of the standard norms on X 3 will do the job here, but we use the ∞ -norm x u X .
We will prove Theorem 4.15 later in the section. First we show how it implies outer-L p quasinorm bounds. The argument is standard, but we include it to show the role played by convexity.
and (4.10) Proof. Let us show that (4.9) holds. Assume that the right hand side of the inequality is finite. Then for each n ∈ Z there exists a countable union of trees E n = i∈N T n,i such that n∈Z µ(E n )2 pn For each n and i the set 3P\T n,i is convex, and thus so is 3P\E n = i∈N 3P\T n,i . Theorem 4.15 implies that so by the definition of the outer-L p quasinorms it holds that as required. Similar reasoning yields the iterated bounds (4.10); it suffices to recall that strips and their complements are convex.
The proof of Theorem 4.15 relies on the following lemma.
Lemma 4.17. Let X be a Banach space and f ∈ S (W; X). Let T be a tree and A a finite convex set. Then there exists a function g ∈ S (W; X) supported on I T such that Proof. The set A∩T can be assumed to be non-empty, otherwise we can take g = 0.
We first reason under the assumption that P T ∈ A. Let where we recall that ch(J) denotes the set of triadic children of the interval J. By convexity of A, the set J satisfies Let J be the partition of I T generated by J , i.e. the elements of J are the maximal triadic subintervals of I T , ordered by inclusion, that do not contain any interval of J as a proper subset. The set J can also be characterised as the set of minimal elements of J with respect to inclusion. It follows that for any J ∈ J there exists a unique P(J) such that J ∈ ch(I P(J) ). Furthermore, for any P ∈ A, the elements of J cannot contain I P , and thus {J ∈ J : J ⊂ I P } partitions I P .
For every J ∈ J let Q J be the unique tile such that ξ T ∈ ω Q J and I Q J = J, and set g := Let us show that (4.11) holds. Given any P ∈ A the intervals {J ∈ J : J ⊂ I P } partition I P , and since any such J does not properly contain any of the triadic children of I P , it holds that |J| ≤ |I P |/3 and thus |ω P | = 3/|I P | ≤ |ω Q J |. Since ξ T ∈ ω P J ∩ ω P this implies that By Lemma 2.4, for any u ∈ {0, 1, 2} it holds that It follows that where the last equality holds by maximality of J : if J ∈ J and J ⊂ I P then J ∩ I P = ∅. Now we prove the bound (4.12). The wave packets w Q J for J ∈ J have disjoint time support so it suffices to show that for all such J. Notice that Q J ⊂ 2 u=0 P(J) u with P(J) as above, so using Lemma 2.4 we obtain that Finally, suppose that P T / ∈ A. Let (O i ) i be the maximal elements of T ∩ A with respect to the order ≤. The intervals I Oi are pairwise disjoint, and T ∩ A can be written as a union of disjoint sets ∪ i T (O i ) ∩ A. Applying the above reasoning to each T (O i ) we obtain a set of disjointly supported functions g i satisfying Setting g = i g i completes the proof.
Recall that the randomised size S is the sum of three types of terms, The first summand need not be estimated; we handle the remaining summands separately.
Proposition 4.18 (Defect size domination). Let X be a Banach space and A ⊂ 3P a convex set. Then for all trees T and all f ∈ S (W; X), Proof. Using the estimate (4.5) and the fact that dE[f ] = 0, for all tritiles P we have Since A is convex, for each x ∈ I T there are at most two tritiles P such that x ∈ I P and Q≤P as required.
Proof. Let 3P N = {P ∈ 3P : 3 −N < |I P | < 3 N }. We show that f ; w Pv X for fixed N ; the theorem follows by passing to the limit N → ∞. Since A ∩ T ∩ 3P N is finite and convex, by Lemma 4.17 there exists a function g ∈ S (W; X) supported on I T such that Since X is UMD we have by Proposition 3.4 Summing this over v = u and using the L ∞ -bound on g yields (4.15).

Proofs of the embedding bounds
In this section we prove Theorem 1.1: modulation invariant Carleson embedding bounds into iterated and non-iterated outer-L p spaces. Before getting to the proofs themselves, we isolate a tile selection algorithm that appears multiple times in the proofs. Thanks to the size domination theorem (Theorem 4.15), we only need this simple tile selection procedure, rather than a more complicated tree selection procedure (as used for example in [19]).
Proposition 5.1 (Tile selection). Let F ∈ B(3P; C). For any λ > 0 there exists a (possibly empty) set B λ of pairwise disjoint tritiles such that, if we set E λ := Otherwise let B λ ⊂ M λ be the subset of tritiles in M λ that are maximal with respect to ≤. Then B λ satisfies the first required condition, and to see the second one simply notes that M λ ⊂ E λ . To see that B λ consists of pairwise disjoint tritiles, suppose that P, Q ∈ B λ with P ∩ Q = ∅. Then either P ≤ Q or Q ≤ P, and by maximality of P and Q in M λ we must have that P = Q.
We are ready to prove our modulation invariant Carleson embedding bounds. We prove these with respect to the deterministic size S ∞ , under an r-Hilbertian assumption; we will obtain Theorem 1.1 as a corollary of the size domination theorem. First we consider embeddings into non-iterated outer-L p spaces. These are easier to prove, but they only hold for p > r.
Theorem 5.2. Let X be a Banach space which is r-Hilbertian for some r ∈ [2, ∞). 10 Then the bounds hold for all f ∈ S (W; X).
Proof. By interpolation (i.e. by Proposition 4.5) it suffices to establish weak endpoint bounds for p = ∞ and p = r. The p = ∞ endpoint follows immediately from the definition of S ∞ : For the weak outer-L r endpoint, we need to show that for every λ > 0 there exists a set E λ ⊂ 3P such that Apply the tile selection (Proposition 5.1) at level λ to the function F (P) = E[f ](P) X 3 to get a disjoint collection of tritiles B λ such that with E λ := B∈B T (B). It remains to show the bound on µ(E λ ).
For each B ∈ B λ there exists a tile P B ∈ B of the tritile B such that f ; w P B X > λ. The tritiles B are pairwise disjoint and thus so are the tiles P B ; therefore we have where the last estimate follows from Proposition 3.8 applied to all finite subsets of B λ . Now we prove the embeddings into iterated outer-L p spaces, which hold for all p > 1, but which are much harder to prove. Theorem 5.3. Let X be a Banach space which is r-Hilbertian for some r ∈ [2, ∞). 11 Then for all p ∈ (1, ∞) and q ∈ (min(p, r) (r − 1), ∞] the bound holds for all f ∈ S (W; X).
Proof. Fix p ∈ (1, ∞). We will establish various endpoints depending on the position of p relative to r; interpolation will then yield the estimates that we claim. In all cases, we will first fix λ > 0 and utilise the set K λ ⊂ 3P defined (dependent on p) as follows: write as a disjoint union of (maximal) triadic intervals, and then define where D(I n,λ ) is the strip generated by I n,λ . Since the min(p, r)-maximal function M min(p,r) is of weak type (p, p), we then have In each case it remains to show for an appropriate exponent q that Here we need to show that This follows from the definition of K λ : Endpoint 2: p ≥ r, q = r. We must show that for every strip D ∈ D and every τ > 0 there exists E τ ⊂ 3P such that It suffices to assume that τ < λ, for otherwise we can take E τ = ∅ and the result follows from Endpoint 1.
Fix a strip D. We may assume I D ⊂ I n,λ for all n ∈ N, since otherwise D\K λ = ∅ and there is nothing to prove. It thus holds that (5.4) f -L r (I D ;X) ≤ f - λ. 11 Again, one could alternatively suppose that X satisfies the bounds (3.10).
For P ∈ D we have that f ; w Pv = f 1 I D ; w Pv for all v ∈ {0, 1, 2}. The noniterated version of the embedding, i.e. Theorem 5.2, then guarantees that and we are done. 'Endpoint' 3: p < r and q > p (r − 1). We will show that for every strip D ∈ D and every τ > 0 there exists E τ ⊂ 3P such that (5.5) µ(E τ ) (λ/τ ) q |I D | and for any q > p (r − 1). The result of Endpoint 1 allows us to consider only q close to p (r − 1) and extend the result to all q by interpolation. Furthermore it suffices to assume that τ < λ, for otherwise we can take E τ = ∅ and the result follows from the s = ∞ bound. Fix a strip D. As before we may assume I D ⊂ I n,λ for all n ∈ N, so if I n,λ intersects I D , we must have I n,λ I D . Henceforth we consider only those indices n ∈ N for which I n,λ I D , and we drop λ from the notation. For each k ∈ N let (J n,k,m ) m∈N denote the maximal subintervals of I n on which M p ( f X ) > 2 k λ.
Let us decompose f by setting We have bounds Now fix ε > 0, and for each k ≥ −1 apply the tile selection of Proposition 5.1 to F k at level 2 −εk τ , yielding sets B k and E k := B∈B k T (B) of tritiles such that On the other hand for any P ∈ D \ K λ one has that For k = −1 this is a trivial consequence of (5.6), while for k ∈ N notice that I n ∩ I P = ∅ only if I n ⊂ I P so It follows that E k is empty when 2 k(p−1− ) λ τ , i.e. when k ≥ k λ/τ with 2 k λ/τ (λ/τ ) We conclude by setting E τ := k λ/τ k=−1 E k . Since r(1 + ) − p > 0, estimate (5.9) gives that where the last inequality holds since > 0 is arbitrary and τ λ. On the other hand and this concludes the proof.
Proof of Theorem 1.1. The argument is identical for the iterated and non-iterated embeddings, so we only show the iterated case. By Corollary 4.16, using that X is UMD, for any convex A ⊂ 3P it holds that and by the iterated embeddings for S ∞ (Theorem 5.3), using that X is r-Hilbertian, The first inequality above follows by the unconditionality property of sizes and thus of outer-L p quasi-norms. This completes the proof.

Applications to the tritile form
Again we consider three Banach spaces X 0 , X 1 , X 2 and a bounded trilinear form Π : X 0 × X 1 × X 2 → C. Each X u is assumed to be UMD and r u -Hilbertian for some r u ∈ [2, ∞). Recall that the tritile form is the trilinear form Using the embedding theorems from the previous section, we will establish L pbounds and sparse domination for Λ Π . 6.1. L p bounds.
Proof of Theorem 1.2. The condition (1.12) guarantees the existence of a Hölder triple (q 0 , q 1 , q 2 ) such that q u > min(p u , r u ) (r u − 1) for all u ∈ {0, 1, 2}, and then by Theorem 1.1 we have for all u. By Remark 4.14 this suffices to prove the theorem.
The set of exponents (p u ) u∈{0,1,2} to which Theorem 1.2 applies (more precisely, the set of reciprocals (1/p u ) u∈{0,1,2} ) can be characterised as the interior of a polygon. Let β u = 1/p u and γ u := 1/r u . Say that (p 0 , p 1 , p 2 ) is admissible if We rewrite the left hand side of this condition as It follows that an admissible exponent (p 0 , p 1 , p 2 ) exists only if and we assume this condition in what follows. Consider the set of exponents This set is the interior of a polygon; the vertices of this polygon may be found by choosing w ∈ {0, 1, 2} \ {u} arbitrarily, setting β u = γ u , and making β w > γ w as large as possible. Let v be the single element of {0, 1, 2} \ {u, w}, so that 1 − β v = β u + β w = γ u + β w . Then the second condition in the definition of S, for β w > 1 − γ v − γ u , becomes Rearranging this gives so the vertices of ∂S are given by the 6 points β in the Hölder triangle determined by their (u, w)-components The region of exponents (β u ) = (p −1 u ) to which Theorem 1.2 applies is thus the interior of the convex hull of the 6 points in (6.3), intersected with the cube (0, 1) 3 (noting that S generally contains some exponents with nonpositive entries).
Thus, comparing our result with that of Hytönen, Lacey, and Parissis [19], we see that we obtain the same L p bounds for the tritile operator as they do for the quartile operator, but restricted to the reflexive range p u ∈ (1, ∞). It remains an open question whether one can use outer-L p techniques to obtain estimates outside this range.
We will show the following abstract sparse domination result: for any Hölder triple (q u ) u∈{0,1,2} and any triple of exponents p u ∈ [1, ∞), we have the bound for any F u ∈ B(3P; X 3 u ). This result suffices to prove the theorem; to see this, let the Hölder triple (q u ) u∈{0,1,2} such that q u > min(p u , r u ) (r u − 1) for each u (such a choice is possible by condition (1.12)), the bound holds. Since |Λ Π (f 0 , f 1 , f 2 )| ≤ P∈3P Π * F 0 (P), F 1 (P), F 2 (P) |I P | this implies the conclusion of the theorem. It remains to show that (6.4) holds. The definition of the iterated outer-L p quasinorms implies that for every strip D, there exists a subset K D ⊂ D such that where J (I) is the set of intervals defined by (6.5) and (6.6) (with D = D(I)). The bound (6.6) guarantees, by induction, that max I∈Gn |I| ≤ 2 −n |I 0 | and thus n∈N I∈Gn I = ∅. Let (q u ) u∈{0,1,2} be any Hölder triple; using the Hölder inequality for L qu µ S gives us (6.7) Recall that for any I ∈ G we have set K D(I) = I ∈J (I) D(I ) so that (6.5) holds and guarantees the last bound.
We now show that G = n∈N G n is sparse with G sp ≤ 1. The intervals of G are nested in the sense that if J ∈ G n+1 then there exists J ⊃ J with J ∈ G n . First suppose I ∈ G n0 for some n 0 ∈ N; it follows by induction from (6.6) that Estimating the sum over D 0 via (6.7) and using that G sp ≤ 1 for the collections G that we constructed shows that (6.4) holds, and completes the proof.

Appendix: Arguing via R-bounds and RMF
In this appendix we present an alternative approach to our main results. In this approach, the randomised sizes are simplified; the defect operator does not appear, and the sizes resemble more closely the sizes used in [19] and [11] or in earlier scalar-valued proofs. For this simplicity we pay the price of having to manipulate R-bounds, which leads to the assumption of the RMF property on the trilinear form Π. The same technical difficulty occured in [11]; a new method of obtaining these results without RMF was recently given in [10]. 7.1. R-bounds and the RMF property. First we recall the definition of Kconvexity of a Banach space. All the spaces that we consider, being UMD spaces, are K-convex [15, Proposition 4.3.10].
where the supremum is taken over all sequences (x * n ) N n=1 in X * such that with implicit constant independent of N .
As a technical tool, we use a strong notion of boundedness of a set of operators known as R-boundedness. For a short history of this concept see [16, §8.7].
Definition 7.2. Let X and Y be Banach spaces, and let T ⊂ L(X, Y ) be a set of bounded operators from X to Y . We say that the set T is R-bounded if there exists a constant C > 0 such that for all finite sequences (x n ) N n=1 in X and (T n ) N n=1 in T , the estimate ε n x n X holds. The smallest allowable C in this estimate is called the R-bound of T , and denoted by R(T ).
If T is R-bounded, then T is uniformly bounded in norm (consider N = 1 in for all sequences (x n ) in X, (y * n ) in Y * , and (T n ) in T , and the smallest admissible constant C here is equivalent to R(T ). This is analogous to the Hölder inequality for sequences f, g, h : N → C, keeping in mind the analogy between Rademacher sums and square functions.
The concept of R-boundedness applies to sets of operators, but we can apply it to sets of vectors by viewing them as operators. This relies on the additional information of an identification of vectors with operators. This definition leads to an analogue of the Doob (or dyadic Hardy-Littlewood) maximal function, with R-bounds replacing norm bounds. Definition 7.4. Let X be a Banach space, and consider an embedding ι : X → L(Y, Z) for some Banach spaces Y and Z. Let (F n ) n∈N be a filtration on a σ-finite measure space (Ξ, F, µ). The Rademacher maximal operator M ι with respect to ι is defined on F-measurable functions f : Ξ → X by where E[f |F n ] is the conditional expectation of f on F n .
The L p -boundedness of this maximal function is a geometric property of the embedding ι (and thus of the Banach spaces X, Y, Z), which may or may not hold. Thus it is given a name.
Definition 7.5. Let X be a Banach space, and ι : X → L(Y, Z) an embedding of X into the bounded linear operators between some Banach spaces Y and Z. We say that ι has the RMF property if for all filtrations and measure spaces as above, M ι is bounded from L p (Ξ; X) to L p (Ξ) for all p ∈ (1, ∞). More concisely, we say that ι is an RMF embedding of X, without making explicit reference to the spaces Y and Z.
We say that a Banach space X has the RMF property (without reference to an embedding) if the natural embedding ι : X → L(X * , C) has the RMF property. In this case, for a function f : R → X on the line equipped with the dyadic filtration, we have where the sum is over all dyadic intervals I containing x, and where the supremum is taken over all normalised sequences λ ∈ 2 (D) on the set D of dyadic intervals in R. This form of the RMF property was first introduced by Hytönen, McIntosh, and Portal [14], who proved that the following classes of Banach spaces are RMF: • all spaces of type 2, • all UMD Banach lattices (including L p -spaces with p ∈ (1, ∞)), • all noncommutative L p spaces with p ∈ (1, ∞), and in particular the Schatten classes C p with p ∈ (1, ∞). They also proved that 1 does not have the RMF property. It is not known whether the UMD property implies the RMF property. The converse fails: there exists a space of type 2 (and therefore with the RMF property) which is not reflexive, and therefore not UMD [21]. Now consider a triple of Banach spaces X 0 , X 1 , X 2 and a bounded trilinear form Π : X 0 × X 1 × X 2 → C. Indexing {0, 1, 2} = {u, v, w} arbitrarily, the trilinear form induces natural embeddings ι u Π : X u → L(X v , X * w ). Thus for a set of vectors V ⊂ X u we have an R-bound R Π (V ) := R ι u Π (V ) and a Rademacher maximal operator M Π := M ι u Π . We generally omit ι and u in our notation, and observe that the indexing {0, 1, 2} \ {u} does not affect these quantities, as R-bounds are preserved under taking adjoints. We say that Π has the RMF property if each embedding ι u Π is RMF. If each X u is K-convex (as is the case when each X u is UMD) then for all subsets V ⊂ X u , the R-bound R Π (V ) is equivalent to the smallest constant C such that (7.2) N n=1 Π(x 0,n , x 1,n , x 2,n ) ≤ C v =u E N n=1 ε n x v,n Xv for all finite sequences (x u,n ) N n=1 ⊂ V , and (x v,n ) N n=1 ⊂ X v , v ∈ {0, 1, 2} \ {u}. Remark 7.6. Although all known UMD spaces have the RMF property, this does not tell us that a trilinear form Π : X 0 × X 1 × X 2 → C automatically has the RMF property when each X u is UMD. If each X u is a UMD Banach function space and Π is the pointwise product, then by the Khintchine-Maurey theorem (3.1) one can reduce matters to boundedness of the lattice maximal function, which follows from UMD, and so Π has the RMF property. However, a natural trilinear form is given by composing the composition map on Schatten classes C p × C q × C r → C 1 (when p −1 + q −1 + r −1 = 1 with the trace map C 1 → C; here each of the Banach spaces is UMD (and each has RMF, as an individual Banach space) but it is not known whether the constructed trilinear form has the RMF property.

7.2.
Randomised sizes with R-bounds. We can use the notion of R-boundedness in the previous section to define new randomised sizes. The key difference between these sizes and those defined in Section 4 are that the defect operator is not used, and that R-bounds appear in the overlapping sizes.
Proposition 7.8. For all F u ∈ B(3P; X u ), A corresponding Hölder inequality for outer-Lebesgue spaces follows as in Section 4.
The key ingredient that we need is a size domination result for these sizes. This uses the deterministic size S ∞ from Section 4. Theorem 7.9. For all u ∈ {0, 1, 2} and all f ∈ S (W; X u ), and all convex A ⊂ 3P, Corresponding outer-Lebesgue estimates as in Corollary 4.16 follow immediately.
Proof of Theorem 7.9. As in the proof of Proposition 4.19, it suffices to replace 3P with 3P N = {P ∈ 3P : 3 −N < |I P | < 3 N }. Fix a tree T . Since A ∩ T ∩ 3P N is finite and convex, by Lemma 4.17 there exists a function g ∈ S (W; X u ) supported on I T such that It suffices to bound each of the summands in by g L ∞ (I T ;Xu) . The lacunary parts S u,v (v = u) are already handled in the proof of Proposition 4.19, using the UMD property (one need only replace the exponent 2 by the exponent 3). It remains to treat the overlapping part, S u,u . Since ξ T ∈ w Pu , we have exp (ξ Pu −ξ T ) (x − x P ) = 1 ∀x ∈ I P and so w Pu = exp (ξ Pu −ξ T ) (x P ) exp ξ T (x)|I P | −1 1 I P .
Writing out the size and using the RMF property of Π gives |I T | −1 exp −ξ T g 3 L 3 (I T ;X) ≤ g L ∞ (I T ;X) where M Π is the Rademacher maximal operator with respect to Π, and we used the contraction principle to remove the unimodular coefficients exp (ξ P −ξ T ) (x P ).
Having proven this size domination result, we are reduced to the situation of Section 5, where embedding bounds are proven with respect to the deterministic size S ∞ , which is the same in this formulation. Thus we obtain an alternative version of our main theorem, with different sizes and an additional RMF assumption.
Theorem 7.10. Let X 0 , X 1 , and X 2 be UMD Banach spaces, such that each X u is r u -Hilbertian for some r u ∈ [2, ∞). Let Π : X 0 × X 1 × X 2 → C be a bounded trilinear form with the RMF property. Then for all u ∈ {0, 1, 2} and all convex sets A ⊂ 3P of tritiles, the following embedding bounds hold.
• For all p ∈ (1, ∞) and all q ∈ (min(p, r u ) (r − 1), ∞), The implicit constants in the above bounds do not depend on A.
As the sizes S u satisfy a Hölder inequality (Proposition 7.8), we obtain an alternative proof of our results on the tritile operator (Theorems 1.2 and 1.3) under the additional assumption that Π has the RMF property.