The bilinear Hilbert transform in UMD spaces

We prove $L^p$-bounds for the bilinear Hilbert transform acting on functions valued in intermediate UMD spaces. Such bounds were previously unknown for UMD spaces that are not Banach lattices. Our proof relies on bounds on embeddings from Bochner spaces $L^p(\mathbb{R};X)$ into outer Lebesgue spaces on the time-frequency-scale space $\mathbb{R}^3_+$.


Introduction
Consider three complex Banach spaces X 1 , X 2 , X 3 and a bounded trilinear form Π : X 1 × X 2 × X 3 → C. With respect to this data one can define the trilinear form on Schwartz functions f i ∈ S (R; X i ). It is natural to try to prove L p bounds for a range of exponents p i quantified in terms of the the geometry of the Banach spaces X i . In this article we prove the following result for intermediate UMD spaces.
We discuss particular examples of Banach spaces X i and trilinear forms Π to which Theorem 1.1 applies in Section 6.2.
The UMD property of a Banach space X has many equivalent characterisations. The most natural from the viewpoint of harmonic analysis is given in terms of the (linear) Hilbert transform on X-valued functions, which for f ∈ S (R; X) is defined by Hf (x) := p. v.ˆR f (x − y) dy y : the Banach space X is UMD if H is bounded on L p (R; X) for all p ∈ (1, ∞). UMD stands for 'Unconditionality of Martingale Differences', a probabilistic concept whose equivalence with boundedness of the Hilbert transform is due to Burkholder and Bourgain [6,7]. Examples of UMD spaces include separable Hilbert spaces, most classical reflexive function spaces (including Lebesgue, Sobolev, Besov, and Triebel-Lizorkin spaces), and non-commutative L p -spaces with p ∈ (1, ∞). For more information see for example [8,21,33]. All known UMD spaces actually satisfy the seemingly stronger property of being intermediate UMD: there exists a Hilbert space H and a UMD space Y , forming a compatible couple, such that X is isomorphic to the complex interpolation space For a discussion of complex interpolation see [5] or [21,Appendix C]. It is conjectured that every UMD space is in fact intermediate UMD; this problem has been open since it was first posed by Rubio de Francia in [36]. Theorem 1.1 is stated for the trilinear form BHF Π , which is dual to the bilinear Hilbert transform BHT Π : for (f 1 , f 2 ) ∈ S (R; X 1 ) × S (R; X 2 ), BHT Π (f 1 , f 2 ) is an X * 3 -valued function on R defined by BHT Π (f 1 , f 2 )(x) := p. v.ˆR Π(f 1 (x − y), f 2 (x + y), ·) dy y ∀x ∈ R, where for vectors x 1 ∈ X 1 , x 2 ∈ X 2 , the functional Π(x 1 , x 2 , ·) ∈ X * 3 is defined by Π(x 1 , x 2 , ·); x 3 := Π(x 1 , x 2 , x 3 ) ∀x 3 ∈ X 3 .
An L p -bound of the form (1.1) for BHF Π is then equivalent to the bound for the bilinear Hilbert transform. Scalar-valued estimates for BHF Π (in which each X i is C, and Π is the ordinary product) date back to Lacey and Thiele [26,27], who developed a new form of time-frequency analysis, extending techniques introduced by Carleson and Fefferman [9,20]. The same technique, with minor modifications, also handles the case in which X 1 = X 2 = H is a separable Hilbert space with inner product ·; · , X 3 = C, and Π(x, y, λ) := λ x; y ; orthogonality is the essential feature of the argument rather than finite-dimensionality. Many developments in scalar-valued time-freqency analysis followed, but since we are primarily interested in the vectorvalued theory we simply direct the reader to [38] and the references therein as a starting point.
The first vector-valued estimates for BHF Π beyond Hilbert spaces are due to Silva [37,Theorem 1.7], who studied the case X 1 = R , X 2 = ∞ , X 3 = R with R ∈ (4/3, 4), where Π is the natural product-sum trilinear form. This result shows that the Banach spaces X i need not all be UMD (as ∞ is not). Benea and Muscalu [3,4] proved estimates for BHF Π for mixed-norm spaces (including L ∞ , and also including quasi-Banach spaces), by a new 'helicoidal' method. 1 Lorist and Nieraeth [29] proved multilinear vector-valued Rubio de Francia-type extrapolation theorems, which allowed them to deduce bounds for BHF Π for various Banach function spaces (including non-UMD spaces, thanks to Nieraeth's extension [32]) from the weighted scalar-valued bounds by Culiuc, Di Plinio, and Ou [11] and Cruz-Uribe and Martell [10].
In all of the work mentioned above, the Banach spaces X i are Banach lattices, i.e. Banach spaces equipped with a partial order compatible with the norm (see for example [28]). However, many important Banach spaces, including Sobolev spaces and non-commutative L p -spaces, are not Banach lattices. The main interest of Theorem 1.1 is that it makes no use of lattice structure. The only previouslyknown results in the non-lattice setting are for discrete models of BHT Π , namely the quartile and tritile operators; these were established by Hytönen, Lacey, and Parissis [25] (recently we proved the same bounds using the outer Lebesgue space framework [2]). Analysis of such discrete models is an important step on the way to the 'continuous' operator; there are no soft methods to 'go from discrete to continuous', but many features of the continuous case are present, and more easily understood, in the discrete setting.
The function Λ (η,y,t) ϕ is called a wave packet at the point (y, η, t) ∈ R 3 + . For f ∈ S (R; X) (where X is any Banach space) we define the wave packet embedding 1 The papers of Silva and Benea-Muscalu treat more general operators than just BHF Π , but we will not go into detail here as our focus is on BHF Π . of f with respect to ϕ at (η, y, t) ∈ R 3 + by so that E[f ][ϕ] : R 3 + → X. To allow for different choices of ϕ we consider each E[f ](η, y, t) as a linear operator from S (R) to X, i.e. as an X-valued tempered distribution. For technical reasons we replace S (R) with a finite-dimensional space Φ of Schwartz functions, and we view E[f ] as a L(Φ; X)-valued function.
We work with a modification BHF Π of BHF Π , with a simpler wave packet representation, such that L p -bounds for BHF Π are equivalent to those for BHF Π . For f i ∈ S (R; X i ) we define Since BHF Π is a nontrivial linear combination of BHF Π and the 'Hölder form', L pbounds for BHF Π are equivalent to those for BHF Π . The wave packet representation of BHF Π is as follows: there exists a Schwartz function ϕ 0 ∈ S (R) with Fourier transform supported in This integral converges absolutely as long as f i ∈ S (R; X i ). This motivates the definition of the following BHF-type 'wave packet forms' on L(Φ; X i )-valued functions: Then the preceding discussion says that where the limit is taken over any increasing sequence of compact sets covering R 3 + . Naturally, the finite dimensional space Φ is constructed so as to include ϕ 0 .
We proceed using the framework of outer Lebesgue spaces, as introduced by Do and Thiele [19] and successfully utilised in a number of papers (see for example [11,12,13,16,17,18,30,39,41,42,45]; it is possible that this list is incomplete). The idea of this method is to construct quasinorms F i L p i µ i Si on functions F i : R 3 + → L(Φ; X i ) satisfying two key properties: first, a Hölder-type inequality for arbitrarily large compact sets K ⊂ R 3 + of appropriate shape, and second, bounds for the embedding map of the form If such quasinorms can be constructed, then the chain of inequalities gives L p -bounds for BHF Π , and hence also for BHF Π . Without giving too much detail, the quasinorms L p µ S are outer Lebesgue quasinorms, defined in terms of the following data: • a collection B of subsets of R 3 + , called generating sets, • a premeasure µ on B, • a local size S on B, which gives a way of measuring the 'size' of a function on each generating set B ∈ B.
The collected data (R 3 + , B, µ, S) is called an outer space, and should be thought of as axiomatising a generalised measure-theoretic geometric structure. The goal is to construct outer spaces satisfying the estimates (1.6) and (1.7). In this article we do this, yielding outer Lebesgue quasinorms L p µi S i and their iterated variants L p ν -L q µi S i . These satisfy Hölder-type inequalities of the form (1.6) (stated more precisely in Corollary 3.15), and we have the following bounds for the embedding map (restated later as Theorems 4.1 and 5.1), from which Theorem 1.1 follows by the argument sketched above.
Then the following embedding bounds hold: • For all p ∈ (r, ∞), • For all p ∈ (1, ∞) and all q ∈ (min(p, r) (r − 1), ∞), Proving these results requires some new ideas in time-frequency analysis. Here we highlight just two. First, the local sizes S that we use are defined in terms of γ-norms (defined in Section 2), which play the role of continuous square functions for functions valued in Banach spaces. Since we work on the continuous timefrequency-scale space R 3 + rather than a discretised version, estimating γ-norms involves controlling not only the pointwise values of functions but also their derivatives. Second, to prove the Hölder type inequality (1.6) without having to deal with R-bounds of embedded functions (see the argument presented in [2, §7.2]), we consider 'defect operators' on functions F : R 3 + → L(Φ; X). These measure how far F is from being an embedded function E[f ]. Outer space theory on R 3 + is well-adapted to keeping track of these objects; a discrete argument is possible but technically less convenient. The definition of the defect operators requires that we consider functions valued in L(Φ; X) rather than X itself, so that we can exploit the relation between different choices of ϕ ∈ Φ (which turns out to be useful in other arguments too).
Our analysis roughly follows the path laid out in our previous work [2], in which we considered a discrete 'Walsh model' of the bilinear Hilbert transform, namely the tritile operator. We recommend that readers new to time-frequency analysis start by reading that article, as it contains many of the core ideas of our arguments without most of the annoying technicalities. More experienced readers would probably prefer to read both articles in parallel.
As we were completing this work we learned of the article [14] being prepared independently by Di Plinio et al., in which Theorem 1.1 is proven as a consequence of a multilinear multiplier theorem rather than embeddings into outer Lebesgue spaces. Their techniques are different to ours; in short, they reduce matters to the multilinear UMD-valued Calderón-Zygmund theory they developed in [15], while our approach can be viewed as a direct (if long) reduction to linear Calderón-Zygmund theory facilitated by the outer Lebesgue framework. We thank the authors of [14] for making their preprint available to us.
1.1. Acknowledgements. The first author was supported by a Fellowship for Postdoctoral Researchers from the Alexander von Humboldt Foundation. We thank Mark Veraar, Christoph Thiele, and Francesco Di Plinio for their encouragement and suggestions.
1.2. Notation. For Banach spaces X and Y we let L(X, Y ) denote the Banach space of bounded linear operators from X to Y , and we let L(X) := L(X, X). For p ∈ [1, ∞], we let L p (R; X) denote the Bochner space of strongly measurable functions R → X such that the function x → f (x) X is in L p (R); for technical details see [21,Chapter 1]. When B ⊂ R is a ball and f ∈ L p loc (R; X) we let denote the Hardy-Littlewood p-maximal operator; of course we write M := M 1 . We use the notation ·; · to denote the duality pairing between a Banach space X and its dual X * , as well as to denote the integral pairing between f ∈ S (R; X) and g ∈ S (R; C) or g ∈ S (R; X): The correct interpretation will always be unambiguous. For p ∈ [1, ∞] we let p denote the conjugate exponent p := p/(p − 1). We say that a triple of exponents We use the Japanese bracket notation

Preliminaries
2.1. γ-radonifying operators. Fix a Banach space X. In Banach-valued harmonic analysis, one often has to deal with Rademacher sums, i.e. quantities of the form where (x n ) N n=1 is a finite sequence of vectors in X and (ε n ) N n=1 is a sequence of independent Rademacher variables (random variables taking only the values ±1 with equal probability) on a probability space Ω. As explained in [2, §3.1], these play the role of discrete square functions for X-valued functions. In this article we use a continuous analogue of this concept, that of the γ-norm; for a longer introduction to the γ-norm see [22, Chapter 9]. Definition 2.1. Let H be a Hilbert space and X a Banach space. A linear operator where the supremum is taken over all finite orthonormal systems (h j ) k j=1 in H, and (γ j ) k j=1 is a sequence of independent standard Gaussian random variables. If T is γ-summing we define , yielding a Banach space γ ∞ (H, X). All finite rank operators H → X are γsumming, and we define γ(H, X) to be the closure of the finite rank operators in γ ∞ (H, X). The operators in γ(H, X) are called γ-radonifying.
If X does not contain a closed subspace isomorphic to c 0 , and in particular if X has finite cotype (see Section 7.1), then γ(H, X) = γ ∞ (H, X). We write γ µ (S; X) := γ(S; X) when the measure needs to be emphasised. This will often be done when considering subsets of R + , where both the Lebesgue measure dt and the Haar measure dt/t are relevant.
Elements of γ(S; X) are by definition operators from L 2 (S) to X, but we can also interpret functions f : S → X as members of γ(S; X). Recall that a function f : S → X is weakly L 2 if for all x * ∈ X * , the function f, x * : S → C belongs to L 2 (S). If f is furthermore strongly µ-measurable, then for all g ∈ L 2 (S) the product gf is Pettis integrable, and we can define a bounded operator I f : L 2 (S) → X, the Pettis integral operator with kernel f , by For a function f : S → X, we write f ∈ γ(S; X) to mean that the operator I f : L 2 (S) → X is in γ(S; X) = γ(L 2 (S), X), and we write f γ(S;X) := I f γ(S;X) .
Using this identification we can think of γ(S; X) as a space of X-valued generalised functions on S, which behaves similarly to the Bochner space L 2 (S; X). When X is a Hilbert space this analogy is perfect; see [22, Proposition 9.2.9] for the proof. In general there is no comparison between γ(S; X) and L 2 (S; X) unless X has type 2 or cotype 2 (see Section 7.1).
Remark 2.5. The γ(S; X) norm can be considered as the continuous analogue of a Rademacher sum, with S in place of a discrete indexing set. In fact f γ(S;X) can be seen as the expectation of the norm of a stochastic integral of f (see Section 7.2).
Pushing the function space analogy further, there is a Hölder-type inequality for γ-norms [22, Theorem 9.2.14(1)]: Proposition 2.6. Let (S, A, µ) be a measure space and X a Banach space. Suppose f : S → X and g : S → X * are in γ(S; X) and γ(S; X * ) respectively. Then We will use the following form of the dominated convergence theorem for γnorms, proved in [22, Corollary 9.4.3].
Proposition 2.7. Let (S, A, µ) be a measure space and X a Banach space. Consider a sequence of functions f n : S → X, n ∈ N, and a function f : S → X such that lim n→∞ f n ; x * = f ; x * in L 2 (S) for all x * ∈ X * . Suppose furthermore that there exists a function F ∈ γ(S; X) such that for all n ≥ 1 and x * ∈ X * . Then each f n is in γ(S; X), and f n → f in γ(S; X).
It is usually difficult to estimate γ-norms directly, but various embeddings of Sobolev and Hölder spaces into γ(R; X) can be used (see [22,Section 9.7]). The only embedding we need requires no assumptions on X and has a relatively simple proof, which we include for completeness (see [22, Proposition 9.7.1]).
Proof. First note that for all t ∈ (0, 1), For all t ∈ (0, 1) we have by the fundamental theorem of calculus substituting this into the estimate above and integrating over s ∈ (0, 1) completes the proof.
The result above is adapted to the Lebesgue measure; for the Haar measure dt/t we have the following analogue. Corollary 2.9. Let 0 ≤ a < b ≤ ∞ and f ∈ C 1 ((a, b); X). If f and tf (t) are in L 1 dt/t ((a, b); X), then f ∈ γ dt/t ((a, b); X) and Proof. By changing variables it suffices to prove the result for (a, b) = (0, 1). Changing variables again, using the triangle inequality, and then using Proposition 2.8, This completes the proof.
Proof. Consider the function F : (0, 1) → X given by Then by Corollary 2.9, We have using that |ζ| < 1 and |z| > 2. Next, we have Thus we can estimate dz again using that t < 1 and z > 2 in the last line. Putting this together, as required.

R-bounds.
Definition 2.11. Let X and Y be Banach spaces, and let T ⊂ L(X, Y ) be a set of operators. We say that T is R-bounded if there exists a constant C < ∞ such that for all finite sequences (T n ) N n=1 in T and (x n ) N n=1 in X, The infimum of all possible C in this estimate is called the R-bound of T , and denoted by R(T ).
R-boundedness arises as a sufficient (and often necessary) condition in various operator-valued multiplier theorems. See [22, Theorem 9.5.1 and Remark 9.5.8] for the following theorem.
Theorem 2.12 (γ-multiplier theorem). Let X and Y be Banach spaces with finite cotype, and (S, A, µ) a measure space. Let A : S → L(X, Y ) be such that for all x ∈ X the Y -valued function s → A(s)(x) is strongly µ-measurable, and that the range A(S) is R-bounded. Then for every function f : S → X in γ(S; X), the function Af : S → Y is in γ(S; Y ), and Af γ(S;Y ) R(A(S)) f γ(S;X) .
One useful consequence is a contraction principle for γ-norms: since the family of scalar operators S := {cI : c ∈ C, |c| ≤ 1} ⊂ L(X) is R-bounded with R(S) 1 by Kahane's contraction principle, we have for all a ∈ L ∞ (X). This is particularly useful applied to characteristic functions, as it yields the quasi-monotonicity property whenever S ⊂ S. Another corollary is a trilinear Hölder-type inequality for γnorms: Corollary 2.13. Let X 1 , X 2 , and X 3 be Banach spaces with finite cotype. Let Π : X 1 × X 2 × X 3 → C be a bounded trilinear form, and let (S, A, µ) be a measure space. Suppose that f i : S → X i for each i ∈ {1, 2, 3}. Then for each i, where R Π (f i (S)) := R(ι i Π (f i (S))), with ι i Π : X i → L(X j , X * k ) the natural map induced by Π, with {i, j, k} = {1, 2, 3}.

Proof. Write
ˆS using the γ-Hölder inequality (Proposition 2.6) in the second line, and the γmultiplier theorem (Theorem 2.12) in the last line.
Given a uniformly bounded set of operators, one has to exploit additional structure of the set to establish R-boundedness. For example, the R-bound of the range of an operator-valued function with integrable derivative can be estimated analogously to Proposition 2.8 for γ-norms. The following proposition is a special case of [22, Proposition 8.5.7].
Proposition 2.14. Let X and Y be Banach spaces, and let −∞ < a < b < ∞.
where the limit is in the strong operator topology.
As a consequence we obtain R-boundedness for the set of running averages of a bounded operator-valued function.
Corollary 2.15. Let the Banach spaces X, Y and the interval (a, b) be as in Proposition 2.14. Let F ∈ L ∞ ((a, b); L(X, Y )), and define We will need one more technical result, which gives an R-bound for the set of maps X → γ(H * ; X) given by tensoring with elements of H [22, Theorem 9.6.13].
Theorem 2.16. Let X be a Banach space with finite cotype and H a Hilbert space. For each h ∈ H define the operator T h : X → γ(H * , X) by Then the set of operators We apply this to prove a continuous Littlewood-Paley estimate for UMD spaces, highlighting the use of both γ-norms and R-bounds. 2 First we recall one form of the operator-valued Mihlin multiplier theorem, proven in [22, Corollary 8.3.11].
Theorem 2.17. Let X and Y be UMD spaces, and p ∈ (1, ∞). Consider a symbol Then the Fourier multiplier T m with symbol m is bounded from L p (R; X) to L p (R; Y ) with norm controlled by R m . Theorem 2.18 (Continuous Littlewood-Paley square function). Let X be a UMD space and p ∈ (1, ∞). Fix a Schwartz function ψ ∈ S (R) with mean zero. Then for all f ∈ L p (R; X), Proof. Let A ψ denote the operator sending X-valued functions on R to X-valued functions on R × R + , defined by where ρ is a Schwartz function satisfying the same assumptions as ψ and Tρ (t·) is the Fourier multiplier with symbol ξ →ρ(tξ). This operator can be seen as a Fourier multiplier with symbol m : R → L(X; γ dt/t (R + ; X)) defined by The derivative m of this symbol is given by Thus for each ξ ∈ R and x ∈ X we have By Lemma 2.16, since UMD spaces have finite cotype, controlling the L 2 -norms by multiplicative invariance of the Haar measure and the fact thatρ and its derivative are both Schwartz and vanish near the origin. By the operator-valued Mihlin theorem (2.17), this proves boundedness of A ψ from L p (R; X) to L p (R; γ dt/t (R + ; X)), completing the proof.

2.3.
Outer Lebesgue spaces. In this section we give a brief overview of the definition and basic properties of abstract outer spaces and the associated outer Lebesgue quasinorms, which were introduced in [19]. For a topological space X we let B(X) denote the σ-algebra of Borel sets in X, and for a Banach space X we let B(X; X) denote the set of strongly Borel measurable functions X → X.
Definition 2.19. Let X be a topological space.
• A σ-generating collection on X is a subset B ⊂ B(X) such that X can be written as a union of countably many elements of B. We write • An (X-valued) outer space is a tuple (X, B, µ, S) consisting of a topological space X, a σ-generating collection B on X, a local measure µ, and an Xvalued local size S, all as above. We often do not make reference to the Banach space X.
Consider an outer space (X, B, µ, S). We extend µ to an outer measure on X via countable covers: for all E ⊂ X, We abuse notation and write µ for both the local measure and the corresponding outer measure. We define the outer size (or outer supremum) of F ∈ B(X; X) by We say that two local sizes for all F ∈ B(X; X). The conjunction of the notions of outer measure and outer size allows us to define the outer super-level measure of a function F ∈ B(X; X) as This quantity need not be the measure of any specific set; instead, it is an intermediate quantity between the outer measure µ and the outer size S. For any F ∈ B(X; X) we define Definition 2.20. Let (X, B, µ, S) be an X-valued outer space. We define the outer Lebesgue quasinorms of a function F ∈ B(X; X), and their weak variants, by setting It is straightforward to check that these are indeed quasinorms (modulo functions F with µ(spt(F )) = 0). Some particularly useful outer spaces are constructed by using an outer Lebesgue quasinorm itself to define a local size. This construction results in iterated outer spaces.
Definition 2.21. Let (X, B, µ, S) be an outer space. Let B be a σ-generating collection on X, and let ν be a local measure on B . Then for all q ∈ (0, ∞) define the iterated local size -L q µ S on B (which depends on ν) by It is straightforward to check that -L q µ S is a local size on B . Thus (X, B , ν, -L q µ S) is an iterated outer space.
We will use a few key properties of outer Lebesgue quasinorms. First, the following Radon-Nikodym-type domination result lets us compare classical Lebesgue integrals with outer Lebesgue quasinorms. The proof is a straightforward modification of [ Proposition 2.22. Let (X, B, µ, S) be an outer space such that the outer measure generated by µ is σ-finite. Let m be a positive Borel measure on X such that and Then we have ˆX The previous proposition is usually followed by the following 'outer Hölder inequality'. A slightly weaker version has been proven before; our version supports multiple outer spaces.
be an M -linear map, and suppose that for all i ∈ {1, . . . , M } and F i ∈ B(X i ; X i ), the outer sizes S i and outer measures µ i satisfy the bounds Then for all p i ∈ (0, ∞] we have Assume that the factors on the right hand side of (2.9) are finite and nonzero, for otherwise there is nothing to prove. By homogeneity we may assume that F i L p i µ i Si = 1 for each i ∈ {1, . . . , M }. For each such i and all n ∈ Z, let A n i ⊂ X be such that We may assume that A n i ⊂ A n−1 i by considering A n i = k≥n A k i and noticing that A n i satisfies the conditions above. Let .
By assumption we have that i , while for any = (<, <, . . . , <, <) it holds that which concludes the proof.
After using an outer Hölder inequality, one typically needs to estimate outer Lebesgue quasinorms. This can be done by interpolation, using either or both of the following two results. The first is proven in [19,Proposition 3.3], and the second in [19,Proposition 3.5] (see also [16,Proposition 7.4]).

Analysis on the time-frequency-scale space
By the time-frequency-scale space we mean R 3 + , whose points parametrise the operators Λ (η,y,t) = Tr y Mod η Dil t representing the fundamental symmetries of BHF Π . It is natural to think of R 3 + as a metric space, equipped with the pushforward of the Euclidean metric on R 3 by the map (x, y, τ ) → (e τ x, e −τ y, e τ ). This metric does not play an important role in our analysis, but it is worth keeping in mind. For (ξ, x, s) ∈ R 3 + we define mutually inverse local coordinate maps π (ξ,x,s) , π −1 (ξ,x,s) , both mapping R 3 + to itself, by With a view towards applications to the bilinear Hilbert transform, we fix a small parameter b > 0 and an bounded open interval Θ (a frequency band) with B 2b (0) ⊂ Θ. The constructions below depend on both of these choices. In applications we will need multiple choices of Θ, so we sometimes reference it in the notation, but only when the particular choice of Θ is important. We will only ever need one choice of b (and b = 2 −4 will do), so we will always suppress it.
For a tree T = T (ξ,x,s),Θ we use the shorthand π T := π (ξ,x,s) and (ξ T , x T , s T ) = (ξ, x, s). We define the inner and outer parts of T by and we denote the family of all trees by T Θ . The set T Θ is a σ-generating collection on R 3 + , and we define a local measure µ Θ on T Θ by See Figure 1 for a sketch of the model tree T Θ , and Figure 2 for how two trees look in local coordinates with respect to one of them.
Top: the model tree T Θ in the time-scale and frequency-scale planes. Bottom: the tree T (ξ T ,x T ,s T ),Θ in the η = ξ T and y = x T planes. Figure 2. Left: Two trees in the frequency-scale plane. Right: The same two trees, viewed in local coordinates with respect to T , in the frequency-scale plane.
A tree T represents a region of time-frequency-scale space in which frequency is localised around ξ T (with precision measured by the rescaled frequency band s −1 T Θ), time (or space, depending on the interpretation of the variable x) is approximately localised to B(x T , s T ), and the maximum scale is s T . Time-frequency analysis restricted to a single tree essentially corresponds to Calderón-Zygmund theory; handling the contributions of multiple trees is the main difficulty.
and define the strip with top (x, s) ∈ R 2 + to be the set We let (x D , s D ) := (x, s), and we denote the family of all strips by D. Of course D is a σ-generating collection on R 3 + , and we define a local measure ν on D by A strip D represents a region of R 3 + in which time is localised to B(x D , s D ) and the maximum scale is s D ; there is no frequency restriction. Note that we have the expression so in particular each strip can be written as a countable union of trees, and it follows that D ∪ ⊂ T ∪ Θ .

3.2.
Wave packets and embeddings. Let X be a Banach space. As discussed in the introduction, we will consider not only X-valued functions but also L(Φ; X)valued functions, where Φ is a finite-dimensional space of testing wave packets. This space is constructed in terms of a "mother" wave packet ϕ 0 , which in applications will be a function as in the wave packet representation (1.5) of BHF Π .
where · Φ N is an arbitrary norm on the finite dimensional space Φ N . Generally we fix a large parameter N ∈ N, and whenever possible we write Φ := Φ N and Φ 1 := Φ N 1 . Having defined the testing wave packet space Φ, we view the embedding (1.4) as a map E : S (R; X) → L 1 loc (R 3 + ; L(Φ; X)).
This is analogous to the situation in [2], where we used X 3 -valued functions to represent wave packet coefficients corresponding to the three constituent tiles of each tritile. Now consider another Banach space Y (not necessarily X or L(Φ; X)). Given a tree T ∈ T Θ and a function F ∈ B(R 3 + ; Y ), we can look at F in the local coordinates with respect to T . The way we do this is modelled on the behaviour of embedded functions under change of coordinates. Given T ∈ T Θ and f ∈ S (R; X), notice that for all ϕ ∈ S (R). With this relation in mind, we make the following definition. .
, and 2πizϕ(z) are in Φ N . Thus differentiation of embedded functions corresponds to changing the wave packet, in such a way that the new wave packet is still in Φ, and if N is sufficiently large then these operations can be carried out multiple times. The identities (3.9) need not hold for general functions in B(R 3 + ; L(Φ; X)), so we use the right hand sides as a new definition.
for all F ∈ B(R 3 + ; L(Φ N ; X)) and ϕ ∈ Φ N . Thus for f ∈ S (R; X) we can write the equations (3.9) as T , (∂ σ − d σ )π * T , and (∂ θ − d θ )π * T quantify how much F differs from an embedded function on the tree T .

3.3.
Local sizes on trees. Given a Banach space X, we define three classes of X-valued and L(Φ; X)-valued local sizes on T Θ . The first class is the same as that used in scalar-valued time-frequency analysis.
with the usual modification when p = ∞. For G ∈ B(R 3 + ; X) we abuse notation and write These local sizes have "inner" and "outer" variants given by . The scalar-valued Lebesgue local sizes satisfy the following local size-Hölder inequality, which has a straightforward proof.
are two Hölder triples of exponents, with p i , q i ∈ (0, ∞]. Then for any T ∈ T Θ and F 1 , F 2 , F 3 ∈ B(R 3 + ; C), The next local size uses the γ-norm defined in Section 2.1. A discrete version of this local size, with Rademacher sums in place of γ-norms, was used in [2]. Definition 3.9 (γ local size). We define the local size R Θ for F ∈ B(R 3 + ; L(Φ; X)) and T ∈ T Θ by When X is isomorphic to a Hilbert space we have F RΘ(T ) F L 2 Θ,out (T ) by Proposition 2.4. In general, unless X has type 2 or cotype 2 (see Section 7.1), there is no comparison between these two local sizes.
The final class of local sizes measures how far a function F : R 3 + → L(Φ; X) differs from an embedded function E[f ]. This local size is also exploited in the scalar-valued theory in [43]; a discrete version was used in [2]. Definition 3.10 (Defect local sizes). For N ≥ 2, the σ-defect and ζ-defect local sizes W σ Θ and W ζ Θ are defined for F ∈ L 1 loc (R 3 + ; L(Φ N ; X)) and T ∈ T Θ by (3.14) thus the defect sizes behave like L 1 in scale and L ∞ in time and frequency. The derivatives ∂ σ and ∂ ζ appearing above are taken in the distributional sense and the integral is an abuse of notation for the pairing of X-valued distributions with X *valued test functions. For F ∈ B(R 3 + ; L(Φ N ; X)) which is not locally integrable, we define F W σ Θ (T ) = F W ζ Θ (T ) = ∞. Remark 3.11. We do not quantify the defect in the frequency variable θ, as we never actually need to.
The 'defect local sizes' are not actually local sizes: they fail global positivedefiniteness, as they vanish on every embedded function E[f ]. However, the 'complete' local size S Θ defined below is an actual local size.
. Remark 3.13. The outer space structures introduced on R 3 + are invariant under translation, modulation, and dilation symmetries, in the sense that for any (η 0 , y 0 , t 0 ) ∈ R 3 + and E ⊂ R 3 + , and similarly for ν. Furthermore it holds that The local sizes defined above possess analogous invariance properties.

3.4.
Local size-Hölder and the BHWF. Fix b = 2 −4 (anything sufficiently small will do) and ϕ 0 ∈ S (R) with Fourier support in B b (0). Then ϕ 0 is used to build the the finite-dimensional wave packet spaces Φ N and Φ N 1 for all N ∈ N, as in Definition 3.3; from here on we fix N large and try not to refer to it.
Theorem 3.14. Fix Banach spaces X 1 , X 2 , X 3 with finite cotype and a bounded trilinear form Π : X 1 × X 2 × X 3 → C. Then for any T ∈ T Θ , any A ∈ T ∪ , and so that I Θ,K + L ∞ is a local size. Furthermore it holds that Proof. From the boundedness of Π it follows immediately that It remains to control the I Θ part. By definition we havê (3.19) follows trivially by noticing that µ(V ± ) = µ i (V ±,i ).
Multiplying by 1 = e 2πi(α1+α2+α3)ξ T (x T +s T ζ) , we have where we write θ i := α i θ + β i and T i = T (αiξ T ,x T ,s T ),Θi to save space. It suffices to bound each summand individually; we concentrate only on the case j = 1, as the others are treated in the same way. We may assume that the functions F i are compactly supported, as the expression above depends only on the values of F i on T i and so that our claim reduces to showing (3.20) Let us argue that each F i can be assumed to be smooth. Fix a non-zero nonnegative bump function χ ∈ C ∞ c (B 1 (0)) with´χ = 1, and let where χ ε = Dil ε χ. Then the functions F ε i are smooth and compactly supported, and by dominated convergence we have ˆT For ε > 0 sufficiently small and i ∈ {2, 3}, F ε i (θ, ζ, σ) vanishes for θ ∈ B 2b (0). Furthermore, by the γ-dominated convergence theorem (Proposition 2.7), for ε > 0 sufficiently small we have Clearly we also have and by the definition of the defect local sizes in (3.14) it holds that for all i ∈ {1, 2, 3}, and likewise for the ζ-defect term. Putting all this together, we see that without loss of generality we may assume that each F i is smooth. We proceed towards the claimed bound (3.20). It holds that To estimate |B 1 |, fix θ ∈ Θ \ Θ 1,ov and ζ ∈ B 2 (0) and write ˆ1 We have by the γ-Hölder inequality (Proposition 2.6), exploiting that the first factor is independent of σ,

Now we deal with the term
Let ϕ 1 (z) = zϕ 0 (z), so that ϕ 1 ∈ Φ and This splitting decomposes M 1 into the sum of two terms, M 1 = B 2 +M 2 , as before. And indeed, exactly as before, we have For the term we integrate by parts in ζ: for θ ∈ Θ \ Θ 1,ov , the negative of the ζ-integral iŝ there are no boundary terms, as the integrand is compactly supported. Both of these terms are treated in the same way, so we will only do the first one. Write which decomposes the corresponding summand of M 2 into two parts, B 3 and M 3 . The integrand of B 3 (with θ ∈ Θ \ Θ 1,ov and ζ ∈ B 2 (0) fixed) is controlled bŷ which leads to the bound using Corollary 2.13 to pull out an R-bound (this uses finite cotype) and Corollary 2.15 to control the R-bound. This ultimately yields which completes the proof.
This local size-Hölder inequality has the following consequence for BHWF Π . The proof is a straightforward combination of Theorem 3.14 with Proposition 2.22; see [2, Corollary 4.13] for the argument for the Walsh model. Corollary 3.15. Fix Banach spaces X 1 , X 2 , X 3 with finite cotype and a bounded trilinear form Π : be two Hölder triples of exponents, with p i , q i ∈ (0, ∞]. Then for all F i ∈ B(R 3 + ; L(Φ; X i )) and all compact where µ i := µ Θi and the implicit constant is independent of K.

3.4.1.
Technical remarks on Theorem 3.14. Note that we prove the estimate (3.21) rather than the stronger estimate in which the absolute value is inside the integral. This is because (3.22) would imply the estimate for all a ∈ L ∞ (R 3 + ), but such a multiplier theorem should not be true without additional regularity assumptions on a. In [2, Proposition 4.12] we prove the discrete analogue of (3.23), which basically models the situation where the multiplier a(η, y, t) satisfies |t∂ t a(η, y, t)| 1. It should be possible to prove (3.23) with regularity assumptions on a; this would lead to multilinear multiplier theorems along the lines of those proven by Muscalu, Tao, and Thiele [31] (such a result is proven in [14]). It should also be possible to handle more than three factors. These extensions are beyond the scope of this article.

Size domination. By Corollary 3.15 and the fact that
we see that we need to prove bounds of the form (here we do not make reference to the frequency band Θ, as the precise choice is no longer relevant). The local size S is defined in such a way as to make the size-Hölder inequality work; this is why we define it as the sum of four local sizes (Lebesgue, γ, and the two defect local sizes). However, when applied to embedded functions, it turns out that the Lebesgue and defect local sizes are controlled by the γ local size, so in proving (3.24) this is the only local size we need to consider. In this section we prove these statements.
Lemma 3.16. Let X be a Banach space and F ∈ C 1 (R 3 + ; L(Φ; X)). Then for all trees T ∈ T and all A, B ∈ T ∪ , . Proof. We only prove (3.25), as (3.26) has the same proof. By definition we have and g ∈ C ∞ c (R 3 + ; X * ) normalised in L 1 (R 2 ; L ∞ (R + ; X * )) we need to control where F := F • π T . By the product rule we have Since A and B are countable unions of trees, there exist two Lipschitz functions τ − , τ + : ζ)) where δ is the usual Dirac delta distribution on R (see Figure 3 for a sketch of the case A = T ). Substituting this into (3.28), we estimate (3.27) by the sum of three terms. The first two of these terms correspond to the choice of sign in (3.29): they are estimated by The following corollary comes from the fact that the defect local sizes vanish on embedded functions.
Next we control the Lebesgue local size L ∞ on embedded functions by the γ local size R. The proof is done by controlling pointwise values of E[f ] by γ-norms over an appropriate region; this uses a Sobolev embedding argument and the wave packet differential equations (3.11) for embedded functions.
In the proof below, note that we implicitly use both the L(Φ N ; X)-valued and L(Φ N −6 ; X)-valued embeddings. Taking N large, as we do by convention, means that this is not a problem.

.12, Part II]) to the scalar-valued function
.
using (3.9) and the definition of the wave packet differentials. By the γ-Hölder inequality and multiplier theorem (Proposition 2.6 and Theorem 2.12, which needs finite cotype) we have Taking the supremum over x * ∈ X * yields (3.30).
Ω Figure 4. The set Ω used in the proof of Lemma 3.18 Remark 3.19. In [2] we had a much stronger size domination result: there we could bound the discrete versions of both the γ and defect local sizes by the discrete version of L ∞ . We do not know whether we can prove such a strong result in the continuous setting; in particular, we do not know if the estimate If it does, then the rest of our arguments can be significantly simplified, as outer Lebesgue quasinorms with respect to L ∞ are easier to control than those with respect to R.

Embeddings into non-iterated outer Lebesgue spaces
In this section we fix a parameter b > 0 and a bounded interval Θ containing B 2b (0). We also fix a mother wave packet ϕ 0 with Fourier support in B b (0), and we use it to define the finite-dimensional wave packet space Φ = Φ N with some large irrelevant parameter N . All estimates implicitly depend on these choices, and we will not refer to Θ or N in the notation.
To prove Theorem 1.1, we need to prove embedding bounds of the form (3.24), into iterated outer Lebesgue spaces. Actually, it would be enough to prove bounds into non-iterated outer Lebesgue spaces; the problem is that we can only prove these bounds for p > r when X is r-intermediate UMD. This is good enough to prove Theorem 1.1 with the additional restriction that p i > r i for each i, but it is not enough for the full range. With additional work (done in Section 5) the estimates (4.1) for p > r can be 'localised' to prove the full range of estimates (3.24) that we need.
By Marcinkiewicz interpolation for outer Lebesgue spaces, Theorem 4.1 follows from the following endpoint bounds.
∀f ∈ S (R; X). , Theorem 4.1 is essentially already known in the case r = 2, i.e. the case where X is a Hilbert space. We state this as a separate theorem.
Theorem 4.4. If H is a Hilbert space, then for all p ∈ (2, ∞], Taking into account minor notational differences, the statement was proved in [19] for the case H = C. The proof of Theorem 4.4 follows from the same argument, using that S is equivalent to L 1 + L 2 out + W σ + W ζ for Hilbert spaces, and using Corollary 3.17 to handle the contribution from the defect local sizes. We use this theorem in the proof of Proposition 4.3; we do not reprove it.
We begin by proving the L ∞ endpoint, which amounts to estimates on a single tree.
Proof of Proposition 4.2. By Corollary 3.17 and Lemma 3.18, it suffices to control the contribution from R. Let f ∈ S (R; X) and T ∈ T. By modulation, translation, and dilation invariance, we may assume that T = T (0,0,1) (see Remark 3.13).
Decompose f into a local part and a tail part, f = f loc +f tail , where f loc = f 1 B2(0) .
For the local part we have (0) and ϕ has Fourier support in B b (0), ψ has mean zero. Thus by Theorem 2.18 we have using that f loc is supported on B 2 (0). For the tail part, having fixed θ and ϕ as before, we are faced with estimatinĝ By Lemma 2.10 we have for each ζ ∈ B 1 (0), and integrating in ζ and θ completes the estimate. We find that completing the proof.

Tree orthogonality for intermediate spaces.
In time-frequency analysis one has the following heuristic: if ϕ ∈ S (R), and we consider a sequence of points (η i , y i , t i ) ∈ R 3 + that are sufficiently separated, then the wave packets Λ (ηi,yi,ti) ϕ are essentially orthogonal. If f ∈ S (R; H) takes values in a Hilbert space, this essential orthogonality can be exploited to control weighted 2 -sums of the coefficients f, Λ (ηi,yi,ti) ϕ by f L 2 (R;H) . More generally, one can control 2 -sums of square functions over disjoint regions E i of trees T i . This is one of the main techniques used in the proof of Theorem 4.4.
This orthogonality is lost when working with general Banach spaces. However, by working with intermediate UMD spaces X = [H, Y ] θ , we can use some orthogonality from H to strengthen the UMD-derived estimates on Y (which hold only on a single tree, or at a single point); the result is Theorem 4.5. A technical discussion of the result and its relation with the notion of Fourier tile-type introduced by Hytönen and Lacey [24] follows the proof.
Theorem 4.5. Fix r ∈ [2, ∞) and suppose X is r-intermediate UMD. Let T be a finite collection of trees, and for each T ∈ T let E T ⊂ T be such that the sets (E T ) T ∈T are pairwise disjoint. Then for all p > r and all f ∈ S (R; X), (4.5) .
Proof. First fix a UMD space Y and a tree T . By Proposition 4.2 we have

and thus
T ∈T Now fix a Hilbert space H. By the equivalence of γ-and L 2 -norms for Hilbert spaces (Proposition 2.4), for each tree T we have Since the sets E T are disjoint, for each (η, y, t) ∈ T there is a wave packet ϕ (η,y,t) ∈ Φ such that Since Φ is finite-dimensional, without loss of generality we can take ϕ (η,y,t) = ϕ to be constant. By Proposition 3.8, using that p > 2, .
4.1.1. Technical remarks on Theorem 4.5. The estimate (4.5) and its proof are inspired by the notion of Fourier tile-type introduced by Hytönen and Lacey in [24], but there are some fundamental differences. Using the notation of [24, Definition 5.1], a Banach space X is said to have Fourier tile-type q if for every α ∈ (0, 1), whenever f ∈ L q (R; X) ∩ L ∞ (R; X) and T is a finite disjoint collection of finite trees with a certain disjointness property. They work on a discrete model space of tiles rather than R 3 + , and their trees are subsets of tiles (which correspond to subsets of our trees). The functions ϕ P here are L 2 -normalised wave packets.
The first obvious difference between (4.8) and (4.5) is that ours is an L p → L 2 estimate, while (4.8) is a range of L q ∩ L ∞ → L q estimates with an auxiliary parameter α. Hytönen and Lacey work with sub-indicator functions |f | ≤ 1 E where E ⊂ R is a bounded measurable set, so the space L q ∩ L ∞ is natural. However, our estimates must only be in terms of f L p (R:X) , so (4.8) does not directly help us. A second difference is the form of the left-hand-sides of the two estimates. The functions being measured in (4.8) are the 'tree projections' which can be thought of as derived from E[f ]. On the other hand, (4.5) measures Another difference is in the method of proof. Like Hytönen and Lacey, we argue by interpolation, based on an estimate for Hilbert spaces. Their fundamental Hilbert space estimate [24, Proposition 6.1] takes the form its proof is based on the orthogonality heuristic mentioned before Theorem 4.5.
The wave packet coefficients f, ϕ P can be controlled since f is a sub-indicator function; we do not have this luxury. Instead, our Hilbert space estimate (4.6) is an L p → L 2 estimate that follows from the embedding bounds of Theorem 4.4. This theorem is proven by Marcinkiewicz interpolation for outer Lebesgue spaces; the weak endpoint is proven by a tree selection argument depending on the function f being analysed, and one then uses orthogonality arguments to prove the desired bounds. For the weak endpoint in the Banach case (Proposition 4.3) we also use a tree selection argument depending on f , but in place of orthogonality arguments we use (4.5). To prove (4.5) by complex interpolation, we need to see it as boundedness of a linear operator; such an operator is allowed to depend on the data T and (E T ) T ∈T , but to be well-defined and linear, it can't depend on f . By proving the fundamental Hilbert space estimate (4.6) as a consequence of Theorem 4.4 we manage to embed an f -dependent tree selection argument where it shouldn't be allowed. This illustrates the strength of Theorem 4.4. The conclusion of Theorem 4.5 could be considered as a Banach space property that all r-intermediate UMD spaces have, as is done by Hytönen and Lacey with Fourier tile-type; we will not go as far as to make such a definition here.

4.2.
Tree selection and the weak endpoint. The proof of the weak endpoint estimate (Proposition 4.3) uses a tree selection argument, which we place separately as a lemma.
Lemma 4.6. Let X be a Banach space with finite cotype, A ⊂ R 3 + a compact set, and suppose F ∈ B(R 3 + ; X) is supported in A. Then for all λ > 0 there exists a finite collection of trees T and distinguished subsets E T ⊂ T ∩ A such that for each x ∈ A there are at most two trees T ∈ T with x ∈ E T , satisfying in addition (4.9) Proof. First we replace R with an equivalent size: define Then by the γ-multiplier theorem (Theorem 2.12, which requires finite cotype) R is equivalent to R, while by definition we also have the monotonicity property for all B 1 ⊂ B 2 ⊂ R 3 + and all T ∈ T. In the rest of the proof we abuse notation and write R in place of R.
Define local sizes R + and R − by where T ± = π T (T Θ ∩ {±θ ≥ 0}). It suffices to prove the lemma with R + in place of R, with the additional restriction that the sets E T are disjoint. A symmetric proof handles R − , and we can simply add the results together. Fix a small parameter ε > 0 and define T latt = T (η,y,t),Θ : (η, y, t) ∈ R 3,latt + .
Since A is compact, A ∩ R 3,latt + is finite. Furthermore, since for any T ∈ T one needs at most 10 trees T ∈ T latt with s T ≈ s T to cover T out+ := T out ∩ T + (provided ε is sufficiently small). Given a finite collection of trees X ⊂ T latt , we say that T ∈ X is maximal if We proceed to choose trees iteratively. Start with the collection T 0 = ∅ and let A 0 := A. Suppose that at step j we have a collection T j ⊂ T latt and a subset A j ⊂ A. At step j + 1 let if X j+1 is empty, terminate the iteration. Otherwise choose a maximal element T j+1 of X j+1 ; this can be done since A j ⊂ A and thus X j+1 is finite.
This process terminates after finitely many steps and yields a collection of trees T and pairwise disjoint distinguished subsets E T ⊂ T ∈ T . The first two conditions of (4.9) hold by construction. Pairwise disjointness of the sets (E T ) T ∈T guarantees that it remains to show that (4.10) for any tree T ∈ T latt . Without loss of generality we may suppose T = T (0,0,1) . Fix ζ ∈ B 1 (0) and θ ∈ B 2b (0) and suppose that the integrand above doesn't vanish identically in σ. Let Recall that we are implicitly working with respect to a frequency band Θ, which we write as Θ = (θ * − , θ * + ). We claim that This would allow us to conclude that which would prove (4.10). To prove the claimed lower bound (4.11), argue by contradiction and suppose τ − (θ, ζ) < 1 2 2b−θ θ * + −θ τ + (θ, ζ), so that we can choose a σ such that Fix a tree T 0 ∈ T such that E T0 intersects arbitrarily small neighbourhoods of (θ, ζ, τ + (θ, ζ)) (see Figure 5). Then and, in particular, we have that (θ, ζ, σ) / ∈ π −1 T (T out+ 0 ). Notice that (θ, ζ, σ) ∈ π −1 T (T 0 ) and thus (θ, ζ, σ) / ∈ π −1 T (E Tα ) for any T α ∈ T selected after T 0 in the selection procedure. On the other hand suppose that T β ∈ T was selected before . It would then hold that by definition of σ, which contradicts the maximality condition on the construction since T β was selected before T 0 : by monotonicity of R (recall that we redefined this size to force this monotonicity) T 0 could have been selected earlier, but it was not.
Thus we cannot have (4.12), since the point (θ, ζ, σ) can belong neither to E T0 nor to any E T for any T ∈ T selected before or after T 0 . The proof is complete.
) Figure 5. The trees T and T 0 appearing in the proof of Lemma 4.6.
Proof of Proposition 4.3. As with the L ∞ endpoint, by Corollary 3.17 and Lemma 3.18, it suffices to control the contribution from R. We need to show for all λ > 0 that there exists a set A λ ⊂ R 3 + such that Since f ∈ S (R; X) there exists a compact set K ⊂ R 3 + such that (4.14) Consider the collection T λ of trees and at-most-twice overlapping distinguished subsets E T ⊂ T for each T ∈ T λ given by Lemma 4.6 applied to 1 K E[f ]. Splitting T λ into two subsets, we may assume that the subsets E T are pairwise disjoint.
and combining this with (4.14) yields the second condition of (4.13). To prove the first condition, write using Theorem 4.5 (valid as X is r-intermediate UMD and p > r). Since we have that as required.

Embeddings into iterated outer Lebesgue spaces
As in the previous section, we fix a parameter b > 0, a bounded interval Θ containing B 2b (0), and a mother wave packet ϕ 0 with Fourier support in B b (0); these choices define Φ = Φ N , and as always, we fix a large irrelevant integer N . Everything below depends on these choices, and we do not mention them in the notation.
In this section we prove the main technical result of the article, which eventually leads to our bounds on the bilinear Hilbert transform.
Theorem 5.1. Fix r ∈ [2, ∞) and let X be an r-intermediate UMD space. Then for all p ∈ (1, ∞) and q ∈ (min(p, r) (r − 1), ∞] we have As explained at the start of Section 4, we obtain embedding bounds into iterated outer Lebesgue spaces for a larger range of p ∈ (1, ∞) than we get for the noniterated spaces. Theorem 5.1 is proven by 'localisation to strips': we use Theorem 4.1 to provide refined information on the quasinorms 1 D\W E[f ] L q µ S when D is a strip and W is a countable union of strips. This takes a considerable amount of technical work, particularly in the tail estimates. A similar argument was implicitly used to prove [2,Theorem 5.3], the discrete version of this result, but in that result the tails are not present and life is simpler.

Localisation lemmas.
We have three localisation lemmas, corresponding to the three 'endpoints' needed in the proof of Theorem 5.1.

Lemma 5.2 (First localisation lemma)
. Let X be a UMD space. Then for all W ∈ D ∪ , T ∈ T, and M ∈ N, we have dz for all f ∈ S (R; X).
Lemma 5.4 (Third localisation lemma). Let X be an r-intermediate UMD space, and suppose p ∈ (1, r] and q ∈ (p (r − 1), ∞). Then for all D ∈ D, W ∈ D ∪ , and M ∈ N, Proof of Lemma 5.2. Without loss of generality assume M is large. By homogeneity we may assume that and without loss of generality we may assume that T = T (0,0,1) (see Remark 3.13). First note that by Lemmas 3.16 and 3.18, and the fact that W ∈ D ∪ ⊂ T ∪ , we have Split f into a local part and a tail part, f = f loc + f tail with f loc = 1 B2(0) f , so that by Lemma 2.10 we have It remains to control the contribution of the local part. We may assume that´B 2(0) f (z) X dz 1, as if this does not hold, then by (5.4) we have T ⊂ W and there is nothing to prove. Fix C > 0 and write as a disjoint union of open balls, where M is the Hardy-Littlewood maximal operator (the sum is over some nameless countable indexing set). For C sufficiently large we have D (xn,3sn) ⊂ W for each n, for otherwise (5.4) is contradicted. Fix such a large C. By disjointness of the balls, we have n s n 1.
Proof of Lemma 5.3. As in the previous proof, we are free to assume that M is huge, by homogeneity we assume and without loss of generality we assume D = D (0,1) (see Remark 3.13). Decompose f into dyadic annuli, i.e. write f = ∞ k=0 f k , where f 0 = 1 B2(0) f and f k = 1 B 2 k+1 (0)\B 2 k (0) f for all k ≥ 1. By the first localisation lemma (Lemma 5.2), for each k we have where we used that (z − y)/t 2 k on the support of f k . On the other hand, for any r < r 0 < q, by Theorem 4.1 (here we use that X is r-intermediate UMD) we have Assuming D ⊂ W (for otherwise there is nothing to prove) it holds that By logarithmic convexity of the outer Lebesgue quasinorms we find By quasi-subadditivity we have for some C ≥ 1 that provided M is taken to be sufficiently large.
We advise the reader to prepare a cup of tea and get comfortable before reading the next proof.
Proof of Lemma 5.4. We will assume that spt f ⊂ B 2 L (0) for some L > 1 and show for some constant C tail > 1. This will be enough to complete the proof; to see this, and observe that for any q 1 > q we have for some C tr ≥ 1, using the first localisation lemma (Lemma 5.2) with some sufficiently large M > M . The final sum in j converges and yields the required estimate provided that M is large enough. Since q > p (r − 1) is an open condition on q, we are free to start with a slightly smaller q so that q 1 is the 'goal' exponent. Now for the proof of (5.7), explicitly tracking the dependence on L. Once more, we may assume that D = D (0,1) (see Remark 3.13) and sup (η,y,t)∈D(0,2)\WˆR for some large parameter C > 1 to be determined later. Decompose the super-level sets of M p (intersected with B 2 L (0)) as disjoint unions of open balls as follows: For convenience we write B k,n := B s k,n (x k,n ) for all k, n. Fix r + ∈ (r, q) (noting that q > r follows from q > p (r − 1) and p ≤ r, which we have assumed). Since f −1 L ∞ (R;X) 1, we get For k ∈ N we have L ∞ and support bounds The penultimate bound is a consequence of the L p → L p,∞ boundedness of M p , and the bound f p L p (R;X) 2 M L follows from (0, 0, 1) / ∈ W and Thus we have obtained the two bounds By interpolation (using that r < r + < q) we find that by our assumptions on p and q. By quasi-subadditivity we obtain as long as C is large enough. This establishes (5.7) (with normalised supremum), completing the proof.

Proof of the embedding bounds.
Proof of Theorem 5.1. First we prove the bound for p ∈ (1, ∞) and q = ∞. Fix λ > 0 and represent the super-level set as a disjoint union of balls, and then define Since M is of weak type (1, 1) we have so it remains to show that (5.13) For each T ∈ T, by Lemma 5.2 we have which establishes (5.13) and in turn (5.12) with q = ∞. Next we prove (5.12) for p > r and q > r. Fix r + ∈ (r, min(p, q)) and λ > 0, represent as a disjoint union of balls, and as before define Since M r+ is bounded on L q we have ν(K λ ) n s n λ −q f L q (R;X) , and as in the previous case we deduce the estimate λ from the second localisation lemma (Lemma 5.3). Finally can we show (5.12) for p ∈ (1, r] and q > p (r − 1). The proof is the same as in the previous two cases, except this time we use the super-level set of M p and the third localisation lemma (Lemma 5.4). The result then follows by Marcinkiewicz interpolation for outer Lebesgue spaces.
6. Applications to BHF Π 6.1. Bounds for BHF Π . We are ready to prove L p -bounds for the trilinear form BHF Π associated with a trilinear form Π : X 1 × X 2 × X 3 → C on the product of intermediate UMD spaces. As in Section 3.4, we set α = (1, 1, −2) and β = (−1, 1, 0), we fix b = 2 −4 and Θ = B 2 (0), and we define Θ i := α i Θ + β i . where K runs over all compact sets V + \ V − with V ± ∈ T ∪ Θ . Then we can estimate via Corollary 3.15. The assumptions on the Banach spaces X i and the exponents p i , q i let us apply the embedding bounds of Theorem 5.1, yielding The proof is complete. is bounded. If r i ∈ [2, ∞) for each i then each ri is r i -intermediate UMD, and if in addition 3 i=1 r i > 1, then Theorem 1.1 applies. However, bounds for BHT Π in this case have already appeared in the works of Silva [37] and Benea and Muscalu [3,4], both of which allow for a larger range of sequence spaces (including the non-UMD space ∞ ). 6.2.3. Triples of Lebesgue spaces. Consider a Hölder triple of exponents (r i ) 3 i=1 , so that the spaces L ri (R) are max(r i , r i )-intermediate UMD, and there is no smaller exponent for which this holds. Consider the trilinear form Condition (6.1) holds if and only if max(r i , r i ) < r i for some i, which is impossible, so Theorem 1.1 does not apply to these spaces. Benea and Muscalu [3,4], on the other hand, obtain bounds in this case as long as r 1 , r 2 , r 3 ∈ (1, ∞]. Our results are far from optimal when applied to Banach lattices.

6.2.4.
Triples of Schatten classes. The purpose of our results was to prove bounds for BHF Π when the spaces X i are not Banach lattices, and here we succeed. Consider for example the 'non-commutative Hölder form': for a Hölder triple (r i ) 3 i=1 satisfying 3 i=1 r i ≥ 1 as in the sequence space example above, the composition/trace map Π : C r1 × C r2 × C r3 → C, Π(F, G, H) := tr(F GH) is bounded. Here C r denotes the Schatten class of compact operators on a Hilbert space with approximation numbers in r ; see [22, Appendix D] for the definition. Each C ri is max(r i , r i )-intermediate UMD, and we obtain bounds for BHF Π exactly when we obtain them with C r replaced by the sequence space r . Naturally one can consider more general non-commutative L p spaces, as described in [34]. 7. Appendices 7.1. Type and cotype. The γ-multiplier theorem (Theorem 2.12) is used at a few key points in the article. One of its hypotheses is that the Banach spaces in question have finite cotype. For the reader who is not familiar with the notions of type and cotype of a Banach space, we briefly discuss them.
is a sequence of independent Rademacher variables. One says that X has nontrivial type if it has type p for some p > 1, and finite cotype if it has cotype q for some q < ∞.
Every Banach space has type 1 and cotype ∞; at the other extreme, a Banach space has both type 2 and cotype 2 if and only if it is isomorphic to a Hilbert space [22, Theorem 7.3.1]. Every UMD space has nontrivial type and finite cotype [21, §4.3.b]. The Lebesgue space L p (R) with p ∈ [1, ∞) has type min(p, 2) and cotype max(p, 2). However, L ∞ (R) is a special case, having neither nontrivial type nor finite cotype. Spaces with type 2 or cotype 2 are special: in these spaces, one can compare γ-norms with L 2 -norms. In fact, this property characterises spaces with either type 2 or cotype 2 [22, Theorems 9.2.10 and 9.2.11].
Theorem 7.2. Let X be an infinite-dimensional Banach space and Ω a measure space such that L 2 (Ω) is infinite-dimensional. Then L 2 (Ω; X) → γ(Ω; X) if and only if X has type 2, and γ(Ω; X) → L 2 (Ω; X) if and only if X has cotype 2. In particular, L 2 (Ω; X) = γ(Ω; X) if and only if X is isomorphic to a Hilbert space.
7.2. γ-norms as stochastic integrals. The definition of the γ-norm as a supremum of Gaussian random sums over orthonormal sets does not lend itself well to function-based intuition. It can be easier to think of γ-norms as continuous analogues of random sums; this thinking is formalised in terms of stochastic integrals. In the following discussion we follow the exposition of [23], which builds on [44] and [35]. We do not use any of the following results in this article, but we have found them useful in guiding our reasoning. Definition 7.3. Let (S, A, µ) be a σ-finite measure space, and let A 0 denote the subalgebra of A of sets of finite µ-measure. A Gaussian random measure on (S, A, µ) is a function W : A 0 → L 2 (Ω), where (Ω, Σ, P ) is some probability space, with the following properties: • For all A ∈ A 0 , the random variable W (A) is centred Gaussian with variance µ(A), • For all finite pairwise disjoint collections of sets (A k ) n k=1 in A 0 , the random variables (W (A k )) n k=1 are independent, and W (∪ n k=1 A k ) = n k=1 W (A k ).
Let X be a Banach space. If f = n k=1 a k 1 E k is a simple function with each E k ∈ A 0 and a k ∈ X then the stochastic integral of f with respect to a Gaussian random measure W is defined bŷ which allows the stochastic integral to be extended by density to all f ∈ L 2 (S; H). For general Banach spaces, the γ-norm must be used in place of L 2 . A function f is stochastically integrable with respect to W if it is weakly L 2 and there exists a random variable Φ ∈ L 1 (Ω; X) such that for all x ∈ X we have Φ, x =ˆM f, x dW P -almost surely, where the right hand side is a scalar stochastic integral. In this case we writê S f dW := Φ.
The following theorem is proven in [44].
Theorem 7.4. Let X be a real Banach space, (S, A, µ) a σ-finite measure space, and f : S → X a weakly L 2 function. Let W be a Gaussian random measure on (S, A, µ). Then f is stochastically integrable with respect to W if and only if f ∈ γ(S; X), and in this case When considering complex Banach spaces, the result above is true up to a constant (by splitting functions and their stochastic integrals into real and imaginary parts).