The dynamic $\Phi^4_3$ model comes down from infinity

We prove an a priori bound for the dynamic $\Phi^4_3$ model on the torus wich is independent of the initial condition. In particular, this bound rules out the possibility of finite time blow-up of the solution. It also gives a uniform control over solutions at large times, and thus allows to construct invariant measures via the Krylov-Bogoliubov method. It thereby provides a new dynamic construction of the Euclidean $\Phi^4_3$ field theory on finite volume. Our method is based on the local-in-time solution theory developed recently by Gubinelli, Imkeller, Perkowski and Catellier, Chouk. The argument relies entirely on deterministic PDE arguments (such as embeddings of Besov spaces and interpolation), which are combined to derive energy inequalities.


Introduction
The aim of this paper is to prove an a priori bound for the dynamic Φ 4 3 model on the torus. This model is formally given by the stochastic partial differential equation (1.1) ∂ t X = ∆X − X 3 + mX + ξ, on R + × [−1, 1] 3 , X(0, ·) = X 0 , where ξ denotes a white noise over R × [−1, 1] 3 , and m is a real parameter. Our main result, Theorem 1.1 below, implies that for every p < ∞ and ε > 0 sufficiently small, we have Here and below, for α > 0, we denote by B −α ∞ the Besov-Hölder space of negative regularity −α (see Appendix A). This bound is not only strong enough to prove the global existence of solutions for (1.1), but can also be used to construct invariant measures via the Krylov-Bogoliubov method. This last point is particularly interesting, because equation (1.1) describes the natural reversible dynamics for the Φ 4 3 quantum field theory, which is formally given by the expression The construction of this measure was a major result in the programme of constructive quantum field theory, accomplished in the late 60s and 70s [13,8,14,10,9]. Our main result yields an alternative construction through the dynamics (1.1). The construction of the dynamics (1.1) in two and three dimensions was proposed in [34], but in the more difficult three dimensional case very little progress was made before Hairer's recent breakthrough results on regularity structures; the construction of local-in-time solutions to (1.1) was one of the two principal applications of the theory presented in [23]. Hairer's work triggered a lot of activity: Catellier and Chouk [5] were able to reproduce a similar local-in-time well-posedness result based on the notion of paracontrolled distributions put forward by Gubinelli, Imkeller and Perkowski in [18]. Yet another approach to obtain solutions for short times, based on Wilsonian renormalisation group analysis, was given by Kupiainen in [30]. The analysis presented in this article is based on the paracontrolled approach of [18,5]. The emphasis is on deriving an a priori estimate that complements the local solution theory and rules out the possibility of finite time blow-up. Our method relies solely on PDE arguments, such as energy inequalities and parabolic regularity theory.
The main difficulty in dealing with (1.1) or (1.2) is the irregularity of X, which in turn stems from the roughness of the white noise term ξ. Realisations of X are distribution valued, so that there is a priori no canonical interpretation of the nonlinear terms X 3 in (1.1) and X 4 in (1.2). The construction ultimatively involves a renormalisation procedure which amounts to subtracting some infinite counter-terms. The first important observation which is used to implement this renormalisation, and which lies at the foundation of all of the local solution theories, is the subcriticality of (1.1) in three dimensions. To explain this property, let us momentarily consider this equation over R d for an arbitrary d 1. Formally rescaling the equation viâ whereξ is a space-time white noise with the same law as ξ. This suggests that for d < 4, the influence of the non-linear term should vanish as we consider smaller and smaller scales. This corresponds to the well-known fact that the Φ 4 d theory is superrenormalisable in dimension d < 4.
Based on this observation, the first step to implement the renormalisation in both the approaches using regularity structures or paracontrolled distributions is the explicit construction of several terms based on the solution of the linear stochastic heat equation 1 The renormalisation, that is, the subtraction of diverging counter-terms, is implemented at this stage. For example, the simplest stochastic objects constructed from are and , which formally play the role of " 2 " and " 3 ". These objects are constructed by considering a regularised version δ of , e.g. the solution obtained by replacing ξ by its convolution with a smoothing kernel on scale δ, and then taking the limits as δ tends to zero of for a suitable choice of diverging constant C δ . The proof of convergence of these objects makes strong use of explicit representations of the covariance of and of its Gaussianity.
In both theories, the full non-linear system (1.1) is only treated in a second step. This step is completely deterministic, with the random terms constructed in the first 1 Throughout the article, we adopt Hairer's convention to denote the terms in the expansion by trees: here the symbol should be interpreted as a graph with a single vertex at the top which corresponds to the white noise, and with a line below corresponding to a convolution with the heat kernel. This graphical notation is extremely useful to keep track of a potentially large number of explicit stochastic objects. step treated as an input. The solution X is sought in a space of distributions whose small-scale behaviour is described in detail by the explicit stochastic objects. In both theories, this is implemented by replacing the scalar field X by a vector-valued function whose components correspond to the different "levels of regularity" of X. The scalar equation (1.1) then turns into a coupled system of equations. This point is at the heart of both methods. The approaches via regularity structures and via paracontrolled distributions then differ significantly. In the regularity structures approach, a local description of the solution X in "real space" is given, whereas the paracontrolled approach uses tools from Fourier analysis. However, in both approaches, local-in-time solutions X are found by performing a Picard iteration for the system of equations interpreted in the mild sense. We stress that the renormalisation is completely treated at the level of the construction of the stochastic objects based on (1.4), and that no "infinite constants" appear in the deterministic analysis.
All three approaches mentioned above, i.e. regularity structures [23], paracontrolled distributions [18,5] and renormalisation group [30], focus on the problems arising in the analysis of (1.1) on small scales, and devise a powerful method to deal with the so-called ultra-violet divergences. However, extra ingredients are necessary to obtain information on large scales. This already becomes apparent from the fact that the "good" sign of the term −X 3 is not used in the construction of local solutions. In fact, the theories would allow for the construction of local-in-time solutions of (1.1) with the sign of the non-linear term reversed, and solutions of this modified equation are expected to blow up in finite time. Moreover, the scaling analysis above suggests that it is the non-linear term −X 3 which dominates the dynamics on large scales, so that it can no longer be treated as a perturbation.
In situations where the noise is less irregular, there are well-known tools available to obtain large scale information on non-linear equations such as (1.1). In the deterministic case ξ = 0, the non-linear term is known to have a strong damping effect, and the non-linear equation satisfies better bounds than the linearised version: for solutions of (1.1) with ξ = 0 (started with an L ∞ initial datum, say), a simple argument based on the comparison principle and the behaviour of the ODĖ x = −x 3 + mx immediately yields X(t) L ∞ t − 1 2 + 1, where the implicit constant does not depend on the initial datum. Other standard tools to extract information on the non-linear term involve testing the equation against X or powers of X. In this paper, we show how comparable arguments can be implemented in the context of the system of equations arising in the paracontrolled solution theory of (1.1).
1.1. Formal derivation of a system of equations. The obvious difficulty in developing a solution theory for (1.1) is the fact that the solution X will be a distribution, and that it is unclear how to interpret the non-linear expression −X 3 . However, as we have explained in the previous section, on small scales X is expected to "behave like" the Gaussian process ; more precisely, we expect that X − has better regularity than each of the terms separately. Moreover, the detailed knowledge of the covariance and the Gaussianity of can be used to define the "renormalised" products ( ) 2 and ( ) 3 , via (1.5). In this section, we present a formal computation in the spirit of [5] to reorganise (1.1) into a system that we are able to solve, assuming that we can define the products of the explicit stochastic terms, even if they are distributions of low regularity. For the moment, we will ignore the "infinite constants" and manipulate the equation formally, adopting the following rules: • Every term has a regularity exponent associated with it. We will say, for example, that the terms X and have regularity (− 1 2 ) − , i.e. regularity 1 2 − ε for ε arbitrarily small. All regularities are derived from the regularity of the white noise ξ, which is (− 5 2 ) − . • A function of regularity α 1 > 0 can be multiplied with a distribution of regularity α 2 < 0 if α 1 + α 2 > 0, resulting in a distribution of regularity α 2 . • Convolution with the heat kernel of ∂ t − ∆ increases the regularity by 2.
• Explicit stochastic objects can always be multiplied, irrespective of their regularity. The product of stochastic objects of regularity α 1 and α 2 has regularity min{α 1 , α 2 , α 1 + α 2 }. In Section 1.2, we will give a precise meaning to these statements and discuss in particular how the last of these rules has to be interpreted. There, we will give a rigorous link between the system we derive formally in this section and the original equation (1.1).
For illustration, we briefly show this calculation in the two-dimensional case d = 2, sketching a method introduced by Da Prato and Debussche in [7]. In dimension 2, the noise ξ has regularity (−2) − , so both X and have regularity 0 − . According to the rules above, we cannot define X 3 directly (the regularity being negative), but we can define the square and the cube of , both of which also have regularity 0 − . If we make the ansatz X = + Y , then Y solves Convolution with the heat kernel increases regularity by 2, so that we expect Y to have regularity 2 − , which in turn allows to define all the products on the right hand side. Hence, we can solve (1.6), at least locally in time. We define the solution we seek, as a replacement for (1.1), to be X := + Y . We now come back to our original problem, posed in three space dimensions. As stated above, in this case ξ has regularity (− 5 2 ) − , so that X and have regularity (− 1 2 ) − , has regularity (−1) − and has regularity (− 3 2 ) − . Therefore, the simple procedure leading to (1.6) does not suffice, as it would lead to Y being of regularity ( 1 2 ) − , which is not enough to define the products on the right-hand side of (1.6). The most irregular term we encounter in this approach, limiting the regularity of Y to ( 1 2 ) − , is the term , so we use it to define the next-order term in our expansion. We introduce , the solution of which has regularity ( 1 2 ) − , and postulate an expansion of the form (1.8) for some hopefully more regular u. Analogously to the two-dimensional case, we write the formal equation satisfied by u: where we introduced the notation where we write = > + = for concision. The idea is that v carries the same local irregularity as u, while w should have better regularity, namely ( 3 2 ) − instead of 1 − . The paraproduct in the right side of (1.10) contains the high-frequency modes of modulated by the low-frequency modes of (v + w − ). It is always well-defined and has regularity (−1) − . The paraproduct (v + w − ) > is also well-defined and has regularity (− 1 2 ) − . It remains to consider the resonant term which cannot be made sense of classically (the criterion being the same as for the product of course, that is, the sum of regularities should be strictly positive). As was pointed out above, this term should have regularity given by the sum of the regularities of each term, that is, regularity (− 1 2 ) − in our case. Since w is expected to have regularity ( 3 2 ) − , the term w = can be made sense of classically. In extension of our rules, we postulate that we can define = =: = as a distribution of regularity (− 1 2 ) − . It remains to treat the term v = . The key advantage of the decomposition using paraproducts lies in the following commutator estimates, which allow to rewrite this term using explicit graphical terms of low regularity and more regular objects involving v and w. As a first step, we denote by the solution of We also write (1.10) in the mild form The behaviour of the heat kernel suggests that the local irregularity of v is that of −3(v + w − ) < . In other words, the difference (1.14) has better regularity than v itself. (Justifying this relies on Proposition A.16 and on suitable time regularity of v, w and .) We The second of these terms is defined classically, and it only remains to control the first term. Recall that (v + w − ) < carries the high-frequency modes of , modulated by the low-frequency modes of (v + w − ). Hence, it is reasonable to expect (v + w − ) < = to have the same local irregularity as To be more precise, the domain of the commutation operator can be extended to cases for which the terms appearing in the definition are not well-defined separately (see Proposition A.9), so that is well-defined. Our renormalisation rule is thus given by To sum up, we are interested in solutions of the system where F and G are defined by with com defined by (1.16), (1.14) and (1.15). Table 1. The list of relevant diagrams, together with their regularity exponent, where ε > 0 is arbitrary.
1.2. Renormalised system. We now turn to giving a precise meaning to the discussion of the previous section. From now on, we refer to processes represented by diagrams as "the diagrams". For such a process, we understand the notion of "being of regularity α" as meaning that it belongs to C([0, ∞), B α ∞ ). This definition would have to be modified for ξ and , which only make sense as space-time distributions, but we will not refer to these any longer. We refer the reader to Appendix A for the definition and some properties of the Besov spaces B α p . These spaces are more commonly denoted by B α p,q , but since we do not make use of fine properties encoded by the second integrability index q, we will always set it equal to ∞ and drop it in the notation. For the diagram , some additional information on its time regularity will also be needed.
We now discuss briefly in which way the system (1.17) can be linked to the original equation rigorously, and in particular in which sense the products (and resonant terms) of the diagrams of low regularity should be interpreted. The diagrams entering our equations for v and w are (1.21) , , , as well as , which is defined as the solution of (1.12), that is, as a function of . These quantities, together with their regularity exponent, are summarized in Table 1.
The two remaining ambiguous terms in our formal derivation, namely ( ) 2 and , can be defined classically in terms of the more fundamental object = . For , we can set As for ( ) 2 , we only need to define = ( ) 2 . This term can be formally decomposed into 2 = < + = = , and only the first term is ill-defined. The commutator , ) is well-defined, and we can thus set In this way, the coefficients a 0 , a 1 and a 2 appearing in (1.20) can be re-expressed as Throughout the article, we will never make use of the explicit form of these coefficients, but only that they are of regularity (− 1 2 ) − . A natural approach to construct the diagrams in (1.21) is via regularisation: if ξ is replaced by a smooth approximation ξ δ , then these terms have a canonical interpretation: One can define δ as the solution to (1.4) with ξ replaced by ξ δ , δ := 2 δ , δ := 3 δ , and δ and δ as solutions of (1.12) and (1.7) with right hand sides δ and δ . Furthermore, one can then define However, these "canonical" diagrams fail to converge as the regularisation parameter δ is sent to zero. Given their low regularity, this is not surprising. Yet, the first striking fact about renormalisation is that these terms do converge in the relevant spaces if they are modified in a rather mild way. Indeed, if we set for a suitable choice of diverging constant C (1) δ , then define δ and δ as solutions of (1.12) and (1.7) with right hand sides δ and δ , and finally for another choice of diverging constant C (2) δ , then these terms converge to non-trivial limiting objects. This is shown in [5], and a very similar result is already contained in [23, Sec. 10] (see also [33] for a pedagogical presentation of these calculations). We stress once more that these results rely heavily on explicit calculations involving variances of the terms involved, which allow to capture stochastic cancellations.
The second striking fact is that the "renormalisation" of these diagrams translates into a simple transformation of the original equation. Indeed, if (v δ , w δ ) solves (1.17), with diagrams interpreted in the renormalised way, then X δ = δ − δ + v δ + w δ solves the identical equation (1.1), with ξ replaced by ξ δ but with renormalised massive term m δ := m + 3C (1) δ − 9C (2) δ . Since the solution theory for (1.17) is stable under convergence of the diagrams, we can conclude that the solution X δ to this renormalised equation does converge to a non-trivial limit, denoted by X, as δ tends to 0.
The fact that we have modified the equation we intended to solve may be discomforting at first. That this modification is the "correct" one is ultimately justified by the fact that the solutions thus defined are indeed the physically relevant ones. In particular, these solutions arise as scaling limits of models of statistical mechanics near criticality. The connexion between renormalised fields and statistical mechanics has been studied at least since the 60s (see e.g. [15,21,16] and the references therein). We showed in [31] that the Φ 4 2 model can be obtained as the scaling limit of Ising-Kac models near criticality, as anticipated in [12]. Related results were obtained for the KPZ equation, first in [2] via a Cole-Hopf transformation, and then, following [22], in a series of works including [17,11,29,20,19,27]. See also the survey articles [24,6] for a summary of the work on the Φ 4 model with regularity structures.

Main result.
Our aim is to derive an a priori bound on solutions of (1.1). We will only be concerned with the analysis of the deterministic system. Before we do so, we make a modification to the system (1.17). We give ourselves a (large) constant c 0, and consider instead the system with F and G as in (1.18) and (1.19) respectively, and with initial condition Naturally, this modification changes the definitions of v and w, but we stress that it does not change the sum v + w, and therefore the final solution X. This can easily be seen on the level of the regularised solution (v δ , w δ ) discussed in the previous section. Since (v, w) is the limit of the (v δ , w δ ), it follows that v + w itself does not depend on the choice of c. Therefore, it is ultimately enough to show the existence of a constant c for which the a priori bound holds. For the same reason, the solution X depends on v 0 and w 0 only through the sum v 0 + w 0 . We seek solutions to (1.22) in the space X defined as the set of pairs (v, w) in for which the norm 1 8 is finite. Here is our main result.
as well as Assume furthermore that the constant c in (1.22) is chosen according to We set v 0 := 0. For every w 0 ∈ B We now explain how to apply this result to the renormalized solution of (1.1). Note first that the diagrams based on the solution to (1.4) unfortunately do not satisfy uniform bounds such as, for every p < ∞, However, this problem is very simple to solve: it suffices to add a massive term to the linear equation (1.4), that is, to redefine as the solution to The addition of a massive term in the definition of the diagrams only modifies the system (1.22) very superficially, and it is elementary to verify that Theorem 1.1 also applies to this modified system. Moreover, the diagrams defined with a massive term do satisfy (1.25)-(1.26) for every p < ∞. Indeed, this is an elementary extension of the results of Catellier and Chouk [5]; see also [23,Sec. 10] and [33]. We can then apply Theorem 1.1 iteratively to construct a solution to (1.1) over [0, ∞) as follows.
We first apply Theorem 1.1 to define with c sufficiently large and v 0 = 0, . This defines X up to time 1, and ensures that for every p < ∞, We thus obtain a solution X over [0, ∞) which satisfies, for every p < ∞, This bound can then be used as the basis for a Krylov-Bogoliubov procedure for the construction of an invariant measure, see [36,Section 4] for the implementation of this argument in the case of the two-dimensional torus.
The two-dimensional analysis in [36] actually yields a stronger statement, namely the exponential convergence to equilibrium with respect to the total variation norm, uniformly over all initial data. The key ingredients are a non-linear dissipative bound akin to (1.27), complemented by the strong Feller property as well as a support theorem. The strong Feller property for (1.1) has in the meantime been established in [26] in the framework of regularity structures, and a support theorem is part of the forthcoming work [28]. We expect that the combination of our main result with these two additional ingredients will indeed imply exponential convergence to equilibrium also in the three-dimensional case.
Remark 1.2. In the simpler two-dimensional case, a comparable analysis was performed in [32]. There, we were able to push the analysis further and show global existence of solutions if the equation is posed on the full space R 2 . The full-space setting is physically more relevant, but also more difficult to analyse, because the stochastic terms lack any decay at infinity, which mandates an analysis in weighted distribution spaces. It would be interesting to investigate whether the methods of [32] can be combined with those of the present article to yield a solution theory for the dynamical Φ 4 equation in the full space R 3 . Remark 1.3. At first glance, the choice of initial datum (v 0 , w 0 ) = (0, X 0 ) may seem surprising. However, we cannot expect to obtain a strong non-linear dissipative bound for the system (1.22) uniformly over all initial data in, say B . As we are ultimately only interested in the sum X = − + v + w, this does not impose any restrictions on the level of the process X.
Remark 1.4. Convergence of lattice approximations to (1.1) was shown in [25] and [37]. This was used in [25] to implement an argument in the spirit of Bourgain's work on non-linear Schrödinger equations (see e.g. [3]) to show that for almost every initial datum with respect to the measure (1.2), solutions to (1.1) do not explode. This result relies on the analysis of the measure (1.2) performed in [4]. It can then be upgraded using the strong Feller property shown in [26] to obtain the global well-posedness for any initial datum of suitable regularity. We stress however that the spirit of this method is completely different from the method presented here. There, a priori information on the invariant measures is used to rule out finite time blow-up of solutions. Our argument on the other hand relies only on the dynamics, and yields information on the invariant measure as a result.
Remark 1.5. The notion of solution derived in [5] is closely related to (1.17), but slightly different: there, our ansatz is replaced by X = − + Φ < + Φ , and a system of equations for Φ and the remainder Φ is solved. The term Φ < in this decomposition corresponds to v up to a commutator term. Although these approaches are very similar, ours makes the equations solved by v and w more explicit.
1.4. Sketch of proof and organisation of the paper. We present a local existence and uniqueness result based on a Picard iteration in Section 2. This result is essentially contained in [5], although we use slightly different norms (see also Remark 1.5). The bulk of our argument is contained in Sections 3 to 7, and we now proceed to explain the strategy. We start by recalling the deterministic argument we aim to mimic. If X solves the deterministic PDE where · · · denotes a collection of lower order terms which is bounded, say in L ∞ , by K 1, one can simply test the equation against X 3p−3 for an even integer p to get the differential inequality In fact, an additional "good term" X 3p−3 |∇X| 2 (t) L 1 which comes from the Laplacian −∆X on the left-hand side also appears, but we can choose to ignore it. By Young's and Jensen's inequalities, the term X(t) 3p−3 L 3p−3 on the right-hand side of the inequality above can be absorbed into the term X(t) 3p L 3p , and then a simple comparison argument for ODEs yields that for every t > 0, . This bound is uniform over all initial data X 0 . A yet simpler manifestation of this phenomenon is the well-known fact that solutions of the ODEẋ = −x 3 satisfy x(t) (2t) − 1 2 uniformly over all initial data.
We aim to implement a similar testing argument for the system (1.22), which we restate here in the form where we use the suggestive convention to write for a collection of lower order terms which do not cause any particular difficulty in the analysis. One quickly realises that the testing must be performed on the level of the equation for w. First of all, it is where the "good" cubic term, which is the crucial ingredient for the testing, appears. Second, testing the equation (1.29) against v would produce a "good" term proportional to ∇v(t) 2 L 2 on the left hand side, but this term is infinite, since the best regularity exponent we can expect for v is below 1. Moreover, as already hinted at in Remark 1.3, since the damping terms (−∆ + c)v in (1.29) are linear, we cannot expect v to relax to equilibrium faster than exponentially. This motivates our choice of initial condition v 0 = 0 (although in several steps of the argument, it will be useful to estimate the behaviour of v for arbitrary initial datum v 0 ).
We proceed to test the equation for w against w 3p−3 for some large even integer p. Ideally, we would like to get a closed expression which permits to invoke an ODE comparison argument similar to the one sketched below (1.28). However, several problems present themselves. First, the equations for v and w are coupled, so we need to estimate the influence of v on w and vice versa. Second, even if we controlled all terms involving v on the right-hand side of (1.30), the testing would not lead to a closed expression: several terms involve higher order regularity information on w which is not controlled by the "good" term w 3p−2 |∇w| 2 L 1 appearing when testing the equation. These are the terms left explicit on the right-hand side of (1.30), namely the terms com 1 (v, w) = and w = . Indeed, the estimation of com 1 (v, w) = requires information on the time regularity of w, while the term w = requires to control at least 1 + 2ε derivatives of w. The quadratic term a 2 (v + w) 2 also requires some care because it calls for a control of 1 2 + 2ε derivatives of v and w, and it is quadratic rather than linear.
A bound on v is presented in Section 3. The key observation is that although the terms on the right-hand side of (1.29) contain paraproducts with , solutions are relatively easy to control, because both v and w only appear linearly. We thus use a Gronwall-type lemma to obtain several estimates on v in terms of the initial datum v 0 and w. These estimates are used in the following sections to replace all expressions involving v when manipulating the equation for w. The extra massive term −cv appearing on the left-hand side of (1.29) permits to get small constants in this argument. This feature is crucially useful in the testing argument to show that when testing (v + w) 3 against w 3p−3 , the terms involving v are dominated by the "good term" w 3p L 3p ; see Lemma 6.3. In order to address the appearance of higher regularity norms of w in the testing argument, which ultimately controls the large scale behaviour of solutions, we use parabolic regularity estimates. More precisely, the mild form of the equation is used in Sections 4 and 5 to derive bounds on δ st w := w(t) − w(s) and w(t) B γ p for some γ > 1. Both sections aim to control the "small-scale behaviour" of solutions, and thus it is natural that the "good sign" of the cubic non-linearity is not used in these sections. The bounds on v derived in Section 3 are used in these two sections to replace terms involving v by terms involving w. In the end, both sup s =t |t − s| − 1 8 δ st w and w(t) B γ p can be bounded in terms of , as well as a suitable norm for w 0 . In Section 6, the equation for w is tested against w 3p−3 . We use the bounds on v and δ st w from Sections 3 and 4 systematically to obtain a bound on t s w(r) 3p L 3p dr. In the concluding Section 7, this bound is combined with the higher regularity bound on w(t) B γ p from Section 5 to finish the proof of our main result, Theorem 1.1. We first derive a self-contained bound on quantities involving w, see Lemma 7.2. In this estimate, some norm of v appears on the right-hand side. In order to conclude by mimiking the ODE argument explained below (1.28), we rely on the assumption that v 0 = 0. This is the only place where this assumption is used. We apply the estimate from Lemma 7.2 up to the first time τ such that v(τ ) B −3ε 2p exceeds a suitable norm of w. This argument then yields the desired estimate on w(t) for all t τ . In order to remove the restriction on times to be less than τ , we use that t − 1 2 is integrable and Theorem 3.1 to get a bound on v(τ ) B −3ε 2p , and thus deduce that suitable norms of w(τ ) must be small (irrespectively of the possible smallness of τ ). This final part of the argument only works if v is measured in a low regularity norm (we work with · B −3ε 2p ; see (7.19)) and this is the reason why throughout the paper we measure the initial datum of the equation for v in this norm.

Local existence and uniqueness
The aim of this section is to provide a local existence and uniqueness result for the system (1.22). A similar local theory was already presented in [5] in a slightly different formulation (see Remark 1.5). The value of the constant c plays no role for the results presented in this section. We interpret the system (1.22) in the mild sense: and assume our initial condition (This choice is somewhat arbitrary. Any initial condition of regularity strictly better than − 2 3 would work.) is finite. The main result of this section is the following. be distributions such that for every pair (τ, α τ ) as in Table 1, we have as well as (1) For every pair of initial conditions . This time T can be chosen maximal, in the sense that either the solution is global, i.e. T = 1 and the solution can be extend to time t = 1 and takes values in The choice of maximal existence time T and solution (v, w) with these properties is unique. (1) is continuous at the initial time, in the sense that X T in the above statement can be replaced by We start by isolating a bound on the commutator com 1 defined in (1.14), which we will use again in subsequent sections. We introduce the difference operator where the implicit multiplicative constant depends on ε and p.
Proof. Recall the definition of com 1 in (1.14). We introduce the commutation operator We start by estimating the last term in the sum above. The contribution of can be estimated using Proposition A. 16: By the same reasoning, we have We now turn to the first term in the right-hand side of (2.8), which we will combine with the last term in (1.14). Recalling (1.13), we observe that By Proposition A.7, the · B 1+2ε p norm of the integral above is bounded by a constant times where we used Proposition A.13 and the fact that (s) B −1−ε ∞ 1 in the last step. By the assumption of Hölder regularity in time on (with exponent 1 8 ), this last integral is bounded by a constant times which completes the proof.
Proof of Theorem 2.1. We follow the usual strategy to first solve the system for some small but strictly positive T ∈ (0, 1] using a Picard iteration. In a second step, solutions are restarted iteratively to obtain maximal solutions.
For every T > 0 and M > 0, we define the ball For dealing with the case of regular initial data we also introduce the ball where (v, w) X T is defined in an analogous way to (v, w) X T without allowing for blow-up near time 0, i.e.
(v, w) X T Furthermore, we denote by Ψ the fixed point map, i.e. the mapping which associates We now show that for a suitable M and for T small enough, Ψ maps the ball X T,M into itself and X T,M into itself. The core ingredients are the following bounds, which we formulate as a lemma.

Lemma 2.3. There exists a constant C depending only on c and K (defined in the assumption of Theorem 2.1) such that the following holds. For every
We momentarily admit this lemma, and first use it to establish that Ψ is a contraction from X T,M into itself, and also from X T,M into itself. We focus on the statement concerning X T,M , the proof for X T,M using (2.12)-(2.13) instead of (2.10)-(2.11) being similar, only simpler.
We start by deriving bounds on Ψ V . For every M satisfying (2.9), using Proposition A.13 and (2.10), we get that for every t T and β ∈ {− 3 5 , 1 2 + 2ε}, Note that the exponents appearing in these bounds are compatible with the exponents appearing in the definition , and the second t exponent evaluates to 47 To bound time differences, we make use of the identity which holds for any 0 s t. This allows us to write, using Proposition A.13 and (2.10) again, is strictly larger than the exponent − 1 2 which appears in the definition of X M,T . The argument for Ψ W is similar: we get as well as where to bound the last integral we have made use of the simple estimate r − 99 Summarising, we conclude that there exists a constant C depending only on K and c, as well as an exponent θ > 0 such that for all T 1, The fact that it is also a contraction on this ball can be established with the same method and we omit the proof.
At this point, we can conclude that for every initial data (v 0 , w 0 ) ∈ B , then a contraction argument in X M,T1 implies that this solution is continuous all the way to t = 0 without blowup. Furthermore, any upper bound on v 0 provides a lower bound on T 1 . Our argument also implies that v(T 1 ) In particular, we have v(T 1 ) for v and B 1+2ε ∞ for w at time T 1 . However one could also use the contraction mapping principle on XT 2,M for some possibly smaller timeT 2 to find a solution for which these norms are continuous. By uniqueness of solutions in X T2,M , these solutions coincide, which ensures the continuity at T 1 of the original solution. By induction, one can now iterate this construction. In this way, either eventually the whole interval [0, 1] is covered, or one has T = ∞ k=1 T k 1. By the previous observation, this can only happen if at least one of the quantities v(t) There remains to argue about uniqueness of solutions to the system (2.1)-(2.2). This follows from the local contractivity of the fixed point map by classical arguments (see e.g. Step 3 of the proof of [32, Theorem 6.2]).
Proof of Lemma 2.3. We only treat the case (v, w) ∈ X M,T , the case (v, w) ∈ X M,T being only simpler. Throughout the calculations we make extensive use of the fact that the bounds v(s) for all − 3 5 γ 1 2 + 2ε. We will in particular use this for γ = ε. By Remark A.3, this yields a bound on the L ∞ norm of v: for ε small enough (of course this exponent is somewhat arbitrary; it is only important that it is less than a third). In the same way, we get and w(s) for ε small enough. According to the definition of F in (1.18) and Proposition A.7, we have (dropping the time argument s in the first expressions to lighten the notation) We now proceed to bound G(v(s) + w(s)) + cv(s) in (2.11). The term cv(s) can be estimated using (2.16). We now recall the definition of G in (1.19): where the polynomial P is defined in (1.20). We proceed by using the triangle inequality and bounding the terms on the right-hand side above one by one. The least regular term is a 2 (v + w) 2 arising in the polynomial P . We use Proposition A.7 and Corollary A.8 to bound this term: For the remaining terms in the polynomial P , we get Another rather irregular term is given by where we used Proposition A.7 once more. The remaining terms appearing in the definition of G can be bounded in stronger norms. Indeed, we have Note that it is here where it is crucial that the blowup exponent for the L ∞ -norm of v and w is strictly less than 1 3 , which requires the initial conditions v 0 and w 0 to be better than B Finally, we recall that according to (1.16), we have and use Proposition A.7 and Proposition 2.2 to write and Note in particular that we have made use of the control on the Hölder regularity in time of (v, w) in order to treat the last integral. For the second commutator term (defined in (1.15)), we use Proposition A.9 to obtain This completes the argument.
We conclude this section by making two important observations, and then setting the stage for the derivation of the a priori bound.
Remark 2.4. The first observation is that the solution pair obtained in Theorem 2.1 depends continuously on the initial condition. This is indeed clear from the construction of the solution pair via a fixed point argument. As a consequence, it suffices to show Theorem 1.1 for smooth initial datum. Indeed, once the theorem is established for v 0 = 0 and smooth w 0 , one can recover the general case v 0 = 0 and w 0 ∈ B − 3 5 ∞ , by regularising w 0 and solving the system with this initial datum, then applying the result for this solution to get a bound which holds uniformly in the regularisation parameter, and finally passing to the limit.
Remark 2.5. The second point we wish to make is that the norms of the spaces X T and X T , although convenient to work with in order to show Theorem 2.1, can be improved a posteriori. Indeed, a slight modification of (2.14) yields the bound Similarly, a small modification of (2.15) gives ∞ , then we can conclude that for any (If the initial condition is only assumed to be in B . Note that we assume K 1. This assumption is a convenience allowing us to simplify bounds by using inequalities such as K K 2 , etc. For the same reason, we also assume throughout that the constant c appearing in (1.22) satisfies c 1. As the argument proceeds, we will also assume a stronger lower bound on c, see (3.14), and then simply fix c according to (7.1).
We also give ourselves v 0 ∈ B and therefore that T = 1 and (v, w) ∈ X 1 .
In the course of the argument, various norms of v and w will be involved. We know beforehand that all these norms are finite. Indeed by the assumed smootheness of the initial datum, we have (v, w) ∈ X T . Moreover, in view of Remark 2.5, we also have (2.17) for every compact interval I ⊆ [0, T ).

A priori estimate on v
In this section, we derive a priori estimates on v. Theorem 3.1 below will be used many times to replace quantities involving v by quantities involving w only (and the initial condition v 0 ). The estimate becomes better as c increases. The possibility to choose c sufficiently large is used crucially in Lemma 6.3 below. The constraint on c is then propagated to Theorem 6.1 and then throughout the concluding Section 7. Lemma 6.3 is part of an argument where we test the equation for w against w 3p−3 , and focuses on the terms arising from the cubic non-linearity −(v + w) 3 . This testing produces the quantities fixing c large allows to argue that the term w 3p (which has the right sign if p is an even integer) dominates the other terms. We recall our notation and that we assume c 1.
and let where Γ is Euler's Gamma function. For every t < T , we have as well as we have for every s t ∈ [0, T ), In all estimates, the implicit constants depend on ε, p q and β, but neither on c 1 nor on K ∈ [1, ∞).
Remark 3.2. In view of the proof below and of Remarks A.3 and A.14, we also have The proof of Theorem 3.1 relies on the following Gronwall-type lemma.
Then for every t 0, Moreover, Remark 3.4. We did not include a multiplicative constant K 0 in the definition of the convolution kernel k 1 , contrary to the definition of k 2 . This is because any multiplicative constant on k 1 can be incorporated into the definition of h, while such a manipulation is not possible with k 2 .
Remark 3.5. By writing with an implicit constant independent of s, K 0 , and c.
Proof of Lemma 3.3. Note that by iterating the hypothesis once, We introduce some notation that will allow to iterate further. For every integer n 0, we let A change of variables enables us to rewrite this integral as s1+···+sn tn+1−t0 (the condition s i > 0 is kept implicit). The latter integral is the (multivariate) beta function evaluated at (1 − σ, . . . , 1 − σ), and is equal to To sum up, we have shown that In the same way, it follows that This proves that the remainder term in (3.8) tends to 0 as N tends to infinity, and yields (3.6). In order to check (3.7), we use the fact that for x 1, , this gives the upper bound for (3.7). Since we will never use the matching lower bound, we simply mention that it follows by evaluating the contribution of the summand indexed by n such that Proof of Theorem 3.1. By Propositions A.13 and A.2, the first term in the righthand side of (2.1) is estimated by For the second term in the right-hand side of (2.1), recall the definition of F in (1.18). Here we want to allow for v of negative regularity (but no worse than −3ε), so we decompose F into and by Proposition A.7 and Remark A.3, Hence, by Propositions A.13 and A.2, where σ := β + 1 + 4ε 2 and σ := β + 1 + ε 2 + 3 2 The assumption of β −3ε ensures that v(t) B −3ε q v(t) B β q , while the assumption of β < 1 − 4ε ensures that σ < 1. Lemma 3.3 thus yields that, for c as in (3.2), and that σ < 1, we obtain (3.3).
In order to derive (3.4), we repeat the reasoning above with minor modification. Indeed, the argument above shows that The last integral is bounded by a constant times Inequality (3.4) then follows by another application of Lemma 3.3.
We now turn to (3.5). By homogeneity in time of the equation, it suffices to show (3.5) for s = 0. By Remark A.3, we have By Proposition A.13, Remarks A.3 and A.15 and the assumption of c 1, we have For this bound we do not have to deal with v of negative regularity, so we simply bound Combining this estimate with (3.10) and using that We then apply Lemma 3.3 and conclude as above.
We conclude this section by fixing an important convention. We started the section by explaining that the possibility to choose c sufficiently large is only really useful in Lemma 6.3 to control the cubic non-linearity. While this is indeed the case if we aim for any bound on the solution, irrespectively of its dependency on the constant K, here we are aiming for more: we want to make sure that the bound obtained in the end depends polynomially on K. In view of the definition of c in (3.2) and of the way it enters the estimates (3.3)-(3.5), we risk encountering terms that are super-exponential in K if c is chosen of order 1. This observation already suggests to fix c sufficiently large in terms of K, to ensure that c 0 and restore a polynomial dependence on K in the bounds (3.3)-(3.5).
How large c needs to be chosen depends on the exponent σ, which itself depends on the choices we will make of the parameters p, p , q and β appearing in Theorem 3.1.
In the overarching structure of the argument for our main result, we will fix an integrability exponent p ∈ [1, ∞) sufficiently large, and then ε > 0 sufficiently small in terms of p. We will then apply Theorem 3.1 a number of times, but always with β 1 2 + 2ε, and with every integrability exponent appearing there bounded from below by the exponent p fixed sufficiently large. Thus every appeal to Theorem 3.1 will produce an exponent σ satisfying, as per (3.1), In view of this, we fix from now on the following Important convention. Throughout the rest of the paper, we impose (3.11) p 24 and ε 10 −3 .
In this way, every time we appeal to Theorem 3.1, we will do so with a choice of parameters ensuring the inequality In such instances, we can always replace the parameter-dependent value of c by the lower bound c c − 1 − KΓ 1 8 8 .
With this new convention, the estimates (3.3)-(3.5) are valid provided that we make sure that σ 7 8 , which will be the case every time we actually appeal to Theorem 3.1 thanks to (3.11). A more stringent lower bound on c will appear later in Theorem 6.1, by the requirements of Lemma 6.3. For convenience, we also impose that (3.15) p is an even integer.
where the implicit constant depends on p and ε, but neither on K nor on c satisfying (3.14).
We introduce The core of the proof of Theorem 4.1 focuses on the estimation of the L p norm of δ st w. We then derive an estimate of δ st w L p at the last step, which makes the term w(s) B 1+4ε p appear. We could replace this term by the weaker quantity w(s) +ε p , but this does not facilitate subsequent arguments.
Recall the definition of G in (1.19); see also (1.16). There are several terms in G which require special attention. The cubic term −(v + w) 3 has the highest degree. In the proof of Theorem 4.1, we cannot make use of the "good" sign of this term, but only treat it as a "bad" term. This makes the cubic non-linearities in (4.1) appear. The estimation of com 1 (v, w) involves δ st w L p itself; we will derive an estimate of the form where · · · are quantities that do not involve δ st w. An explicit estimate on δ st w L p follows, since we know from (2.17) and (4.29) below that for every t < T , The term involving w = is the only term which requires to control derivatives of w of order higher than one. This is the reason for the appearance of the term w B 1+4ε p on the right-hand side of (4.1). The term a 2 (v + w) 2 also requires attention, since it involves controlling the spatial regularity of non-linear quantities of v and w (recall that a 2 is a distribution with regularity exponent − 1 2 − ε). We summarize this decomposition as where [ . . . ] stands for the easier terms left out. We provide bounds on the terms listed in (4.2) in the following lemmas. Although we do not repeat it each time, the implicit constants in these lemmas depend neither on K nor on c satisfying (3.14).
where the implicit constant depends on p and ε.
Proof. We start with the simple estimate

By Theorem 3.1 and Remark
By Jensen's inequality, the quantity above is bounded by a constant times Summarizing, we obtain (4.3).
where the implicit constant depends on p and ε.
Remark 4.4. Keeping the left side of (4.4) in this form, as opposed to directly using the estimate will turn out to be useful for the proof of Lemmas 5.4 and 6.4 below.
Proof of Lemma 4.3. By Proposition 2.2, We now use the estimates of v(s) Note that the estimate holds uniformly over c, by the assumption of (3.14). Similarly, the second term of the upper bound for (4.5) is bounded by , this term is bounded by the right-hand side of (4.4).
As for the term with δ st v L p , we have The first term is (4.5) again, up to an extra exponent ε/2, while by the same reasoning as above, the double integral is bounded by t 0 K (t − u) 1 2 +3ε (K + w(u) L p ) du, and this completes the proof.
where |||w||| p,t is defined by The implicit constant in (4.6) depends on p and ε.
Proof. We start the proof by using Proposition A.7: The initial condition is easily dealt with: We now recall that by Lemma 4.3, The contribution of the first line above to the integral    We now analyse the more subtle term coming from (4.8): To begin with, we replace δ u u w by δ u u w. The difference is estimated by Proposition A.13: Hence, the difference between (4.10) and the same expression with δ u u replaced by δ u u is bounded by where we used Hölder's and Jensen's inequalities and p 8 7 . Note that this is the same error term as in (4.9). Moreover, by Remark A.14, Hence, the double integral in (4.10) with δ u u replaced by δ u u is bounded by , as well as Summarizing, we obtain (4.6).
The following lemma is the only place where we need to measure a derivative of index higher than 1 of w.
where the implicit constant depends on p and ε.
Proof. The estimate (4.12) follows easily by writing For the next lemma, we recall that a 2 is the coefficient in front of the quadratic term in P which was defined in (1.20), and that a 2 is a distribution with spatial regularity − 1 2 − ε controlled uniformly in time.
where the implicit constant depends on p and ε.
Proof. We start by bounding the term which is of highest order in w, using Remark A.3 and Propositions A.13 and A.7: By Proposition A.7 (specifically, (A.13)), we have

Moreover, by Proposition A.4 and Remark A.3, we have
so that, by Young's inequality, We deduce from this and Hölder's inequality that t s 1 (t − u) 1 4 +ε w 2 (u)

JEAN-CHRISTOPHE MOURRAT, HENDRIK WEBER
For p > 8 3 and ε > 0 sufficiently small, we have We now turn to the term involving a 2 v 2 . Arguing as in (4.14) and then using Proposition A.7, we get Recall that by Theorem 3.1, The term containing the initial condition contributes The contribution of the second term in (4.18) to the integral on the right-hand side of (4.17) can be rewritten as Therefore, using Hölder's inequality in the first and Young's inequality in the second step we get where q 1 is the adjoint exponent of q 1 and q 2 , q 3 ∈ (1, ∞) satisfy 1 q2 + 1 q3 = 1 + 1 2q 1 . We also impose q 1 and q 2 to be sufficiently small that the corresponding integrals are finite. That is, we impose Choosing q 3 = 3p, and q 1 = 2 1+2ε (which implies that the second condition in (4.21) is satisfied) one sees that the the q 2 determined by the first condition in (4.21) satisfies q 2 12 11 for any p > 1, which implies in turn that for ε > 0 small enough the third condition holds. Therefore, using w L 2p w L 3p we can summarise We now analyse the term involving the product vw. As before, we write The second term is easily taken care of, since the inequality reduces the analysis to the sum of The contribution of the first two terms was already analysed, see (4.16) and following. The contribution of the last term is only simpler to analyse than the quantity on the right-hand side of (4.17). Indeed, appealing again to Theorem 3.1, the initial condition appearing there poses no difficulty, and there remains to bound    (4.20), and g (u) = u − 1 2 −ε . We then note that, by Hölder's and Young's inequalities, this quantity is bounded by , and q 2 < 2 1 + 2ε .
As before, we choose q 3 = 3p and q 1 = 2 1+2ε , and then the equality in the first condition above implies q 2 6 5 , which in particular satisfies the last condition if ε > 0 is sufficiently small. We therefore obtain that the integral above is bounded by and this completes the estimation of this term.
We now bound the terms which were not made explicit in (4.2).
where the dots . . . represent all the terms left out in (4.2) (spelled out explicitly in (4.25) below). The implicit constant depends on p and ε.
Proof. We need to estimate (4.25) and we proceed by bounding these terms one by one.
To begin with, we show that Indeed, by Remark A.14 and Proposition A.9, the left-hand side above is bounded by For the integral involving v, we apply Theorem 3.1 as before to obtain where we have first used Jensen's inequality to move the p-th power inside the duintegral, and then carried out the du integral. So (4.26) follows.
We now show that Indeed, on the one hand, by Proposition A.7, On the other hand, The contribution of the other term from Theorem 3.1 takes the form .
We also have which is bounded by the right-hand side of (4.24).
For the term involving v, we call again Theorem 3.1 to write The first term is bounded as in (4.28). Using Hölder's inequality, we bound the second term by .
For the integral involving w, we write where in the second step, we have made use of the interpolation bound provided by Proposition A.4 and of Remark A.3. Note that 1 − 2 3p − 1 4 > 1 8 for p 8 7 , so this term is bounded by the right-hand side of (4.24) as well provided that ε > 0 is sufficiently small.
where we recall that |||w||| p,t is defined in (4.7), and that this quantity is finite by (2.17). Using that p 8 7 , the comparisons · To conclude, we observe that by Proposition A.13 and Remark A.3, we have and then use the crude bound ·

Higher regularity for w
In this section, we use the regularizing properties of the heat semigroup once more to estimate w in a norm with an exponent of regularity larger than 1. Such an information is necessary to control the behaviour of the term w = .
Recall that we are mainly intersted in the case γ > 1, in which case the power of t appearing in (5.1) is negative. However, we will use this theorem in the form of the following corollary, in which diverging powers of t no longer appear. Note that this requires a rather fine control on the excess of the exponent in the diverging power of t in (5.1), and particular attention needs to be paid to this aspect in the proof of Lemma 5.6 below.

Corollary 5.2. Let p
24 and ε > 0 be sufficiently small. There exists an exponent κ < ∞ depending on p and ε such that for every t ∈ [0, T ), where the implicit constant depends on p and ε, but neither on K nor on c satisfying (3.14).
Proof of Corollary 5.2. We use Theorem 5.1 in the simplified version . By the definition of λ and the choice of ε > 0 small enough (ε < 4 39p is sufficient), we see that It only remains to remove the term involving w(s) p B 1+4ε p on the right-hand side. By Proposition A.4, Remark A.3 and Young's inequality, we have the interpolation bound Since 1+4ε 1+6ε < 1, an application of Young's inequality then yields the result. We now proceed to the proof of Theorem 5.1. We use again the decomposition (4.2) (setting s = 0 there), and proceed to bound the terms one by one in the following lemmas. Although we do not repeat it each time, the implicit constants in these lemmas depend neither on K nor on c satisfying (3.14). Lemma 5.3. Let p 24, ε > 0 be sufficiently small, and let 0 < γ < 3 2 . For every where the implicit constant depends on p, ε and γ.
Proof. We start by observing that by Proposition A.13, we have for any τ < T We proceed by bounding the integrals over the expressions involving w and v one by one. For w, we use Hölder's inequality in the form The first integral on the right-hand side is finite if and only if γp 2 < 1, which amounts to p > 2 2−γ . This condition is clearly satisfied, since p 4. For the integral involving v, we use Hölder's inequality again, but this time in the form where 1 q + 6 p = 1. This time the condition for the first integral on the right-hand side to be finite reads γq 2 < 1, or equivalently p > 12 2−γ . Using again that γ < 3 2 , we see that this condition is satisfied for p 24. For the second integral on the right-hand side, we use Theorem 3.1 (in the form given by Remark 3.2) to get The first integral on the right-hand side is finite as soon as ε < 1 2p . We estimate the second expression using Jensen's inequality: The desired estimate thus follows. Lemma 5.4. Let p 24, ε > 0 be sufficiently small, and 0 < γ < 3 2 . For every t ∈ [0, T ), we have where the implicit constant depends on p, ε and γ.
Proof. As before, we start by observing that by Proposition A.13 and Remark A.3, In order to bound this integral, we first observe that We proceed by bounding these terms one by one, starting with the integral involving e τ ∆ v 0 . We get thus resulting in the first term on the right-hand side of (5.3). It is worth observing here that as we are mostly interested in γ > 1, the resulting exponent of t is negative. However, as both exponents γ 2 and 1+5ε 2 individually are strictly less than 1, this does not affect the finiteness of the integral.
For the integrals involving each of the terms listed on the right-hand side of (5.4), we write where 1 p + 1 p = 1 and j = 1, 2, 3, 4. The first integral on the right-hand side is finite by our conditions on p and γ, and it thus remains to bound the temporal L p norms of the I j . For the first two terms, we get In the second identity we have used that 4ε < 1 where 1 p + 1 p = 1. As already seen, the first integral is finite under our assumptions on p and γ. In order to treat the second term on the right-hand side, we use the multiplicative inequality in Proposition A.7 to get that for every fixed τ , so that the conclusion follows.
Proof. We start by writing . We now apply Proposition A.7 in the form (5.10) so that the expression in (5.8) can be rewritten as

as well as the bounds (3.3) and Remark 3.2 which yield
We now plug these bounds into the right-hand side of (5.7) and treat the resulting terms one by one, using the shorthand notation (5.12) γ := 1 2 γ + 1 2 + ε .
We first get where we have set q := 3p 3p−4 , so that 1 = 1 q + 1 3p + 1 p . The first integral on the right-hand side above is finite if and only if p > 8 3 This condition is implied by our assumptions of γ < 4 3 and p 16, provided that ε > 0 is sufficiently small. Applying Young's inequality to the definitions (5.9) and (5.10), we get To control the last term on the right-hand side of (5.13), we first make use of Proposition A.4, in the form of the interpolation bound and then of Hölder's inequality to get We also observe that by Young's inequality, Combining these calculations with (5.13), we obtain It remains to bound the terms involving A 0 and B 0 (i.e. the initial datum v 0 ) on the right-hand side of (5.11). We write where 1 p + 1 p = 1, and where we used once more the interpolation bound (5.16). The first integral is bounded by t 1− 1 . Noting from the definition of γ in (5.12) that γ < 11 12 + ε, we see that this exponent satisfies Using (5.15) and Young's inequality, we thus conclude that Similarly, using the definition (5.10) of B 0 , then Hölder's inequality, and then (5.14), we get Recalling the definition of γ in (5.12), we see that the exponent in the power of t above can be rewritten as Finally, for the last term we write, recalling the definitions (5.9) and (5.10) of A 0 and B 0 , For ε > 0 sufficiently small, the exponent in the power of t above is smaller than that appearing in (5.17), so the proof is complete.
Using the definition (1.15) of com 2 and the bound provided in Proposition A.9, one can check that Similarly to the previous lemma, we use Theorem 3.1 to bound v(τ ) We then get and using Hölder's inequality for 1 so that the desired bound follows from the embedding · Proof of Theorem 5.1. The result is a straightforward consequence of the decomposition in (4.2) (with s = 0) and of the results of Lemmas 5.3 to 5.7.

Leveraging on the cubic non-linearity
In this section, we test the equation for w against suitable powers of w. This allows us to benefit from the "good" sign of the term −w 3 in the definition of G. In the course of the argument, we will use Section 3 to dispense with the terms involving v, and effectively reduce the analysis of the system (1.22) to that of a single equation on w; and Section 4 to control the time regularity of w and handle the commutator term com 1 . We postpone the incorporation of the results of Section 5 to the next section. Recall that the relationship between c and c is fixed by (3.13). Theorem 6.1 (A priori estimate on w). Let p 24 and ε > 0 be sufficiently small. There exist c 0 , κ < ∞ depending only on p such that if then for every t ∈ [0, T ), we have where the implicit constant depends only on p and ε.
In order to isolate the "good term" −w 3 , we let G be such that Proposition 6.2 (Testing against w 3p−3 ). Let p 24, which we recall is assumed to be an even integer, see (3.15). For every t ∈ [0, T ), we have Proof. By classical arguments (see e.g. [32, Proposition 6.7]), w is a weak solution of (1.22), in the sense that for every φ ∈ C ∞ per , We proceed as in the proof of [32, Proposition 6.8]. We split the interval [0, t] into a subdivision 0 = t 0 · · · t n = t, apply the identity above with s = t i , t = t i+1 and φ = w 3p−3 (t i ), take the sum over i, and study the convergence of the result as the subdivision gets finer and finer. In order to obtain the result, we need to show that in this limit, Indeed, the conclusion is then immediate from the decomposition of G in (6.2). We decompose the sum on the right-hand side of (6.4) into Each of the terms in the sum on the right side above is treated similarly. For notational simplicity, we only discuss the term w 3p−3 (t i+1 ). The difference between its contribution and the left-hand side of (6.4) is This difference tends to 0 as the subdivision gets finer and finer, since by (2.17), we have w ∈ C 1 2 +ε ([0, T ), L ∞ ). This completes the proof of (6.4). The convergence in (6.5) is a consequence of the fact that, by Theorem 2.1 and Proposition A.5, Finally, we obtain the convergence in (6.6) using Lemma 2.3 and the fact that Similarly to (4.2), we now rewrite the right-hand side of (6.3) as We now proceed to estimate each of these terms. The first term has a cubic homogeneity. We need to control it with the contribution of the "good term" −w 3 . This crucially relies on our ability to choose c sufficiently large.
Proof. We start with the bound which follows from Hölder's and Young's inequalities. It is therefore sufficient to bound the space-time L 3p -norm of v. By Theorem 3.1 (or more precisely Remark 3.2), we have By Jensen's inequality, we have uniformly over c 1 and s 0, We deduce that Fixing ε > 0 sufficiently small in terms of p, we can therefore enforce that I(c) c − 1 5 . Combining this with (6.9) and (6.11) completes the proof.
We now use the a priori estimate on δ st w derived in Section 4 to estimate the contribution of the first commutator term.
where the implicit constant depends only on p and ε.
Proof. We start by applying Hölder's inequality and then Proposition A.7 to get (dropping the time variable in the notation) We integrate in time the first term, use Propositions A.13 and A.2, Jensen's and Young inequalities to get Moreover, since a second application of Young's inequality yields that, for some exponent κ > 0 depending only on p and every δ ∈ (0, 1], Integrating the second term in (6.13) and applying Hölder's inequality, we get We now focus on bounding the first integral on the right-hand side above. According to Lemma 4.3, for any fixed s, we have the bound The contribution of v 0 B −3ε p is easily taken care of. We calculate the L p norm in time of the first integral, using Jensen's inequality and the bound · For the remaining integral, we first write for any δ > 0, We then use Theorem 4.1 to get where we have set Note that N (t) does not depend on the variables of integration, and that Finally, by Jensen's inequality, Summarizing, we have bounded the left side of (6.16) by Applying Young's inequality on each term (save the last one) then yields (6.12).
We now turn to the term involving w = , which can only be controlled by a norm of w with regularity index above 1.
where the implicit constant depends only p and ε.
Proof. This bound follows directly from Hölder's inequality and the bound The quadratic non-linearity is rather delicate to handle. Lemma 6.6. Let p 24 and ε > 0 be sufficiently small. There exists an exponent κ > 0 depending only on p such that for every δ ∈ (0, 1] and t ∈ [0, T ), we have where the implicit constant depends only p and ε.
Proof. Throughout the proof, the exponent κ > 0 may vary from one occurence to another, but is only allowed to depend on p. We decompose the proof into three steps, treating the contributions of w 2 , v 2 and vw successively.
Step 1. We first treat the term of highest homogeneity in w. Recall that a 2 is uniformly bounded in B . We write, using Propositions A.1 (dropping the time variable in the notation), Moreover, by Corollary A.8, By Proposition A.4 and Remark A.3, Moreover, by Young's inequality, so that integrating over time completes the estimate of this term.
Step 2. We now turn to the contribution of the term involving v 2 . As above, our starting point is the observation that , and by Proposition A.7, For the first term on the right side above, we expect v 2 to have almost L 4 integrability in time. We may choose to bound w L 3p−2 by w L 3p ; such a bound is interesting since the term involving w L 3p on the right side of (6.18) appears with a different homogeneity than the term involving w L 3p−2 . However, the term involving w L 3p only appears integrated in time, which is problematic for controlling the small-time divergence of v 2 B 1 2 +2ε ∞ . We will therefore use an interpolation of these bounds, such as . This choice of exponents yields, by Hölder's and Young's inequalities, There remains to bound the first integral on the right side of (6.22), that is, and Theorem 3.1 and Remark 3.2 allow to bound each of these two terms, that is, , provided that p > 12 and ε > 0 is sufficiently small. The contribution of the cross-term involving the integrals in (6.24)-(6.25) can be bounded by by Jensen's inequality and Fubini's theorem. By Hölder's inequality, the two mixed terms involving v 0 B −3ε p and an integral from (6.24)-(6.25) are both bounded by , provided that p > 18 and ε > 0 is sufficiently small. This completes the analysis of the first term on the right side of (6.20 We use Young's inequality on the squared term above, and then bound v 0 , provided that ε > 0 is sufficiently small. Noting that (6.28) 1 3p 2 + 6p − 7 2 + 3 2 = 1, and applying Young's inequality with these exponents, we conclude that the quantity above is bounded by We now bound the remaining term from (6.27), namely We use once more the identity (6.28) to apply Young's inequality and get that the integral above is bounded by The contribution of the inner integral is bounded using Jensen's inequality. This therefore completes the analysis of the contribution of the term a 2 v 2 .
Step 3. We finally analyse the contribution of the cross-term vw. As in the previous steps, our starting point is the inequality As above, we apply Proposition A.7 to note that (6.29) vw 3p−2 Similarly to (6.21), we use the upper bound to gain some integrability in time. That is, we apply Hölder's inequality to get where in the last step, we used Young's inequalities with exponents 1 3p The first integral on the right side of (6.30) is very similar to that appearing in (6.23), and can be treated similarly. There remains to estimate the contribution of the last term on the right side of (6.29). By Corollary A.8 and (6.19), we have The analysis of p ds then proceeds along very similar lines to that for (6.27) above, and we therefore omit the details. Lemma 6.7. Let p 24 and ε > 0 be sufficiently small. There exists an exponent κ > 0 depending only on p such that for every δ ∈ (0, 1] and t ∈ [0, T ), we have The dots . . . represent all the terms left out in (6.7) (spelled out explicitly in (6.31) below).
Proof. We need to bound For the first term, by Proposition A.9. We then apply Young's inequality to bound and proceed as before to bound the last term, appealing to Theorem 3.1. For the second term in (6.31), we treat the initial condition for v separately, that is, we first bound We have already estimated this term, see (6.14). We now focus on bounding (dropping the time variable in the notation) The term w 3p−3 is the same as that appearing on the left-hand side of (6.26), up to a rescaling of ε. Hence, For the other term, by Proposition A.7, The term involving poses no difficulty. For the term involving w, we use the interpolation bound to get a bound of the form on which we then apply Young's inequality. The remaining term involving v −e ·(∆−c) in (6.32) is treated by an appeal to Theorem 3.1.
The other terms in the left-hand side of (6.31) are only simpler than the quadratic terms covered by the previous lemma, so we omit the details.
Proof of Theorem 6.1. Fir p 24 and ε > 0 sufficiently small, and then c such that where c 1 is given by Lemma 6.3. Combining Proposition 6.2 with the bounds derived in Lemmas 6.3-6.7 (and with Young's inequality and comparisons of norms), we obtain the existence of an exponent κ > 0 depending only on p, and of a constant C depending only on p and ε, such that for every t ∈ (0, T ], Letting t vary over an interval containing 0, we can absorb the supremum and thus obtain the announced result.

Conclusion
In this section, we combine the bounds derived in the previous sections to derive the final estimate on v and w. As in the rest of the paper, we assume that p 24 and that ε is sufficiently small. From now on, recalling (3.13), where c 0 is the constant (depending on p) appearing in Theorem 6.1.
Theorem 7.1. Let p 24, ε > 0 be sufficiently small, and c be fixed according to (7.1). There exists an exponent κ < ∞ depending only on p and ε such that for where the implicit constant depends only on p and ε.
The next lemma combines the bounds obtained in Sections 5 and 6 into a single estimate, which we then use as the basis for an ODE-type argument similar to the one sketched below (1.28).

Lemma 7.2. Let p
24, ε > 0 be sufficiently small, and c be fixed according to (7.1). There exists an exponent κ < ∞ depending only on p and ε such that for every s, t ∈ [0, T ), we have where the implicit constant depends only on p and ε.
Proof. By Theorem 6.1 and Corollary 5.2, there exists an exponent κ < ∞ depending only on p and ε such that for every s < t ∈ [0, T ), we have as well as where the implicit constants depend only on p and ε. We start by simplifying these estimates and putting them in the most convenient possible form for subsequent analysis. In the estimate (7.3), a term involving w(s) B 1+6ε p appears. This term can be estimated by a power strictly less than 1 of the quantity on the left-hand side of (7.4). More precisely, by the interpolation bound (5.2) and Young's inequality, we have L p , for some exponent κ < ∞ depending on ε. Setting σ := 3 + 18ε 3 + 20ε , we deduce from (7.3), (7.4) and (7.5) that, after enlarging the exponent κ < ∞ as necessary, After enlarging again the exponent κ < ∞ as necessary, we infer that uniformly over δ ∈ (0, 1], we have We now estimate the term involving the initial datum w(s) in (7.4). We observe that, for γ := (1 + 7ε) ( and combine this with (7.4) to arrive at Multiplying the estimate (7.7) by 2K κ , summing it with (7.8) and simplifying, we obtain that for some exponent κ < ∞ sufficiently large, where C is the constant implicit in the last , then yields the announced result.
The next lemma exposes the general principle by which, with the help Lemma 7.2, we obtain the sought-after power-law decay of w(t) L 3p−2 . There exist an integer N 1 and a sequence 0 = t 0 < t 1 < t 2 < . . . < t N = τ such that for every n ∈ {0, . . . , N − 1}, where the implicit constant depends only on λ.
Proof. We define t 0 = 0 and τ , then we set N = 1, t 1 = τ , and we verify that (7.10) holds. Otherwise, we evaluate (7.9) for s = 0 and t = t * 1 , writing By the definition of t * 1 , this implies min We then denote by t 1 the smallest value of r for which this minimum is realised, and summarise this first step of our induction in the bounds We now iterate this construction, and construct t n+1 assuming that t 0 < t 1 < . . . < t n have been constructed and that t n < τ . We set t * n+1 = t n + c2 λ F (t n ) 1−λ . As before, if t * n+1 τ , then we terminate the recursion and set N = n + 1 and t n+1 = τ . Otherwise, we define t n+1 as the smallest value of r for which the minimum min tn r t * n+1 F λ (r) is attained. As in the initial step, this implies and t n+1 t n + c2 λ F 1−λ (t n ).
This procedure necessarily terminates after finitely many steps. Indeed, the first of these estimates can be rewritten as 2 λ−1 F 1−λ (t n ) F 1−λ (t n+1 ), (7.12) and thus, in each iteration, the proposed time-step t * n+1 − t n = c2 λ F (t n ) 1−λ is at least multiplied by a factor of 2 λ−1 > 1, so that it has to exceed τ after finitely many steps. In order to establish (7.10), we then write, for every n ∈ {0, . . . , N − 1}, The key observation is now that the sum appearing on the right-hand side of this identity is dominated by the term F 1−λ (t n ). Indeed, by induction on (7.12), we see that for every j n, Plugging this into the sum on the right-hand side of (7.13) yields Combining this with (7.13) yields t n+1 cF 1−λ (t n ), which is equivalent to (7.10) and thus completes the argument. Due to our assumption of v(0) = 0 and the continuity properties of v and w, either F (0) 1, or τ > 0. If τ > 0, then by Lemma 7.3, there exists a positive integer N 1 and a sequence of times 0 = t 0 < t 1 < . . . < t N = τ such that for every n < N , We now aim to extend this bound to get a control for arbitrary t ∈ (0, T ). We decompose the argument into four steps.
Step 1. We consider the case τ > 0 and t < τ . In this situation, there exists a positive integer n < N such that t n t < t n+1 , and moreover, for every s < t, we have v(s) 3p By Lemma 7.2 applied with s = t n and the previous display, we infer that and by (7.17), we deduce Step 2. Define τ := inf{s 0 : v(s) 3p We clearly have τ τ . In this step, we study the possibility that τ < τ , and aim to cover times t ∈ [τ, τ ). Under these conditions, we have F (τ ) 1 as well as, for every s < τ , An application of Lemma 7.2 with s = τ then yields Step 3. In this step, we consider the remaining case when t ∈ [τ , T ). By the result of the previous two steps, we have ∀s τ , w(s) L 3p−2 K κ s − 1 2 .
By Theorem 3.1 and the assumption of v 0 = 0, we get The estimate above is the reason why we were careful to measure v(τ ) in a Besov space with an integrability exponent 2p (3p − 2 would be sufficient, but 3p is more problematic). By continuity and the definition of τ , we deduce that and by an application of Lemma 7.2 with s = τ , we obtain Step 4. We now conclude. Combining the results of the previous steps yields that for every t ∈ (0, T ), Arguing as in Step 3, we deduce that for every t ∈ [0, T ), and this completes the proof. The mapping (f, g) → f, g (defined for f, g ∈ C ∞ per ) can be extended to a continuous bilinear form on B α p,q × B −α p ,q . In particular, we can think of Besov spaces as being all embedded in the space of Schwartz distributions.
Proposition A.2 (Besov embedding). Let α β ∈ R and p r ∈ [1, ∞] be such that There exists C < ∞ such that f B α p,q C f B β r,q . Remark A.3. By [32, Remarks 3.5 and 3.6], there exists C < ∞ such that When α = 1, we also impose q = ∞. There exists C < ∞ such that We decompose the proof into two steps.
Step 1. We show the result for α ∈ (0, 1). By comparison of norms, it suffices to show the result for q = 1. We assume p < ∞, the case p = ∞ being similar. Let f be a smooth, one-periodic function. For 0, we define the projectors For the first term, recalling (A.5) and (A.6), we have where we used Young's convolution inequality on the torus and set (A.10) η k := y∈(2Z) d η k (· + y).
Recall that η k = 2 kd η(2 k ·). By scaling and rapid decay to 0 at infinity of η, we have On the other hand, using the fact that for k 0, the function η k has vanishing integral, we get By Hölder's inequality, the integral above is bounded by where we recall that · L1 denotes the L 1 norm in the full space R d . For every x, y ∈ R d , Therefore, Noting that |2 k · | η k L 1 is finite and independent of k 0 by scaling, we obtain so that uniformly over 0, The result then follows by optimizing over .
Step 2. We show the result for α = 1 and q = ∞. This is a minor modification of the arguments of the previous step. Indeed, we have and we have seen that the latter is bounded by a constant times ∇f L p , so the proof is complete. + ct f B β p,q . We now turn to our second commutator estimate, which extends Lemma 32 in the first arXiv version of [18] to more general Besov spaces (see also [5,Lemma 2.5]). Proposition A. 16 (commutation between e t∆ and < ). Let α < 1, β ∈ R, γ α+β, and p, p 1 , p 2 ∈ [1, ∞] such that 1/p = 1/p 1 + 1/p 2 . For every t 0, define [e t∆ , < ] : (f, g) → e t∆ (f < g) − f < (e t∆ g).  Any function h whose Fourier spectrum lies in 2 k A satisfies e t∆ h = G k,t h.
In particular, h k = G k,t (S k−1 f δ k g) − S k−1 f (G k,t δ k g) , that is, We can rewrite the difference of S k−1 f at two points in terms of its gradient: where G k,t (y) := y G k,t (y). Let us denote the inner integral above by h k,s (x). We now show that (A. 22) h k L p G k,t L 1 ∇S k−1 f L p 1 δ k g L p 2 .
We will in fact show that (A.22) holds with h k,s in place of h k , uniformly over s.
(This inequality is a minor variant of Young's and Hölder's inequalities; in particular, it does not depend on the specific properties of the functions involved, and the implicit multiplicative constant would be 1 if all functions were real-valued.) We first observe that by Hölder's inequality, As a consequence, By Hölder's inequality, and we obtain (A.22). The remaining step consists in uncovering the size of G k,t L 1 in terms of k and t. By symmetry, it suffices to study the L 1 norm of the function y → y 1 G k,t (y). Up to a factor i, this function is the inverse Fourier transform of We learn from the proof of [1, Lemma 2.4] (or that of [32, Lemma 2.10]) that for every φ ∈ C ∞ with support in an annulus, there exists c > 0 such that As a consequence, there exists c > 0 such that G k,t L 1 2 −k 1 + t2 2k e −ct2 2k .