Quantization of Time-Like Energy for Wave Maps into Spheres

In this article we consider large energy wave maps in dimension 2+1, as in the resolution of the threshold conjecture by Sterbenz and Tataru (Commun. Math. Phys. 298(1):139–230, 2010; Commun. Math. Phys. 298(1):231–264, 2010), but more specifically into the unit Euclidean sphere Sn-1⊂Rn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb{S}^{n-1} \subset\mathbb{R}^{n}}$$\end{document} with n≥2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${n\geq2}$$\end{document}, and study further the dynamics of the sequence of wave maps that are obtained in Sterbenz and Tataru (Commun. Math. Phys. 298(1):231–264, 2010) at the final rescaling for a first, finite or infinite, time singularity. We prove that, on a suitably chosen sequence of time slices at this scaling, there is a decomposition of the map, up to an error with asymptotically vanishing energy, into a decoupled sum of rescaled solitons concentrating in the interior of the light cone and a term having asymptotically vanishing energy dispersion norm, concentrating on the null boundary and converging to a constant locally in the interior of the cone, in the energy space. Similar and stronger results have been recently obtained in the equivariant setting by several authors (Côte, Commun. Pure Appl. Math. 68(11):1946–2004, 2015; Côte, Commun. Pure Appl. Math. 69(4):609–612, 2016; Côte, Am. J. Math. 137(1):139–207, 2015; Côte et al., Am. J. Math. 137(1):209–250, 2015; Krieger, Commun. Math. Phys. 250(3):507–580, 2004), where better control on the dispersive term concentrating on the null boundary of the cone is provided, and in some cases the asymptotic decomposition is shown to hold for all time. Here, however, we do not impose any symmetry condition on the map itself and our strategy follows the one from bubbling analysis of harmonic maps into spheres in the supercritical regime due to Lin and Rivière (Ann. Math. 149(2):785–829, 1999; Duke Math. J. 111:177–193, 2002), which we make work here in the hyperbolic context of Sterbenz and Tataru (Commun. Math. Phys. 298(1), 231–264, 2010).


Wave maps into spheres.
We discuss here some facts, important for our argument, regarding smooth wave maps with target the Euclidean sphere. For a broad introduction to the subject, we shall refer the reader to the monograph of Shatah and Struwe [24].
Wave maps are smooth maps φ : I ×R 2 → R n , defined on some time interval I ⊂ R, taking values in the sphere S n−1 ⊂ R n , which concretely means: with the evolution φ[t] := (φ(t), ∂ t φ(t)) ∈ T (S n−1 ), taking values in the tangent bundle and belonging to the space C 0 t (I ;Ḣ 1 x ) ∩ C 1 t (I ; L 2 x ), governed by the equation: where the D'Alembertian is given by := ∂ α ∂ α = −∂ 2 t + x . Note our convention here is that we are summing over repeating indices, where α is running from 0 to 2, with ∂ 0 = ∂ t and ∂ 0 = −∂ t as we will be always raising the indices with respect to the Minkowski metric μ = −dt ⊗2 + dx ⊗2 1 + dx ⊗2 2 on R 2+1 unless clearly stated otherwise. We recall that equation (1.2) is invariant with respect to the scaling: for any λ > 0, and also any space-time translation.
Let us mention a few important conservation laws associated to the above evolution. Firstly, recall that the energy of a wave map at time t 0 ∈ I , scale invariant in dimension 2+1, is given by: and a conservation of energy law holds: for any t 0 , t 1 ∈ I . Secondly, as the target is the Euclidean sphere S n−1 , equation (1.2) is equivalent to the conservation law: which is a consequence of Noether's theorem and the symmetries of the sphere (and similarly for other homogeneous Riemannian manifolds, but we shall focus on the sphere here for simplicity), recalling that wave maps are formally critical points of the Lagrangian: (1.5) of which (1.2) is the Euler-Lagrange equation. The use of (1.4) means however, that some of our arguments do not directly generalize to the case when one has an arbitrary closed Riemannian manifold as a target. Another consequence of the variational point of view and Noether's theorem, is that smooth wave maps enjoy the stress energy tensor: Finally, closing our presentation of wave maps, we remark that the Lagrangian L is Lorentz invariant which implies that, after composition with Lorentz transformations, the map still solves Eq. (1.2) and in particular the conservation law (1.4) also stays true.

Statement of the main result.
Before presenting our main result, let us set up some notation. As usual, for two positive quantities A and B we will be writing A B if A ≤ C · B for some implicit constant C > 0 whose dependence should be clarified when necessary. We also write A ∼ B whenever the additional estimate B A holds. Similarly, for the O-notation, we set A = O(B) with A not necessarily positive this time, if |A| ≤ C · B.
Regarding the asymptotic notation, arising in various statements of the soliton decomposition below, we write o X (A), as ν → +∞ in the background with X some Banach space (typically a Sobolev space), for a sequence of elements f ν ∈ X with f ν X ≤ c ν · A where c ν ↓ 0. In the same spirit, we will write A ν B ν whenever A ν /B ν → 0 holds. By B r 0 (x 0 ) ⊂ R 2 , we will be always referring to a spatial open ball of radius r 0 > 0 and center x 0 ∈ R 2 , whereas in space-time our basic domains should be light cones. We denote the forward light cone by: : t ∈ I, r = t} standing for the lateral boundary, to which we usually refer as the null boundary. Given some δ > 0, it will be convenient also to set C δ := (δ, 0) + C, with the convention that C 0 stays for ∪ δ>0 C δ , the open interior of C. Accordingly, we have C δ I := C I ∩ C δ , S δ t 0 := S t 0 ∩ C δ and if δ > 0, ∂C δ I for the lateral boundary of C δ I . We recall now the set-up from [27] (which of course holds for any closed Riemannian manifold as target, but we restrict ourselves to the case of S n−1 for the sake of consistency). By the finite speed of propagation, translation and scaling invariance properties, we shall restrict ourselves to the forward light cone C on which it is convenient to study at the same time both scenarios: the finite time blow-up at the tip of the cone, as well as the problem of scattering as t → +∞. Hence, we can assume that we are given a wave map φ on C, smooth up to but not necessarily including the origin (0, 0), and satisfying the energy bound: where E is an arbitrarily large but fixed for the rest of the paper bound on which most of our constants will depend. Let us introduce here the notation for the energy of the wave map φ over some domain U ⊂ R 2+1 at the time slice {t = t 0 } setting: or simply E U [φ] when there is no ambiguity, as for example with E S t 0 [φ] above. For the latter quantity, we recall the important monotonicity property: which is obtained, as the conservation of energy law (1.3), contracting the stress energy tensor T [φ] with ∂ t and using (1.7) with Stokes' theorem, this time however applied in C [t 0 ,t 1 ] , giving: (1.9) where F [t 0 ,t 1 ] [φ] is called the flux of the wave map from t 1 to t 0 , and L is part of the null frame: The monotonicity property and the global bound (1.8) enable us to define the limits: and imply that F [t 0 ,t 1 ] [φ] ↓ 0 as t 0 , t 1 both tend to zero or infinity. The latter can be used, together with the angular part of F [t 0 ,t 1 ] [φ] from (1.9), to construct, given any ε > 0, an extension of φ outside the cone C on (0, t 0 ] for t 0 = t 0 (ε) small enough, and on [t ∞ , ∞) for t ∞ = t ∞ (ε) large enough, solving the wave maps equation (which is possible by finite speed of propagation, hence we shall slightly abuse notation denoting those extensions by φ) such that: see Sections 6.1 and 6.2 in [27]. By the small energy theorem of Tao [28], if E[φ](t 0 ) can be chosen small enough, then E 0 = 0 and φ can be extended to a smooth wave map for all time (this guarantees also that the above extensions are smooth everywhere except possibly (0, 0), even if E S t [φ] is large, provided ε > 0 was chosen small enough initially). Moreover, via a continuity-iteration-renormalization argument, φ is proved in [28] to belong to a space S ⊂ C 0 t (I ;Ḣ 1 x ) ∩ C 1 t (I ; L 2 x ), implying control in all the Strichartz spaces amongst others, in which well-posedness for the Cauchy problem (1.2) can be established. We discuss this more precisely with further references later in Sect. 2.2. Here, we should mention that, following the terminology of Sterbenz and Tataru [27], we will say that scattering holds if: noting that, strictly speaking, this means that φ behaves like a linear wave as t → ±∞ after applying the microlocal gauge (if small energy, see [28]) or the diffusion gauge (necessary if large energy, see [26]). We refer the reader to the structure theorem of Sterbenz and Tataru in [26], Proposition 3.9 there, for further information. Let us take the opportunity here to remark that, if the target manifold is a hyperbolic Riemann surface, then scattering in the classical sense was established by Krieger and Schlag [15] for wave maps in the Coulomb gauge. For the hyperbolic spaces, this was achieved by Tao [29] using the caloric gauge. Therefore, if E[φ](t ∞ ) could be chosen small enough for some extension we consider the scattering problem for φ as t → +∞ resolved.
Once energy gets large, blow-up can occur and the first examples of finite time singularity for equivariant wave maps into S 2 were constructed by Krieger, Schlag and Tataru [16], as well as Rodnianski and Sterbenz [22] and also Raphaël and Rodnianski [20], where, as for the harmonic map heat flow, the mechanism behind the singular behavior was concentration of a non-trivial harmonic map. More generally, the wave map φ could have concentrated at the origin at least one soliton: these are defined to be finite energy smooth maps ω : R 2+1 → S n−1 solving the wave maps equation (1.2) and satisfying: X ω = 0, for some constant time-like vector field X on R 2+1 . In particular, precomposing ω with a Lorentz transformation that takes ∂ t to X , we obtain a finite energy harmonic map from R 2 steady in the time direction which, upon extending over spatial infinity using the removable singularity theorem of Sacks and Uhlenbeck [23], gives a harmonic twosphere ω • : R × S 2 → S n−1 familiar from the bubbling analysis of harmonic maps and heat flows. Let us note here that this last point of view enables us to set ω(∞) := lim |x|→∞ ω(t, x), which is well-defined and independent of time t chosen.
The threshold conjecture, resolved by Sterbenz and Tataru [26,27] (for closed Riemannian manifolds), Krieger and Schlag [15] (for hyperbolic surfaces) and Tao [29] (for hyperbolic spaces of any dimension), predicts that concentration of solitons is the essential mechanism behind blow-up. That is, if E 0 , E ∞ are less than the energy threshold below which every harmonic two-sphere is constant, then one has regularity at t = 0 and scattering as t → +∞.
One of the central difficulties in establishing this conjecture in the general nonsymmetric situation was that relying only on standard Morawetz type estimates obtained from the stress energy tensor, it was not possible to get a non-trivial amount of energy concentrating within the light cone required to produce a non-constant soliton. As far as the program of Sterbenz and Tataru is concerned, the breakthrough was made in [26], where they obtain that, on top of concentrating energy, the map must concentrate a non-trivial amount (E) > 0 of the BMO type energy dispersion norm.
That is, if: sup (1.10) where P k stands for the Littlewood-Paley projection, then: and the map extends smoothly to a neighborhood of t = 0 (we shall state a slightly more precise version of this theorem in Sect. 2.2). This is a large data result and is proved in [26] via an induction on energy argument. Let us note here, as an aside, that the program of Krieger and Schlag [15], as well as the one of Tao [29], proceeded via a different induction on energy argument and without any smallness assumption as (1.10). As there are no non-constant solitons for the targets considered there, one obtains global regularity and scattering for arbitrarily large data in those cases. We point out, on the other hand, that the concentration-compactness techniques used in [15] can also lead to a fruitful study of the formation of solitons, as was demonstrated so far for equivariant wave maps in [1][2][3][4]13]. In the present work however, we shall adopt a more direct approach, staying closer to [26,27], see Sect. 1.3 for a detailed summary of our strategy.
In Sect. 3.2, we will briefly discuss results from [27] that convert concentration of energy dispersion into concentration of a non-trivial amount of time-like energy, as this is how, arguing by contradiction, we get the energy dispersion norm of the term concentrating on the null boundary asymptotically vanishing. On the other hand, the fact that arguments in [27] give that only some energy is prevented from escaping into the null boundary at a finite time singularity is a serious obstacle to controlling null concentration further. In fact, techniques dealing with this phenomenon would have to strengthen [26,27] considerably in this situation, if not giving a wholly alternative proof to the threshold conjecture (which we shall not attempt in this paper). Theorem 1.1 (Sterbenz and Tataru [26,27]). Suppose that the wave map φ is singular at (0, 0), respectively φ / ∈ S[t ∞ , ∞) for any extension as discussed above, then there exists a sequence λ 0 ν ↓ 0, respectively λ ∞ ν ↑ ∞, the so-called final rescaling, such that setting: we can find a sequence of concentration points (t ν , x ν ) ∈ C for some non-constant soliton ω.
We shall describe in detail the final rescaling φ ν at the beginning of Sect. 3, see Lemma 3.1. In our main theorem, we study this sequence further, carrying out a blow-up analysis for it and establishing an analogue of the energy identity from the bubbling analysis of harmonic maps and heat flows (and many other geometric variational problems), see for example the works [5,17,32] and the references therein for the critical regime, and for a supercritical situation the papers of Lin and Rivière [18,19], which are of closer flavor to the arguments presented in this paper.
where we are writing i (t) := i ∩ S t , and the maps converge locally to a constant away from i in the interior of the light cone: • Dispersive property for null-concentration: The parts of the maps φ ν that get concentrated on the null boundary ∂C have asymptotically vanishing energy dispersion norm, that is fixing the constant: [1,2] ), , solving the wave maps equation on this short, but independent of ν, time interval and satisfying: where ω j : R 2+1 → S n−1 are solitons for which: , for a finite collection, q = 1, . . . , q(ω j , E), of parallel time-like geodesics j q . Remark 1.3. In other words, we have energy quantization in the interior of the light cone for wave maps into spheres. This is a little first step towards understanding the soliton resolution conjecture for the (2+1)-dimensional wave maps equation with target S n−1 . It states that in addition, such a decomposition should be unique holding for all time and that t 0 ,ν should have asymptotically vanishing energy in the case of finite time blow-up (we note that this is guaranteed in the equivariant case by the well-known exterior energy estimate, see [24]), or correspond to the scattering part of the wave map in the case of global existence. Some further estimates, following directly from the work of Sterbenz and Tataru [26,27], regarding the terms t 0 ,ν can be found in Remark 2.6 and Sect. 3.2 (for example, (3.17) there gives decay for the angular and the null L = ∂ t + ∂ r energy). We note in the end though that our techniques do not lead to any further information.
We mention here that the soliton resolution conjecture has recently been shown to hold for the 1-equivariant wave maps into S 2 ⊂ R 3 with initial data having topological degree one and energy strictly less than 3 times 4π (note that 4π is the energy threshold) by Côte, Kenig, Lawrie and Schlag at finite time singularity in [3], and in [4] for the case of global existence (more general surfaces of revolution are also considered). Note that in this situation, one knows a priori the uniqueness of the possible configurations of solitons that can be concentrated (in fact there is only one of them and it is the unique equivariant degree one harmonic map). The conjecture is also established for the examples constructed by Krieger et al. [16], as well as Raphaël and Rodnianski [20].
Without this restriction on the initial data, the soliton resolution along a sequence of times was obtained in the 1-equivariant setting by Côte [1,2] building upon [3,4], and more generally for the -equivariant case for any integer ≥ 1 by Jia and Kenig [13] relying on a method different from [1,3,4] (in both works, the finite time singularity and the global existence case have been considered). We refer the reader to [13] for more references and an overview with some history of the various beautiful techniques used to tackle the soliton resolution conjecture in the radial/equivariant cases for a variety of non-linear wave equations initiated by Duyckaerts, Kenig and Merle, see for example [7]. We also note that those techniques have been very recently applied to prove the sequential soliton resolution conjecture without any symmetry assumptions for some focusing semi-linear wave equations by Duyckaerts et al. [6,8,12]. The strategy of the present paper will have a very different flavor though. An outline can be found in Sect. 1.3.
Let us say that the techniques we use to establish the above theorem leave completely open the question of uniqueness of the set of solitons. In fact, as suggested by an example of Topping [32] for the harmonic map heat flow, this, and therefore the soliton resolution conjecture, could fail for certain targets (in view of the work of Simon [25] however, such pathologies are believed to be excluded when working with real analytic targets like S n−1 ). Therefore, there is a notoriously difficult and long way from Theorem 1.2 to the full soliton resolution conjecture, as one should expect the former to hold for any closed Riemannian manifold as a target, and the only place where we use the fact that our target is a sphere is when relying on the conservation law (1.4) in the proof of the compensation estimates in Sect. 2.3. Establishing the analogue of those estimates for general targets is an important open question even in the elliptic theory, see the work of Rivière [21] for a further discussion.
1.3. Discussion of the strategy. We should close the introduction by outlining the proof of Theorem 1.2, which is contained in Sect. 3.
The first point of Theorem 1.2 is obtained in Sect. 3.1. For the sequence of wave maps {φ ν } ν∈N at the final rescaling, Sterbenz and Tataru [27] obtain a decay estimate along the scaling vector field ∂ ρ = 1 (t 2 −r 2 ) 1/2 (t∂ t + r ∂ r ): for some sequences ς ν ↓ 0, 1 2 ν ς ν , see Lemma 3.1. If one uses a local version of the latter, by contracting the stress energy tensor (1.6) with ϕ∂ ρ , for some compactly supported cut-off ϕ on the unit hyperbolic plane H 2 , it is possible to spread a given energy control on some ball B r 0 (x 0 ) S 0 1 , at the time slice t = 1 say, along the flow of the vector field ∂ ρ for any finite amount of time; in other words the wave maps φ ν would have small energy, uniformly in ν, on the whole of: provided they did so initially at t = 1. This is a simple analogue of the fact, from the blow-up analysis of supercritical harmonic maps, that one must have the tangent Radon measures monotone under scaling (see the work of Lin [18], and Lemma 3.2 here).
This way, relying as well on concentration-compactness at t = 1 and the small energy compactness result under control of a time-like direction due to Sterbenz and Tataru [27], see Lemma 2.3 here, we are able to obtain a subsequence for {φ ν } ν∈N which converges on C 0 [1,2] , away from a finite set of time-like rays passing through the origin, to a regular self-similar wave map φ. By homogeneity and the singularity removable theorem of Sacks and Uhlenbeck [23], the map φ extends to a smooth wave map on the whole of the open forward light cone C 0 (the details of this argument are contained in Lemma 3.3). We note that similar arguments give also the convergence to solitons statement (1.11) claimed in Theorem 1.2 (see Lemma 3.6 for this point). We recall, however, that self-similar wave maps of finite energy must be constant. This is a well-known result, the proof of which can be found in [27] (see also Proposition 3.4 here for a precise statement).
On the other hand, another crucial property of the wave maps at the final scaling of Sterbenz and Tataru [27] is that a non-trivial amount of energy is uniformly held at a fixed distance away from the null boundary. Hence, our configuration of time-like rays, along which the wave maps concentrate, must be non-trivial. At this stage of the proof, this yields the first point of Theorem 1.2.
Because only some time-like energy is obtained in [27] (and this should have been so almost surely, if one considers the non-scattering problem for example), the second point of Theorem 1.2, treated in Sect. 3.2, tries to address the issue of null concentration. By cutting the parts of the map concentrating at the time-like geodesics, we are able to solve the wave maps equation for a uniform amount of time, even though the energy of the initial data is a priori large (thanks to the finite speed of propagation property and the fact the configuration of time-like rays was fixed initially). Running the arguments of Sterbenz and Tataru [27] backwards yields then the claimed control for the energy dispersion norm (see Lemma 3.5).
The construction of the asymptotic decomposition and the proof of the energy quantization, the third point of Theorem 1.2, is contained in Sect. 3.3. Upon choosing a suitable sequence of time slices {t (1) ν } ν∈N ⊂ (1, 2) and scales δ ν ↓ 0, we study the wave maps: for each geodesic i , from the first point of Theorem 1.2. The maps φ i,ν converge to the constant c φ corresponding to the self-similar wave map φ mentioned previously, locally in L ∞ t (H 1 x × L 2 x ) away from i , and in fact strongly in L ∞ t (L 2 x ). The time slices {t (1) ν } ν∈N have been chosen such that: x , for the constant time-like vector field X i pointing in the direction of the ray i . The concentration scales {δ ν } ν∈N have been chosen decaying slowly enough, to avoid losing energy in the process: lim ν→∞ sup t∈ [1,2] From there, we appeal to the compensation type estimates from Sect. 2.3 (the only place where we use the fact that our target is the sphere S n−1 ), decomposing the gradient as: which is obtained in Proposition 2.7. To construct i,ν , we rely essentially on the timelike decay above, and for i,ν the div-curl type structure of the non-linearity: , coming from the conservation law (1.4). Furthermore, we obtain a decomposition for the higher order time-like derivatives of φ i,ν : where the first term is a linear combination of: (1.12) that we note being local in time and quadratic in the gradient, and the second one satisfies a favorable decay estimate: This is obtained in Lemma 2.8 of Sect. 2.3, relying crucially on the conservation law (1.4) again, and plays an important role in the proof of the Besov decay estimate for wave maps on neck domains of Lemma 3.8 in Sect. 3.3, to which we come in few moments here.
We proceed then by constructing the soliton decomposition for the wave maps φ i,ν , up to terms called necks in the literature on harmonic maps, which are given by φ i,ν restricted to a finite collection of conformally degenerating annuli: and k = 1, . . . , K i (E), satisfying the local energy decay estimate: for any positive integer ∈ N. This is the content of Lemma 3.6, and represents essentially a standard argument of concentration-compactness. The whole of Theorem 1.2 is then reduced to showing that those necks have asymptotically vanishing energy.
In doing so, upon picking up suitable time slices {t (2) ν } ν∈N ⊂ (− 1 2 , 1 2 ) before applying Lemma 3.6, and taking the fastest concentrating scale λ min,ν := min i {λ i ν }, we consider the maps: together with: and {t (2) ν } ν∈N was chosen in such a way that: We use then the second and third items of the decay statement above, to write for the gradient of φ ν,x k i,ν on the neck domain: ) and satisfying: This is proved in Lemma and this gives the desired energy collapsing result.

Technical Results
In this section we gather some of the technical results, mainly restricted to the regularity theory of wave maps, that we will be using in Sect. 3 to establish Theorem 1.2. The crucial compensation estimate is proved in Sect. 2.3.

Some harmonic analysis.
We will be mainly relying on the spatial Fourier transform. For φ(t, x) ∈ S(R 2 ), a Schwartz function on R 2 at some fixed time t, we define: together with the inverse transform given by: for a Schwartz function ϕ(t, ξ) on the frequency space. The space-time Fourier transform: with inverse denoted by F −1 , should however appear in Sect. 2.3 while treating high modulations. The use of Littlewood-Paley theory will be quite beneficial to our analysis and general references for it are the monographs of Taylor [31] and Grafakos [10]. We shall rely on the discrete version here only: the Littlewood-Paley projection P ≤k , with k ∈ Z, is defined to be a Fourier multiplier with symbol m ≤k (ξ ) := m ≤0 (2 −k |ξ |), i.e. via the convolution: for some radial non-negative function m ≤0 (|ξ |) in frequency space, identically 1 on |ξ | ≤ 1 and 0 for |ξ | ≥ 2.
We also set P k to be a multiplier with symbol m k (ξ ) := m 0 (2 −k |ξ |), where m 0 (|ξ |) := m ≤0 (|ξ |) − m ≤0 (2 |ξ |), and the operators P <k , P k 1 ≤·≤k 2 , P ≥k , etc. are then defined in the usual way. Note that LP-projections make sense for functions defined only at some given time t, or restricted to any time interval, and more generally commute with time cut-offs. Furthermore they are disposable multipliers, i.e. have the distributional convolution kernels of bounded mass, even when considered on the whole of space-time which in practice means that they are bounded on any translation invariant Banach space of functions on R × R 2 and therefore can be discarded from the estimates as one wishes.
Two elementary but important facts about LP-projections that we would like to mention here are the finite band property that states: and further: for any 1 ≤ p ≤ ∞, as well as Bernstein's inequality: for any 1 ≤ q ≤ p ≤ ∞. The latter is especially useful converting integrability into regularity at low frequencies.
We can decompose any Schwartz function using LP-projections, and as we typically consider maps taking values in the sphere, we will be considering affinely (i.e. upon adding a constant) Schwartz functions, obtaining: (2.5) While working with the gradient ∇ t,x φ, this will make no difference of course. By duality, the above decompositions hold also for tempered distributions and are used to define various Besov and Triebel-Lizorkin spaces, see [10]. Let us present here some examples important for our argument.
In this paper, we will be mainly working with the Besov spaces B s, p q (R 2 ), for s ∈ R and 1 ≤ p, q ≤ ∞, together with the homogeneous versionsḂ s, p q (R 2 ), defined as completions with respect to the norms: and taking the ∞ norm if q = ∞ instead, of subspaces of S(R 2 ) for which those norms are finite. We remark that the case p, q = 2 corresponds to the familiar Sobolev spaces H s x , and their homogeneous versionsḢ s x respectively. We introduce also the local Hardy space H 1 loc (R 2 ) with its homogeneous counterpart H 1 (R 2 ), as Triebel-Lizorkin spaces F 0,1 2 (R 2 ) = H 1 loc (R 2 ) andḞ 0,1 2 (R 2 ) = H 1 (R 2 ) (this characterization is obtained in [10]), both subspaces of L 1 x , defined as the completion of Schwartz functions with respect to the norms: and which admit the local and homogeneous BMO spaces as a duals, (H 1 loc ) = bmo and (H 1 ) = BMO respectively. Although the latter does not admit a Littlewood-Paley type characterization, the former does via the Triebel-Lizorkin space F 0,∞ 2 = bmo, which is defined to be the Banach space of all tempered distributions ϕ ∈ S (R 2 ) having the following norm finite: the series above required to hold in S , see the monograph of Taylor [31] for further information. Hardy spaces are especially useful in estimating paraproducts (see below), and let us mention here, with this in mind, that H 1 embeds into a Besov space with lower regularity but better summability: This fact, that we will enjoy exploiting in the proof of Proposition 2.7 later, is taken from Lemma 7.19 of Krieger and Schlag [15] (p. 250). For a related result in the Lorentz space setting see the monograph of Hélein [11] (Theorem 3.3.10 and also the references mentioned there). Littlewood-Paley decompositions are also very useful in studying non-linear expressions, and one central example is the product θϑ of two Schwartz functions θ and ϑ ∈ S. Applying the decomposition (2.5), we can write: but recalling that the Fourier transform of a product is a convolution leads to the so-called Littlewood-Paley trichotomy decomposition (also called paraproduct decomposition), which simplifies the above double sum into: • The high-high interactions Both θ and ϑ have Fourier support well above the scale |ξ | ∼ 2 k , but the only way the sum of two annuli at larger scales |ξ | ∼ 2 k 1 , 2 k 2 with k 1 , k 2 ≥ k +6 can intersect the small annulus at |ξ | ∼ 2 k , is if they are approximately at the same scale, we should have |k 1 − k 2 | ≤ 3. • The low-high interactions If θ has Fourier support in the ball of radius 2 k−6 , it will contribute to the frequency scale |ξ | ∼ 2 k if it is multiplied by ϑ frequency localized to the annuli |ξ | ∼ 2 k 2 with k − 3 ≤ k 2 ≤ k + 3. The rougher components of ϑ bring up the low frequency parts of θ . The sum in k of the low-high interactions is sometimes called a paraproduct in the literature. By symmetry, we have the same picture with the roles of θ and ϑ interchanged: these are the high-low interactions.
We are then left only with the contribution of θ k 1 ϑ k 2 where both terms are frequency localized at 2 k 1 , 2 k 2 ∼ 2 k , these are the low-low interactions and in our case it will be often convenient to incorporate them in the high-high interactions.
Finally, let us set up here the notation for some space-time function spaces and related tools that we use. We define the Sobolev spaces H s t,x = H s t,x (R × R 2 ), for s ∈ R, by using the space-time Fourier transform and taking the completion of S(R × R 2 ) with respect to the norm: We define the modulation projections Q ≤ j and Q j for j ∈ Z to be the Fourier multipliers with symbols: respectively (and similarly for Q < j , Q j 1 ≤·≤ j 2 and Q ≥ j ). We note that those are not disposable so that one needs to be careful when discarding them off from the estimates in general, but as their symbols are bounded and smooth, they are directly seen to be bounded on L 2 t,x by Plancherel. Otherwise, we have the following lemma due to Tao (Lemmata 3 and 4 in [28]).
Lemma 2.1. The operators P k Q j , P k Q ≤ j , P ≤k Q ≤ j and P ≤k Q j are disposable for any pair of integers j and k with j ≥ k + O(1). Moreover, for any 1 ≤ p ≤ ∞ and j, j 1 , j 2 ∈ Z, the operators Q ≤ j , Q j 1 ≤·≤ j 2 and Q j are bounded on the spaces L p t (L 2 x ).
Using the modulation projections Q j , we define following Tao [28] the homogeneouṡ X s,b,q k spaces associated to the cone {|τ | = |ξ |} at the spatial frequency scale k, for any fixed integer k ∈ Z and some given real b ∈ R, to be the completion of the space of Schwartz functions ψ on R × R 2 with respect to the norm: provided the latter is finite for ψ, and adopting the usual convention if q is infinite. For q = 1 we obtain an atomic space. As our methods here have more of an elliptic rather than dispersive character in the end, we shall not use those spaces directly (other than stating the estimates from regularity theory). However, the distinction between the high modulations regime P k Q >k+10 , and the one of frequency space-like P k Q ≤k+10 , is absolutely crucial for our analysis.
To close this section, let us recall here the convention that function spaces over domains are defined via minimal extensions. For example, we shall write X (I ), where X is a function space over R × R 2 and I some time interval, for the Banach space of functions f in I × R 2 admitting an extension f to the whole of R × R 2 and set:

Regularity theory for wave maps.
We shall not give here the full definition of the space DS, and its undifferentiated version S, used in the iteration arguments of the proofs of well-posedness for the wave maps equation, referring to [28,Section 10] or [26, Section 5.2], but we will briefly summarize here some characteristic properties. At a given frequency scale k ∈ Z, the space DS is defined as an intersection of several different spaces and for us it will be enough to note that we have the control: (2.7) for any Schwartz function ψ on R × R 2 (under frequency localization, for the space The first component is the natural energy component on which we should mainly rely in this work. The second one is the dispersive component to be used only indirectly here but being important in gaining extra regularity for the part of the wave map that has Fourier support away from the light cone. The latter observation is exploited by Sterbenz and Tataru [27] in their compactness result that we discuss below. The third component represents the standard Strichartz spaces. We note that we do obtain the null concentration terms t 0 ,ν lying in this space, see Remark 2.6. We note that, for the regularity theory, the Q 0 -null structure in the non-linearity of equation (1.2) is crucial and the components mentioned above are not enough by themselves to exploit it so that one needs to introduce further suitable null frame Strichartz spaces. However, as this structure will not play any direct role in our arguments we should not elaborate more on this point here. Let us simply remark in the end that DS contains the atomic Fourier restriction space: referring to Lemma 8 in Tao's paper [28] for the proof of this fact, ideas from which we should actually use later in the proof of Lemma 2.8. By default in [26], the authors define then the spaces DS and S as completions of Schwartz functions in R × R 2 with respect to the norms obtained by 2 -summing the control on the LP-projections and adding the L ∞ norm for S: In practice however, it is sometimes convenient to replace the 2 summation in (2.9) with a control with respect to a frequency envelope. Following Sterbenz and Tataru [26], we call a sequence c := {c k } k∈Z ∈ 2 of positive numbers c k > 0 a (σ 0 , σ 1 )-admissible frequency envelope if 0 < σ 0 < σ 1 and for any k 0 < k 1 we have: Given some smooth initial data φ[0] = (φ(0), ∂ t φ(0)) we can naturally attach to it an admissible frequency envelope by setting: for which we note that: so that given any function ψ on R 2 , P k ψ L 2 x c k implies: which is very useful in controlling the regularity of an evolution like the wave map. Well-posedness theory for the wave maps equation with small energy initial data is due to Tao [28] and Tataru [30], and also Krieger [14] who considered the hyperbolic plane as target. We will be using here a local version that we state below appearing as Theorem 1.3 in [30]. Of course, all of the results stated in this section are true for general closed Riemannian manifolds as target, but we present them in the case of spheres for the sake of consistency. [28], Tataru [30]). There exists a constant 0 := 0 (S n−1 ) > 0 such that:

Theorem 2.2 (Tao
outside a compact domain with energy: there exists a unique smooth wave map φ defined on the whole of Minkowski space R 2+1 such that: taking the frequency envelope c from (2.10) for φ[0] and where σ 0 = σ 0 (S n−1 ) is some fixed small positive constant but σ 1 can be chosen arbitrarily large; • Continuous dependence on initial data and rough solutions: given a sequence of smooth tuples φ ν [0] ∈ T (S n−1 ) of initial data equal to a fixed constant outside some fixed compact domain, with energy: , there exist smooth wave maps φ ν with the properties as stated in the first point above and a map: solving weakly the wave maps equation (1.2) x ) on bounded time intervals, and further for 0 < s < σ 0 : We state now a compactness result due to Sterbenz and Tataru [27] for a sequence of small energy wave maps which become constant in the direction of some smooth time-like vector field. The absence of such a result in the general small energy case is precisely what makes the study of wave maps near the null boundary of the light cone a very challenging affair, requiring global non-linear techniques going beyond the present article. We mention that the arguments in [27] rely on the elliptic flavor given to the situation by the assumption that the sequence is asymptotically constant along a timelike vector field, the use of the Fourier restriction component of DS to gain compactness and regularity for the limiting map, as well as the small energy weak stability theory developed by Tataru [30] (which we have presented in the second point of Theorem 2.2 here).

Lemma 2.3 (Sterbenz and Tataru [27]). Consider a sequence of smooth wave maps
where s > 0 depends only on 0 from Theorem 2.2, and such that: for some smooth time-like vector field X . Then there exists a wave map: for any 0 < < 1 2 , satisfying: after passing to a subsequence, and further: for almost every t that we can fix as close to 0 as we wish. Hence, assuming that s was chosen small enough initially, by the pigeonhole principle we have for σ ∈ (2, 5 2 ): away from a set of measure 1 10 say. Fixing such a σ , we would have φ(t, ∂ B σ ) contained in a single chart of S n−1 of diameter O( √ s ) around a point c ∈ S n−1 . Moreover, upon passing to a further subsequence, by the strong convergence (2.17) we can choose σ ∈ (2, 5 2 1 2 ), using Morrey's inequality. Hence, we would have φ ν (t, ∂ B σ ) contained in the chart around c ∈ S n−1 of diameter O( √ s ) as well, for all ν ∈ N large enough. Therefore, smooth as the latter are, with the energy bound: In the end, setting the constant s > 0 small enough and the time t close enough to 0, the convergence statements are justified by the continuous dependence on the initial data part of Theorem 2.2 and the finite speed of propagation property.
In particular, the assumption (2.14) gets upgraded to: and going further, the regularity theory of Theorem 2.2 tells us that in fact we have: for any 0 < < 1 2 improving upon (2.15), although it is unfortunately impossible to obtain convergence in such a stronger space without further assumptions, especially regarding the decay (2.14).
Let us close this section by mentioning the result of Sterbenz and Tataru [26], see both Theorem 1.3 and Proposition 3.9 there, which relaxes the assumption of small energy in the work of Tao [28] and Tataru [30] to small energy dispersion. This represents a crucial technical ingredient in the proof by Sterbenz and Tataru [27] of the threshold conjecture. Let us consider an open interval I = (t 0 , t 1 ), which can be unbounded. [26]). Given an energy bound E > 0, there exist constants 0 < (S n−1 , E) 1 and 1 F(S n−1 , E) such that for any smooth wave map φ on (t 0 , t 1 ) with energy bounded by E and ∇ t,x φ spatially Schwartz, if we have:

Theorem 2.5 (Sterbenz and Tataru
Moreover, considering an admissible frequency envelope c attached to some φ[t] for t 0 < t < t 1 , as in (2.10) and σ 0 as in Theorem 2.2, we obtain: and the map φ extends to a smooth wave map on a neighborhood of the time interval (t 0 , t 1 ).
Remark 2.6. In this paper, the above theorem will be used indirectly only, but we can apply it immediately to the wave maps t 0 ,ν from Theorem 1.2 concentrating on the null boundary ∂C, to obtain the bound:

Compensation type estimates.
We prove here two compensation estimates for wave maps into spheres with a good bound in the direction of some constant time-like vector field, relying on the conservation law (1.4) to treat high-high frequency interactions (this phenomena goes back essentially to Wente). These estimates will play a key role in the proof of no loss of energy in formation of solitons, and as in the case of higher dimensional harmonic maps considered by Lin and Rivière [19], this is the only place where we use the fact that our target manifold is S n−1 .
and X a constant time-like vector field, that we may take to be: satisfying: with the implicit constants depending only on n the dimension of R n , the energy bound E, the rapidity constant ζ and the cut-off χ (most notably on ∂ t χ L ∞ t ). Proof. We start by noting that, expressing ∂ t as a linear combination of X and ∂ x 1 via (2.19), it suffices to consider the spatial gradient χ ∇ x φ.
For low frequencies, we proceed claiming immediately: which simply follows from the finite band property (2.2), passing to L ∞ t (L 2 x ) as necessary. This is an acceptable contribution.
For high modulations, we claim: 24) and the idea here, as in [27], is to note that the vector field X being time-like, the Fourier multiplier X −1 ∇ x Q ≥k+10 P k , where P k = P k−1≤·≤k+1 , has symbol smooth and bounded uniformly in k ∈ Z. By Plancherel in L 2 t,x , this gives rise to the favorable elliptic estimate: and so (2.24) follows square-summing in k the above and dropping the cut-off. This is again acceptable.
The main term to consider is Q <k+10 P k (χ ∇ x φ) with k > 0, and for this we rely on the wave maps Eq. (1.2), that we trick slightly to make the vector X to appear, introducing the operator: which is elliptic in the frequency region considered. So, using (2.19), together with (1.1), we rewrite the wave maps equation (1.2) as: and inverting x,β we have: , hence let us treat each term in (2.27) one by one. Considering second line in (2.27), we control the first two terms by claiming, for any k ∈ Z: which follows immediately discarding, via Plancherel in L 2 t,x , the Fourier multiplier x,β Q <k+10 P k of symbol bounded uniformly in k ∈ Z, and dropping the time cut-off χ . For the third term, we have, for any k ∈ Z: where we discarded by Plancherel in L 2 t,x the Fourier multiplier 2 k ∇ x −1 x,β Q <k+10 P k , having here again the symbol bounded uniformly in k ∈ Z. Therefore, square-summing over k > 0, both (2.28) and (2.29) lead to acceptable contributions.
We consider now the non-linear term on the first line of (2.27). Let us introduce some notation for the connection matrices: by (1.4), respectively the global energy bound (2.18) and the boundedness of the wave map. We claim then the following compensation estimate: Thanks to the conservation law, the term α ∂ α φ exhibits and a div-curl type structure, and we should treat this using the Littlewood-Paley trichotomy in very much the same standard way as the actual div-curl structure, see Taylor's monograph [31]. We start by writing: where α,k 1 := P k 1 α and similarly for φ k 2 , α,≤k 1 , etc. We are going to prove claim (2.31) for each of the terms in (2.32) separately. Note that the Fourier multipliers: are disposable, which is essentially contained in Lemma 2.1 (precomposing, for example, with the space-time LP-projections to |τ | + |ξ | ∼ 2 k that we don't use here otherwise). This justifies the fact that we can work with the space x (on which, of course, (2.33) are bounded by Plancherel).
Let us start with the high-high interactions on the first and second lines of (2.32), for which we control (2.31), discarding the multipliers (2.33) and dropping 2 −k ∂ t χ for the first term, by: where we applied Bernstein's inequality (2.4), commuted the sum k>0 with L 1 t and discarded P k . Using Cauchy-Schwarz in L 1 x and recalling the finite band property (2.3) for φ k 2 , we can bound the contribution of (2.34) via: and summing this over k > 0, letting i := k 1 − k 2 and j := k 2 − k, we obtain: where we have used Cauchy-Schwarz in k. By the global energy bound, we get that high-high interactions make an acceptable contribution to (2.31).
Finally, let us consider the contribution of the paraproducts from lines three and four in (2.32), and we focus on the latter as the former is treated in the same way by symmetry (or in fact, could have already been absorbed in the argument for high-high interactions). Here, the div-curl structure is not playing any role, and is actually counter-productive. Hence, discarding the second multiplier from (2.33) and commuting the discrete sum k>0 with L 1 t as previously, it suffices control: Recalling the embedding (2.6) we are reduced to showing: 1.
Using the duality (F 0,1 2 ) = F 0,∞ 2 , as discussed in Sect. 2.1, we take an arbitrary ϕ ∈ F 0,∞ 2 together with a representation ϕ = k≥0 ϕ k in S x such that each ϕ k has Fourier support in |ξ | ∼ 2 k (|ξ | 1 for ϕ 0 ) and: Then, recalling the fact that LP-projections are self-adjoint, we must show that: with the convention that ϕ k with k negative simply stands for ϕ 0 . Using Cauchy-Schwartz we bound this via: It is a well-known fact from harmonic analysis, to which we shall refer as the Littlewood-Paley square function estimate, see e.g. [31], that: Hence, by the global energy bound, the contribution of the paraproducts is acceptable. Therefore we have shown the compensation estimate (2.31). Proposition 2.7 is proved.
We present now a compensation estimate for higher order time-like derivatives of wave maps as considered in the previous proposition. It holds up to a non-linear bulk, essentially quadratic in the gradient and local in time, that we shall consider on neck regions later in the proof of the weak BesovḂ 1,2 ∞ decay estimate in Lemma 3.8. Parts of this estimate are non-linear, and will be established via a duality argument in the spirit of the energy collapsing result itself.
As for Proposition 2.7, the conservation law (1.4) is absolutely crucial, and so our arguments do not generalize directly to the case of a general target beyond the Euclidean sphere S n−1 .
Lemma 2.8. Consider a wave map φ : [−1, 1] × R 2 → S n−1 with the same set-up as in Proposition 2.7, then we have the following decomposition holding in S(R × R 2 ), using notation from (2.30) and recalling that β = tanh(ζ ): the error term satisfying: with the same dependence for the implicit constant as in Proposition 2.7.
Let us note here, for later use in Lemma 3.8, that we can rewrite the decomposition (2.35) as follows. Introducing the notation: we can write: For, one should rely on the conservation law (1.4), making the vector X to appear through (2.19) and adding up on both sides some low-high interactions-see the proof below for more details.
Proof. Let us start with the frequency space-like region, that we can treat directly and for which we claim the stronger estimate: for any k ∈ Z. To see this, we simply commute X with the time cut-off χ , getting: where for the first term we discarded the multiplier 2 −k X P k Q <k+10 using Plancherel in L 2 t,x . Regarding the second one, passing to L ∞ t (L 2 x ), which is possible as we are working over a bounded time interval in (2.37), we can apply the inversion formula for the space-time Fourier transform F, to get: combining Minkowski's inequality and then Plancherel in L 2 x . But the integrand on the RHS has τ -support of length O(2 k ), hence we can bound this simply via: where we applied the inversion formula for F −1 this time (note that this argument is essentially a manifestation of Bernstein's one dimensional inequality). This gives claim (2.37) as desired.
For high modulations, we use the wave maps equation as in (2.27). Following the Littlewood-Paley trichotomy (passing to the convention φ k := P k φ, etc. as before), we write: where we set: From there, we add and subtract the frequency space-like part of the terms on second and last lines above, and use the conservation law (1.4), that we rewrite as: This yields the following decomposition: where we define: (4) k := P k −sinh(ζ )X (χ x 1 φ >k+10 ) , as well as: We proceed proving the estimate (2.36) for the first line of (2.38) and each of the ψ (i) k and ϕ (i) k separately. For the Laplacian, inverting X , we have the stronger estimate: that follows immediately by discarding, via Plancherel in L 2 t,x , the Fourier multiplier 2 −k X −1 x,β P k Q ≥k+10 having symbol bounded uniformly in k ∈ Z, which leads to an acceptable contribution.
For the second term on the RHS of (2.38) we immediately have: 1] , by the finite band property (2.3), which is acceptable.
Regarding ψ (1) k , we remark that it has a paraproduct structure and so at least one of the factors will be frequency localized to |ξ | ∼ 2 k , which is favorable for square-summing. More precisely, discarding Q <k+10 before dropping the cut-off χ , and using Bernstein's inequality (2.4) to pass to L 2 t (L 1 x ), it is enough to note that for any 1 ≤ p, q, r ≤ n and any time slice t ∈ [−1, 1]: by Cauchy-Schwarz. Upon integrating in time, this is an acceptable contribution by the energy bound (2.18). For the expression ψ (2) k , it is already convenient to proceed via a duality argument: where we used Bernstein (2.4) for the first factor, and for the second one we proceeded as for the frequency space-like term (2.37), using time frequency localization to estimate it via the Fourier inversion formula: .
The first factor is universally bounded for us, as for any 1 ≤ p, q, r ≤ n: which follows directly from the analogous treatment of high-high interactions in the proof of Proposition 2.7. On the other hand, the second factor is controlled via: 1] , which yields an acceptable contribution to the non-linear part of (2.36).
Regarding ψ (3) k , it is a linear combination of: where we discarded via Plancherel in L 2 t,x the Fourier multiplier 2 −k ∇ t,x Q <k+10 P k having bounded symbol, passed from 2 to 1 summation in k after commuting time integration with the discrete sum k , and applied Bernstein (2.4) with Cauchy-Schwarz. This contribution is directly seen to be bounded by O( X φ 2 The terms ψ (4) k , ϕ (1) k and ϕ (2) k are similar and require a duality argument relying heavily on their compensated structure to obtain estimate (2.36) at 2 modulation.
First for ψ (4) k , using the self-adjointness of Q <k+10 and then commuting k with time integration, we have: 1] .
For the first factor we claim that it is universally bounded due to its compensated structure. Indeed, passing to the Hardy space on each time slice via the embedding (2.6), we estimate it by: where we relied on the Calderón-Zygmund theory for the Littlewood-Paley square function and the vector valued operator (2 −k X Q 2 <k+10 P k ) k∈Z , precomposing with the spacetime LP-projections to |τ | + |ξ | ∼ 2 k as necessary. From there, proceeding as previously, we immediately bound the latter by O( ∇ x φ 2 1] ) as required. The set-up is similar for ϕ (1) k and ϕ (2) k . Here however, being at high modulations, we start by inverting the time-like vector X for one of the factors. Then, using the skewadjointness of 2 k X −1 Q ≥k+10 , but proceeding identically to the above otherwise, we obtain: where Q k+ j = Q k+ j−1≤·≤k+ j+1 is the slightly enlarged modulation projection, and we relied as previously on Calderón-Zygmund theory to discard the vector valued operator (2 k+ j X −1 Q k+ j Q k+ j P k ) k∈Z , precomposing with the space-time LP-projections to |τ | + |ξ | ∼ 2 k+ j as necessary, for any integer j ≥ 10.
From there, we note that the first factor is bounded by 1] ) as required. This follows essentially from the arguments used to treat the high-high interactions and the paraproducts, for ϕ (1) k and ϕ (2) k respectively, in the proof of Proposition 2.7 that we shall not reproduce here.
Given this, to prove estimate (2.36) for the terms ψ (4) k , ϕ (1) k and ϕ (2) k , it is enough by (2.3) and (2.4) to establish the following couple of weak estimates: for any 1 ≤ p, q, r ≤ n and any time slice t ∈ [−1, 1]. Consider (2.39). For convenience, let us suppress the time t from the notation. Moving X inside the bracket, we first differentiate the time cut-off getting by Cauchy-Schwarz: where we relied on the finite band property (2.2) for φ r , which is a permissible bound for (2.39).
Next, if X falls on φ p , then we have: with again the finite band property (2.2) applied to φ r , but this time in L ∞ x , and this is an acceptable bound.
When X falls on ∇ x φ q , we shall first insert the projection P ≤k+O(1) in front of φ p X ∇ x φ q , which is possible by the localization of ∇ x φ r ≤k+10 , and untangle the highhigh interactions: Given this decomposition, we have for the low frequency interactions: where we used the finite band property (2.2) for φ q , and this is acceptable. For the high-high frequency interactions: where we have used the finite band property (2.2) for φ r in L ∞ x , and transferred the spatial gradient from φ q to φ p by relying on (2.3) this time and the fact that |k 1 − k 2 | ≤ O(1). This control is acceptable applying the discrete Cauchy-Schwarz inequality in The last case we need to consider, in order to finish with (2.39), is when X falls on φ r . This follows however at once, applying (2.2) to the latter: which is certainly acceptable and gives (2.39). The estimate (2.40) is very much similar to (2.39). As previously, we move X into the bracket, first estimating the term when the derivative falls on the time cut-off, passing initially to L 1 x via Bernstein's inequality (2.4): simply noting that P >k+10 φ = P >k+10 (φ − c) and then discarding the LP-projection. When X differentiates φ p , we pass again to L 1 x , and then immediately get: Both estimates are acceptable for (2.40).
We consider now the term with X falling on φ q , and untangling the high-high interactions in the product we should regroup together φ p and φ r , obtaining: Now, given this decomposition, we control the first term directly by applying the finite band property (2.2) to φ q : which is acceptable by the boundedness of wave maps. For the high-high interactions we proceed as for (2.39) above, passing initially to L 1 x via Bernstein's inequality (2.4) and transferring the spatial gradient ∇ x from φ q to φ p φ r >k+10 via the finite band property (2.3), which gives: and using the discrete Cauchy-Schwarz, we can bound this via: which is certainly acceptable. Lastly, if X differentiates φ r , we pass to L 1 x and this immediately yields the desired control: hence we have (2.40).
Lemma 2.8 is proved.

Bubbling Analysis
In this section we prove our main Theorem 1.2. We start by recording, in the lemma just below, some of the important properties of the wave map φ, we were considering in the statement of the threshold Theorem 1.1, at the final rescaling obtained by Sterbenz and Tataru in Section 6.6 of [27].

1)
where ς ν ↓ 0 as ν → ∞, with the following properties: • There exists a sequence ν ↓ 0, with 1 2 ν ς ν , such that: • A decay to the self-similar mode holds: is the scaling vector field which we recall is uniformly time-like μ(∂ ρ , ∂ ρ ) = −1; • There is a uniform amount of energy E c > 0 getting concentrated by the maps φ ν in the interior of the light cone:
Let us write here a few lines of comments regarding the above lemma, referring the reader to [27] for more details. Given a sequence of concentration points (t ν , x ν ) for the energy dispersion norm: with t ν → 0 in the case of a finite time blow-up, or t ν → +∞ in a non-scattering scenario, the sequence ν ↓ 0 is chosen such that: In [27], Sections 6.3 and 6.4, the authors use the above lower bound to prove that there is a non-trivial amount of time-like energy concentrating on the time slice S t ν . As we shall later rely on those results in Sect. 3.2, we gathered them in Lemma 3.5 here. From there, a weighted energy estimate (see Lemma 3.4 in [27]) propagates this energy backwards in time, leading to (3.4) for any t ∈ [ 1/2 In parallel to this, a Morawetz type estimate (see Lemma 3.3 in [27]) and the pigeonhole principle enable Sterbenz and Tataru to find a sequence of time intervals , such that the following decay estimate holds: see Section 6.6 in [27]. Then for the final rescaling, the authors in [27] choose t ν τ ν for the scales λ 0 ν (or λ ∞ ν ), obtaining a sequence of wave maps φ(λ 0 ν ·) with the desired properties on the growing cones C [1,N ν ] . In our case, it will be more convenient (for notational purposes mainly, as to respect the CMC foliation in Sect. 3.1 below), to asymptotically cover all of forward light cone C 0 , so we should simply fix any: and choose then ς ν ↓ 0 decaying slowly enough, for Lemma 3.1 to hold. Finally, we bring reader's attention here to our convention that, in any of the results stated in this last section, we assume (3.1)-(3.4) holding without mentioning it. In fact, one might directly consider those as the assumptions under which claims of Theorem 1.2 are made.
3.1. Blow-up analysis for asymptotically self-similar sequences of wave maps. We start the proof of Theorem 1.2 with a study of the energy concentration sets. Our approach here will be close in spirit to the work of Freire et al. [9]. We will rely on a monotonicity lemma for asymptotically self-similar wave maps, see Lemma 3.2 below, which is a rough analogue of part (ii) from Lemma 1.7 in Lin's work [18], but mainly parallels the computations in the proof of Morawetz type estimates from Section 3 of [27]. Note that we do not use here the fact that our target manifold is a sphere.
It will be convenient to use hyperbolic coordinates, also known as CMC foliation of the (forward) light cone C 0 , where we recall that with respect to the Minkowski metric μ on R 2+1 . These formulae will be useful below applying Stokes' theorem in the hyperbolic annulus {ρ 1 ≤ ρ ≤ ρ 2 }. Let us also record here that, using the identities: one computes, for a smooth map φ into S n−1 : where ∇ H 2 denotes the gradient on the unit hyperboloid H 2 := H 2 1 : For every given ρ 0 > 0, let us define the Radon measures: We can naturally view them as measures on the unit hyperbolic plane H 2 since for any given test function ϕ on H 2 , that we should view as a function ϕ(y, θ) independent of ρ on the whole of the light cone C 0 , we have: Using the decay (3.3) to a self-similar mode, we can establish the following asymptotic monotonicity property for the family σ ν,ρ ρ>0 ⊂ R(H 2 ).
Proof. Given a continuously differentiable vector field ψ = ψ β ∂ β compactly supported in (y, θ), contracting the stress-energy tensor T [φ ν ] with ψ, we obtain the associated Noether current: Hence, if we set: where we relied on the conservation law (1.7), and: where our convention follows x 0 := t and x 0 = −t, so that x α = μ αγ x γ , applying Stokes' theorem over the region {ρ 0 ≤ ρ ≤ ρ 0 + λ} leads to the identity: Taking ψ = ϕ(y, θ)∂ ρ , we compute using the expression (1.6) for T αβ [φ ν ]: and for the boundary terms: where to pass to the second line we have used the identity (3.5). Therefore, plugging the above back into (3.7) we obtain: Integrating over ρ 0 ∈ [ρ 1 , ρ 2 ] and using Cauchy-Schwarz for the second term on RHS above, appealing to the decay (3.3) and the global energy bound (3.1), we obtain (3.6). Hence Lemma 3.2 is proved.
From now on we restrict ourselves to the time interval 1 ≤ t ≤ 2. We will study there the sets in space-time where our wave maps concentrate a non-trivial amount of energy as in the work of Freire et al. [9], where some general statements about the structure of energy concentration loci can be found (for instance, it is shown in Proposition 4.1 and Theorem B.1 of [9] that, upon passing to a suitable subsequence, the concentration set of an energy threshold will be contained in a finite union of Lipschitz curves). Our assumptions however enable us to go beyond [9] via more elementary arguments and prove that picking a suitable subsequence will lead to an energy concentration set which is in fact given by a finite collection of time-like geodesics, relying on Lemmata 2.3 and 3.2.
To use the latter, we remark that for a fixed open domain U with closure U ⊂ C 0 3] )), and this will enable us to transfer control back and forward between the Radon measures σ ν,ρ and the energy densities ∇ t,x φ ν 2 dxdt of which we want to study the concentration sets (with the small energy compactness Lemma 2.3 enabling us to obtain some uniformity in time).

Lemma 3.3.
There exists a subsequence of {φ ν } ν∈N restricting to which, without changing notation, we can find a finite collection of time-like geodesics 1 , . . . , I passing through the origin in Minkowski space such that defining the energy concentration set by: we have: and away from , there exist a wave map φ satisfying: for any 0 < < 1 2 , of finite energy on C 0 [1,2] , [1,2], such that:
Proof. In view of the asymptotic monotonicity provided by Lemma 3.2, let us denote for a set U ⊂ S t=1 the cone over U by: and by C I (U ) := C(U ) ∩ C I the corresponding truncation to a time interval I . Considering the time slice S 0 1 , given the global energy bound (3.1) we can pass to a subsequence for {φ ν } ν∈N , without changing notation, such that for some Radon measure ι ∈ R(S 0 1 ) we have: from where we also see that there exist only finitely many points {x i } I i=1 ⊂ S 0 1 such that: and we set i := C({x i }). Let us start by showing that: obtaining on the way claim (3.9). Fix any point x 0 ∈ S 0 1 \ , then there exists a radius r 1 = r 1 (x 0 ) > 0 such that for all ν ∈ N: hence by the energy-flux identity (1.9), shrinking r 1 to r 2 > 0 as necessary, we obtain that: By the decay assumption (3.3), we can apply the compactness Lemma 2.3 obtaining that on a subsequence {φ ν } ν ∈N we have convergence in C 0 to a wave map φ in [1 − r 2 , 1 + r 2 ] × B r 2 (x 0 ), satisfying ∂ ρ φ = 0 and having regularity as dictated by (2.15) there.
Hence, given any positive constant η > 0 there exist a radius r η > 0 such that: Therefore, using (3.8) we get for any test function ϕ(y, θ) on the hyperboloid H 2 , having support in C(B r η (x 0 )) ∩ H 2 and satisfying 0 ≤ ϕ ≤ 1, the bound: for some suitably chosen 0 < ρ 1 < ρ 2 . The implicit constant here does not depend on the parameter η, and in fact depends only on the distance of the point (1, x 0 ) to the null boundary.
Recalling Lemma 3.2, we obtain by (3.6) for every fixed λ > 0 the estimate: Given this, shrinking r 2 to r 3 = r 3 (x 0 , η) > 0 and picking a suitable cut-off function ϕ on H 2 as necessary, we can rely on the other inequality in (3.8) this time and the energy-flux identity (1.9) to find, arguing via the pigeonhole principle, a finite cover of: and such that: where the implicit constant is independent of η. Hence, choosing η > 0 small enough we can claim: with N = N (x 0 ) and r 3 = r 3 (x 0 ) now. Proceeding this way for a countable dense set of points x 0 ∈ S 0 1 \ , we obtain ultimately a countable cover of C 0 [1,2] \∪ i i that we can use together with the compactness Lemma 2.3 to construct a subsequence for {φ ν } ν∈N via the diagonal process, to which we restrict ourselves without changing notation this time, such that (3.9) hold for a wave map φ ∈ (H By construction, it can be seen immediately that the obtained map φ has energy bounded by E and we note the argument also yields (3.12) as desired.
To finish the proof of the lemma, we need to get the reverse inclusion to (3.12). This follows however from a simple argument by contradiction: suppose that there exists a point (s i , y i ) ∈ i which is not contained in . We can then run the above proof with (s i , y i ) instead of (1, x 0 ) and obtain that the full ray i is not contained in , but that contradicts the definition of x i from (3.11). Lemma 3.3 is therefore proved.
To close the proof of the first part of Theorem 1.2 it is enough now to prove that the wave map φ obtained above must in fact be constant. For this point, we will rely on a folklore fact that finite energy self-similar wave maps do not exist in dimension 2 + 1 which we state in Proposition 3.4 below. A self-contained proof of this proposition can be found in the work of Sterbenz and Tataru [27] (see Section 4 there). Consider the wave map φ from Lemma 3.3. By homogeneity, we can extend it to: t,x and satisfying ∂ ρ φ = 0. Let us note here that we were considering the unit time interval [1,2] in (3.9) just in order to simplify the task of keeping track of the dependence of implicit constants. It is easy to see that the arguments above lead to local convergence of the sequence φ ν to the map φ on all of C 0 \ ∪ i i . This is however a purely qualitative statement.
Restricting φ to the unit hyperbolic plane H 2 gives rise to a harmonic map of locally finite energy, by (3.8), defined away from a finite set of points given by H 2 ∩ I i=1 i . By the regularity theory due to Hélein [11], we obtain in fact a smooth harmonic map away from the above collection of points. But then, by the removable singularity theorem of Sacks and Uhlenbeck [23] we can extend φ to a smooth harmonic map on the whole of the hyperbolic plane H 2 , which in turn means that, by homogeneity again, we could have extended φ across the rays i to a smooth finite energy self-similar wave map on C 0 . By Proposition 3.4, φ has to be a constant.
The first point of Theorem 1.2 is therefore established, given that must be nontrivial by the concentration of time-like energy assumption (3.4).

Dispersive property for null-concentration.
This short section is devoted to the description of the parts of the sequence that escape into the null boundary. We proceed first, borrowing arguments from Section 6.1 of [27], by constructing extensions for the maps φ ν outside the light cone with asymptotically vanishing energy there (we note that, if considering the non-scattering problem, those have been already constructed in Section 6.2 of [27]).
Relying on the flux decay estimate (3.2) and using the angular part of F [ς ν ,ς −1 ν ] [φ ν ], see the expression in (1.9), we can find by the pigeonhole principle a sequence τ ν ∈ [2,3] such that: Hence, as in Remark 2.4, we get that φ ν (∂ S τ ν ) is contained in a chart of radius O( 1/4 ν ) and so we can build smooth spatial extensions φ ν [τ ν ] ∈ T (S n−1 ) of φ ν [τ ν ], satisfying the energy control: We solve then the wave maps equation with initial data φ ν [τ ν ] backwards in time for t ∈ [ς ν , τ ν ]. By the finite speed of propagation property, the solution agrees with φ ν on C [ς ν ,τ ν ] , hence let us denote it by φ ν (abusing slightly notation). Moreover, relying again on the assumption (3.2) and using the conservation of energy law (1.3) together with the energy-flux identity (1.9), we propagate to all of the time interval [ς ν , τ ν ] the smallness of the energy exterior to the light cone: which in particular guarantees smoothness of the extension on all of [ς ν , τ ν ] × R 2 .
Another consequence of the flux decay estimate (3.2) that we record here, is the following weighted control: (3.13) direct consequence of Lemma 3.2 in [27], and constitutes an important ingredient in the elimination of sharp pockets of null energy (see Section 6.3 of [27]). Regarding the interior of the cone, by the previous section we can pick a monotonically decreasing sequence of scales δ ν ↓ 0, starting with δ 0 := 1 10 dist(∪ i i , ∂C [1,2] ), such that: lim (3.14) which are in some sense the slowest concentration scales, i.e. have the property that: where the constant c φ corresponds to the wave map φ from (3.9), for any given t 0 ∈ (1, 2) and i = 1, . . . , I . This can be obtained upon taking δ ν tending slower to 0, which will not break condition (3.14). Hence, by pigeonholing, we can choose a sequence of radii σ ν = σ ν (t 0 , i) ∈ (3, 4) such that: which enables us, as before, to construct extensions into B σ ν that have asymptotically vanishing energy. That is we cut off the bubbles from the body of the map. More precisely, we choose a sequence of maps ( i,t 0 ,ν , ∂ t i,t 0 ,ν ) ∈ T (S n−1 ) defined on B σ ν such that: and performing this surgery for each i = 1, . . . , I , we obtain smooth maps: satisfying by construction: Moreover, fixing t 0 ∈ [1 + δ 0 , 2 − δ 0 ], we can naturally view t 0 ,ν [t 0 ] as defined on the time slice S t 0 , and solve the wave maps equation with initial data t 0 ,ν [t 0 ] obtaining a smooth solution on [t 0 − δ 0 , t 0 + δ 0 ] provided we work with ν large enough, relying on the finite speed of propagation property (which tells us that t 0 ,ν agrees with φ ν near and beyond the null boundary, at least away from C 2δ 0 [t 0 −δ 0 ,t 0 +δ 0 ] ), and the small energy regularity via (3.16). The choice of δ 0 is not the most optimal one, but here we are rather concerned with its independence from ν. It is immediate then that, as desired in Theorem 1.2, and furthermore the weighted estimate (3.13) is inherited by the maps t 0 ,ν : (3.17) giving us the possibility to apply the following lemma of Sterbenz and Tataru from [27] (see Sections 6.3 and 6.4 there), to get the energy dispersion norm of t 0 ,ν asymptotically vanishing and conclude on the second point of Theorem 1.2. [27]). Consider tuples {(ϕ ν , ∂ t ϕ ν )} ν∈N of Schwartz functions on R 2 satisfying, for some sequence ν ↓ 0 and a bound E > 0:

Asymptotic decomposition.
We have reduced the proof of Theorem 1.2 to carrying out the bubbling analysis for our sequence of wave maps {φ ν } ν∈N near the set of time-like energy concentration: recalling the set-up from Sect. 3.2, where δ 0 > 0 controls the distance to the null boundary ∂C of the light cone, on which dependence of our constants will be considered universal. The dynamics of the maps φ ν near distinct rays i are completely disjoint and to get the claimed asymptotic decomposition from Theorem 1.2 we will have to select the time slices t ν rather carefully. To start, in order to obtain from the decay assumption (3.3) the asymptotic stationarity at all scales for some suitably chosen time slices, we consider a sequence of positive functions on the time interval [1,2] defined by: so that ζ ν L 1 t [1,2] → 0 by (3.3). Then, looking at the corresponding Hardy-Littlewood maximal functions: the well-known maximal inequality of Hardy-Littlewood tells us that for any λ > 0: Therefore taking a sequence λ ν ∼ ζ ν 1/2 L 1 t ↓ 0 decaying slowly enough compared to ζ ν L 1 t , we can select a sequence of time slices {t ν } ν∈N ⊂ (1 + δ 0 , 2 − δ 0 ) such that: We should note here that this will not be quite the final sequence of time slices we will claim the soliton resolution on as we might need to perturb it a little at scales δ ν . From there, we have to study for each i = 1, . . . , I , a sequence of wave maps obtained from φ ν , upon translating by (t ν , i (t ν )) and rescaling by δ ν , which gives us by (3.15): . Moreover from (3.19), denoting by X i the unit constant time-like vector field pointing in the direction of the line i , we have: Proceeding as in Remark 2.4, we interpolate smoothly between φ i,ν [0] and the constant initial data (c φ , 0) ∈ T (S n−1 ) on B 4 \B 3 , replacing the map φ i,ν with a wave map φ i,ν agreeing with the latter on [− 3 2 , 3 2 ] × B 3/2 and constant outside B 6 (at most) for t ∈ [− 3 2 , 3 2 ] by finite speed of propagation. This introduces an error of asymptotically vanishing energy on this time interval, safely by (3.20). In fact, from the construction it is immediate that: away from i , and we still have decay in a time-like direction: Let us fix a smooth time cut-off 1], so that we get now in position to apply Proposition 2.7, obtaining from (2.20) the following decomposition: Furthermore, applying Lemma 2.8, we get from (2.35) a decomposition for second order time-like derivative of φ i,ν : where the first item is a linear combination of: while the second one satisfies (2.36): (3.26) We note that the implicit constants, including the factors in the linear combination for i,ν , depend only on the energy bound E from (2.18) and the distance δ 0 to the null boundary ∂C from (3.18), hence can be considered universal for the rest of the argument.
With this understood, we define non-negative functions ϑ i,ν , ξ i,ν , ζ i,ν , and π i,ν for i = 1, . . . , I and t ∈ [−1, 1], setting: as well as: We will now choose a sequence of time slices where we uniformly control θ i,ν and have all of the other functions above asymptotically decaying. This will be used to prove decay of the weak Besov normḂ 1,2 ∞ on the neck regions, and ultimately get the energy collapsing there via the control on θ i,ν . At the same time, to start this argument, we shall build first the weak bubble tree decomposition. To do so, one relies on the small energy compactness result from Lemma 2.3 (which, for example, enables one to extract solitons from the standard concentration-compactness procedure). Hence, for that reason, we will need to control the maximal function Mζ i,ν corresponding to X i φ i,ν (t) 2 L 2 x as well.
Let us take λ θ i ν ∼ θ i,ν for some arbitrarily small > 0 to be fixed according to (3.27) below, as well as λ Hence, applying Chebyshev's inequality and the maximal inequality of Hardy-Littlewood for Mζ i,ν , we get: (3.27) Therefore, we can choose a sequence of time slices {t ν } ν∈N ⊂ [− 1 2 , 1 2 ], that we may assume simply to be t ν = 0 upon translating the maps φ i,ν by (t ν , i (t ν )) without changing notation for φ i,ν (and working on [− 1 2 , 1 2 ] × B 6 ), such that for all i = 1, . . . , I we have the following control: These are the final time slices that we will consider and obtain the asymptotic decomposition on, as claimed in our main theorem. We start doing bubbling analysis on them just below. Here we just add the remark that, upon working in (3.27)-(3.28) with the maximal functions for θ i,ν , ξ i,ν and π i,ν as well, it should be clear by end of the argument that we can also get the energy collapsing result for almost every time slice strictly within the lifespan of the fastest concentrating solitons.
In the following lemma we present a preliminary version of the soliton decomposition. It is essentially the one that we aim towards from Theorem 1.2, but it contains errors that we shall call necks-those are wave maps on conformally degenerating annuli such that once localized in space converge to a constant but when considered on the whole annulus might carry a priori a non-trivial amount of energy. Ruling out such a scenario will be the last step in the proof of the main theorem.
We note that the proof of this lemma relies on a covering argument which goes back to at least Ding and Tian [5] and today is pretty standard in the literature on bubbling analysis of harmonic maps (and related areas, where some authors refer to as weak bubble tree convergence). The lemma of course holds for any closed Riemannian manifold as a target.

29)
as ν → ∞ for j and j distinct, such that: we have the following asymptotic decomposition holding for t ∈ [−λ min,ν , λ min,ν ]: (3.30) where N i,ν stands for the wave map φ i,ν restricted to a collection of K i E 1 sequences of degenerating annuli: holding for each k = 1, . . . , K i .
Proof. Let us fix i = 1, . . . , I , and suppress this subscript in the argument below to lighten the notation. In the same spirit, we also never change notation here whenever passing to a subsequence for {φ ν } ν∈N while using Lemma 2.3 as it will be clear from the construction that we obtain in the end a countable cover of a suitable neighborhood of {t = 0} × B 3 on which we can rely to build via the diagonal process a final subsequence that satisfies the claims of Lemma 3.6. Pick a sequence of points a 1 ν ∈ B 1 with radii λ 1 ν ↓ 0 such that: Consider the sequence of balls B 2 k λ 1 ν (a 1 ν ) with k a positive integer, and choose the lowest K 0 = K 0 ({a 1 ν } ν∈N ) ∈ N such that the functions: which are continuous as the wave maps φ ν are smooth, admit a collective positive lower bound r := lim inf x,ν r ν (x) > 0 (assuming K 0 exists, the case when it does not is treated later when we describe convergence to solitons at infinity). As a preliminary step, relying on (3.28) and the compactness Lemma 2.3, we can obtain for the rescalings of the maps φ ν at a 1 ν , upon passing to a subsequence, that: for some wave map ω 1 with regularity as in (2.15) and satisfying X ω 1 = 0, with X standing for the constant time-like vector field X i from (3.21). Therefore, the map ω 1 is part of a soliton.
The time interval [− r 3 , r 3 ] for the convergence above will be improved considerably below by recalling the methods from Sect. 3.1, see the proof of (3.37). Now, we shall proceed instead describing further ω 1 in space. Slightly abusing terminology, let us refer to ω 1 as a soliton already from here, bearing in mind that we will prove it is one shortly.
By construction, we can find at least one sequence of concentration points: bubbling off on the top of the soliton ω 1 in the sense that: where it is quite important to note that we have an equality above, a fact that must hold by the compactness Lemma 2.3. Let us consider a new sequence of concentration points satisfying (3.35) and (3.36) like a 2 ν ν∈N , in other words forming itself above the scales λ 1 ν and converging, upon passing to a subsequence, in the closure of B 2 K 0 λ 1 ν (a 1 ν ), so that it suffices to work in There are of course uncountably many of those, given the existence of a single one, a 2 ν ν∈N , but we are going to consider equivalent all those for which the orthogonality condition (3.29) holds and pick only one representative per equivalence class. That is, if a sequence a ν ν∈N satisfies (3.35) and (3.36) but in addition also has λ ν ∼ λ 2 ν with: then one can see that the maps φ ν (λ 2 ν t, a 2 ν + λ 2 ν x) and φ ν (λ ν t, a ν + λ ν x) would converge on [−2 −1 , 2 −1 ] × B 2 −1 , upon passing to a subsequence directly by Lemma 2.3, to the same soliton up to translation that we should denote by ω 2 as it was initially obtained from a 2 ν once the procedure we are describing now for the soliton ω 1 is completed and applied to the soliton ω 2 . Hence the sequence a ν ν∈N should be discarded keeping a 2 ν ν∈N . Given the orthogonality relations (3.29) holding between any two sequences of concentration points as above, we note that we are left with only finitely many possibilities, say {a j ν } ν∈N with j = 2, . . . , J . This follows from the fact we are considering a sequence of functions ∇ t,x φ ν ν∈N ⊂ L 2 x , bounded by the global energy control assumption (2.18), and with ∇ t,x φ ν concentrating definite amounts of its L 2 x norm, namely √ s , note the equality in (3.36), at different frequency and/or spatial scales so that we can conclude that, since L 2 x is a Hilbert space, we should have: which is a universal bound for us as desired.
The collection {a j ν } ν∈N , j = 2, . . . , J , gives rise to solitons ω j , one for each j, by the same procedure as described for ω 1 and so from now on we should be running for each of them the same construction as we are currently considering for ω 1 .
From the point of view of ω 1 , we can subdivide the above collection of sequences of energy concentration points into disjoint families by considering the limit points ), indexed by q = 1, . . . , Q for some integer Q ≤ J , to which the sequences converge once rescaled by λ 1 ν . So for any r > 0 small but fixed, we have by Lemma 2.3: since the functions r ν from (3.34) extended to B 2 K 0 \ ∪ q B r (b 1 q ) admit a collective lower bound r := lim inf x,ν r ν (x) > 0 (provided r > 0 is fixed of course as r depends on it). Understanding the behavior of the maps φ ν as r ↓ 0 is linked to the convergence of φ ν to solitons at the spatial infinity and this is when the neck domains enter into our picture. We shall discuss this straight after we finish the construction of the soliton ω 1 (and so for the other ones, ω j above, in parallel).
Considering the annuli B 2 K 0 +k λ 1 ν (a 1 ν )\B 2 K 0 +k−1 λ 1 ν (a 1 ν ) one after the other and studying as above whether there are new sequences of concentration points satisfying (3.36), upgrading the collection {a j ν } ν∈N , j = 2, . . . , J , accordingly upon checking the orthogonality relation (3.29) holds for each new member (we should not change the notation for the upgraded version), we must a reach an integer K 1 = K 1 ({a 1 ν } ν∈N , s , E) ∈ N such that for any k ≥ K 1 the functions r ν from (3.34) once considered on B 2 k (a 1 ν )\B 2 k−1 (a 1 ν ) would admit a positive collective lower bound there. Note that this situation could have occurred without passing by the previous bubbling analysis induced by the existence of the integer K 0 , e.g. if we would have picked up the fastest concentrating soliton initially for ω 1 .
From there, we let k → ∞ with r ↓ 0 and fully construct the soliton ω 1 in the sense that we claim: for a finite collection of geodesics 1 q , q = 1, . . . , Q , each passing through the corresponding point b 1 q , all with direction X , and such that X ω 1 = 0 there. To prove (3.37), we note that by (3.28), used already above, we have for any fixed bounded time interval the following decay estimate: and so denoting by the Lorentz boost taking ∂ t to X , if one considers the foliation induced by ({t} × R 2 ) t∈R on the whole of Minkowski space R 2+1 instead of the CMC foliation in the interior of the forward light cone as in Lemmata 3.2 and 3.3, the very same arguments would lead to the convergence claimed in (3.37). Let us present some details, setting ϕ ν = φ ν (λ 1 ν ·, a 1 ν + λ 1 ν ·). Working on −1 (R 2+1 ) we denote the coordinates there by xᾱ, or (t,x 1 ,x 2 ), and writingφ ν := ϕ ν • we get by the Lorentz invariance of smooth wave maps that the associated stress energy tensor Tᾱβ [φ ν ] enjoys the conservation law ∂ᾱ Tᾱβ [φ ν ] = 0. So, contracting T [φ ν ] with the vector field χ(x)∂t , for some continuously differentiable test function χ with ∂t χ = 0, and integrating the divergence of the Noether current ∂ᾱ( (χ (x)∂t ) Pᾱ) over the stript ∈ [t, t + λ] for any t ∈ R and positive constant λ > 0 (similar considerations apply when λ < 0), we get by Stokes' theorem and the mentioned conservation law: Hence, integrating the above identity over t ∈ [t 0 , t 1 ] for given t 0 , t 1 ∈ R, using the decay (3.38) we obtain: analogously to (3.6) from Lemma 3.2. To use this asymptotic monotonicity formula to propagate small energy control, we note that we have ∇t ,xφν ∼ ∇ t,x ϕ ν with the implicit constant depending only on X , which is constant and fixed. Therefore, proceeding as in Lemma 3.3, given any point y ∈ R 2 \ ∪ q b 1 q and a positive constant η > 0, there exists a radius r 1 = r 1 (y, η) > 0 such that: which leads to the control: that in turn gives us, precomposing with and shrinking suitably the radius to r 1 > r 2 r 1 : where (s,ȳ) := −1 (0, y). By the decay estimate (3.39), we get that given any λ ∈ R: lim sup and so going back to ϕ ν by precomposing with −1 , shrinking further the radius to r 2 > r 3 r 2 we obtain by the pigeonhole principle, using the energy flux identity (1.9), the estimate: for any given λ ∈ R, viewing naturally X ∈ R 2+1 . All the implicit constants above being independent of η (and of λ, the dependence on which of our construction is hidden in the limsup), we can choose η small enough obtaining the small energy control for any fixed λ ∈ R: with the radius r 3 = r 3 (y). Therefore, picking suitable collections of points y ∈ R 2 \ ∪ q b 1 q and constants λ ∈ R, we construct a countable cover of R 2+1 \ ∪ q 1 q such that relying on the estimates (3.38) and (3.40) we can apply Lemma 2.3 to get a subsequence via the diagonal process for which the local convergence claim (3.37) holds a desired.
Note that by construction ω 1 has energy bounded by E, and so precomposing it with the Lorentz boost we get a steady in time finite energy harmonic map from R 2 minus a finite set of points (note that the energy of this harmonic map will be smaller or equal to E[ω 1 ], nothing travels faster than light!). By the regularity theory of Hélein [11] the latter has to be smooth and by the removable singularity theorem of Sacks and Uhlenbeck [23], it extends smoothly across the singular points. The outcome of this argument is therefore that ω 1 is a smooth finite energy wave map defined on the whole of R 2+1 with X ω 1 = 0, i.e. a genuine soliton as desired.
The same holds of course for the solitons ω j , j = 2, . . . , J , but note that those do not of course constitute all the members of the decomposition (3.30) as parts of the maps φ ν can get lost a priori at spatial infinity and in between the solitons we are considering. We shall address this issue now.
Consider the scales λ 1 ν ν∈N corresponding to the soliton ω 1 . Fix an arbitrary small 0 < ε < s , then by the pigeonhole principle there exist an integer K (ε) ≥ K 1 such that for any k ∈ N fixed: for all ν large enough. Suppose that there exist a sequence of smallest integer k ν (ε) ≥ K (ε), as ν gets large, such that the above inequality fails: and note that by construction we must have k ν (ε) → ∞; then we have found a new soliton on the top of which our previous ω 1 is concentrating, that we should denote by ω J +1 so that setting λ J +1 ν := 2 k ν (ε)−1 λ 1 ν we can apply directly Lemma 2.3, by the choice of k ν (ε) and (3.41), to get: and the analysis we carried for ω 1 so far should also be applied to ω J +1 now.
It should be clear that if no k ν (ε) as above exist, i.e. (3.41) is not violated for any k ∈ N for ν large, then choosing 0 < ε < s small enough initially, by equality in (3.33) we must have been working with ω 1 and there should exist then a sequence of integers k ν such that 2 k ν λ 1 ν ∼ 1 and (3.41) holding for any k = 1, . . . , k ν − K (ε), with any 0 < ε < ε for larger k ≥ k ν − K (ε) by (3.20) as ν → ∞. The map ω J +1 would be standing for the constant c φ in this case.
For the other solitons ω j , with j ≥ 2, k ν (ε) must exist and we could of course end up with ω 1 , or also a constant (to which some authors refer to as a ghost bubble, i.e. a soliton on the top of which two or more non-constant solitons are concentrating but itself is constant) in which case we obviously do not consider this as a new soliton. This brings us to the final steps in the proof of Lemma 3.6.
In fact, in the above construction the constant ε > 0 could be arbitrarily small but was initially fixed and we would like now to let it degenerate to 0. We claim that in fact we can put ourselves in a situation when for any smaller 0 < ε < ε the choice of the integers k ν (ε ) ∈ N is uniform in the sense that there exist positive integers L(ε ) ∈ N independent of ν such that k ν (ε ) = k ν (ε) − L(ε ), that is: for ν large enough. If this were to fail for some ε > 0, we could find a sequence of scales, that we denote by λ J +2 ν , such that: (3.43) and that would give rise to new non-constant solitons at scale λ J +2 ν or above, in which case we have to redefine ε as ε . Note that we can have only finitely many non-constant solitons forming by the global energy bound (3.1) since those cannot have arbitrary small energy as this is not possible for harmonic 2-spheres, and by (3.43) they are asymptotically orthogonal inḢ 1 x × L 2 x . Hence our procedure, applied to every single soliton we have found so far, detects all of the solitons in the claimed decomposition (3.30) and we are just left to characterize the regions in-between the domains of convergence to solitons as neck regions, but this can be obtained directly from (3.42) as follows.
Upon changing notation, by the above remarks we can assume that (3.42) holds. Now, we simply choose sequences 0 < r 1 ν ≤ R 1 ν tending to 0 slowly enough so that for any ε > 0 small enough: then by (3.42) there exits a sequence ε 1 ν = ε 1 ν (r 1 ν , R 1 ν ) ↓ 0 such that: If we know a priori that r 1 ν ∼ R 1 ν , then we can immediately absorb this part of the wave map φ ν into the error term o L ∞ t (Ḣ 1 x ×L 2 x ) (1) in the decomposition (3.30) and there is no loss of energy between the considered solitons. Otherwise we should have r 1 ν R 1 ν , i.e. the annulus is conformally degenerating, and this is precisely a neck in our terminology, as required. To prove Theorem 1.2 we must show that those terms can also be absorbed into oḢ1 (1) upon picking a suitable time slice, but that's the next and final step of the whole argument. So far we have established Lemma 3.6.
Remark 3.7. We note here that our techniques cannot say anything more about the decomposition beyond the scales O(λ min,ν ) ν∈N which is a central issue to address if one were to try understanding the full soliton resolution conjecture.
Let us also remark that there is also quite some freedom in fixing the radii R k i,ν and r k i,ν defining the neck domain, as for any positive integer ∈ N which can be arbitrarily large but fixed, we still have: which follows directly from the characterization (3.42) in the proof of Lemma 3.6 above.
Our aim now is to show energy collapsing for the necks N i,ν , that is a decay to zero for the L 2 x norm of ∇ t,x φ ν as ν → +∞ on the degenerating annuli (3.31). We shall start by obtaining a decay in the weaker BesovḂ 1,2 ∞ norm for N i,ν , as consequence of the property (3.32), up to an error whoseḢ 1 x norm is controlled by the L 2 x norm of X φ ν for some time-like vector field X that we will fix according to (3.28) later. This is the content of the following lemma.
Moreover, we assume the maps are asymptotically steady in the direction of a constant time-like vector field X , standing for one of the X i 's from (3.21) which we can take to be given by (2.19): (3.46) and the second order time-like derivatives satisfy: Both assumptions are justified by (3.28).
Then on the neck region, we can write for the map φ ν : 1] E, and satisfying the following weak decay estimate on t = 0: The strategy of our argument is roughly to replace, by using the decay in the direction of the time-like vector field X , the sequence of wave maps on neck domains under consideration with another one, differing by an error of vanishing energy and converging locally to a constant on the neck domain with more regularity thanḢ 1 x × L 2 x for φ ν . However, because we need to obtain estimates that are uniform in time, working on very short intervals, we should not rely on the small energy regularity theory from Theorem 2.2 and the direct use of Fourier restriction spaces, as in the proof of the compactness result by Sterbenz and Tataru [27] (Proposition 5.1 there), but proceed directly via the wave maps equation (1.2) proving a weakḂ −1,2 ∞ decay estimate for its quadratic structure in the gradient at high frequency (without any null-structure involved, hence having target S n−1 is not specifically necessary for this part of the argument), and then using Lemma 2.8 to control the second order time-like derivatives (the latter though does involve the conservation law (1.4) for wave maps into spheres).
Proof. As usual, having the required control in a time-like direction, it is enough to consider the spatial gradient only. Now working on the domain [−1, 1]×(B 2 Nν \B 2 nν ), we note it being arbitrarily rough in time as n ν , N ν → +∞ degenerates. This is an additional difficulty, to be dealt with in the present proof, in comparison to the analogous estimate for harmonic maps, where ε-regularity is used on the domains [−2 −1 , 2 −1 ] × (B 2 +1 \B 2 ) instead, see for the example the paper of Lin and Rivière [19] on p. 188.
Before taking the main line of the argument, let us start with some preliminaries, fixing the decay rates for the assumptions of Lemma 3.8, that is sequences ι ν ↓ 0, σ ν ↓ 0 and ε ν ↓ 0 for which: corresponding to (3.47), (3.46) and (3.45) respectively. Next, we consider, for an arbitrary choice of integers ν between n ν and N ν , the sequence of wave maps: We build an extension ψ ν, ν of φ ν, ν , as in Remark 2.4, by smoothly interpolating on and (c ν , 0) ∈ T (S n−1 ), for some suitably chosen sequence of constants c ν = c ν (φ ν, ν ), solving the wave maps equation for ψ ν, ν with initial data of ψ ν, ν [0], such that scaling back and setting ψ ν ν (·) := ψ ν, ν (2 − ν ·), we have (denoting by 1 ν the characteristic function of B 2 ν +1 \B 2 ν −1 over the time interval [−2 ν −3 , 2 ν −3 ]): ε ν and 1 ν φ ν = 1 ν ψ ν ν , (3.52) by (3.50) and the finite speed of propagation property respectively. From there, we construct a partition of unity over [−1, 1] × (B 2 Nν \B 2 nν ) paralleling the Littlewood-Paley decomposition in frequency space. For the spatial directions, we recall the non-negative radial bump functions m 0 and m ≤0 used in the definition of the LP-projections P 0 and P ≤0 , but which this time, we will use on the physical space setting:m We get then the following "physical LP-decomposition": where η(t) stands for the rough cut-off to the time interval [−1, 1], and of course it is immediate that ϒ ν 1] E. Moreover we note that, recalling the extensions (3.52), we have ηm ν φ ν = ηm ν ψ ν ν . Writing φ c ν := φ ν − c ν , for an arbitrary sequence of maps corresponding to (3.51), and similarly for φ c ν, ν , together with the extensions ψ ν ,c ν and ψ c ν, ν from (3.52) which become compactly supported by construction, we consider the commutator (denoting the cut-off functions by χ ν := ηm ν ): and start by treating the second term, for which we claim: for any k ∈ Z. To see this, we rescale by 2 ν . For high frequency scales 2 k 1, we can use the extra regularity, the spatial derivative falling on the cut-off instead of the map, available from: introducing the extensions ψ c ν, ν , so that applying Poincaré's inequality in L 2 x for the first term, given the spatial localization of ψ c ν, ν at any given time slice in the support of η ν (·) := η(2 ν ·), we get by the finite band property (2.3) and the bound (3.52): as desired. For low frequency scales 2 k 1, by Cauchy-Schwarz and Poincaré's inequalities, we have: x ) , dropping ∇ xm0 , and so using Bernstein's inequality (2.4) we obtain here an exponential gain as well: by the energy bound (3.52). Hence, claim (3.55) follows.
We remark that, by the same argument, we get also control for the low frequencies of the first term ∇ x (χ ν φ c ν ) in the commutator: (3.56) and so it remains to treat now the main terms, that is the LHS above when ν ≥ −k, for which we should rely on the wave maps equation, the time-like control assumption (3.49), as well as the favorable decay (3.48) we already have. Recalling the expression for the operator (2.26), we compute then: . Let us treat first the smooth terms on the first line of (3.57), of which there are two types, (∇ 2 x χ ν )ψ ν ,c ν and ∇ x χ ν ∇ t,x ψ ν ν , the cut-off differentiated in a spatial direction, claiming for both the control: To show this, relying on Plancherel in L 2 x , we discard the Fourier multiplier 2 k ∇ x −1 x,β P k (where P k = P k−1≤·≤k+1 ), having symbol bounded uniformly in k ∈ Z. Rescaling by 2 ν we are brought to estimate for k ≥ O(1): where the second term is directly seen to have the desired control by (3.50), whereas for the first one, given the spatial support of the extension ψ c ν, ν , we apply Poincaré's inequality in L 2 x as before, which allows us to conclude by (3.52). The second line of (3.57) is an error term controlled thanks to the time-like decay (3.49) we have. We first write: ν , and note that the second term here was already treated in (3.58), and so we just need to show: but this follows at once by Plancherel in L 2 x , as the Fourier multiplier ∇ 2 x −1 x,β P k has a bounded symbol, dropping the cut-offs and relying on (3.49).
Finally, we shall consider the delicate second order time-like derivatives and the nonlinear terms on the third line of (3.57). As was already required for (3.59), we restrict ourselves from now on to work exclusively over the time slice t = 0. And to lighten the notation, we shall not mention this explicitly anymore.
Thanks to the assumption (3.48), we have already partial control on them through X,ν , which however we need to localize to the neck region B 2 Nν +1 \B 2 max(−k,nν )−1 . In doing so, we first note that sincem ≤0 was initially fixed spatially Schwartz, we have: given that the above norm is scale invariant. Hence applying the Littlewood-Paley trichotomy to m k,N ν X,ν , we get: From there, using (3.48), we estimate the low-high interactions by: the high-low ones by: whereas for the high-high cascade we have: where we have used Bernstein's inequality (2.4) passing to L 1 x , and then Cauchy-Schwarz with the fact that k 1 = k 2 + O(1).
Putting those estimates together we get the required control for m k,N ν X,ν : by discarding the multiplier 2 k ∇ x −1 x,β P k and relying on the bounds for the cut-offs m k,N ν discussed above.
We treat now the non-linear bulk left from Lemma 2.8, decomposing it into: introducing the convenient notation φ k ν := P k φ ν (also later φ k ν, ν := P k φ ν, ν for the rescaled maps), etc. We want to treat this term perturbatively, as in elliptic regularity theory, and so we proceed claiming first the followingḂ −1,2 ∞ estimate: where the sums are such that both ν and ν + j range between max(−k, n ν ) and N ν .
Discarding the Fourier multiplier 2 k ∇ x −1 x,β P k via Plancherel in L 2 x , we note the Littlewood-Paley projection P k in front of the sum in (3.61) is crucial to handle the remaining factor 2 −k . But frequency localization induces spreading for the physical support by the uncertainty principle. And so, we are not allowed to use a square-summing trick relying on the finitely overlapping supports of χ ν B ν i . On the other hand, this leakage is very much controllable given the fact that k ≥ − ν + O (1), which corresponds to high frequency here.
More precisely, let us bound the LHS of (3.61) via: with both ν and μ ν ranging between max(−k, n ν ) and N ν . By the self-adjointness of P k , the summand above can be estimated by: Now, looking at the convolution kernel for P 2 k , analogue to (2.1), we can estimate the first factor on the RHS above by: for μ ν ≥ ν ≥ −k, a refined version of Bernstein's inequality (2.4). Hence, this leads us to estimate the LHS of (3.61) by: as required. Given (3.61), we remark that summing one of the factors we get a universal bound. This follows from the global energy control (3.44) since, by the finitely overlapping supports of χ ν B ν i : and in fact we have the stronger control: where for the former we have: applying initially the finite band property (2.3), and then once again for φ k 2 ν , and this can be summed over k ∈ Z using discrete Cauchy-Schwarz in k 1 = k 2 + O (1). Whereas for the latter, we note that by the Littlewood-Paley trichotomy: and so the first two terms correspond to paraproducts, already localized to |ξ | ∼ 2 k , and therefore their sum in k ∈ Z lies in the homogeneous Hardy spaceḞ 0,1 2 with bound O(E), and for the last term the stronger estimate inḂ 0,1 1 with bound O(E) as for B ν 1 holds, since the sum under P k is finite and we can apply the discrete Cauchy-Schwarz inequality. Hence, rescaling by 2 ν and setting B ν, ν i (·) = 2 2 ν B ν i (2 ν ·), to obtain decay for (3.61) it suffices to prove: This is direct manifestation of the perturbative nature of quadratic non-linearities on neck regions, thanks to local energy decay (3.50). In our case, the argument is however slightly more involved because our product structure is non-local. This represents however a minor technicality only, and we shall treat this analogously to the previous instances of physical support leakage. Let us introduce two auxiliary parameters. Setting ν, ν x,β (·) := 2 ν ν x,β (2 ν ·), by the local energy estimate (3.50), we can find sequences κ ν → +∞ andε ν ↓ 0 such that: where we use the conventionm k 1 ≤·≤k 2 :=m ≤k 2 −m ≤k 1 −1 , and similarly form ≥k 1 := 1 −m ≤k 1 −1 . Let us first treat the annulus determined so, and then the outer and inner regions separately.
For the annulus we can discard the cut-offm 0 . Regarding B ν, ν 1 , we have: x,β φ >k+10 ν, ν ) L 1 x k∈Z k 1 ,k 2 ≥k+5:|k 1 −k 2 |≤O(1) where we have used the finite band property (2.3) as usual, and we control this by O(ε ν E 1 2 ) relying on the discrete Cauchy-Schwarz and k 1 = k 2 + O(1), which is acceptable for (3.63 x,β · ∇ x φ ≤k+10 ν, ν ) and relying on the Littlewood-Paley square function estimate for the first two terms, and simply the discrete Cauchy-Schwarz for the last, we can bound the above by O(ε ν E 1 2 ) again. Therefore this is permissible contribution to (3.63). Now we treat the error terms. First, let us consider the outer region defined by the cut-offm >κ ν . Writing: x,β φ >k+10 ν, ν ), by considering the convolution kernel for the Fourier multiplier ∇ x P k P k , with k = k + O(1), which gives: for any positive integer N ∈ N, bearing in mind the physical support ofm >κ ν ν, ν x,β φ >k+10 ν, ν . Using this estimate, for high frequency scales, we choose N = 3, getting the following bound for the sum in k ≥ 0 from (3.64) : by the finite band property (2.3) for φ ν, ν . This is immediately seen to be o(E) as κ ν → +∞, hence this contribution is acceptable. For the low frequency scales, if we set N = 1 above, we have for the sum over k < 0 in (3.64): x,β · ∇ x φ ≤k+10 ν, ν ).
Proceeding similarly to the above, we look at the convolution kernel of P k P k , with k = k + O (1), and given the spatial support ofm >κ ν ν, ν x,β φ ≤k+10 ν, ν , we get the analogous estimate for N ∈ Z: so that choosing N = 3 when k ≥ 0, and N = 1 if k < 0 as previously, yields the control for (3.64): as desired, and this completes the treatment of the contribution to (3.63) of the outer region.
Finally, we need to study the contribution of the interior region defined by the support ofm <−10 , that we note being at a definite amount of distance from the support ofm 0 . First, we remark that we have: To establish (3.65) it is enough to consider ϕ ν ∇ x ϕ ν , where ϕ ν :=m <−10 ϕ ν . For low frequencies: where for the first term we have used (2.2) to discard ∇ x , and for the second we passed initially to L 1 x applying (2.4), and then transferred ∇ x from ϕ c ν to ϕ ν via (2.3). Both items are acceptable by (3.66). For high frequencies, we apply precisely the same argument, but with a slightly more refined Littlewood-Paley trichotomy decomposition: where for the first term we applied (2.2) and for the other two we passed first to L 1 x via (2.3), then used Cauchy-Schwarz, from where for the second term we used (2.2) for P ≤k−7 ϕ c ν and (2.3) for ϕ ν transferring ∇ x from one to the other, whereas for the third term this transfer of ∇ x happened at once via (2.3) since k 1 = k 2 + O(1), and then multiplied P k 2 ϕ c ν simply by 2 −k 2 /2 2 k 2 /2 which led to the exponential gain 2 −k/2 in front of the sum since k 2 ≥ k + O(1). Square-summing the above estimate over k > 0, and applying discrete Cauchy-Schwarz for the third item, gives an acceptable bound by (3.66), therefore we have claim (3.65).
With this understood, we can control the contribution of the inner region to (3.63) for the low frequencies. Given any positive integer K > 0, we have regarding B ν, ν 1 : which is o(E) for the first term and o K (E) for the second by (3.65 x,β · ∇ x φ ≤k+10 ν, ν ) and this yields the decay of slowly growing frequencies for the inner region, as desired.
Note that the cut-offm 0 has not played any role in the above argument. However, for the high frequencies k > K ν , havingm 0 will be crucial as we are going to pass by (3.64) as before, first with: x,β φ >k+10 ν, ν ).
Considering the convolution kernel for ∇ x P k P k , with k = k + O(1), as previously, we estimate: noting the fixed positive distance of the physical support ofm <−10 ν, ν x,β φ >k+10 ν, ν to the annulus {2 −1 ≤ |x| ≤ 2}. Using this, we can bound (3.64) in this case by: which is certainly acceptable, given that K ν → +∞. Finally, the last contribution to treat is when: in (3.64), and here we proceed in complete analogy to the above, getting the following estimate: by looking at the convolution kernel of P k P k , with k = k + O(1), and the location of spatial support ofm <−10 ν, ν x,β · ∇ x φ ≤k+10 ν, ν with respect to the annulus {2 −1 ≤ |x| ≤ 2}. This in turn, yields the following control for (3.64): which, as noted above, is permissible. That concludes the treatment of the contribution of the inner region, and therefore we have obtained claim (3.63).
In the end, going back to the physical Littlewood-Paley decomposition (3.53) and expressing the time derivative ∂ t via X and ∂ x 1 using expression (2.19), we have for any k ∈ Z: where the first sum arises from the low frequencies (3.56) and the regular part involving spatial derivatives falling on the cut-offs from (3.55) and (3.58), the second term comes from errors having good time-like control (3.59), the third one arise from treating the higher-order time like derivative in (3.60), and finally the last term is due to the pertur-bativeḂ −1,2 ∞ estimate of the non-linearity for the wave maps equation at high frequency (3.61), combined with (3.62) and (3.63).
Lemma 3.8 is proved.
We are now at the concluding stage of the proof of Theorem 1.2, for which, going back to the weak bubble tree decomposition (3.30), we must show that the energy of the necks N i,ν is asymptotically vanishing as ν → +∞. Recall that those are provided with corresponding neck domains, that is the conformally degeneration annuli from (3.31), so that setting: we can apply Lemma 3.8, by (3.32) and (3.28), to write: and satisfying the decay: From there, we can estimate the energy at time t = 0 on a neck region by: and by the previous estimates this tends to 0 as ν → +∞. Theorem 1.2 is proved.