Quadratic sparse domination and Weighted Estimates for non-integral Square Functions

We prove a quadratic sparse domination result for general non-integral square functions $S$. That is, we prove an estimate of the form \begin{equation*} \int_{M} (S f)^{2} g \, \mathrm{d}\mu \le c \sum_{P \in \mathcal{S}} \left(\frac{1}{\lvert 5P \rvert}\int_{5 P} \lvert f\rvert^{p_{0}} \, \mathrm{d}\mu\right)^{2/p_{0}} \left(\frac{1}{\lvert 5P \rvert} \int_{5 P} \lvert g\rvert^{q_{0}^*}\,\mathrm{d}\mu\right)^{1/q_{0}^*} \lvert P\rvert, \end{equation*} where $q_{0}^{*}$ is the H\"{o}lder conjugate of $q_{0}/2$, $M$ is the underlying doubling space and $\mathcal{S}$ is a sparse collection of cubes on $M$. Our result will cover both square functions associated with divergence form elliptic operators and those associated with the Laplace-Beltrami operator. This sparse domination allows us to derive optimal norm estimates in the weighted space $L^{p}(w)$.


Introduction
Recent years have seen a surge of activity in the weighted theory of singular integrals that has resulted in the resolution of some major conjectures such as the A 2 conjecture [30], the Muckenhoupt-Wheeden conjecture [48] and the resolution of the two weight problem for the Hilbert transform [34,36]. Accompanying these achievements is the development of new core techniques such as the representation of singular integrals by dyadic operators [30] or the sparse domination of singular integrals [17,43].
The sparse domination of an operator provides, at a glance, a rich picture of the unweighted and weighted estimates with precise tracking of the dependence on the weight characteristic. Its use in harmonic analysis was introduced by Lerner in [39], where a decomposition of an arbitrary measurable function was obtained in terms of its local mean oscillations. It has since been extended to a great number of different contexts spanning and reproducing a large portion of harmonic analysis. To call attention to some of the most celebrated results, there have been articles published covering the domination of Calderón-Zygmund singular integral operators [17,35,[41][42][43], multilinear singular integrals [21], rough singular integrals, variational Carleson, Bochner-Riesz multipliers, Walsh-Fourier multipliers, spherical maximal function and also the T 1 sparse domination of singular integrals. For more details on these and other applications we refer the reader to Sect. 8 of the survey paper [47] and the references therein. In this article we are interested in the sparse domination of square function operators.
The sparse domination of classical square function operators was first considered in [40]. In this article it was discovered that in order to obtain sharp weighted estimates for square functions from a sparse domination result, the sparse techniques applied to singular integral operators had to be adjusted to account for the quadratic nature of the square function. Thus, instead of a "linear" sparse domination result, one must aim for a stronger "quadratic" sparse domination theorem. This idea was also explored in [11] where a quadratic result with minimal T 1-type assumptions is proved. Similar ideas are also investigated in the work of Lorist [44], where sparse domination is obtained for general vector-valued operators.
Since the turn of the century, fuelled by applications to boundary value problems and the epic contest of ideas surrounding the Kato conjecture, there has been a sustained and pronounced interest in weighted estimates for non-integral singular operators that are beyond the realm of Calderón-Zygmund theory. Some of the most prominent examples are operators attached to the divergence form elliptic operator L = −div(A∇), where A is bounded and elliptic with complex coefficients. For instance, neither the Riesz transforms ∇ L − 1 2 nor the constituent operators { √ t∇e −t L } t>0 of the square function possess integral kernels in general that satisfy any meaningful estimates and, as such, are deserving of the title "non-integral". As a result of this characteristic, and in contrast to the classical setting of the Laplacian operator , these operators will fail to be bounded on L p (R n ) for p in the entire interval range (1, ∞). Instead, as proved in [1], boundedness will occur if and only if p is contained within a restricted subinterval of (1, ∞) that will depend on the perturbation A, see also [9] and [28]. Similarly, for boundedness on the weighted space L p (w), one must also consider a restricted range of p ∈ (1, ∞). For a detailed investigation into such results the reader is referred to the seminal series of papers by P. Auscher and J. M. Martell, [2][3][4][5].
The sparse domination methods developed for Calderón-Zygmund operators in [35,41,42] automatically imply boundedness on L p (R n ) for p in the full range (1, ∞). It then follows that the classical sparse domination is particularly ill-suited to non-integral singular operators. In the article [8], the authors F. Bernicot, D. Frey and S. Petermichl introduced a linear sparse domination framework that was adapted to non-integral singular operators in the sense that the sparse object dominating the operator would only be bounded on a restricted range. This linear sparse domination allowed for sharp weighted estimates to be produced for a wide range of operators associated with L that included the Riesz transforms ∇ L − 1 2 . As stated earlier for the classical setting of the Laplacian, the linear sparse domination in [8] does not imply the best weighted bounds for square functions for p > 2. The ultimate objective of this article is, thus, to prove a quadratic sparse domination theorem for non-integral square functions. This, in turn, will yield weighted estimates for G L and other similar square functions. They will also reproduce optimal weighted estimates for G L when L = −div(A∇) and A is real valued with smooth coefficients, a result that was first proved by T. A. Bui and X. Duong in [13]. When the square function is bounded in the full range (1, ∞), we recover the sparse form in [11] which implies weighted estimates that are known to be optimal for several classical square functions [40].
Motivated by finding a uniform setting that will include several examples of square functions, we consider the following general framework. The underlying space (M, d, μ) is a locally compact separable metric space (M, d) equipped with a Borel measure μ that is finite on compact sets and strictly positive on any non-empty open set. For a measurable subset B ⊂ M, we denote |B| := μ(B).
The measure μ will be assumed to satisfy the doubling property, for all x ∈ M and r > 0, where B(x, s) denotes the ball of radius s > 0 centred at a point x ∈ M and X Y will be used throughout the paper to signify that there exists a constant C > 0 such that X ≤ CY . There will then exist some ν > 0 for which for all x ∈ M and r , s > 0, where X Y means that both X Y and Y X hold. This technical condition has been imposed in order to prove boundedness of a certain maximal operator that is essential to our proof. This point will be elaborated upon further in Remark 1.4 and Sect. 4.
We will consider an unbounded operator L on L 2 (M, μ) satisfying the below assumption.
. L is ω-accretive for some 0 ≤ ω < π/2 and there exists some 1 ≤ p 0 < 2 < q 0 ≤ ∞ and c > 0 such that for all balls B 1 , B 2 of radius √ t, From Assumption 1.1, it follows that L is a maximal accretive operator on L 2 (M, μ), L possesses a bounded holomorphic functional calculus on L 2 (M, μ) and −L is the generator of an analytic semigroup (e −t L ) t>0 on L 2 (M, μ).
Throughout this article, we consider square function operators associated with L. These will be defined to be operators S that satisfy the following set of assumptions.
where {Q t } t>0 is a collection of bounded operators on L 2 (M, μ) which satisfy the property that there exists some 1 ≤ p 0 < 2 < q 0 ≤ ∞ such that for all balls .
(c) (Cancellation with respect to L) There exists A 0 > 0 and N 0 ∈ N such that for all integers N ≥ N 0 , r } r >0 is a collection of bounded operators on L 2 (M, μ) that satisfies off-diagonal estimates at all scales in the sense that and for a ball B = B(x, r ) and t > 0 we will use the notation t B to represent the t-dilate of B, t B := B(x, tr). (d) (Cotlar type inequality) There exists an exponent p 1 ∈ [p 0 , 2) such that for all x ∈ M and r > 0 where we define − B f dμ := |B| −1 B f dμ for f ∈ L 1 loc (M, μ) and we denote by M the uncentered Hardy-Littlewood maximal function and M p f := (M| f | p ) 1/ p for any p ≥ 1.

Remark 1.3
In general, the exponents p 0 and q 0 are determined by the off-diagonal estimates for the constituent operator Q t , rather than by the off-diagonal estimates for For our aim, it is enough to assume that the range in which one has off-diagonal estimates for {e −t L } t>0 contains the range ( p 0 , q 0 ) in the Assumption 1.2.

Remark 1.4
As our work is intended to build upon the article [8], it will be instructive to compare our assumptions with the hypotheses of [8]. In both our article and [8], the assumptions imposed upon the underlying operator L are identical. For the operator S, we have also assumed L 2 -boundedness and a Cotlar type inequality. However, we have included the additional assumption that S is of the form of a square function composed of operators Q t that satisfy off-diagonal bounds. Also, the cancellative condition of S with respect to L, Assumption (b) of [8], has instead been replaced by a cancellative condition of the constituent operators Q t .
In Sect. 4, using the growth condition imposed upon our metric space (1.0.4), it will be proved that the assumed cancellative condition for the Q t operators does in fact imply the cancellative condition of S with respect to L. This allows us to deduce that the operators under consideration in our article are a restricted subclass of the operators considered by [8]. Indeed, the additional growth condition of our metric space (1.0.4) has been assumed with the sole purpose of ensuring that we are working strictly within the setting of [8]. This will allow us to utilise some of the intermediary results from [8] without having to reprove them under a different cancellation condition. This will be of particular use to us in Sect. 4 when we come to prove the boundedness of a certain maximal function operator that is essential to our proof. Remark 1.5 One does not have to search for long before encountering examples of square function operators that satisfy the previous set of assumptions. For instance, the square functions associated with an elliptic operator L = −div A∇, such as G L from (1.0.1) and , and square functions associated with the Laplace-Beltrami operator satisfy the above conditions. We discuss these examples in detail in Sect. 3.
In order to make sense of the concept of sparse domination and precisely state our main theorem we need to define the notion of a sparse family of cubes. We consider a system of dyadic cubes D on the metric space (M, d).
where q * 0 := q 0 2 is the dual exponent of q 0 2 , and c is a positive constant independent of f and g.
The right hand side of (1.0.5) is the sparse form natural to the square function. We observe that the bilinear sparse form obtained differs from the linear sparse domination results where the L q 0 average of g is used instead (c.f. [8]). This is due to the nonlinear nature of the problem at hand. Analogous sparse forms appear when controlling vector-valued operators, as seen in the work of Lorist [44]. In fact, as the operators we consider satisfy the hypotheses from [8], it follows that [8,Thm. 5.7] will be valid for S. This result states that for any f and g in C ∞ c (M) there exists a sparse family S ⊆ D for which The essence of our sparse domination result is that, under the additional square function hypotheses assumed above, the previous sparse bound can be improved to a quadratic sparse domination bound that is uniquely suited to square function operators.
Our proof strategy requires the weak boundedness at the endpoint of a"grand maximal function" operator associated with the square function. This strategy is an adaptation of Lerner's work on singular integrals [42] to our setting, which itself is an elaboration of Lacey's elementary proof from [35]. The weak-type boundedness of our grand maximal operator will be obtained by demonstrating that our operator is pointwise controlled from above by a related maximal operator that was introduced in [8]. The weak boundedness of this alternative grand maximal operator was proved in [8] under their setting. Since, as will be shown in Sect. 4, we are working strictly within their setting, this will then allow us to conclude that our grand maximal operator is also weakly bounded at the endpoint.
Next we give an account of the weighted estimates that we obtain for our square functions via the sparse domination (Theorem 1.7). It is understood that if the operator at hand maps L p to L p for a restricted range of exponents p, the relevant classes of weights will involve the intersection of Muckenhoupt and reverse Hölder weights [3]. We define them precisely.
A weight w is a positive locally integrable function. We say that a weight w is in the Muckenhoupt A p class for 1 < p < ∞ and we denote it by w ∈ A p if and only if where p = p/( p − 1) is the dual exponent of p. We say that a weight w belongs to the reverse Hölder class R H p for p > 1 if We can now state our second result. Theorem 1.8 Fix p 0 < 2 < q 0 . For any sparse family S ⊂ D, functions f , g ∈ L 1 loc ( dμ), p ∈ (2, q 0 ) and weight w ∈ A p p 0 The constant C 0 is independent of both the weight and the sparse collection, and the dependence of this estimate on the weight characteristic is sharp.
Expanding further upon the above theorem, the result is sharp in the sense that the dependence on the weight characteristic [w] A p/ p 0 [w] R H (q 0 / p) can be matched at least asymptotically with the right choice of functions, weights and sparse form. A detailed proof of this sharpness will be presented in Sect. 7. The above theorem, when combined with our other main result, Theorem 1.7, allows us to obtain as a corollary the following sharp weighted result for non-integral square functions. It is important to note that the combination of Theorems 1.7 and 1.8 only produces the below weighted bounds for p ∈ (2, q 0 ). The weighted estimates for the full range p ∈ ( p 0 , q 0 ) follows from this on applying a quantitative version of the limited range extrapolation theorem by Auscher and Martell in [3,Thm. 4.9]. See also [46,Thm. 2.2]. Corollary 1.9 Let p 0 < 2 < q 0 and consider operators L and S that satisfy Assumptions 1.1 and 1.2 for this choice of exponents. For p ∈ ( p 0 , q 0 ) and w ∈ A p p 0 where γ (p) is as defined in Theorem 1.8.
The result is sharp for certain square functions, see [13,38,40]. Sharpness can be deduced from the asymptotic behaviour of the unweighted estimates [26]. Unfortunately, these asymptotics are not easy to exactly compute for our non-integral square functions. However, the estimate (1.0.6) implies an upper bound on the asymptotic behaviour of the unweighted norm S L p →L p , see Sect. 7.1. In particular, when such asymptotic behaviour is known to match the upper bound, the weighted estimates in Corollary 1.9 are sharp.

Structure of the Paper
The paper is distributed as follows. Section 2 contains some preliminary results that will be of use later in the paper. Section 3 will discuss the examples that fit the assumptions and that one should keep in mind as references. The proof of Theorem 1.7 requires us to understand the boundedness properties of a grand maximal operator associated with the corresponding square functions. These boundedness properties are included in Sect. 4. Section 5 is dedicated to the proof of our main result, Theorem 1.7. Section 6 considers weighted estimates for the sparse forms found in Sect. 5 and, in particular, proves Theorem 1.8. Finally, Sect. 7 is dedicated to the proof of the sharpness of Theorem 1.8 when p > 2.

Preliminaries
In this section we gather a collection of useful results concerning dyadic analysis in metric measure spaces, off-diagonal estimates for a family of operators, and properties of Muckenhoupt and reverse Hölder weight classes.

Dyadic Analysis on a Doubling Metric Space
We recall some well-known definitions and facts from dyadic harmonic analysis as written in [8]. For detailed information on the construction of dyadic systems of cubes in doubling metric spaces, the interested reader is referred to [31] and references therein. • If l ≥ k, α ∈ A k and β ∈ A l then either Q l β ⊆ Q k α or Q k α ∩ Q l β = ∅; • For every l ∈ Z and α ∈ A l , there exists a point z l α with the property that The point z l α can be seen as the centre of the cube Q l α and the side length is defined by (Q l α ) := δ l . The below theorem asserts the existence of adjacent systems of dyadic cubes for a doubling metric space. For a proof of this result, refer to [31]. From this point forward we fix a dyadic collection D := ∪ K b=1 D b as in the previous theorem. The following covering lemma will be useful in Sect. 5.
Let w be a weight on M. The uncentered dyadic maximal function M D p,w of expo- where the notation 1 E is used to denote the characteristic function of a set E ⊂ M and w(E) := E w dμ. When w ≡ 1, M D p,w will just be the usual dyadic maximal function of exponent p and the shorthand notation M D p = M D p,1 will be employed. Similarly, we will also use the notation M D w = M D 1,w . It is known that M D p is of weak-type ( p, p) and strong (q, q) for all q > p, see [16]. Moreover, M D w is bounded on L p (w) for all p ∈ [1, ∞) with a constant independent of the weight,

Off-Diagonal Estimates
In this section, we define three different notions of off-diagonal estimates that will be used throughout this article. For an extensive and detailed account of off-diagonal estimates for operator families, the reader is referred to [4]. Throughout this section, we will consider exponents 1 ≤ p 0 < 2 < q 0 ≤ ∞.

Remark 2.5 Some comments are in order.
• Examples of ρ that we will use are the Gaussian function ρ(x) = e −c|x| 2 and It should be noted, however that the value of c or s in the above examples of ρ may change for the composition.
• For p 0 ≤ p ≤ q ≤ q 0 , Hölder's inequality implies that if an operator family satisfies ( p 0 , q 0 ) off-diagonal estimates at scale √ t then it will also satisfy ( p, q) estimates.
• Off-diagonal estimates for p ≤ q do not imply L p − L q boundness of T t , see [4].
In order to apply off-diagonal estimates, we often need to decompose the support of a function f into finitely overlapping balls with radius to match the scale. Definition 2. 6 We say that a collection of balls B has finite overlap if there exists a finite constant B such that

Remark 2.7
Let B be a collection of finite overlapping balls covering a set . Then Since m R ⊃ for all R ∈ R, the doubling property implies that The case p 0 = 1 is even simpler since it does not require the use of Hölder's inequality nor an estimate on the cardinality #R.
for balls B(r ) of radius r ≥ √ s and B 1 of radius √ s.

Proof of (2.2.2)
It is enough to cover the larger ball B(r ) with a collection B of smaller, finite overlapping balls of radius √ s.
We can use off-diagonal estimates at scale √ s to obtain The estimate then follows from the fact that the supremum of ρ(d( We denote the semigroup by P t := e −t L . This is used as an approximation of the identity at scale For N > 0, we also consider the family of operators Q s . These operators will satisfy an adapted Calderón reproducing formula for functions f ∈ L p with p ∈ ( p 0 , q 0 ), namely t . We also have that as L p -bounded operators,

Remark 2.10
It is known that for any integer N ∈ N \ {0} the operators P

Definition 2.11 (Off-diagonal estimates at all scales) A family of operators
where B i, It is trivial to see that off-diagonal estimates at all scales implies off-diagonal estimates at scale √ t. This stronger condition is used in our cancellation hypothesis, uniformly for all x ∈ M and r > 0. Notice that this condition is stronger than (1.0.4). For spaces of ψ-growth, one encounters another notion of off-diagonal estimate. These types of estimates are studied in [4].

Remark 2.13
It is not difficult to show that for spaces of ψ-growth, the three different notions of off-diagonal estimates, Definitions 2.4, 2.11 and 2.12, are all equivalent for a particular choice of ρ.

Weight Classes
We recall some basic properties of the Muckenhoupt and reverse Hölder weight classes as defined in the introduction. Refer to [33] for further information.
Lemma 2.14 The following properties of the weight classes A p and R H q are true.
(i) For p ∈ (1, ∞), a weight w will be contained in the class A p if and only if w 1− p ∈ A p . Moreover, The dependence of φ on p 0 and q 0 will be kept implicit. From the previous lemma, we get that a weight w will be contained in the class A p In the article [3], the authors P. Auscher and J. M. Martell proved a restricted range extrapolation result that allowed one to obtain L p (w)-boundedness for the full range of p ∈ ( p 0 , q 0 ) and w ∈ A p p 0 In their result, they do not state the dependence of the bound on the weight characteristic w However, a quantitative version of the extrapolation theorem by Auscher and Martell can be obtained through [46,Thm. 2.2] in the scalar case (m = 1), as their weight Here we recall this result using the notation of [3, Thm. 4.9] and the weight characteristic introduced earlier. As in [50], F denotes a family of ordered pairs of non-negative, measurable functions ( f , g).
for some α > 0 and C > 0 independent of the weight. Then, for all p 0 < p < q 0 and

Applications
In this section, we consider two distinct applications of our quadratic sparse domination result and Corollary 1.9. For the first application, weighted estimates for square functions associated with divergence form elliptic operators will be proved. For the particular case of the Laplacian operator , this will allow us to recover some estimates from [13]. The second example that we will look at are square functions associated with the Laplace-Beltrami operator on a Riemannian manifold.

Elliptic Operators
Fix n ∈ N \ {0} and consider the Euclidean space R n equipped with the Lebesgue measure. This is a space of ψ-growth, so all definitions of off-diagonal estimates are equivalent, see Remark 2.13. Let A be an n × n matrix-valued function on R n that is bounded and elliptic in the sense that for some λ > 0, for all ξ, x ∈ R n . Consider the divergence form elliptic operator L = −div A∇, defined through its corresponding sesquilinear form as a densely defined and maximally accretive operator on L 2 (R n ). The operator L generates an analytic semigroup Let g L and G L denote the square function operators associated with L defined by and In the articles [4] and [2], off-diagonal estimates for the constituent operators of g L and G L were studied in great detail. The below proposition outlines some properties of such off-diagonal estimates that will be required in order to apply Corollary 1.9 to these two square functions.
• The interiors int J m (L) and int K m (L) are independent of m. It is also not difficult to see that J 0 (L) ⊂ J 1 (L). Indeed, consider the expression For p 0 , q 0 ∈ J 0 (L) with p 0 < 2 < q 0 , Proposition 3.1 tells us that the operator e − t 3 L will satisfy both ( p 0 , 2) and (2, q 0 ) full off-diagonal estimates. It is also well known that t Le − t 3 L satisfies (2, 2) full off-diagonal estimates. The stability of full off-diagonal estimates under composition then implies that t Le −t L satisfies ( p 0 , q 0 ) full off-diagonal estimates.
Applying Corollary 1.9 to the operators L and g L will produce the following weighted result.
where γ (p) is as defined in Corollary 1.9.
Proof To prove the proposition, it is sufficient to check that the hypotheses of Corollary 1.9, namely Assumptions 1.1 and 1.2, are valid for the operators L and g L and the indices p 0 , q 0 . Assumption 1.1 is clearly valid since the definition of J 0 (L) implies that the semigroup e −t L will satisfy ( p 0 , q 0 ) full off-diagonal estimates. It remains to prove the validity of Assumption 1.2. Part (a), the L 2 -boundedness of g L , follows from the fact that L possesses a bounded holomorphic functional calculus on L 2 . Assumption 1.2(b), the off-diagonal estimates of the operator family t Le −t L is given by Remark 3.2. Assumption 1.2(c) follows on observing that and that since p 0 , q 0 ∈ J 0 (L) the operator family for some sequence of numbers c( j) > 0 that satisfies j≥1 c( j) 1. It should be noted that this argument was written for the square function with constituent operators (t L) Similarly, Corollary 1.9 can be applied to the square function G L . .
Proof In order to apply Corollary 1.9, it is sufficient to show that G L satisfies Assumptions 1.1 and 1.2. Assumption 1.1 is implied by p 0 , q 0 ∈ K 0 (L) ⊂ J 0 (L).
Let us now demonstrate the validity of Assumption 1.2. The L 2 -boundedness of G L , Assumption 1.2(a), follows from the ellipticity condition of A and a straightforward integration by parts argument that can be found in [1, pg. 74]. Assumption 1.2(b) is implied by the condition p 0 , q 0 ∈ K 0 (L). For Assumption 1.2(c), notice that Also observe that As p 0 , q 0 ∈ K 0 (L), Proposition 3.1 tells us that operator family √ r ∇e −r L/2 will satisfy (2, q 0 ) full off-diagonal estimates. Similarly, since K 0 (L) ⊂ J N (L) for any N ≥ N 0 = 0, the family (r L) N e −r L/2 satisfies ( p 0 , 2) full off-diagonal bounds. It then follows from the stability of full off-diagonal bounds under composition that the family (N ) r will satisfy ( p 0 , q 0 ) full off-diagonal bounds. This proves that Assumption 1.2(c) is satisfied.

Remark 3.5 If
A is real valued, then it is known that J 0 (L) = [1, ∞] (c.f. [2]). Proposition 3.3 will then imply that for all w ∈ A p . When A has smooth coefficients, this result was proved by Bui and Duong in [13].
In the same work, the authors showed that square functions associated with √ L are dominated by the corresponding one associated with L [13, Thm. 1.4].
In particular, our bounds for g L in Proposition 3.3 implies the same bound for the square function g √ L . If, in addition to being real valued, A is also smooth then it is known that K 0 (L) = [1, ∞]. Proposition 3.4 then implies that which reproduces a result in [13].

Remark 3.6
For A = I we have L = and it is then known that J 0 (L) = K 0 (L) = [1, ∞]. We can then take p 0 = 1 and q 0 = ∞ in Propositions 3.3 and 3.4. This will produce the weighted estimates For both square functions, it is known that these estimates are optimal in the sense that they will not hold for an exponent of [w] A p any smaller than the above exponent. This provides a new proof of weighted boundedness of the standard square functions associated with with optimal dependence on the constant [w] A p .

Laplace-Beltrami
Let M be a complete, connected, non-compact Riemannian manifold. It will be assumed that the Riemannian measure μ satisfies the volume doubling property. In addition, it will also be assumed that there exists a function ψ : (0, ∞) → (0, ∞) for which

|B(x, r )| = μ(B(x, r )) ψ(r )
uniformly for all x ∈ M and r > 0. That is, the manifold is of ψ-growth. Enforcing this stronger growth condition will allow us to interchange our different notions of off-diagonal estimates (c.f. Remark 2.13). Consider the Laplace-Beltrami operator defined as an unbounded operator on L 2 (M, μ) through the integration by parts formula where ∇ is the Riemannian gradient. The positivity of implies that it will generate an analytic semigroup e −t on L 2 (M, μ).
Recall that the heat kernel k t (x, y) of is said to satisfy Gaussian upper bounds if there exists c > 0 such that for all x, y ∈ M and t > 0. This is a very common assumption that is imposed when considering the boundedness of singular operators on Riemannian manifolds. For further information refer to [19], [6] or [5]. Consider the square function g defined through The boundedness for square functions of this form on unweighted L p (M) with 1 < p < ∞ is known to hold in the general symmetric Markov semigroup setting [49, pg. 111]. Let us consider the weighted case on the full range of p ∈ (1, ∞).

Proposition 3.7 Suppose that the heat kernel for M satisfies Gaussian upper bounds.
Then, for any p ∈ (1, ∞) and w ∈ A p , Proof This result will follow from Corollary 1.9 provided that Assumptions 1.
Observe that since the semigroup e −t satisfies (1, ∞) full off-diagonal estimates, e −t will satisfy both (1, 2) and (2, ∞) full off-diagonal bounds. At the same time, t e −t is well known to satisfy ( ) and the fact that the operator family {(r ) N +1 e −r } r >0 satisfies (1, ∞) full offdiagonal bounds by an argument similar to that of Remark 3.2.
Finally, the validity of Assumption 1.2(d) can be proved in an identical manner to the argument used to obtain (3.1.1). This argument can be found in [2, §7] on pages 729-730. This argument in the elliptic setting follows from a combination of the offdiagonal estimates of the constituent operators, the fact that the constituent operators are expressible in terms of the semigroup and a variation of the Marcinkiewicz-Zygmund theorem [27]*Thm. 5.5.1. All three of these components will hold for our square function in this Riemannian manifold setting, and thus, the argument will be valid.
Next, we will apply our sparse result to the square function The weighted boundedness of the Riesz transforms operator ∇ − 1 2 on L p (M, w dμ) was considered for p ∈ (1, q + ) in [5]. Owing to the strong connection between the Riesz transforms and the square function G , the range (1, q + ) will also be a natural interval over which to consider the boundedness of G . From the definition of q + and the L 2 -boundedness of ∇ − 1 2 , it is clear that q + ≥ 2. In the below proposition we assume this inequality to be strict.

Proposition 3.8 Assume that the heat kernel of M satisfies Gaussian upper bounds
and that q + > 2. Let 2 < q 0 < q + and p ∈ [1, q 0 ). Then for any w ∈ A p ∩ R H ( q 0 p ) , .
Proof Once again, let us apply Corollary 1.9. Assumption 1.1 will be true for the same reason as in Proposition 3.7. Assumption 1.2(a) is well known and can be obtained by combining the L 2 -boundedness of ∇ − 1 2 together with the bounded holomorphic functional calculus of on L 2 .
Let us show that the family of operators Q t = √ t∇e −t satisfies (1, q 0 ) offdiagonal estimates at scale for all t > 0 and y ∈ M, where c > 0 is dependent on q 0 . This immediately implies that where the last line follows from the uniform ψ-growth condition imposed upon our manifold. For f supported in B 1 , Minkowski's inequality followed by the previous estimate produces Let us now prove that Assumption 1.2(c) is valid. Observe that Observe that the operator family {(r ) N e −r } r >0 satisfies (1, ∞) full off-diagonal estimates. Recall that for spaces of ψ-growth the three different forms of off-diagonal estimates, Definitions 2.4, 2.11 and 2.12, are all equivalent. This, when combined with Hölder's inequality, implies that this operator family satisfies (1, 2) off-diagonal estimates at scale √ r . Similarly, the family { √ r ∇e −r } r >0 satisfies (2, q 0 ) off-diagonal estimates at scale √ r . The stability of off-diagonal estimates under composition then implies that the operator family r satisfies (1, q 0 ) at scale √ r , which implies (1, q 0 ) off-diagonal estimates at all scales. This proves Assumption 1.2(c).
Finally, the validity of Assumption 1.2 (d) can be proved in an identical manner to the argument used to obtain (3.1.2). This argument can be found in [2, §7] on page 732. This argument in the elliptic setting follows from a combination of the offdiagonal estimates of the constituent operators, the fact that the constituent operators are expressible in terms of the semigroup and a variation of the Marcinkiewicz-Zygmund theorem [27]*Thm. 5.5.1. All three of these components will hold for our square function in this Riemannian manifold setting and thus the argument will be valid.

Boundedness of the Maximal Function
Throughout this section, fix p 0 , q 0 ∈ [1, ∞], N 0 ∈ N and operators L and S satisfying Assumptions 1.1 and 1.2 for such a choice of p 0 , q 0 . For a ball B we denote by r (B) its radius. Define the following maximal operator associated with the square function, In this section, our aim is to prove the following boundedness result for S * . The boundedness of this maximal function constitutes an important part of our sparse domination argument. The reliance of our argument on an associated maximal function is a well-known method for obtaining sparse bounds and finds its origins in the work of Lacey [35]. It was later streamlined by Lerner [42]. Quite often, the issue of proving sparse domination for a particular operator can be reduced to determining an appropriate associated maximal operator, proving its (weak) boundedness and then applying a stopping time argument that utilises this boundedness.

A Pointwise Estimate
In order to prove the boundedness of the operator S * we will require a couple of preliminary lemmas. Given a ball B, we define the average of a function f over the annulus S k (B) := 2 k+1 B \ 2 k B for k ∈ N as the integral over S k (B) normalised by |2 k B|.
Recall that A 0 is a positive number defined in Assumption 1.2 (c).

Lemma 4.2
For any 0 < s < r 2 < t and N ∈ N, Proof Fix B a ball of radius r . For j ≥ 0, let R j denote a collection of finite overlapping balls of radius √ t that is a cover for the set S j ( B). Then, Assumption 1.2 (c) together with the triangle inequality produces  This, together with the fact that |R| ≤ R √ s+t gives and R j as defined above in this proof. The inclusion ⊂ 2 j+1 B ⊂ 2 j+2 R holds for any R ∈ R j and j ∈ N. Thus Lemma 2.8 implies that Applying this estimate and (4. 1.4-4.1.3) gives us our result.
Using the previous lemma, the following result can then be proved using an argument identical to the first estimate of [8,Lem. 4.1].
Let S # denote the maximal operator This operator was introduced in [8] and formed an important part of their sparse domination argument.

Proposition 4.4 For every x ∈ M,
Proof For x ∈ M and ball B ⊂ M containing x, the triangle inequality implies For the first term, apply Minkowski's inequality followed by Lemma 4.3 to obtain For the second term, We thus obtain the pointwise estimate (4.4).

Cancellation of S with respect to L
As the operator M p 0 is L 2 -bounded and weak-type ( p 0 , p 0 ), the pointwise bound of the previous section implies that in order to prove Theorem 4.1 it will be sufficient to show that S # is L 2 -bounded and weak-type ( p 0 , p 0 ). According to [8,Prop. 4.6], S # will be L 2 -bounded and weak-type ( p 0 , p 0 ) if S satisfies the assumptions of [8]. The only assumption from [8] that is not included in our hypotheses is Assumption (b) of [8], the cancellative property of S with respect to L. Instead, for us, the cancellation has been imposed upon the constituent operators Q t . In this section it will be proved that cancellation on Q t with respect to L implies cancellation on S with respect to L.

Proposition 4.5
There exists N 0 ≥ N 0 such that for all integers N ≥ N 0 , s > 0 and balls B 1 , B 2 of radius √ s, for all f ∈ L p 0 (B 1 ).
Proof For I ⊂ [0, ∞), define the operator In order to prove (4.2.1), it is sufficient to show that a similar estimate holds for the operators S [0,s] and S [s,∞) .
The property that ϕ(a) ≤ 1 for a ≤ 1 then gives In order to prove the desired off-diagonal estimate, it is then sufficient to prove For t contained in [0, s] we will have t + s ≤ 2s, and therefore, .

This gives
.
Applying this to (4.2.2) produces the desired off-diagonal bounds for the operator S [0,s] . Next, let's prove off-diagonal bounds for the operator S [s,∞) . Suppose first that s > d(B 1 , B 2 ) 2 . When this occurs, note that
Applying this to (4.2.2) produces the desired off-diagonal estimates for S [s,∞) . Finally, we must prove off-diagonal decay for S [s,∞) for the case s ≤ d(B 1 , B 2 ) 2 . We have, where the last line follows from the condition s ≤ d(B 1 , B 2 ) 2 . Applying this to (4.2.2) completes our proof.
The below corollary, in combination with the pointwise estimate Proposition 4.4, completes the proof of Theorem 4.1.

Sparse Bounds
In this section we prove Theorem 1.7. Since f has compact support, without loss of generality we can assume that its support is contained in a bounded set E ⊂ M. By the Lemma 2.3, there exists α ≥ 1 and a partition P of M of dyadic cubes such that α Q ⊇ supp f for every Q ∈ P. Then We are not concerned with the particular value of α, so we will fix α = 5 in the following and assume that this value works for the covering lemma. Then, it is enough to show the existence of a sparse collection S 0 inside a fixed cube Q 0 such that We will decompose our quantity in different terms: all will be controlled by the averages of f and g but one. This last term is where f assumes a large value and it is similar to the original quantity but on a smaller scale. We can then iterate the decomposition, which terminates since the measure of the set we are decomposing shrinks geometrically at each iteration.

Decomposition
Denote by (P) the side length of the dyadic cube P. Let us consider the (localised) dyadic version of the operator introduced in Sect. 4, For a positive η to be fixed later, consider the set Since the operators M * Q 0 , p 0 and S * Q 0 are weak-type ( p 0 , p 0 ), as shown in Sect. 4, there exists η > 0 such that |E(Q 0 )| ≤ 1 2 |Q 0 |. Decompose our form as Term I is controlled by using Lebesgue differentiation theorem as in [8,Lem. 4.4] Consider term II. Let E := {P} P∈D be a covering of E(Q 0 ) with maximal dyadic cubes. Then For each P in the covering, we write f = f in + f out , where f in := f 1 5P and f out := f 1 (5P) . Then each term in II < is itself decomposed into three terms Term (II in ) goes into the iteration. Terms (II out ) and (II cross ) are controlled by using Fubini and applying off-diagonal estimates as in the following lemma. Lemma 5.1 For a given dyadic cube P, let S k (P) := 2 k+1 P \ 2 k P for k ≥ 2. Then for any t > 0, Proof of Lemma 5. 1 The proof follows the one in [8,Thm. 5.7]. For f in = f 1 5P , let R 0 be a collection of finite overlapping balls R of radius √ t covering 5P. By linearity of the operators, the triangle inequality, off-diagonal estimates for Q t with ρ(x) = (1 + |x| 2 ) −(ν+1) and Remark 2.9 we have For f out = f 1 (5P) , decompose f on the squared annuli S k = S k (P). Let R k be the covering of S k with finite overlapping balls R of radius √ t. Linearity of the operators Q t , the triangle inequality and off-diagonal estimates for Q t imply that where we used that the function ρ is monotone decreasing and d(P, R) ≥ d(P, S k ). The last inequality follows by applying Lemma 2.8, since S k (P) ⊆ 2 k P ⊆ 2 k+1 (P) √ t R. Finally, we have enough decay from the remaining product, since This follows because d(P, S k ) = d(P, 2 k+1 P \ 2 k P) is comparable with 2 k (P) and the function ρ(x) = (1 + |x| 2 ) −(ν+1) decays faster than x ν for x 1. This proves estimate (5.1.2).
We will use Lemma 5.1 to control the different terms left in the decomposition.

Remark 5.2
The geometric sum in (5.1.2) is controlled using the stopping condition: the integral over S k is bounded by the integral over the ball 2 k+1 P, so where we used that P is a maximal cube covering E. Similarly for the average on 5P: The sum of the q * 0 -averages of g is controlled by using Hölder's inequality in , summing over all cubes P in E we obtain

Out Term
Consider (II out ). Applying Fubini and Hölder's inequality, we have The average of g is controlled as in (5.1.3). Apply Lemma 5.1 to the first factor: which is controlled as in Remark 5.2. This case is concluded.

Cross Term
Consider (II cross ). We exchange the integrals, then an application of Hölder's and Cauchy-Schwarz inequality give The off-diagonal estimates for Q t in Lemma 5.1 applied to f in and f out imply that where the last estimate follows as in Remark 5.2.

Large Scales
Consider II > . Let P a be the dyadic parent of P, so that (P a ) = 2 (P). Then In the first term, we exchange the integrals and apply Hölder's inequality Applying Lemma 5.1 and using that √ t is comparable with (P), we obtain which again is controlled as in Remark 5.2. The average of g is estimated as in (5.1.3).
The second term in (5.4.1), after applying Hölder's inequality, is controlled by the maximal truncation We have shown that Let S = {Q 0 }. We add all P in the sum to S and we repeat the argument on each term in the sum. This iteration gives the desired bound: a sum of averages of f and g on cubes in the collection S. We can choose η > 0 such that |E(Q)| ≤ 1 2 |Q| for each Q ∈ S. Then S is sparse since each Q ∈ S has a subset F Q := Q \ E(Q) with the property that {F Q } Q∈S is a disjoint family and |F Q | > 1 2 |Q| by construction.

Proof of Theorem 1.8
Fix p ∈ (2, q 0 ). Notice that by (2.3.1), . This tells us that in order to prove the estimate in Theorem 1.8, it is sufficient to demonstrate the stronger estimate (6.1.1) By Theorem 2.2, for each P ∈ S there will existP ∈ D for which 5P ⊂P and P |5P|. Then P |P| by the doubling property of dyadic cubes. As the collection S is sparse, there must exist a collection of disjoint sets {E P } P∈S such that E P ⊂ P and |P| |E P | for all P ∈ S. We, therefore, have Define the weight v := w (q 0 / p) and r := φ( p) = q 0 p p p 0 − 1 + 1. Set u to be the dual weight of v in A r , u := v 1−r . We have Applying these two relations to our sparse form leads to Case 1: p ≥ p. Note that by Remark 6.1 this assumption is equivalent to assuming then our assumption is also equivalent to the condition κ( p) ≤ 0. The fact that u is the conjugate weight of v in A r implies that for P ∈ S, . This estimate can be applied to (6.1.2) to produce |P| .
Since P |E P | and κ( p) ≤ 0, For λ := (1 − κ( p)) −1 notice that Also, it is straightforward to check by substituting in the definition u := v 1−r that the constant function 1 can be decomposed as From this, Hölder's inequality implies , and, therefore, raising to the power 1/λ produces Applying this estimate to (6.1.3) and Hölder's inequality leads to Combining this with (6.1.2) gives |P| .
Also, it is straightforward to check by substituting in the definition u = v 1−r that the constant function 1 can be decomposed as Hölder's inequality then implies .

Proof of Corollary 1.9
We start by noting that where σ := w 1− p 2 = w 1−p * is the A p 2 -conjugate weight of w. Thus, in order to prove the desired result, it is sufficient to demonstrate the estimate For the critical index p this is an easy consequence of Theorem 1.7, estimate (6.1.1) and a density argument. Applying the sharp restricted range extrapolation (Theorem 2.15) yields that for any p ∈ ( p 0 , q 0 ) and weight w ∈ A p p 0 where β( p, p) = max 1, (q 0 − p)(p− p 0 ) (q 0 −p)( p− p 0 ) . We check that this matches the power γ (p) in Corollary 1.9. Let ω( p) := (q 0 − p)/( p − p 0 ) for p ∈ ( p 0 , q 0 ). Then β( p, p) = max(1, ω(p)/ω(p)). Since ω( p) is decreasing in p and β(p, p) = 1, then β( p, p) = 1 for p ∈ [p, q 0 ). In this range of exponent we have where the last inequality is the bound on the weight characteristic given in (2.3.1). When p < p, instead β( p, p) = ω( p)/ω(p). Using the identity (6.0.3) for p, one can see that ω(p) · 2q * 0 = q 0 . This immediately gives Then (6.2.2) followed by (2.3.1) implies that for p ∈ ( p 0 , p) The exponent in the above inequality matches the hypothesised exponent of (2.3.1), allowing us to conclude our proof.

Sharpness of the Sparse form for p > 2
In this section we will use the notation ∼ to indicate asymptotic behaviour and we will work in R with the Lebesgue measure. The sharpness in Theorem 1.8 is a consequence of the following proposition. The proof, although different, follows the reasoning in [8, §7]. Proposition 7.1 For p ∈ (2, q 0 ), there exists a sparse collection S and for every 0 < < 1, there exist sequences of functions f and g and weights w such that Proof The proof is divided into two cases, the case where p ≤ p and the case where p ≥ p. In both of them, the sparse collection considered is S = {I n := [0, 2 −n ] : for n ∈ N}.
We conclude that the right hand side of (7.0.3) behaves as Using the definition of p * and q * 0 , we note − p We conclude the right hand side of (7.0.3) behaves as −1/q * 0 −2/ p −1/ p * = −1 −1/q * 0 as → 0, which is exactly the asymptotic for the left hand side of (7.0.3) as desired.

Upper Bound on Asymptotic Behaviour
In this section we discuss the connection between sharp weighted estimates for an operator T and the asymptotic behaviour of its unweighted norm T L p →L p . We recall the definition of γ (q 0 ) from [26, Definition 5.1]. Let T be a bounded operator on L p for p ∈ ( p 0 , q 0 ). Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.