Average distances on self-similar sets and higher order average distances of self-similar measures

The purpose of this paper is twofold: (1) we study different notions of the average distance between two points of a self-similar subset of R\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {R}}$$\end{document}, and (2) we investigate the asymptotic behaviour of higher order average moments of self-similar measures on self-similar subsets of R\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {R}}$$\end{document}.

average moments of self-similar measures in R. In particular, we investigate the following two problems: (1) We compute the "natural geometric" average distance between two points in a selfsimilar Cantor subset C of R satisfying the so-called Open Set Condition. If C k denotes the k'th order approximation to C (the precise definition of C k will be given in Sect. 1.1), then the number may be interpreted as the average distance between two points chosen uniformly from C k . We now show that the following limiting average distance, namely, exists and we provide an explicit value for it; this is the content of Corollary 2.2 (we note that this result is not new, but was first proved by Leary et al. [18] in 2010; see below for a more detailed discussion of this).
There is a another, and perhaps equally natural, way to define the average distance between two points from C. Namely, the average distance between two points in C chosen with respect to the "natural" uniform distribution on C, i.e. chosen with respect to the normalised Hausdorff measure on C. More precisely, if s denotes the Hausdorff dimension of C and H s denotes the s-dimensional Hausdorff measure, then we compute the average distance between two points in C chosen with respect to the normalised s-dimensional Hausdorff measure on C, i.e. we compute the integral 1 H s (C) 2  In fact, we compute far more general averages than those in (1.1) and (1.2). Namely, if μ p and μ q are self-similar measures on C associated with the probability vectors p and q (the precise definitions will be given in Sect. 1.1), then we compute the average distance between two points in C where the first point is chosen with respect to the measure μ p and where the second point is chosen with respect to the measure μ q , i.e. we compute the average distance defined by where s is the unique solution to the equation i r s i = 1. The average distance between two points in a self-similar subset of R has recently been investigated by Leary et al. [18] and Bailey et al. [1]. In particular, Leary et al. proved the formula in Corollary 2.2 for the limiting "geometric" average distance in (1.1). Leary et al also provide formulas for the average distance (1.1) between points belonging to some selfsimilar subsets of R n for n ≥ 2, and for points belonging to certain families of fat Cantor subsets of R. Averages similar to (1.1) between points belonging to self-similar subsets of R n , have also been studied in Bailey et al. [1]. In particular, Bailey et al. are interested in developing numerical methods that allow for high-precision approximation of the integrals in (1.1). Finally, we note that other notions of average distances on fractals (different from the ones considered in this paper and in [1,18]) have been studied by Bandt and Kuschel [2] and Hinz and Schief [14].
There is also a connection between the results in this paper and recent studies of the Kantorovich-Wasserstein distance between two self-similar measures. We first recall the definition of the Kantorovich-Wasserstein distance W (μ, ν) between two Borel probability measures μ and ν on a compact metric space X . We say that a Borel probability measure γ on X × X is a coupling of μ and ν if μ = γ • P −1 and ν = γ • Q −1 where P, Q : X × X → X are the projections given by P(x, y) = x and Q(x, y) = y, and we denote the family of couplings of μ and ν by (μ, ν). The Kantorovich-Wasserstein distance W (μ, ν) between μ and ν is now defined by W (μ, ν) = inf γ ∈ (μ,ν) |x − y| dγ (x, y), see, for example [20]. It is well-known that convergence with respect to the Kantorovich-Wasserstein distance is equivalent to weak convergence. The Kantorovich-Wasserstein distance between self-similar measures has recently been studied by Fraser [11] and investigated further by Cipriano and Pollicott [4] and Cipriano [5]. In particular, Fraser [11] found an explicit formula for the Kantorovich-Wasserstein distance between two self-similar measures on the real line generated by Iterated Function Systems of two maps with a common contractions ratio. For the self-similar measures μ p and μ q , the product measure μ p × μ q is clearly a coupling of μ p and μ q (but typically not a coupling that realises the Kantorovich-Wasserstein distance between μ p and μ q ), and our results are therefore related to the study of the Kantorovich-Wasserstein distance between self-similar measures in [4,5,11]. For example, in order to derive his main results, Fraser [11, Theorem 2.1] first finds an explicit formula for the average |x − y| dγ r (x, y) for a certain family of self-similar measures γ r indexed by a parameter r , and a special case of this result is a special case of Theorem 2.1 providing an explicit formula for the average |x − y| d(μ p × μ q )(x, y).
(2) We find the exact asymptotic behaviour of the higher order average moments of a selfsimilar measure on a self-similar subset of R; this is the contents of Theorems 2.4 and 2.5. More precisely, if C is a self-similar subset of R generated by an Iterated Function System of the form (S 1 , S 2 ) where for r ∈ 0, 1 2 , i.e. C is the unique non-empty compact subset of R such that (the precise definitions are given in Sect. 1.1) and μ p is the self-similar measure on C associated with the probability vector p = ( p 1 , p 2 ), then we establish the exact asymptotic behaviour of the higher order moments as n tends to infinity. In particular, in Theorems 2.4 and 2.5 we show that there is a multiplicatively periodic function : (0, ∞) → R with period equal to r and a sequence (ε n ) n of real numbers with ε n → 0, such that The proofs of Theorems 2.4 and 2.5 use techniques from number theory and dynamical systems involving Tauberian theory and "zeta-functions". Using a Tauberian argument (namely, the Mellin transform theorem), we first show that the n'th moment M n can be written as the sum of a complex contour integral of an appropriate "zeta-function" and an error-term. Next, we compute the complex contour integral using the residue theorem. In particular, we show that the contour integral can be written as the sum of a multiplicatively periodic function (n) of n and another error-term. Finally, combining these two results leads to (1.5).
We are, of course, not the first to investigate the asymptotic behaviour of different types of moments. In particular, the moments J n defined below have been studied, namely, if C is a self-similar subset of the unit interval satisfying the Open Set Condition and μ p is the self-similar measure on C associated with the probability vector p, then the n'th moment J n is defined by for n ∈ N. For example, within the past 15 years several authors [1,7,8] have outlined arguments suggesting that Tauberian theory and "zeta-functions" can be used to investigate the asymptotic behaviour of the moments J n and related quantities of special classes of self-similar measures and developed numerical methods that allow for high-precision approximations.
More rigorous studies of special cases of the above constructions have also been considered. For example, Cristea and Prodinger [6] and Grabner and Prodinger [13] have outlined how the same techniques, involving Tauberian theory and "zeta-functions", can be used to study the asymptotic behaviour of the moments J n of so-called binomial measures, i.e. the measures obtained by putting r = 1 2 and p = ( p 1 , p 2 ), and Goh & Wimp [12] use Tauberian theory to study the asymptotic behaviour of the moments J n of the Cantor measure, i.e. the measure obtained by putting r = 1 3 and p = ( p 1 , p 2 ) = ( 1 2 , 1 2 ), providing full and rigorous proofs of their results.
More recently Mellin transform ideas and "zeta-functions" have been introduced into fractal geometry by Lapidus et al. [16,17]. In particular, due to work by Lapidus and his collaborators [16,17], it has now been recognized that the study of different types of moments is deeply related to the understanding of the geometry of many fractal sets and measures. This viewpoint is also illustrated by the following observation. Namely, the drop-off rate of the moments M n equals the sum of the local dimension dim loc (μ p ; 0) = log p 1 log r of μ p at 0 and the local dimension dim loc (μ p ; 1) = log p 2 log r of μ p at 1, i.e.
recall, that if μ is a Borel measure on R n and x ∈ R n , then the local dimension of μ at x is defined by provided the limit exists, see, for example [9,10].

The setting: self-similar sets and self-similar measures in R
Let N be a positive integer with N > 1, and fix real positive numbers a 1 , . . . , a N and It follows (see, for example [9]) that there is a unique non-empty compact subset C of [0, 1] such that the set C is known as the self-similar set associated with the list (S i ) i . The set C can also be constructed as follows. In order to describe this construction, we introduce the following notation. Then C 0 ⊇ C 1 ⊇ C 2 ⊇ · · · and C equals the intersection of the C n 's, i.e. (1.14) Loosely speaking (1.13) says that the C n may be thought of as approximations to the set C; this interpretation will be useful in Sect. 1.3.
In this paper, we will consider average distances (and higher order average moments) with respect to self-similar measures on C. Self-similar measures have attracted an enormous interest in the literature during the past 30 years and are defined as follows. Let p = ( p 1 , . . . , p N ) be a probability vector. Then there is a unique Borel probability measure μ p supported on the self-similar set C defined in (1.9) [or equivalently in (1.14)] such that the measure μ p is known as the self-similar measure (or the self-similar multifractal) associated with the list (S i , p i ) i , see, for example [10].

Average distances and average moments: the measure theoretic approach
For two Borel probability measures μ and ν on C, we define the average distance with respect to the measures μ and ν by We are also interested in higher order average moments defined as follows. Namely, for a positive integer n, we define the n'th order average moment (or the n'th order average distance) with respect to the measures μ and ν by (1.17)

Average distances and average moments: the geometric approach
There is a (perhaps) more intuitive approach for defining average distances and higher order average moments. In order to introduce this approach, we first introduce the following notation. If μ is a Borel measure on R and E is a Borel subset of R, then we write μ E for the restriction of μ to E, i.e.
for any Borel subset B of R. Also, for i, j ∈ k , write λ 2 i,j for the normalized 2-dimensional Lebesgue measure restricted to I i × I j , i.e.
where L 2 denotes the 2-dimensional Lebesgue measure.
We can now describe the alternative (and perhaps) more intuitive geometric approach for defining average distances and higher order average moments. For Borel probability measures μ and ν on C and a positive integer k, we define the k'th approximative n'th order average moment with respect to μ and ν by Finally, we define the geometric average n'th order moment with respect to μ and ν by provided the limit exists. If n = 1, then we will write A geo,k (μ, ν) and A geo (μ, ν) for A 1 geo,k (μ, ν) and A 1 geo (μ, ν), respectively. The number A geo,k (μ, ν) has a clear geometric interpretation. Namely, two players A and B, say, throw darts at the k'th approximation C k = ∪ |i|=k I i to the Cantor set C. If for each i ∈ k , we make the following two assumptions, namely: Assumption 1 Player A has the probability μ(I i ) of hitting I i .
Assumption 2 Player B has the probability ν(I i ) of hitting I i . then the number A geo,k (μ, ν) is the average distance between a dart thrown by A and a dart thrown by B; of course, this game of darts is most likely not very realistic since the distribution of someone throwing darts at a line is more likely to be Gaussian than modelled by the measures μ and ν.
The next result shows that this approach leads to the same notion of average distance as the measure theoretical approach in (1.17); more precisely, the result shows that the limit A n geo (μ, ν) = lim k A n geo,k (μ, ν) always exists and equals A n (μ, ν). Proposition 1.1 Let μ and ν be non-atomic Borel probability measures on C and let n be a positive integer. Then the limit A n geo (μ, ν) exists and Proof For i ∈ n , let λ 1 i denote the normalized Lebesgue measure restricted to I i , i.e. λ 1 i = 1 r i L 1 I i where L 1 denotes the 1-dimensional Lebesgue measure. Next, for a positive integer k, define measuresμ k andν k byμ k = |i|=k μ(I i ) λ 1 i andν k = |i|=k ν(I i ) λ 1 i . Since μ and ν are non-atomic, it is not difficult to see thatμ k → μ weakly and thatν k → ν weakly, and it therefore follows from [3, Section 3.4] thatμ k ×ν k → μ×ν. In particular, since clearly A n geo,k (μ, ν) = |x − y| n d(μ k ×ν k )(x, y) and A n (μ, ν) = |x − y| n d(μ × ν)(x, y), this now implies that This completes the proof.

First order moments
We first compute the average distance A(μ p , μ q ) with respect to two self-similar measures μ p and μ q associated with two (not necessarily identical) probability vectors p and q; this is the content of the next theorem. Below we use the following notation, namely, for i, j = 1, . . . , N , we write s i, j for the sign of i − j, i.e.
(2.1) Theorem 2.1 Let p = ( p 1 , . . . , p N ) and q = (q 1 , . . . , q N ) be probability vectors. Then we have The proof of Theorem 2.1 is given in Sect. 3. We remark that the proof of Theorem 2.1 is not difficult. Indeed, we first derive a 1'st order linear difference equation for the sequence (A geo,k (μ p , μ q )) k . Using standard methods, this equation can now be solved giving the limiting behaviour of A geo,k (μ p , μ q ) as k → ∞.
If p = q and all the contraction ratios coincide, i.e. if r 1 = · · · = r N = r , then the formula in Theorem 2.1 for the average A(μ p , μ q ) simplifies considerably, namely, in this case it is easily seen that Below we consider two corollaries of Theorem 2.1. By applying Theorem 2.1 to the vectors p = q = u where u = ( r 1 S , . . . , r N S ) and S = i r i , we obtain the first corollary, i.e. Corollary 2.2. This corollary shows that the following natural geometric limiting average distance, namely, lim k , exists and provides an explicit value for it. This result was first obtained by Leary et al. [18] in 2010.

Corollary 2.2 [18] We have
and the result therefore follows immediately from Theorem 2.1.
The second corollary, i.e. Corollary 2.3, computes the average distance between two points in C with respect to the natural uniform distribution on C, namely, the normalised Hausdorff measure. To state this formally, we introduce the following notation. For a positive number t, let H t denote the t-dimensional Hausdorff measure. Corollary 2.3 now gives an explicit value for the average distance between two points in C with respect to the normalised Hausdorff

Corollary 2.3 Let s denote the Hausdorff dimension of C, i.e. s is the unique real number
such that i r s i = 1 (see [9]). Then we have Proof Since i r s i = 1, we can define the probability vector h by h = (r s 1 , . . . , r s N ). It is well-known that the measure μ h equals the normalised s-dimensional Hausdorff measure on C, i.e. μ h = 1 H s (C) H s C (see, for example [9,10]), whence and the result therefore follows immediately from Theorem 2.1.
If all the contraction ratios coincide, i.e. if r 1 = · · · = r N = r , then it is easily seen that the two "natural" averages in (2.2) and (2.3) coincide and that their common value equals However, it is interesting to note that the two "natural" averages in (2.2) and (2.3) do not, in general, coincide. For example, let C denote the self-similar set obtained by letting N = 2, r 1 = 1 4 , r 2 = 1 2 , a 1 = 0 and a 2 = 1 2 . It follows easily from Corollary 2.2 that in this 2 is the golden ratio, whence r s 1 = ϕ −2 and r s 2 = ϕ −1 . Using this (and the fact that ϕ 2 + ϕ = 1), it now follows from Corollary 2.3 ≈ 0.385.

Higher order moments
The second main result in this paper establishes the exact asymptotic behaviour of the average moments A n (μ p , μ q ) as n → ∞ in the special, but important, case when p = q and all of the contracting ratios r i coincide. More precisely, in this section we will assume that N = 2 and that the contraction ratios r 1 and r 2 are equal and we will denote the common value by r , i.e.
Also, let p = ( p 1 , p 2 ) be a probability vector and write We will now analyse the moments A n (μ p , μ p ). It is not difficult to find a recursive formula for the moments A n (μ p , μ p ). Indeed, in Lemma 4.3 we prove that where [ n 2 ] denotes the integer part of the real number n 2 . While the above recursive formula provides an expression for A n (μ p , μ p ), this expression is not easy to analyse. For this reason, it seems more meaningful to find explicit formulas describing the asymptotic behaviour of A n (μ p , μ p ) for large n. We first note that it is clear that A n (μ p , μ p ) → 0 as n → ∞, and it is therefore interesting and natural to ask how fast A n (μ p , μ p ) tends to 0 as n → ∞. We answer this question in Theorem 2.5. In particular, we prove that this result clearly provides information about how fast A n (μ p , μ p ) tends to 0 as n → ∞. Indeed loosely speaking (2.6) says that: In fact, Theorem 2.5 provides significantly more detailed information. Not only does Theorem 2.5 show that A n (μ p , μ p ) behaves like 1 n for large n, but it gives an exact and explicit asymptotic expression for A n (μ p , μ p ), namely, it shows that n A n (μ p , μ p ) equals a multiplicatively period function of n plus an error term that tends to 0 as n → ∞.

Theorem 2.4
There is a function : (0, ∞) → C and a sequence (ε n ) n of real numbers satisfying the following two conditions: is multiplicatively periodic with period equal to r , i.e. (ru) = (u) for all u; (2) lim sup n n|ε n | < ∞; in particular ε n → 0, such that n A n (μ p , μ p ) = (n) + ε n for all n. In particular, In fact, our methods allow us to obtain an explicit expression for the periodic function . This is the contents of the next theorem. In Theorem 2.5 we use the following notation, namely, we write [x] for the integer part of a real number x.
Theorem 2.5 Define the sequence (λ k ) k recursively by Then the series ∞ converges for all s ∈ C, and we can define the function : C → C by For n ∈ Z, write s n = + 1 − log r 2πin. Then the trigonometric series n∈Z Z (s n ) e 2π i log u log r converges for all u > 0, and The proofs of Theorems 2.4 and 2.5 are given in Sects. 4, 5, 6 and 7. The proofs are divided into three parts. To briefly describe this, we introduce the following notation, namely, write M n = A n (μ p , μ p ) and define L : C → C by Section 4 contains a number of useful technical estimates of the auxiliary function L and the function in Theorem 2.5. The remaining part of the proofs are now divided into the following three parts.
Part 1, Sect. 5 We first show that there is constant K such that for all n. The proof of (2.7) follows from Cauchy's formula applied to the function L and is presented in Sect. 5.
Part 2, Sect. 6 Next, we show that for each real number d with d > there is a constant K d such that for all u > 0 where is the function defined in Theorem 2.5. The proof of (2.8) is presented in Sect. 6 and is divided in the following two sub-parts: Part 2.1 Using the Mellin transform theory, we show that L can be written as a complex curve integral involving Z , namely, we show that for 0 < c < we have for all u > 0; this is done in Theorem 6.3. Part 2.2 Next, using the residue theorem, we compute the complex curve integral in (2.9). In particular, we show that if 0 < c < < d, then for all u > 0; this is done in Theorems 6.4, 6.5 and 6.6: in Theorems 6.4 and 6.5 estimates for |Z (s)| and |1 − qr −s | are obtained and in Theorem 6.6 we use the residue theorem together with the estimates from Theorems 6.4 and 6.5 to derive formula (2.10).

Proof of Theorem 2.1
For brevity, we write for positive integers k. Our aim now is to find an explicit formula for lim k A k ; observe that it follows from Proposition 1.1 that the limit lim k A k exists. We first introduce the following notation. For a probability vector π π π = (π 1 , . . . , π N ) and a positive integer k and Below we show that the elements in the sequence (A k ) k satisfy a recursive formula involving the B p,k 's and the B q,k 's. The limiting behaviour of the A k 's can then be established from this recursive formula using the following well-known (and easily proven) result.
Lemma 3.1 Let t ∈ R with |t| < 1 and let (y n ) n be a sequence of real numbers such that y n → y. Let the sequence (x n ) n be defined by x n+1 = t x n + y n for all n. Then We will now obtain recursive formulas for the B π π π,k 's and the A k 's; this is done in Propositions 3.2 and 3.3 below. Proposition 3.2 Let π π π = (π 1 , . . . , π N ) be a probability vector.
Proof (1) For all positive integers k, we have Using the fact that |i|=k π i r i I i u du = B π π π,k and |i|=k π i r i I i du = |i|=k π i r i r i = |i|=k π i = 1, we now deduce from (3.5) that B π π π,n+1 = i π i r i B π π π,n + i π i a i .
Proof (1) For all positive integers k, we have However, it is clear that if i, j ∈ k and i, j = 1, . . . , N , then I ii = S i I i and (u, v), and it therefore follows from (3.6) that

]. This and (3.7) imply that
Using the fact that |i|=|j|=k Since clearly |i|=|j|=k p i q j = 1, this simplifies to We will now compute U k . In particular, we will express U k in terms of B p,k and B q,k . To do so we note that Using the fact that |i|=k (3.11) Finally, combining (3.9) and (3.11) shows that This completes the proof.
We can now prove Theorem 2.1.

Proof of Theorem 2.1 It follows immediately from Lemma 3.1 and Proposition 3.3 that
This completes the proof of Theorem 2.1.

Proof of Theorems 2.and 2.5: the auxiliary functions L and
The proofs of Theorems 2.4 and 2.5 are given in this and the next three sections. The main purpose of this section is to introduce the two key auxiliary functions L and , and to provide estimates for the derivatives and the integral of ; this is done in Propositions 4.5 and 4.6, respectively. However, we first state and prove the following simple lemma that will be used several times in this and the next sections when estimating L and . Proof Let s be a complex number. Repeated use of (4.1) shows that f (s) = a n f (ρ n s) + n k=0 a k g(ρ k s) for all positive integers n, whence for all positive integers n. Since |a| < 1 and |ρ| < 1, and f is bounded in an open neighbourhood of 0, it follows that |a| n | f (ρ n s)| → 0 as n → ∞, and we therefore deduce from (4.2) that f (s) = k≥0 a k g(ρ k s).
Next, we derive a recursive equation for the n'th moments A n geo,k (μ p , μ p ); this is done in Lemmas 4.2 and 4.3. The recursive equation in Lemma 4.3 plays a key role in proving the estimates in Propositions 4.5 and 4.6. For brevity we introduce the following notation. Namely, for non-negative integers n and k, we write M n,k = A n geo,k (μ p , μ p ) = |i|=|j|=k p i p j r i r j I i ×I j |x − y| n d(x, y), Also, recall that r 1 = r 2 = r and that p = ( p 1 , p 2 ). We also write p = p 2 1 + p 2 2 , q = p 1 p 2 . Finally, we write [x] for the integer part of a real number x.

Lemma 4.2 For all positive integers n and k, we have
Proof For all positive integers n and k, we have However, it is clear that if i, j ∈ k and i, j = 1, . . . , N , then I ii = S i I i and (u, v), and it therefore follows from (4.3) that Using the fact that |i|=|j|=k p i p j r i r j I i ×I j |u − v| n d(u, v) = M n,k , (4.4) now simplifies to We will now compute m n,k . Since i, j ∈ {1, 2}, r 1 = r 2 = r , a 1 = 0, a 2 = 1 − r , s 1,2 = −1 and s 2,1 = 1, we conclude that Finally, using the fact that |i|=|j|=k

Lemma 4.3 For all positive integers n, we have
Proof Since M n,k = A n geo,k (μ p , μ p ) → A n (μ p , μ p ) = M n for all n (by Proposition 1.1), the statement follows immediately from Lemma 4.2.
We now turn towards the definitions of the auxiliary functions L and . We first define the moment generating function M : C → C by for s ∈ C; since |M k | ≤ 1 for all k, it follows that the series   Below we use the following notation. If f : C → C is a differentiable function and n is a positive integer with n ≥ 0, then D n f denotes the n'th derivative of f .  In particular, we conclude from this that if u ≥ 0, then It is not difficult to see that |D i M e (s)| ≤ e |s| for all positive integers i and all complex numbers s. We deduce from this and the previous inequality that if u ≥ 0, then Since r < 1 2 , it follows that −(1 − 2r )r k u < 0, whence e −(1−2r )r k u ≤ 1, and we therefore deduce from (4.12) that where we have used the fact that 1 − p = 2q. This completes the proof of Claim 3. We can now estimate |D n (u)| for all positive integers n and all u ≥ 0. Indeed, for positive integers n and u ≥ 0, we have (4.14) The desired result follows immediately from the above inequality since r < 1 − r . Proof Fix s ∈ C with Re s > 0. We conclude from Proposition 4.5 that there is a constant c such that | (u)| ≤ ce −ru for all u ≥ 0. This implies that Next, introducing the substitution v = ρr k u into the integral ∞ 0 e −r k+1 u u Re s−1 du in (4.15) shows that This completes the proof.

Proof of Theorems 2.4 and 2.5: the proof of (2.7)
The purpose of this section is to prove (2.7), namely, that there is a constant K such that for all positive integers n (recall, that = log q log r where q = p 1 p 2 ). The key tool for proving this inequality is Theorem 5.1 below. For s ∈ C with s = 0, let arg s denote the unique argument of s with arg s ∈ [−π, π), and for θ ∈ [−π, π), write = s ∈ C\{0} | arg s| ≤ θ . Assume that there are positive constants R 0 , R 1 , A 0 , A 1 , D, θ and δ with θ < π 2 and δ < 1 such that the following hold: Then there is a constant K such that for all n. There are positive constants R 0 , R 1 , A 0 , A 1 and δ with δ < 1 such that the following hold: (1) If s ∈ π 4 and |s| > R 0 , then |L(s)| ≤ A 0 1 |s| ; (2) If s / ∈ π 4 and |s| > R 1 , then |L(s)e s | ≤ A 1 e δ|s| . Proof (1) We first prove the following three claims. We now note that if s ∈ π 4 , then Re s = |s| cos arg s where | arg s| ≤ π 4 , whence (since r ≤ 1 2 ) r −cos arg s ≤ 1 2 −cos π 4 = 1 2 − √ 2 2 ≤ − 1 8 , and so r |s|−Re s = r |s|−|s| cos arg s = |s|(r − cos arg s) ≤ − 1 8 |s|. It finally follows from this and (5.3) that if s ∈ π 4 , then This completes the proof of Claim 3.
Combining Claims 1 and 2 we deduce that if s ∈ C, then Next, we observe that if s ∈ π 4 , then r m s ∈ π 4 for all integers m. Using Claim 3 we therefore deduce from (5.4) that if s ∈ π 4 , then where U 0 (s) = ∞ l=0 q l e − 1 8 r l+1 |s| , We will now estimate U 0 (s) and U 1 (s); this is done in Claims 4 and 5 below.

Claim 4
There is a constant k 0 such that U 0 (s) ≤ k 0 1 |s| for all s ∈ C.

Proof of Claim 4 It is easily seen that
Introducing the substitution y = − 1 8 r x+1 |s| into the integral ∞ 0 q x e − 1 8 r x+1 |s| dx in (5.6) yields ∞ 0 y −1 e −y dy denotes the Gamma-function evaluated at ). This completes the proof of Claim 4.

Claim 5 There is a constant k
Proof of Claim 5 Fix s ∈ π 4 . Since Re s ≥ 0 (because s ∈ π 4 ), we conclude that (1 − r k ) Re s ≥ (1 − r ) Re s, whence where This completes the proof of Claim 5.

Theorem 5.3 There is a constant K such that
Proof This follows immediately from Theorems 5.1 and 5.2.

Proof of Theorems 2.4 and 2.5: proof of (2.8)
The purpose of this section is to prove inequality (2.8), namely, that for each real number d with d > (recall that = log q log r = log p 1 p 2 log r ) there is a constant K d such that for all u > 0 where is the function defined in Theorem 2.4 (or, alternatively, in Theorem 6.5 below). The proof of (6.1) is divided in the following four parts: Part 1 We first define the moment zeta-function this is done in Theorem and Definition 6.1.
Part 2 Next, using the Mellin transform theory, we show that L can be written as a complex curve integral involving Z , namely, we show that for 0 < c < we have for all u > 0 (recall that q = p 1 p 2 ); this is done in Theorem 6.3.

Part 3
Finally, using the residue theorem, we compute the complex curve integral in (6.2). In particular, we show that if 0 < c < < d, then for all u > 0; this is done in Theorems 6.4 and 6.6: in Theorems 6.4 and 6.5 estimates for |Z (s)| and |1 − qr −s | are obtained and in Theorem 6.6 we use the residue theorem together with the estimates from Theorems 6.4 and 6.5 to derive formula (6.3). Part 4 The desired inequality [i.e. (6.1)] follows immediately from combining (6.2) and (6.3). We now define the moment zeta-function Z .
Theorem and Definition 6.1 (The moment zeta-function) For s ∈ C with 0 < Re s, we have In particular, the moment zeta function Z : {s ∈ C | Re s > 0} → C defined by Proof This follows from Proposition 4.6.
Next, using the Mellin transform theory, we show that the function L can be expressed as a complex curve integral involving the moment zeta-function Z . For the benefit of the reader we first state the Mellin transform theorem. (i) The function f is piecewise continuous on all compact subintervals of (0, ∞), and at all discontinuity points Then we have: It follows from (1) that the function M f : {s ∈ C | a < Re s < b} → C given by The function M f is called the Mellin transform of f .
(3) For c ∈ R with 0 < c < and u > 0 the integral c+i∞ c−i∞ u −s (ML)(s) ds is well-defined and we have In particular, for c ∈ R with 0 < c < and u > 0, we have Proof (1)-(2) We first note that a simple calculation shows that L(s) = q L(rs) + (s) for s ∈ C, and it therefore follows from Lemma 4.1 that the series ∞ k=0 q k (r k s) converges and that Now fix s ∈ C with 0 < Re s < , and define the functions f n , f, g : (0, ∞) → C for positive integers n by Since Re s < = log q log r , we conclude that q r Re s < 1, whence ∞ k=0 ( q r Re s ) k < ∞. This and Proposition 4.6 imply that We also note that f n (u) → f (u) for all u ∈ (0, ∞) and that | f n | ≤ g for all n. Since ∞ 0 g(u) du < ∞ [by (6.5)], we now conclude from this and the dominated convergence theorem that (3) This statement follows from Theorem 6.2.
Finally, using the residue theorem, we compute the complex curve integral in (6.4). This is done in Theorems 6.4, 6.5 and 6.6: in Theorems 6.4 and 6.5 estimates for |Z (s)| and |1 − qr −s | are obtained and in Theorem 6.6 we use the residue theorem together with the estimates from Theorems 6.4 and 6.5 to compute the curve integral in (6.4).  We conclude immediately from this that

Theorem 6.4 For a real number d with 0 < d, write
Again, recalling that it follows from Proposition 4.5, that for each integer n, there is a positive constant c n such that |D n (u)| ≤ c n e −ρu for all u ≥ 0. In particular, this implies that there is a constant c 2 such that if s ∈ C, with Re s > 0, then | (u)u s+1 | ≤ c 2 e −ρu u Re s+1 . We conclude from this and (6.8) that if s ∈ C, with Re s > 0, then We deduce from this that if s ∈ H d , then (2) Define f : C → R by f (s) = 1 − qr −s and write I = {z ∈ C | − π − log r ≤ Im z ≤ π − log r }. It is clear that f is periodic with period equal to 2π − log r , and so inf s∈K d | f (s)| = inf s∈K d ∩I | f (s)|. It is also clear that f (s) = 0 if and only if s ∈ + 2π − log r Z. Since d = , we deduce from this that f (s) = 0 for s ∈ K d ∩ I, and the compactness of K d ∩ I therefore shows that there is a real constant k d such that | f (s)| ≥ 1 k d for all s ∈ K d ∩ I, whence inf s∈K d | f (s)| ≥ inf s∈K d | f (s)| = inf s∈K d ∩I | f (s)| ≥ 1 k d . This implies that 1 | f (s)| ≤ k d for all s ∈ K d .
Below we use the following notation, namely, if f is a holomorphic function, then P( f ) denotes the set of poles of f , and if ω is a pole of f , then res( f ; ω) denotes the residue of f at ω. Theorem 6.5 For n ∈ Z, write s n = + 1 − log r 2πin.
(2) For n ∈ Z, we have (3) Note that it follows from Theorem 6.4 that there is a constant h such that |Z (s)| ≤ h 1 |s| 2 for all complex numbers s with 0 < Re s ≤ . In particular, since Re s n = for all n ∈ Z, this implies that |Z (s n )| ≤ h 1 |s n | 2 for all n. We now conclude from this and parts (1) and (2)  Proof Write a n = Im( s n +s n−1 2 ) for n ∈ N. Fix a real numbers c and d with 0 < c < < d and let d,n , γ − c,d,n and γ + c,d,n denote the following paths in C: d,n is the directed line segment from d + ia n to d − ia n ; γ − c,d,n is the directed line segment from d − ia n to c − ia n ; γ + c,d,n is the directed line segment from c + ia n to d + ia n .
Fix u > 0. Let G c,d,n denote the region enclosed by the paths d,n , γ − c,d,n , γ + c,d,n and the directed line segment from c−ia n to c+ia n . Since G c,d,n ∩ P(s → Z (s) 1−qr −s u −s ) = {s k | |k| < n}, it now follows from the residue theorem applied to the function s → Z (s) 1−qr −s u −s where s ∈ C with Re s > 0 that 1 2πi We now prove the follows claims.
Claim 1 For all c with 0 < c < , we have v c,n (u) → 0 as n → ∞ for all u > 0 and w c,n (u) → 0 as n → ∞ for all u > 0.