Estimation of stopping times for stopped self-similar random processes

Let X=(Xt)t≥0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X=(X_t)_{t\ge 0}$$\end{document} be a known process and T an unknown random time independent of X. Our goal is to derive the distribution of T based on an iid sample of XT\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X_T$$\end{document}. Belomestny and Schoenmakers (Stoch Process Appl 126(7):2092–2122, 2015) propose a solution based the Mellin transform in case where X is a Brownian motion. Applying their technique we construct a non-parametric estimator for the density of T for a self-similar one-dimensional process X. We calculate the minimax convergence rate of our estimator in some examples with a particular focus on Bessel processes where we also show asymptotic normality.


Introduction
considered the problem of recovering the distribution of an independent random time T based on iid samples from a one-dimensional Brownian motion B at time T . Comte and Genon-Catalot (2015) already considered this problem for Poisson processes. Here we use the method of Belomestny and Schoenmakers (2015) and derive corresponding results for self-similar processes. We particularly focus on Bessel processes. As a consequence, we extend results from Belomestny and Schoenmakers (2015) to multi-dimensional Brownian motion. This is accomplished by considering the two-norm of the multi-dimensional Brownian motion, thus reducing the problem to the case of a Bessel process which is a one-dimensional process and can be treated similarly to the case of onedimensional Brownian motion. More specifically, we give a non-parametric estimator for the density f T of T . We show consistency of this estimator with respect to the L 2 risk and derive a polynomial convergence rate for sufficiently smooth densities f T . Moreover, we B Viktor Schulmann viktor.schulmann@tu-dortmund.de 1 Fakultät für Mathematik,Technische Universität Dortmund,44221 Dortmund,Germany show that this rate is optimal in the minimax sense. The constructed estimator is also shown to be asymptotically normal. The paper is organized as follows: In Sect. 2 we recapitulate the Mellin transform which is our main tool throughout this paper. Using this transform we construct our estimator in Sect. 3 by solving a multiplicative deconvolution problem which is related to the original problem through self-similarity of the underlying process. The use of the Mellin transform in multiplicative deconvolution problems proposed by Belomestny and Schoenmakers (2015) is different to the standard approach which consists in applying a log-transformation and thus reducing the problem to an additive deconvolution problem which is usually addressed by the kernel density deconvolution technique. In Sect. 4 we give bounds on bias and variance of the estimator in the general self-similar case. In the following two sections we lay our focus on Bessel processes and give the convergence rates of our estimator for this case (Sect. 5) and show its asymptotic normality (Sect. 6). Section 7 is devoted to two further examples of self-similar processes where our method yields consistent estimators. Their convergence rates are provided there. In the following Sect. 8 we show optimality in the minimax sense of all previously obtained rates. Some numerical examples are given in Sect. 9. Finally, we collect some of the longer proofs in Sect. 10.

Mellin transform
In this section we recapitulate some properties of the Mellin transform from Butzer and Jansche (1997). This integral transform will be our main tool in estimation procedures of the next sections. For an interval I define the space If f is the density function of an R + -valued random variable, then we have at least f ∈ M [1,1] .
is well defined and holomorphic on the strip {s ∈ C| Re(s) ∈ (a, b)} according to Butzer and Jansche (1997).

Example 2 (i) Consider gamma densities
for x, σ, r > 0. For all s ∈ C with Re(s − σ + 1) > 0 we have For Re(s) > 1 − d elementary calculus shows that Similar to the well-known relation of the classical Fourier transform to sums of independent random variables, the Mellin transform behaves multiplicatively with respect to products of independent random variables: In the setting of Theorem 3 it is easy to see that f XY is identical to Given the Mellin transform of a function f we can reconstruct f : Another important result in the theory of Mellin transforms is the Parseval formula for Mellin tranforms (see (Bleistein and Handelsman 1986, page 108) for the proof):

Construction of the estimator
We consider a real-valued stochastic process (Y t ) t≥0 with càdlàg paths which is self-similar with scaling parameter H (for short, H -ss), that is Here, d = denotes identity of all finite dimensional distributions. Let T ≥ 0 be a stopping time with density f T independent of Y . Let X 1 , . . . , X n be iid samples of Y T . In order to construct a non-parametric estimator for f T we use the simple consequence of (5) that We take the absolute value on both sides and assume that f T ∈ M (a,b) with 0 ≤ a < b and that the density of Y 1 is in M (0,∞) , so we can apply the Mellin transform on both sides of (6) and obtain If the Mellin inversion formula (Lemma 4) is applicable to T , we may write for a < γ < b. Combining (7) and (8) we obtain the representation for max{1 − H , a} < γ < b. In order to obtain an estimator of f T based on (9) we would like to replace M[|Y T |] by its empirical counterpart However, this substitution may prevent the integral in (9) from converging. Thus, we introduce a sequence (g n ) n∈N with g n → ∞ (chosen later) in order to regularize our estimator. In view of (9) definef for x > 0 and max{1 − H , a} < γ < b as an estimator for f T .

Convergence analysis
For the sake of brevity we introduce the notation For 0 ≤ a < b and β ∈ (0, π) consider the class of densities For the bias of the estimator (10) we have: for all x > 0, γ ∈ (max{a, 1 − H }, b).
Having established an upper bound on the bias off n , we now shall do the same for the variance of our estimator.
In order to get a bound on Var which (together with (15)) gives the desired bound on Var[f n (x)].

Application to Bessel processes
In this section we choose Y to be a Bessel process ∞). Note that the case d = 1 leads to the absolute value of the onedimensional Brownian motion and was already considered in Belomestny and Schoenmakers (2015). We refer to Revuz and Yor (1999) for detailed information about Bessel processes. It is well-known, that Bessel processes are 1 2 -ss and have continuous paths. Marginal densities are given by: as an estimator for the density f T (x) of a stopping time T ≥ 0 for x > 0 and max{1/2, a} < γ < b, where a, b are such that f T ∈ M (a,b) and X 1 , . . . , X n are independent samples of B E S T . With our major result Theorem 8 we shall derive the convergence rates for (16).
for some C L,d,γ > 0 depending only on L, γ, d as well as T . Moreover, taking g n = log n π + 2β (18) in (17), one has for all x > 0 the polynomial convergence rate Proof Let x > 0. We use the upper bound on variance obtained in Lemma 7 with H = 1/2 to get for some C 0 (γ , d) > 0. By Example 2(ii) and Lemma 21(ii) we have for some constants C 1 (d, γ ) and C 2 . Adding (21) and (11) gives for some C L,d,γ > 0. The choice (18) yields the rate (19).

Asymptotic normality for Bessel processes
Note that the estimator (16) can be written aŝ Sincef n is a sum of iid variables, we can show that (under mild assumptions on f T )f n is asymptotically normal. In fact, we have: for all x > 0, where with some c > 0 given by (47).
We present the proof in Sect. 10.1. As we mentioned in the end of Sect. 5, we can often assume (a, b) = (0, ∞), so that the choice of γ is only restricted by γ > max{1/2, (4−d)/4}. In this case a suitable δ can always be found: It is possible to give a Berry-Esseen type error estimate for the convergence in (26). This is a new result even for dimension d = 1.
is given by (27)) and by Φ the distribution function of the standard normal distribution. If we choose g n ∼ log(n) in (16) then we have Proof Let x > 0 and n ∈ N. Consider the representation (23) off n (x). Berry-Esseen Theorem (see Gänssler and Stute 1977) states We choose j = 3 in Lemma 11 to get for n → ∞. By Theorem 9 we have (27). Choose g n ∼ log(n). Plugging (30) and (27)  The following observation about the absolute moments of Z n,1 is useful in the proof of Theorem 9 but also holds some insights in itself.

Lemma 11
as n → ∞ for all j ∈ R + . In particular, all absolute moments of Z n,1 exist for all n ∈ N greater than some n 0 ∈ N.
For the special case (a, b) = (0, 1) and d = j = 1 this result is mentioned in Belomestny and Schoenmakers (2015) but without an extensive proof which we provide here. Note that for d ≥ 2 the assumption γ > (4 − d)/4 is redundant. Moreover, we always have the smaller bound of the second case in (31).

Normally distributed processes
Let Y = (Y t ) t≥0 be H -ss with càdlàg paths and Y 1 standard normally distributed. As example consider a fractional Brownian motion. This setting is easily generalized to the case where Y 1 ∼ N (0, σ 2 ) with σ 2 > 0 by considering the process (Ỹ t ) t≥0 := (Y t /σ ) t≥0 and modifying our observations toX i := X i /σ . Taking d = 1 in Example 2(ii) we see that estimator (10) assumes the formf for x > 0 and max{1 − H , a} < γ < b. We can prove a convergence result for this estimator, similar to Theorem 8.
for n → ∞ and all x > 0. Taking we obtain for all x > 0 the polynomial convergence rate for n → ∞.

Proof
The proof is analogous to the one of Theorem 8 except for the upper bound on variance which is in this case for some C 0 (γ , H ) > 0. Combining this with the bound on the bias from Lemma 6(i) we obtain (33). Plugging (34) into (33) gives the rate (35).
Taking H = 1/2 in Theorem 12 we obtain the same rates as for Bessel processes (see Theorem 8). For smaller H the rate is worse and for greater H it is better. Note that we work with observations of |Y T | rather than Y T .

Gamma distributed processes
Let Y = (Y t ) t≥0 be H -ss with càdlàg paths such that Y 1 has Gamma density (1) with r = 1. We can easily generalize to the case r > 0, by considering the process (Ỹ t ) t≥0 := (rY t ) t≥0 and modifying our observations toX i := r X i . As an example consider the so-called square of a Bessel process with dimension d starting at 0 (see Revuz and Yor 1999, Chapter XI, §1). Considering Example 2(i) estimator (10) takes the form for x > 0 and max{1 − σ H , a} < γ < b. We can prove a convergence result for this estimator, that is similar to Theorems 8 and 12. ∈ (a, b), then for n → ∞ and all x > 0. If then for all x > 0 we have

Proof In this case he upper bound on variance becomes
Var Rest is again analogue to the proof of Theorem 8.

Optimality
The rates from Theorems 8, 12 and 13 are optimal in the minimax sense.
Theorem 14 For all β ∈ (0, π) and 0 < a < b < π/β there is x > 0 such that (ii) a H -ss. Gaussian process Y (H ∈ (0, 2)) and ψ n = n − β π 2H +2β ; (iii) a H -ss. Gamma distributed process Y (H ∈ (0, 2)) and ψ n = n See Sect. 10.2 for the proof of this theorem. A similar optimality result was obtained in Belomestny and Schoenmakers (2015) for the case where the absolute value of a onedimensional Brownian motion is observed. (40) means that for each estimatorf n , that we may construct with our observations, there is a true density f ∈ C(β, a, b) such that for some x > 0, i.e. it is impossible to construct an estimator with a convergence rate (w.r.t. L 2 -distance) faster than ψ n for all f ∈ C(β, a, b) and all x > 0.

Simulation study
In this Section we test our estimator (16) with some simulated data. Consider a Bessel process with dimension d = 5 and a Gamma(2, 1) distributed stopping time T , i.e. T has the density In order to evaluate the estimator (16) we choose γ = 0.7. Take the cut-off parameter g n = log(n) (π +2β) (in accordance with (18)) and β = 0. To choose β small appears counterintuitive at first because we showed in Theorem 8 that the convergence rate is better for large β. However, in our examples the choice β = 0 delivers the best results. This can be explained as follows: Our bound on the bias of estimatorf n contains the constant L (see (14)) as a factor. This constant is growing in β and seems to make a crucial contribution to the overall error. We refer to Belomestny and Schoenmakers (2015) and Schulmann (2019) for an alternative choice of g n based purely on the data.
In order to test the performance off n we compute it based on 100 independent samples of B E S T of size n ∈ {1000, 5000, 10,000, 50,000}. In Fig. 1 we see the resulting box-plots of the loss.
Let us demonstrate the performance of our estimator for different distributions of T . As examples we consider Exponential, Gamma, Inverse-Gaussian and Weibull distributions. To construct the estimate (16) we choose d = 5, γ = 0.8, n = 1000 and g n as before. Figure 2 shows the densities of the four distributions and their 50 respective estimates based on 50 independent samples of B E S T .
We can see that the error is particularly large in the neighborhood of 0. That is because our estimator is not defined in 0 and lim x→0f n (x) does not exist for fixed n. Note also that the variance of our estimator is large for small x (see (27)). Conversely, we obtain better results for large x.

Proof of Theorem 9
We roughly imitate the proof of an analogous result for the special case d = 1, (a, b) = (0, 1) found in Belomestny and Schoenmakers (2015). In distinction from Belomestny and Schoenmakers (2015) we do not restrict ourselves to the case x = 1 in the proof and provide the specific form of ν n for all x > 0.
Let x > 0. It suffices to show the Lyapunov condition, i.e. for a δ > 0: The claim (26) follows from (41) In any case of Lemma 11 (for j = δ + 2) we have for all δ ∈ R + and some c > 0. Now we investigate the asymptotic behavior of Var [Z n,1 ].
Looking at (24) we use Fubini's theorem to obtain By Example 2(ii) we can estimate for some C > 0 and further Our strategy now is to decompose the double integral defining R 1 into pieces that are easy to estimate. To that end let ρ n := g α n , where 0 < α < 1/2 and define By Lemma 20 there are C 1 , C 2 > 0 such that and K 1 , K 2 > 0 such that With the help of these inequalities we deduce and g n −g n g n −g n for some l ≥ 0. Combine (44) and (45) to obtain dvdu + O(g l n e π(g n −ρ n ) ) =: I 3 n + O(g l n e π(g n −ρ n ) ).
Next, we examine the asymptotic behavior of the integral I 3 n . To this end, we take advantage of Stirling's formula (Lemma 19) for v → ∞. Consider the integrand of I 3 n . In the denominator it holds by means of the identity Note that due to the choice of ρ n , we have ρ n g −1 n → 0 and ρ 2 n g −1 n → 0. We use the asymptotic decomposition × e −π g n e π(r +s)/2 (1 + O(ρ 2 n g −1 n )). Analogously, on the set × e −π g n e π(r +s)/2 (1 + O(ρ 2 n g −1 n )).
Hence, I 3 n can be decomposed as follows: The integral in (46) allows a series representation via Lemma 22. In fact, uniformly in v. Thus, holds with Summing up the auxiliary quantities introduced above we get (1) and thus (27). If g n ∼ log(n), then (27) and (43) imply (42) and hence the claim.

Proof of Theorem 14
The basic construction used in this proof is due to Belomestny and Schoenmakers (2016), where it is used in the context of an observed Brownian motion. Define the χ 2 -divergence between two probability measures P 0 and P 1 with densities q 0 and q 1 . The following general result forms the basis for the subsequent steps (see Tsybakov 2008 for a proof).
holds for some α > 0 independent of n, then holds for some c > 0, where the infimum is over all estimators.
Let β ∈ (0, π) and 0 < a < b < π/β. Define for M > 0 The following lemma provides some properties of the functions q and ρ M .

Lemma 16
The function q is a probability density on R + with Mellin transform The Mellin transform of the function ρ M is given by Proof Formula (52) can be found in Oberhettinger (2012) and (53) is shown in (Belomestny and Schoenmakers 2016, Lemma 6.2).
Set now for any M > 0 and some δ > 0, for x ≥ 0, where q ρ M is defined by (4). The following lemma will help us verify condition (49).

Lemma 17
For any M > 0 and some δ > 0 not depending on M the function f 1,M is a probability density satisfying Moreover, f 0,M and f 1,M are in C(β, a, b) for all β ∈ (0, π) and 0 < a < b < π/β. Proof For (55) see (Belomestny and Schoenmakers 2016, Lemma 6.3) where it is also shown that for δ small enough: It is easy to see that f 0,M ∈ C(β, a, b) for all β ∈ (0, π), 0 < a < b < π/β and (56) implies the same for f 1,M .
Looking further towards applying Theorem 15 let us consider the densities p M,0 and p M,1 of an observation associated with the hypotheses f 0,M and f 1,M , respectively. At this point we have to differentiate between the models we discussed so far. We will only present the proof for the Bessel case, parts (ii) and (iii) of Theorem 14 can be showed along the same lines.