On the empirical estimator of the boundary in inverse first-exit problems

First-exit problems for the Brownian motion (W(t)) or general diffusion processes, have important applications. Given a boundary b(t), the distribution of the first-exit time τ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document} has to be computed, in most cases numerically. In the inverse first-passage-time problems, the distribution of τ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document} is given and the boundary b has to be found. The boundary and the density of τ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document} satisfy a Volterra integral equation. Again numerical methods approximate the solution b for given distribution of τ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document}. We propose and analyze estimators of b for a given sample τ1,…,τn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _1,\ldots ,\tau _n$$\end{document} of first-exit times. The first estimator, the empirical estimator, is the solution of a stochastic version of the Volterra equation. We prove that it is strongly consistent and we derive an upper bound for its asymptotics convergence rate. Finally, this estimator is compared to a Bayesian estimator, which is based on an approximate likelihood function. Monte Carlo experiments suggests that the empirical estimator is simple, computationally manageable and outperforms the alternative procedure considered in this paper.


Introduction
The analysis of first-passage problems of diffusion processes and especially of the Brownian motion is a rich field in applied probability with important applications in various areas such as mathematical finance, statistics, physics, engineering or biology. For instance, in mathematical finance, a default event can be modelled by the first time a stochastic process representing firm value crosses a certain possibly time-varying barrier or a barrier option can be exercised if the underlying value process reaches a B Sercan Gür guer.sercan@gmail.com predefined boundary. In biology, extinction of a population can be described by the event that the number of individuals first passes a threshold value (for more applications in biology, we refer the reader to Ricciardi et al. 1999). Another area of application arises in statistics, in particular in sequential analysis and change-point problems.
Let (W (t)) t≥0 denote a standard Brownian motion, i.e. a Gaussian process with continuous paths, W (0) = 0, E(W (t)) = 0 and E(W (s)W (t)) = s ∧ t. Let b : [0, ∞) → R denote a function with b(0) ≥ 0, the upper boundary. Define the firstexit time τ is a stopping time. Denote the distribution of τ by F. Given regularity conditions, F is absolutely continuous with a density f , which is continuous and strictly positive on (0, ∞).
The direct first-passage-time problem identifies F for given b. The inverse problem computes b for given F or f . The problem of finding the boundary function b is of importance in many fields, and the application areas are similar to the ones involving the direct first-passage time problems, e.g. mathematical finance, in particular credit risk modelling or in biology, in neural activity (for more details see Abundo 2015). For both problems an extensive literature exists. Since for only few boundaries the distribution F of τ can be computed in closed form, numerical procedures, approximate solutions of corresponding partial differential equations, or Monte Carlo methods are necessary. For details we refer to Durbin (1971), Lerche (1986), Salminen (1988), Novikov et al. (1999 or Pötzelberger and Wang (2001).
Inverse problems are often based on a Volterra Integral equation, the so-called master equation, for which withΦ = 1 − Φ the survival function of the standard normal distribution. z = b(t) and the differentiation of (2) See Peskir (2002), Zucca andSacerdote (2009) or Abundo (2015) for thorough discussion of the inverse problem. If f is given, an approximation of b(t) on {t i | t i = hi} is the solution of the system of equations In this paper we analyze the statistical inverse first-passage-time problem: Given a sample τ 1 , . . . , τ n of independent first-exit times, we approximate the unknown boundary b(t) by an estimatorb n (t). We propose the empirical estimator, which is the solution of (2), when F is replaced by the empirical distributionF n . This paper is organized as follows. In Sect. 2 we prove that the empirical estimator is strongly consistent with rate o((log n + η log log n) 1/2 n −1/2 ) for every η > 1/2, uniformly on t ∈ [0, T ], for all T > 0.
We compare the performance of the empirical estimator to an approximate conditional likelihood method, namely a Bayes estimator. The approximate conditional likelihood is the density of the first-exit time τ , when the boundary b is approximated by a piecewise linear boundary b m , i.e. a boundary that is linear on intervals [t i−1 , t i ], with 0 = t 0 < t 1 < · · · t m = T a partition of [0, T ]. For b m , the density of τ , given W (t 1 ), . . . , W (t m ), can be computed in closed form. In Sect. 3 we compute this approximate and conditional density. Section 4 concludes with the results of Monte Carlo experiments for the empirical estimator and a Bayes estimator derived from the approximate likelihood.

Empirical estimator
Let τ 1 , . . . , τ n be an i.i.d. sample of first-exit times corresponding to the boundary b. Note that τ i = ∞ if the Brownian motion (W (t)) never crosses the boundary b. We denote the empirical distribution of The empirical estimator is consistent. Equation (6) has a solution for all t. However, it is convenient to solve a corresponding system of equations at the sample. Let τ (1) ≤ τ (2) ≤ · · · ≤ τ (n) denote the order statistics of the sample. Note that for finite order statistics τ (i) , τ (i) < τ (i+1) a.s.
Theorem 1 Let b be continuously differentiable with b(0) > 0. The empirical estimator is strongly consistent: Let for η > 1/2, n = (log n + η log log n) 1/2 n −1/2 . Then for all 0 < T , Proof The empirical estimator can be considered as a discretization scheme of the master equation with the order statistics τ (i) as random knots. Zucca and Sacerdote (2009) proved the consistency of the Euler scheme for the (deterministic) master equation. We follow their lines, with the necessary modifications indicated. Define for T > 0 fixed and knots 0 = t 0 < t 1 < · · · < t n = T the solution of the Euler scheme (5) by b * (t k ) and the local consistency error by . The proof of Theorem 6.2. in Zucca and Sacerdote (2009) shows that there is a constantc > 0 (depending on b and T only), such that for all i, Define for θ > 0, Since |Z j (θ )| ≤ 1 for all j, Hoeffding's inequality gives for all > 0, For n = (log n + η log log n) 1/2 n −1/2 we get for η = 2η − 1 > 0 and fixed θ , the Theorem of Borel-Cantelli implies (7).

Remark
In case of censoring at T , the empirical estimator is still consistent, if a consistent estimator of F(T ) is available. If this estimator is even strongly consistent with rate n , then the empirical estimator is strongly consistent with rate n ∨ n , with n defined in Theorem 1.
Let us briefly comment on the asymptotic distribution of the residuals (ˆ n (t)) defined asˆ Denote byÛ n (t) the empirical procesŝ a Brownian bridge, a Gaussian process with continuous paths, E(U (t)) = 0 and Cov(U (s), There are processes (Û * n ) and a Brownian bridge (U F * ) with the same distributions as (Û n ) and (U F ), such that with probability 1, Û * n −U F * → 0 (see Shorack and Wellner 2009). To simplify the exposition, and since we are interested in the asymptotic distribution of the residuals only, we may assume that a.s. Û n − U F → 0.
We havē Assume that for all t,ˆ n (t) would converge to a limit (t). The process ( (t)) would solve With (4), ( (t)) would solve the stochastic linear Abel integral equation However, there is no "classical" solution ( (t)) of (10). To see this, let b(t) = b be constant. Recall that the density f of τ is continuous and bounded. Then Eq. (10) is Applying the Abel transform, (see Gorenflo and Vessella 1991), we get The process (H t ) is a Gaussian process with E(H t ) = 0 and for s ≤ t, with 2. Note that Lévy's theorem on the modulus of continuity implies that the Brownian bridge U F has modulus of continuity 2h log(1/h), i.e. it holds It follows that the modulus of continuity of (H t ) is ch log(1/h), with c > 0 a constant. (12).

Approximate likelihood
For piecewise linear boundaries the following conditional boundary crossing probability allows the computation of an approximate conditional likelihood function. Let b m be continuous and linear on intervals Let τ m denote the corresponding first-exit time and W m = (W (t i )) i≤m a discrete Brownian motion and w m = (w 1 , . . . , w m ) ∈ R m . Wang and Pötzelberger (1997) with Let f (t | b m ) denote the density of τ m , the first-exit time for the boundary b m and Proposition 2 Define for given t d : t u = t d+1 and Δ = t u − t d . Let 1. For t d < t < t u and w u ≥ b(t u ), There is no crossing in [t d , t] and a crossing in [t, t u ]. Note that in case W (t u ) ≥ b(t u ) the latter conditional probability is 1. Taking expectation w.r.t. W (t) and finally the derivative w.r.t. t gives (17) and (18).
Approximate likelihood inference replaces the exact likelihood function by the approximate one, i.e. the boundary b is approximated by a piecewise linear boundary b m . Estimates for errors, especially on |P(τ > t) − P(τ m > t)| are derived in Pötzelberger and Wang (2001), Borovkov and Novikov (2005), Zucca and Sacerdote (2009) and Pötzelberger (2012), among others.

Monte Carlo experiments
Monte Carlo simulation experiments were performed to evaluate the performance of the empirical estimator for finite sample sizes. Since for the Bayes estimator no theoretical result on its properties is available, the Monte Carlo experiments can indicate whether likelihood-based methods have the potential to outperform the empirical estimator. We estimate four boundaries-a constant boundary, a linear increasing, a linear decreasing and a Daniel's boundary-for which the first-exit time distribution is known in closed form, on [0, T ] with T = 1. The fifth boundary corresponds to exponentially distributed first-exit times (see Abundo 2015).
(5) Exp FPT density  The results for the empirical estimator are given in Fig. 1. The mean-integratedsquared errors reported in Tables 1 and 2 are an estimate of For the empirical estimator, we generate K = 100 samples of first-exit times of size n. For each sample (with τ (0) = 0) is computed. The MISE is the mean over these K = 100 samples. We (1) Constant (2) Increasing lin.
(5) Exp FPT density The product of these conditional approximate likelihoods is denoted by L(b, w 1 , . . . , w n , τ 1 , . . . , τ n ). The problem can be formulated as a latent space model with a suitably chosen prior for the parameter b. The Bayes estimator is the posterior mean of a sample of parameters b generated through a Markov Chain Monte Carlo scheme: -Parameter b.
-Prior We assume that the slopes of b follow a random walk with the double gamma shrinkage prior on the process variances, (see Bitto and Frühwirth-Schnatter 2019) a ξ κ 2 ).
-Computational details The estimation is performed in JAGS (see Plummer 2015) and the results shown are based on one chain, burn-in 5000, 10,000 iterations with a thinning of 10 and hyperparameters a ξ = 0.1, κ 2 = 1.
Results are given in Fig. 2 and Table 2. The MISE in Table 2 is defined analogously to the case of the empirical estimator, with (19) replaced by

Remark and Conclusion
The computation of the empirical estimator is straightforward and in negligible time. A tight upper bound for its asymptotic error is available. The Bayes estimator based on the approximate likelihood could incorporate prior knowledge and has its potential if the class of boundaries can be parametrized by a finite-dimensional parameter. In the nonparametric case, the numerical experiments revealed drawbacks, at least compared to the empirical estimator. The computation was costly, considering time. The study was performed on 10 nodes using a cluster of workstations. Each node on the cluster has 2 six core Intel Xeon X 5670 @ 2.93 GHz processor and was used for one boundary with a given n. The execution times are around 7 min and 2 h 20 min for the Bayesian estimator (while they are only around 1 s and 9 s for the empirical estimator) with n = 10 2 , n = 10 3 respectively. As can be seen in Fig. 2, in all cases considered the Bayes estimator showed a strong positive bias. This bias should be a result of the data-augmentation procedure. Note that the discrete Brownian motion is always below the boundary up to the observed exit-time. Then, conditional on the n discrete Brownian motions, the newly sampled boundary is above all these discrete Brownian motions, which have not crossed the boundary up to t. These findings do not depend on the chosen prior. Alternative priors, such as (discrete) Ornstein-Uhlenbeck processes have been considered with qualitatively the same result. material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.