Approximations for the Boundary Crossing Probabilities of Moving Sums of Random Variables

In this paper we study approximations for the boundary crossing probabilities of moving sums of i.i.d. normal random variables. We approximate a discrete time problem with a continuous time problem allowing us to apply established theory for stationary Gaussian processes. By then subsequently correcting approximations for discrete time, we show that the developed approximations are very accurate even for a small window length. Also, they have high accuracy when the original r.v. are not exactly normal and when the weights in the moving window are not all equal. We then provide accurate and simple approximations for ARL, the average run length until crossing the boundary.

In Sections 4.7 and 6 we present results of large-scale simulation studies evaluating the performance of the considered approximations (also, in the cases when the original r.v. ε j are not normal and the weights in the moving window are not equal). In Section 7, we develop an approximation for ARL and compare its accuracy to the one developed in Glaz et al. (2012).

Standardisation of the Moving Sums
The first two moments of S n,L are E S n,L = θL and var(S n,L ) = σ 2 L. Define In accordance with the terminology of Shepp (1971) and Slepian (1961) we shall call F L (T , h) 'first-passage probability'. In the following sections, we derive approximations for (2.5). These approximations will be based on approximating the sequence of r.v. {ξ 0,L , ξ 1,L , . . . , ξ M,L } by a continuous-time random process and subsequently correcting the obtained approximations for discreteness. Before doing this, we formulate the approximation which is currently the state-of-the-art.

Glaz Approximation for P L (T , h )
The approximation for the BCP P L (T , h) developed in Glaz and Johnson (1988), Glaz et al. (2012), , and  and discussed in the introduction is as follows.
Approximation 1 (Glaz approximation) For T ≥ 2, where to approximate the first-passage probabilities F L (1, h) and F L (2, h), which are L+1 and 2L + 1 dimensional integrals respectively, it is advised to use the so-called 'GenzBretz' algorithm for numerical evaluation of multivariate normal probabilities; see Genz and Bretz (2009) and Genz et al. (2018).
Unless h is large (say, h > 3), Approximation 1 is very accurate. However, its computational cost is also high, especially for large L. Moreover, the main option in the 'GenzBretz' package requires the use of Monte-Carlo simulations so that for reliable estimation of highdimensional integrals one needs to make a lot of averaging; see Sections 6.1 and 7 for more discussion on these issues.

Continuous-time (diffusion) Approximation
For the purpose of approximating the BCP P L (T , h), we replace the discrete-time process ξ 0,L , . . . , ξ M,L with a continuous process S(t), t ∈ [0, T ], where T = M/L (we will then correct the corresponding first-passage probabilities for discreteness). We do this as follows.
Set = 1/L and define t n = n ∈ [0, T ] n = 0, 1, . . . , M. Define a piece-wise linear continuous-time process S L (t), t ∈ [0, T ] : By construction, the process S L (t) is such that S L (t n ) = ξ n,L for n = 0, . . . , M. Also we have that S L (t) is a second-order stationary process in the sense that E S L (t), var(S L (t)) and the autocorrelation function is a Gaussian second-order stationary process with marginal distribution S(t) ∼ N(0, 1) for all t ∈ [0, T ] and autocorrelation function R(t, t + s) = R(s) = max{0, 1−|s|} .
This lemma is a simple consequence of (2.3).

Diffusion Approximations: Definition and Their Role in This Study
The above approximation of a discrete-time process {ξ 0,L , ξ 1,L , . . . , ξ M,L } with a continuous process S(t), t ∈ [0, T ], allows us to approximate the BCP P L (T , h) by a continuous-time analogue as follows.
By the definition of a diffusion approximation, the BCP P L (T , h) is approximated by Note that approximating the discrete process of moving sums by a continuous-time process S(t) and subsequent approximation of the BCP P L (T , h) by P(T , h) is by no means new. This has been done, in particular, in Haiman (1999). We will call (2.8) and any approximation to (2.8), which does not involve the knowledge of L, 'diffusion approximation'. These approximations can be greatly improved with the help of the methodology developed by D.Siegmund and adapted to our setup in Section 4. The importance of the discrete-time correction is illustrated by Figs. 1 and 2, where for a fixed h and T we can see a significant difference in values of the BCPs P L (T , h) for different values of L. As seen from Fig. 2, even for very large L = 1000, the discrete-time correction is still needed. Hence we are not recommending to use any approximation for P (T , h) (including rather sophisticated ones like the one developed in Haiman (1999)) as an approximation for P L (T , h). In the next section we will discuss a diffusion approximation that, after correcting for discrete time, will be a cornerstone for all approximations developed in this paper.
In what follows, it will also be convenient to use the first-passage probability

Shepp's Formulas
Define the conditional first-passage probability The result of Shepp (1971, p.949) states than if T = n is a positive integer then For non-integer T ≥ 1, the exact formula for F(T , h | x) is even more complex (the integral has the dimension 2T ) and completely impractical for computing P(T , h) with T > 2, see Shepp ((Shepp 1971), p.950). For n = 1, we obtain (3.3) For n = 2, (3.2) yields (3.4) The three-dimensional integral in (3.4) can be reduced to a one-dimensional, see (4.11) below with h L = h.

An Alternative Representation of the Shepp's Formula (3.2)
It follows from Shepp's proof of (3.2) that s 0 , s 1 , . . . , s n have the meaning of the values of the process S(t) at the times t = 0, 1, . . . , n: S(i) = s i (i = 0, 1, . . . , n). The range of the variables s i is (−∞, h). Changing the variables in (3.2), we obtain where

Joint Density for the Values {S (i )} and Associated Transition Densities
From (3.5), we obtain the following expression for the joint probability density function for the values S(0), S(1), . . . , S(n) under the condition S(t) < h for all t ∈ [0, n]: From this formula, we can derive the transition density from s 0 = x to s n conditionally S(t) < h, ∀t ∈ [0, n]: For this transition density, In the case n = 1, (3.7) gives (3.9) From this and (3.8) we get Rather than just recovering the transition density from s 0 = x to s n , we can also use (3.6) and (3.8) to obtain the transition density from x = s j to z = s n , 0 < j < n, under the condition S(t) < h for all t ∈ [0, n]: where s j = x and s n = z. For j = 1 and n = 2 we obtain the transition density from x = s 1 to z = s 2 under the condition S(t) < h for all t ∈ [0, 2]:

Rewriting (3.2) in Terms of the Brownian Motion
Let W (t) be the standard Brownian Motion process on Suppose T ≥ 1 is an integer and define the event (4.1) It follows from the proof of (3.2) that to correct (4.1) for discrete time, one must correct the following probability for discrete time Due to the conditioning on the rhs of (4.2), the processes W i (t) can be treated as independent Brownian motion processes. Therefore, the independent increments of the Brownian motion means correcting formula (3.2) for discrete time is equivalent to correcting the probability Pr(

Discrete-time Correction for the BCP of Cumulative Sums
Let X 1 , X 2 , . . . be i.i.d. N(0, 1) r.v's and set Y n = X 1 +X 2 +. . .+X n . Consider the sequence of cumulative sums {Y n } and define the stopping time τ Y,a,b = inf{n ≥ 1 : Y n ≥ a + bn} for a > 0 and b ∈ R. Consider the problem of evaluating Exact evaluation of (4.3) is difficult even if N is not very large but it was accurately approximated by D.Siegmund see e.g. Siegmund (1986, p.19). Let W (t) be the standard Brownian Motion process on [0, ∞). For a > 0 and b ∈ R, define τ W,a,b = inf{t : W (t) ≥ a + bt} so that (4.4) In Siegmund (1986), (4.4) was used to approximate (4.3) after translating the barrier a + bt by a suitable scalar ρ ≥ 0. Specifically, the following approximation has been constructed: where the constant ρ approximates the expected excess of the process {Y n } over the barrier a + bt. From Siegmund (1985, p. 225)

Discretised Brownian Motion
and consider the problem of approximating As L → ∞, the piecewise linear continuous-time process W (t), t ∈ [0, 1], defined by: converges to W (t) on [0, 1] as so we can refer to W (t n ) as discretised Brownian motion. We make the following connection between √ 2W (t n ) and the random walk Y n : Then by using (4.5), we approximate the expected excess over the boundary for the process √ 2W (t n ) by We have deliberately rounded the value √ 2ρ 0.8239... to 0.82 as for small h and small L it provides marginally better approximation (4.9).

Corrected Version of (3.2)
Set h L = h + ω L . To correct (3.2) for discrete time we substitute the barrier h with h L . From this and the relation F(

A Generic Approximation Involving Corrected Shepp's Formula
Approximation 2 For integral T ≥ 1, the discrete-time correction for the BCP (2.5) is where F L (T , h, h L ) is given in (4.8).
Whilst Approximation 2 is very accurate (see the next subsection), computation of P(T , h, h L ) requires numerical evaluation of a T + 1 dimensional integral which is impractical for large T . To overcome this, in Section 5.2 we develop approximations that can be easily used for any T > 0 (which is not necessarily integer).

Particular Cases: T = 1 and T = 2
For T = 1, evaluation of (4.8) yields (4.10) In our previous work (Noonan and Zhigljavsky 2019) we have derived approximationŝ Noonan and Zhigljavsky (2019) are also discrete-time corrections of the continuous-time probabilities P (T , h) but they are based almost exclusively on the fact that the process S(t) is conditionally Markov on the interval t ∈ [0, 1]; hence the technique of Noonan and Zhigljavsky (2019) cannot be extended for intervals t ∈ [0, T ] with T > 1. The approximation P L (1, h) of Noonan and Zhigljavsky (2019) of (4.10). It appears thatP L (1, h) is more complicated and less accurate approximation than P(1, h, h L ).
For T = 2, (4.8) can be expressed (after some manipulations) as follows: Only a one-dimensional integral has to be numerically evaluated for computing F(2, h, h L ).

Simulation Study
In this section, we assess the quality of the approximations (4.10) and (4.11) as well as the sensitivity of the BCP P L (T , h) to the value of L. In Figs. 1 and 2 Fig. 1 and value of L in Fig. 2; the y-axis denotes the probabilities of reaching the barrier. The graphs, therefore, show the empirical probabilities of reaching the barrier h (for the dashed line) and values of considered approximations for these probabilities. From these graphs we can conclude that Approximation 2 is very accurate, at least for T = 1, 2. We can also conclude that the BCP P L (T , h) is very sensitive to the value of L. From Fig. 2 we can observe a counter-intuitive fact that even for very high value L = 1000, the BCP P L (T , h) is not even close to P ∞ (T , h) = P(T , h) from (2.8). This may be explained by the fact that for any fixed T and h, the inaccuracy |P L (T , h) − P(T , h)| decreases with the rate const/ √ L as L → ∞.
Approximations 1 and 3 look similar but computing Approximation 1 is very hard and Approximation 3 is very easy (only a one-dimensional integral should be numerically computed).

Continuous Time: Approximations for F (T , h )
Let m be a positive integer, and q(x → z) be the transition density q We then replace formula (5.1) with where p(x) is an eigenfunction of the integral operator with kernel (3.9) corresponding to the maximum eigenvalue λ m (h):  Reed and Simon (1979).

F(T , h) F(m, h) · [λ m (h)] T −m . (5.4)
The approximation (5.4) can be used for any T > 0 which is not necessarily an integer. The most important particular cases of (5.4) are with m = 1 and m = 2. In these two cases, the kernel q (m−1,m) h (x → z) and hence the approximation (5.4) will be corrected for discrete time in the next section.

Correcting the Transition Kernels for Discrete Time
As explained in Section 4, to make a discrete-time correction in the Shepp's formula (3.2) we need to replace the barrier h with h L = h + ω L in all places except for the upper bound for the initial value S(0). Therefore, using the notation of Section 3.2, the joint probability density function for the values S(0), S(1), . . . , S(m) under the condition S(t) < h for all t ∈ [0, m] corrected for discrete time is: This gives us the discrete-time corrected transition density from s 0 = x to s m conditionally S(t) < h, ∀t ∈ [0, m]: which is exactly (3.7) with h L is substituted for h. In a particular case m = 1, the corrected transition density is Let us now make the discrete-time correction of the transition density q From (5.5) and (5.7), the transition density from x = s 1 to z = s 2 under the condition S(t) < h for all t ∈ [0, 2] corrected for discrete time (the corrected form of (3.11)) is given by Unlike the transition density (5.6) (and (5.7) in the particular case m = 1), which only depends on h L and not on h, the transition density q (1,2) h,L (x → z) depends on both h and h L and hence the notation. The dependence on h has appeared from integration over the s 0 ∈ (−∞, h).

Approximations for the BCP P L (T , h )
With discrete-time corrected transition densities q , we obtain the corrected versions of the approximations (5.4).
Similarly to λ m (h) from (5.3), the maximum eigenvalues λ L,1 (h) and λ L,2 (h) of the operators with kernels K(x, z) = q (0,1) h L (x → z) and K(x, z) = q (1,2) h,L (x → z) are simple and positive; the corresponding eigenfunctions p(x) can be chosen as probability densities. Both approximations can be used for any T > 0.
In numerical examples below we approximate the eigenvalues λ L,k (h) (k = 1, 2) using the methodology described in Mohamed and Delves (1985), p.154. This methodology is based on the Gauss-Legendre discretization of the interval [−c, h], with some large c > 0, into an N -point set x 1 , . . . , x N (the x i 's are the roots of the N -th Legendre polynomial on [−c, h]), and the use of the Gauss-Legendre weights w i associated with points x i ; λ L,k (h) and p(x) are then approximated by the largest eigenvalue and associated eigenvector of the matrix D 1/2 AD 1/2 , where D = diag(w i ) and A i,j = K(x i , x j ) with the respective kernel K(x, z). If N is large enough then the resulting approximation to λ L,k (h) is arbitrarily accurate. With modern software, computing Approximations 4 and 5 (as well as Approximation 3) with high accuracy takes only milliseconds on a regular laptop.
As discussed in the next section, Approximation 5 is more accurate than Approximation 4, especially for small h; the accuracies of Approximations 3 and 5 are very similar. Note also that a version of Approximation 4 has been developed in our previous work (Noonan 6 Simulation Study

Accuracy of Approximations for the BCP P L (T , h )
In this section we study the quality of Approximations 4 and 5 for the BCP P L (T , h) defined in (2.5). Approximation 3 is visually indistinguishable from Approximation 5 and is therefore not plotted (see Table 1). Without loss of generality, ε j in (1.1) are normal r.v.'s with mean 0 and variance 1. The style of Fig. 3 is exactly the same as of Fig. 1 and is described in the beginning of Section 4.7. In Fig. 3, the dashed green line corresponds to Approximation 4 and the solid red line corresponds to Approximation 5. From Fig. 3 we see that the performance of Approximations 4 and 5 is very strong even for small L. For small h, Approximation 5 is more precise than Approximation 4 in view of its better accommodation to the non-Markovian nature of the process S(t).
In Table 1, we display the values of λ L,1 (h), λ L,2 (h) and μ L (h) with L = 20 for a number of different h. From this table, we see only a small difference between λ L,2 (h) and μ L (h); this difference is too small to visually differentiate between Approximations 3 and 5 in Fig. 3.
In Tables 2, 3 and 4 we numerically compare the performance of Approximations 1 and 3 for approximating P L (T , h) across different values of L and h. Since Approximation 1 relies on Monte-Carlo methods, we present the average over 100 evaluations and denote this byx. We have also provided values for the standard deviation and maximum and minimum of the 100 runs to illustrate the randomised nature of this approximation. These are denoted by s, Max(x i ) and Min(x i ) respectively. The values of P L (T , h) presented in the tables below are the empirical probabilities of reaching the barrier h obtained by 10 6 simulations. We have not included Approximation 5 in these tables as results are identical to Approximation 3 up to four decimal places. From Tables 2, 3 and 4 we see that with this choice of T = 100, the errors of approximating F L (2, h) and F L (1, h) via the 'GenzBretz' algorithm can accumulate and lead to a fairly significant variation of Approximation 1. This demonstrates the need to average the outcomes of Approximation 1 over a significant number of runs, should one desire an accurate approximation. This may require rather high computational cost and run time, especially if L is large. On the other hand, evaluation of Approximation 3 is practically instantaneous for all L. Even for a very small choice of L = 5, Table 2 shows that Approximation 3 still remains very accurate. As L increases from 5 to 20, Table 3 shows that the accuracy of Approximation 3 increases. The averaged Approximation 1 is also very accurate but a larger L appears to produce a larger range for Max(x i ) and Min(x i ) when h is large; this is seen in Table 4. Note we have not included empirical values of P L (T , h) in Table 4 due to the large computational cost.  Max ( From Fig. 4 and associated numbers we can make the following conclusions: (a) the BCP P L (T , h) for the case where ε i in (1.1) are uniform is closer to the case where ε i are normal, than for the case where ε i have Laplace distribution; (b) as L increases, the probabilities P L (T , h) in the cases of uniform and Laplace distributions of ε i become closer to the BCP for the case of normal ε i and hence the approximations to the BCP become more precise; (c) accuracy of Approximation 5 is excellent for the case of normal ε i and remains very good in the case of uniform ε i ; it is also rather good in the case when ε i have Laplace distribution; (d) Approximation 4 is slightly less accurate than Approximation 5 (and Approximation 3) for the case of normal and uniform ε i (this is in a full agreement with discussions in Sections 5.2.2 and 6.1); however, Approximation 4 is very simple and can still be considered as rather accurate.

Approximation for the BCP in the Case of Moving Weighted Sums
We have also investigated the performance of Approximation 5 (and 3) after introducing particular weights into (1.1). We explored the following two ways of incorporating weights: (i) L random weights w 1 , w 2 , . . . , w L , with w i i.i.d. uniform on [0, 2], are associated with a position in the moving window; this results in the moving weighted sum Simulations results are shown in Fig. 5. In both cases, we have repeated simulations 1,000 times and plotted all the curves representing the BCP as functions of h in grey colour and Approximation 5 for the BCP for the non-weighted case (when all weights w j = 1) as red dashed line. We can see that for both scenarios the Approximation 5 for the BCP in the non-weighted case gives fairly accurate approximation for the weighted BCP. Similar results have been observed for other values of L and T .

Approximating Average Run Length (ARL)
In this section, we provide approximations to the probability distribution of the Note that ARL H (S) = ARL h (X). The diffusion approximation to the time moment τ h (X) is τ h (S(t)) := min{t ≥ 0 : S(t) ≥ h}, which is the time moment when the process S(t) reaches h. The distribution of τ h (S(t)) has the form: (1 − (h))δ 0 (ds) + q(s, h, S(t))ds , s ≥ 0, where δ 0 (ds) is the delta-measure concentrated at 0 and The function q(s, h, S(t))/ (h), considered as a function of s, is a probability density function on (0, ∞) since From this, the diffusion approximation for ARL H (X)/L is ARL h (S(t)) = E(τ h (S(t))) = ∞ 0 s q(s, h, S(t))ds .
( 7.3) The diffusion approximation (7.3) should be corrected for discrete time; otherwise it is poor, especially for small L. As shown in Section 6, Approximations 3 and 5 are very accurate approximations for P L (T , h) and can be used for all T > 0. We shall use Approximation 3 to formulate our approximations but note that the use of Approximation 5 would give very similar results. We define the approximationq(s, h) for the probability density function of τ h (X)/L bŷ In this paper, we define ARL in terms of the number of random variables ξ n,L rather than number of random variables ε j . This means we have to modify the approximation for ARL of Glaz et al. (2012) by subtracting L. The standard deviation approximation in Glaz et al. (2012) is not altered.  The Glaz approximations for ARL h (X) and SD(τ h (X)) are as follows: (F L (1 + j/L, h)) , (7.6) j (F L (1 + j/L, h))+E G (τ h (X))−E G (τ h (X)) 2 1/2 , (7.7) where x = F L (2, h)/F L (1, h).
In Tables 5 and 6 we assess the accuracy of the approximations (7.4) and (7.5) and also Glaz approximations (7.6) and (7.7). In these tables, the values of ARL h (X) and SD(τ h (X)) have been calculated using 100,000 simulations. Since the Glaz approximations rely on Monte Carlo methods, in the tables we have reported value 2s-confidence intervals computed from 150 evaluations.
Tables 5 and 6 show that the approximations developed in this paper perform strongly and are similar, for small or moderate h, to the Glaz approximations. For h ≥ 3, the Glaz approximation produces rather large uncertainty intervals and the uncertainty quickly deteriorates with the increase of h. This is due to the fairly large uncertainty intervals formed by Approximation 1 when approximating P L (T , h) with large h and hence small P L (T , h), as discussed in Section 6.1. The approximations developed in this paper are deterministic and are much simpler in comparison to the Glaz approximations. Moreover, they do not deteriorate for large h.