Practical Aspects of False Alarm Control for Change Point Detection: Beyond Average Run Length

A popular method for detecting changes in the probability distribution of a sequence of observations is CUSUM, which proceeds by sequentially evaluating a log-likelihood ratio test statistic and comparing it to a predefined threshold; a change point is detected as soon as the threshold is exceeded. It is desirable to choose the threshold such that the number of false alarms is kept to a specified level. Traditionally, the number of false alarms is measured by the average run length – the expected stopping time until the first false alarm. However, this is does not in general allow one to control the number of false alarms at every particular time instance. Thus, in this paper two stronger false alarm criteria are considered, for which approximation methods are investigated to facilitate the selection of a threshold.

paper is on the minimax formulation meaning that the change point is regarded as a deterministic parameter (Lorden 1971). A comprehensive treatment of the available methodology related to change point detection is provided in Tartakovsky et al. (2014).
Many minimax change point detection procedures feature a test statistic S t in the form of a random walk in discrete time with possibly dependent increments. A prominent example of such a procedure is the method of cumulative sums (CUSUM) due to Page (1954), in which case S t is the log-likelihood ratio (LLR) of the observations up to time t. The test statistic is computed sequentially as new observations arrive, and compared to a threshold b. A change point is detected as soon as S t > b, which defines a stopping time T . One then seeks to choose a threshold that ensures that the number of false alarms is kept low with respect to some appropriate criterion.
Existing literature on minimax change point detection typically considers the average run length (ARL) -the expected time until the first false alarm is raised -as a performance criterion. Then a threshold is chosen such that the ARL exceeds a desired (large) constant. To this end, because in general closed form expressions for the ARL are not available, the latter is often evaluated based on approximations (Basseville and Nikiforov 1993, Ch. 5.2.2). The ARL criterion is, however, not always informative: in Mei (2008), examples are provided where the ARL is infinite even though the detection delay is finite. More importantly, the ARL does not necessarily ensure that the variance of the stopping time is small (see Lai 1998 andTartakovsky et al. 2014, Section 6.3.5).
Consequently, more stringent false alarm criteria are desirable. Perhaps the best available candidate is a criterion coined maximal local false alarm probability (MLFA) in Tartakovsky et al. (2014), which seems to have first appeared in Lai (1998). It is defined as sup n≥1 P 0 (T ≤ n + N − 1 | T > n − 1) ≤ α, where P 0 indicates that the probability is evaluated under the null hypothesis that no change has occurred, α is the desired level of false alarms, and N ≥ 1 is a design parameter. The MLFA is, however, difficult to evaluate in closed form; in their recent book (Tartakovsky et al. 2014) Tartakovsky et al. note that even an upper bound is lacking for the general non-i.i.d. case. Similar arguments apply to the approach developed in Mei (2008) (see Tartakovsky 2008). The difficulty arises from the fact that the distribution of the stopping time is hard to evaluate in closed form, even if the distribution of the test statistic is known.
In view of the above, one wishes for further understanding of the distribution of the stopping time as well as simple but effective methods for selecting the threshold such that the probability of raising a false alarm is kept low in a stronger sense than allowed by the ARL criterion. In the current paper we consider two simpler false alarm criteria, which are derived from the MLFA. We believe that considering these simplified criteria is worthwhile from a practical point of view because approximations are available and thus the selection of the threshold is facilitated.
We now describe our contributions in more detail. As a simplification of the MLFA, Lai (1998) as a performance criterion. It has, however, since been shown that the probability in (1) is approximately exponential for small α, if observations are independent or weakly dependent (Pollak and Tartakovsky 2009). Therefore, by the memoryless property of the exponential distribution, we have P 0 (n ≤ T < n + N) ≈ P 0 (T ≤ N). In the first part of the paper we therefore focus on the simple criterion We check that CUSUM (with or without windows) is asymptotically optimal under this modified false alarm criterion, and investigate methods for selecting the threshold such that it is satisfied. To do this exactly, one would need closed form expressions for the distribution of the stopping time. Since such expressions are not known in general, we first show how the distribution of the window-limited stopping time can be described in terms of recursive integral equations. For detection procedures without windows, integral equations have been derived based on renewal theory (Page 1954;Pollak and Tartakovsky 2009;Siegmund 1985); here we follow a different approach, using results on the maximum of autoregressive processes (Withers and Nadarajah 2014). We remark that the obtained recursions are not restricted to CUSUM but hold for a broad class of testing procedures including exponentially weighted moving average schemes (EWMA, see Roberts 1959). However, while we thus in principle know the exact distribution of the window-limited stopping time, in general these expressions cannot be solved for the threshold other than numerically, and the latter is only feasible for the left tail of the distribution (see Section 3.2.1 for discussion). We then provide non-asymptotic bounds for the distribution of the CUSUM stopping time when windows are used, as well as for testing without windows. We can then apply available approximation methods to find a threshold (function) that ensures the proposed false alarm criterion is satisfied. We compare the use of central limit theorem (CLT), large deviations (LD), as well as extreme value (EV) approximations. The latter two methods allow one to obtain a threshold function rather than a constant threshold; this increased flexibility can yield an improved delay performance (see Section 4 as well as the example in Ellens et al. 2013).
In the second part of the paper, we focus on the criterion Note that this implies that the false alarm probability is limited at any given time n. This criterion is the most conservative case of the MLFA, where we chose N = 1 because the MLFA increases monotonically as a function of N . We motivate its application and show how the aforementioned approximations can be used to select the threshold (for details see Section 4). The remainder of this paper is organized as follows. In Section 2 we define the change point detection problem and the CUSUM method. In Section 3 we discuss the first criterion, (2). In Section 3.1 we check that asymptotic optimality of CUSUM still holds, before we turn to analyze the distribution of the stopping time with applications for threshold selection in Sections 3.2 (when testing windows of fixed size) and 3.3 (when testing the full history of observations). The alternative of limiting the false alarm probability at any given time via (3) is discussed in Section 4. We conclude in Section 5.

Problem and Procedures
We are concerned with testing a sequence of observations (V t ) ∈ R d v against a change in the underlying probability distribution. At every point in discrete time a new observation arrives and is to be included in the test sample. That is, at time t ∈ N we want to test H 0 , the null hypothesis of no change before time t, against H 1 := t k=1 H 1 (k), the alternative that there was a change point at some time k ∈ {1, . . . , t}. Under H 0 , we assume that all observations V t 1 := (V 1 , . . . , V t ) have a common history-dependent density p, so that In practice, it can be necessary to test data in windows of fixed size n, rather than keeping the whole history of observations. In this case we restrict k to the set {t − n + 1, . . . , t}. Note that testing the full history of observations can equivalently be regarded as testing with expanding windows: At time t the size of the window to be tested is t. We take this viewpoint in the remainder of the paper as it allows us to treat both cases in a more unified manner.
A testing procedure is optimal if it minimizes the detection delay, subject to a condition on the number of false alarms. The CUSUM method due to Page (1954) often turns out to be asymptotically optimal; for further details see Section 3.1. This motivates that we focus on CUSUM in the current paper. The method is essentially a sequential application of a LLR test. A sequence of LLRs can be regarded as a sequence of partial sums with random increments (V t ) given by (V t We can identify the LLRs corresponding to H 1 (1), . . . , H 1 (n) with a Markov process Y m := S 1:n (m), . . . , S n:n (m) , where S k:n (m) := n+m−1 i=k+m−1 (V i ), so that m ≥ 1 corresponds to the index of the first observation within the window that is to be tested. If no windows are to be considered (i.e., in the case of expanding windows), then m = 1 is fixed, and the size of the window n increases with time. In this case, we write S k:n := S k:n (1) to simplify the notation. To consider windows of fixed size n, let m increase with time instead.
The standard CUSUM testing procedure with expanding windows features the stopping time If window-limited testing (with windows of fixed size) is desired, we define the stopping time to be In Sections 3 and 4 we consider the performance criteria as briefly introduced in the introduction, within which we have to distinguish between testing with windows of fixed vs. expanding size.

False Alarm Before Time N
In this section we focus on the criterion (2), where T = ω or T = τ , depending on whether or not window sizes are fixed, and for a fixed quantity N . Note that (2) is less conservative than the MLFA, but indeed stronger than the traditional ARL criterion. The latter requires that E 0 T ≥ κ, for some given (large) constant κ. Given (2), for 1 ≤ h ≤ N we have . Note that N plays the role of the length of the time interval considered in (1). In practice, if the maximum testing period is known to be bounded, then the length of the testing period seems a natural choice for N . Otherwise one could specify κ and α as desired (that is, according to one's practical requirements on the ARL and the false alarm probability), and choose N = κ/(1 − α).

Asymptotic Optimality of CUSUM
For independent observations (i.e., p and q are independent of the history) it is known that if the CUSUM procedure with stopping time τ satisfies E 0 τ = κ, then it is optimal with respect to certain delay criteria among all procedures that satisfy E 0 τ ≥ κ. However, in practice it is usually not possible to achieve E 0 τ = κ because E 0 τ is not known in closed form. Therefore, asymptotic optimality results are of interest which establish optimality of CUSUM with τ satisfying E 0 τ ≥ κ asymptotically for large κ. (For details see, for example, Tartakovsky et al. (2014), Ch. 8.) Similarly, we can prove asymptotic optimality of T ∈ {τ, ω} under (2). This amounts to carefully checking that the proofs given in Lai (1998) for the case of (1) still go through; we detail the steps below.
Let T be a stopping time with respect to the natural filtration F associated with the observations, that is, F t := σ (V 1 , . . . , V t ). As before we write P i and E i , i ∈ {0, 1} for the probability measure and expectation under H i . Furthermore, we define P k 1 and E k 1 to be the probability measure and expectation under H 1 (k), the alternative that there is a change point at time k.
The following delay criteria have been considered in the literature: the worst-case expected delay due to Lorden (1971) Pollak (1985). We now verify that CUSUM is asymptotically optimal among all procedures satisfying (2) with respect to both delay criteria. To this end, we check that the usual asymptotic lower bound on the detection delay still holds (see Prop. 1), and that this lower bound is attained for small α in combination with large N (Prop. 2 below).

Proposition 1 Suppose that for some finite positive constant I we have
The proof is a small modification of Lai's proof (Lai 1998, Thm. 1). Details are presented in the PhD thesis of the first author (Kuhn 2017, Ch. 10).
It then follows with the same arguments as used to prove Thm. 4.(ii) in Lai (1998) that the lower bound is attained by ω (see Prop. 2). Since ω ≥ τ almost surely, this implies that the bound is also attained by τ , and thus, both are asymptotically optimal.
Proposition 2 Lai (1998) Assume that the threshold b = b N and the window size n = n N are chosen such that P 0 (ω ≤ N) ≤ α, where α = α N → 0 as N → ∞. Further assume that for some positive constant I and m ∈ N we have lim R→∞ sup k∈{1,...,m} Then we have: For example, if observations are i.i.d., the conditions (6) and (8) are satisfied with I the Kullback-Leibler information number, I = E 1 1 (V 1 ), assuming the later is finite. In summary, we have the following corollary. (6), (7), and (8) are satisfied with I > 0, then T ∈ {ω, τ } is asymptotically optimal as N → ∞ in the sense that it minimizes the detection delay among all stopping times T satisfying P 0 (T ≤ N) ≤ α.
In order to select a threshold that ensures (2), we need to be able to evaluate the distribution of the stopping time. We focus on ω in Section 3.2 and turn to τ in Section 3.3. In both sections we first provide results on the distribution of the stopping time, and then show how the threshold function can be chosen based on approximations to P 0 (T ≤ N).

Window-Limited Testing
First, we show an exact expression for the distribution of the stopping time ω in terms of iterated integrals. Since these are hard to evaluate in practice, we then propose an EV approximation that can be used to select the threshold in order to ensure (2).

Exact Expression in Terms of Iterated Integrals
We show that the test statistic of a large class of change point detection procedures (including the window-limited CUSUM procedure) can be expressed in form of a first order vector autoregressive process (VAR(1)). We can then obtain the distribution of the corresponding stopping time using results on the distribution of the maximum of autoregressive processes (Withers and Nadarajah 2014).
We are interested in finding an expression for where H m is the n-vector with j -th element Note that the process Y m follows the recursion where 1 denotes an n-vector of ones. To obtain the window-limited CUSUM procedure, ϑ is set equal to one, and is defined as (y) = Cy, where C = (c i,j ) i,j =1,...,n with c i,i+1 = 1 for i = 1, . . . , n − 1 and c i,j = 0 otherwise. Interestingly, other popular change point detection methods can also be expressed in this way: for example, to obtain an exponentially weighted moving average (EWMA, see Roberts 1959) procedure based on LLRs, define (y) = (1−ϑ)Cy for ϑ ∈ (0, 1). Thus, while in this paper we are focussed on the CUSUM procedure, the result in Prop. 3 below would allow one to compute the stopping time for the window-limited case more generally.
Note that (12) is a VAR(1) process, albeit with a degenerate noise process. A paper that gives exact expressions (in terms of iterated Fredholm integrals) for the distribution of H m for a VAR(1) process is Withers and Nadarajah (2014). We adapt their results to our setting.
Let, for fixed x ∈ R n , for m ≥ 0. Denote by x j the j -th entry of the vector x. Let min{x, y} be the componentwise minimum of x and y. Let F be the distribution function of (V i ).
To evaluate this in practice, at least for small m one can use approximations based on the eigenvalues of the Fredholm kernel K (see Withers and Nadarajah 2014). In order to facilitate solving for the threshold function b(·), in the following we provide closed-form approximate expressions.

Approximation for Threshold Selection
When testing is window-limited, we can apply EV theory to approximate the false alarm probability (2). This provides an easily applicable method to select b, which we outline in this section for the example of independent Gaussian observations. We remark that EV results have been applied very recently in Jirak (2015) in the context of non-parametric change point detection. Define . By application of a theorem in Amram (1985), we obtain the following corollary.
Proof It has been shown in Amram (1985) that as m → ∞ the limiting distribution of the process of component-wise maxima of any standard Gaussian process coincides with that of n independent Gumbel variables, provided that (i) |γ i,j (0)| < r fori, j = 1, . . . , n, i = j as well as (ii) ∞ h=1 |γ i,j (h)| r < ∞ for alli, j = 1, . . . , n holds. We apply this theorem to the n-dimensional process M m with i-th component H m,i − (n − i + 1)μ(σ √ n − i + 1) −1 . Note that S k:n (m) − (n − k + 1)μ(σ √ n − k + 1) −1 has a standard normal distribution, so that M m is indeed the process of component-wise maxima of a standard Gaussian process.
To verify the aforementioned conditions, we note that, for l, k ∈ {1, . . . , n} with l > k, which is smaller than 1 by assumption. Finally, for k, l ∈ {1, . . . , n}, h ∈ N, we have which is zero for h large enough.
Recall that we wish to choose a threshold function that yields where the change point k is written as nβ + 1, β ∈ B n := {0/n, 1/n, . . . , (n − 1)/n} (this notation will turn out to be useful particularly in Section 3.3.2). The parameter δ is a design parameter to be chosen based on simulation. Because c N → ∞, adding a constant δ (constant with respect to N ) is negligible for large N . Numerical experiments suggest that, for small n the choice δ = 0 seems to work well, however, for larger n, a negative δ should be chosen, possibly a function of the other parameters. We suggest a choice for δ in Section 3.3.2, for the case of expanding windows.

Testing With Expanding Windows
We first derive non-asymptotic bounds on the distribution of τ , which can then be used to apply the CLT, LD, and EV approximations to select the threshold. The latter two approaches yield a threshold function rather than a fixed threshold, and the achieved false alarm performance is overall closer to the desired level.

Non-asymptotic Bounds
The complication in evaluating P 0 (τ ≤ N) arises from the fact that this involves a double maximum of a sum with random increments: P 0 (τ ≤ N ) = P 0 max 1≤m≤N max 1≤k≤m S k:m > b . In this section we provide bounds that circumvent this problem. The upper bounds we provide below in (17) and (18) turn out to be very tight, particularly if the size of the change is large (see Fig. 1). We use these in Section 3.3.2. We remark that similar bounds could be obtained for P 0 (ω ≤ N); the adaptation to this case is straightforward and therefore omitted. where τ m =: inf{n ≥ m : S m:n > b}. Therefore, the CUSUM stopping time can be written as τ = min m≥1 τ m . Hence, we have which yields the bounds We furthermore note that the right-hand side is smaller than Approximations to EqE 18 are available, based on which we can devise simple yet effective procedures, see Section 3.3.2. As Fig. 1 shows, the upper bounds turn out to be very tight. The lower bounds are tighter when the size of the change is smaller. To see why this should be true, consider the following heuristic argument. Since the mean μ of the LLR increments is negative, let us suppose that all increments were negative. In this case S i:n < b would imply that S i−1:n < b, and hence τ i > N would imply that τ i−1 > N. Thus, when μ is small compared to σ 2 , then P 0 (τ ≤ N) ≈ P 0 (τ N ≤ N) = P 0 (V N > b), where V N = S N:N − S N−1:N . One would thus  (17) and (18), with N = 50, σ = 1 and threshold b = 0.5 expect that an alarm is typically raised at the end of the current window, as is confirmed in numerical experiments (see Fig. 4).
We now discuss how the bound (18) can be used for threshold selection.

Approximations for Threshold Selection
From the upper bound (18) we obtain that a sufficient condition for P 0 (τ ≤ N) ≤ α is Below we discuss different limiting regimes that yield approximations to (19). In this section we assume that observations are independent to facilitate the comparison of the different methods. We remark, however, that the LD approximations suggested below can be extended to the case of correlated observations (as was done in Ellens et al. (2013) for observations following a Gaussian autoregressive process). We restrict ourselves to a Gaussian example only for the EV approximation; the LD and the CLT approximation apply more generally.

EV Approximation
As opposed to the approach in Section 3.2.2, we now consider the univariate process of partial sums. That is, in this case we are interested in the maximum of S k:N over k ∈ {1, . . . , N}. Therefore, to achieve (19), in the case of i.i.d. Gaussian random variables the threshold function can be chosen as where β ∈ B N . For choosing δ we recall our remark from the previous section that one may expect that -at least for large changes -a change tends to be detected at the end of the window, where a single increment is considered. Thus, it seems intuitive to choose δ such that b (N − 1)/N equals the 1 − 1 − (1 − α) 1/N -quantile of the distribution of the LLR increments. It is confirmed in numerical experiments that this choice indeed yields good performance of the resulting testing procedure; see the independent data example provided at the end of this section as well as the example provided in Kuhn (2017, Ch. 10) featuring a state space model.

LD Approximation
Since we wish the false alarm probability α to be small, we may regard this as a rare event scenario. Change point detection procedures based on LD approximations have been considered in Bucklew (1985), Ellens et al. (2013) for i.i.d. and VARMA models, yielding a threshold function b(·) that depends on the assumed position of the change point under the alternative hypothesis. We express the change point k via N , that is, (for details see Ellens et al. 2013, Section 2). LD theory suggests that for fixed β the false alarm probability can be approximated by where I denotes a function specified below. We remark that sharp LD asymptotics (Bahadur and Rao 1960), which include a polynomial term to achieve asymptotic equivalence, are to be preferred whenever they are available; the way of proceeding remains the same. Using the logarithmic LD approximation (21) is, however, not uncommon; see e.g. (Bucklew 1985). We focus on logarithmic asymptotics in order not to overload the paper. As we observed earlier, there is no need to pick a constant b, as in the case of the EV approximation, we can pick a function b(β) instead, such that (21) holds with b replaced by b(β) for all β ∈ B N . Recall that we wish the false alarm probability to be kept at a small level α. Therefore, we propose to pick the threshold function b(·) such that it satisfies This choice entails that raising a false alarm is essentially equally likely irrespective of the supposed location of the change point within the window, and it is therefore optimal in terms of type II error performance; see Bucklew (1985), Ch. VI.E. Now let us make the above discussion more rigorous. The limiting logarithmic momentgenerating function (λ) associated with the distribution of the LLR is defined as log E 0 e λS Nβ+1:N ; (23) we assume for now that this function exists and is finite for every λ ∈ R. Define I as the Fenchel-Legendre transform Provided that (λ) exists for all λ ∈ R, noting that we can rescale as written out in EqE 24, the Gärtner-Ellis theorem (Bucklew 1985; Dembo and Zeitouni 1998) yields In accordance with the idea expressed in (22), we choose the threshold function b(·) such that it satisfies for some positive γ = −N −1 log 1 − (1 − α) 1/N , across all β ∈ B N . Then asymptotically for large N we have that (19) is satisfied.

CLT Approximation
As a third alternative, we consider the approximation of the false alarm probability based on CLT arguments. Applying a CLT approximation has been considered in Kuhn et al. (2015), Pawlak and Steland (2013). Motivated by Donsker's theorem, we can approximate the probability in (19) by Siegmund (1985), Eq. (3.15), where B t is a standard Brownian motion (Wiener process). Then a fixed threshold b (rather than a function as before) can be obtained numerically from setting (26) equal to 1 − (1 − α) 1/N .

Independent Data Example
For illustration we provide an example with independent data, see Figs. 2-3. For a more interesting example featuring a state space model, we refer to Kuhn (2017, Ch. 10). Note that when testing an independent sequence of N (0, ν) obser-vations against a shift in mean of size θ , then the LLR S k:n (m) corresponding to testing against H 1 (k) is given by Thus, under H 0 the LLR increments are normally distributed with mean μ = −(θ/ν) 2 /2 and variance σ 2 = (θ/ν) 2 . Application of the EV and CLT approximations is then straightforward. To apply the LD approximation, we need to compute the limiting log-moment-generating function (λ) in more explicit terms (this way we also check that it indeed exists and is finite for all λ). Because the sequence of observations is independent, with k = Nβ + 1, we can write the associated moment-generating function as It is interesting to compare this to the EV threshold function (20): we note that in both cases (up to scaling by N ) the threshold function is of the form where ζ(·) is some function of the parameters. This form is intuitively appealing: it makes sense to select a threshold that exceeds the expected value of S Nβ+1:N by some function of the standard deviation.
Using the three different thresholds, we can evaluate P 0 (τ ≤ N) by Monte Carlo simulation. We estimate the false alarm rate as the relative frequency with which a false alarm is raised. Figure 2 shows that the performance in terms of false alarms is conservative, as was to be expected because we approximate the upper bound (19) rather than P 0 (τ ≤ N) itself. Nevertheless, the false alarm rates are close to the desired level α when the EV approximation is applied, while the LD approximation is more conservative. The CLT approximation does not seem to adjust enough for different α. This may be related to the fact that we have to solve for b numerically in this case while in absolute terms 1 − (1 − α) 1/N in (19) does not change much with α. Moreover, it has also been found in Ch. III of Siegmund (1985) that the CLT approximation typically underestimates the probability of interest. An explanation for this is that in (26) it is assumed that the maximum is taken over a continuous (and thus larger) interval. Figure 3 displays the obtained delay values for various values of α. Here, the delay is evaluated as the sample average of the difference between the first detection time and the actual change point. Note the trade-off between the false alarm probability and the resulting delay for the LD and CLT approximation. Interestingly, the EV approximation yields a higher delay even though the false alarm probability is higher, suggesting that the shape of the threshold function does not match the shape of the LLRs S Nβ+1:N . (Note that this is not generally the case: for the state space model example discussed in Kuhn et al. (2015) the EV approximation achieves the better delay performance.) To further investigate this issue, we plot a graph of the threshold function as well as the LLRs, both as a function of β ∈ B N , see Fig. 4. Indeed, the distance between the EV threshold and the LLR is not uniform across β. The shape of the LD threshold, however, matches the LLRs very well. The figure also suggests that particularly when using the EV threshold, false alarms usually occur at the end of the window. One may thus wonder whether one could simply choose a constant threshold equal to the 1 − 1 − (1 − α) 1/N -quantile of F , the distribution of the LLR increments. This choice, however, does not work well, the obtained false alarm rate is usually considerably higher than the desired level (in this example it is close to 1). The figure shows clearly why a threshold function is to be preferred with respect to a constant threshold: the CLT threshold is far away from the actual LLRs, except when β is close to 1. Choosing a function is favorable particularly in view of the detection delay, provided that it closely mimics the behavior of the LLRs.  Figure 5 shows a comparison of the delay for various choices of the shift size θ . As expected, the delay performance improves as the shift size increases. We remark that, reassuringly, for different choices of θ the resulting false alarm performance is highly similar to Fig. 2.

More Control over False Alarms
The criterion considered in the previous section may not always be restrictive enough, as is illustrated in Fig. 6. This figure shows the alarm rate obtained when testing a sequence of independent Gaussian observations with expanding windows. We compute the alarm rate as the relative frequency of the alarms raised -thus, the alarm rate before the change point corresponds to the false alarm rate we discussed in Section 3.3.2, whereas the alarm rate after the change point is to be interpreted as the rate of detection. The position of the change point is indicated by the vertical line. The threshold is chosen such that (2) is achieved. It can be seen that at the beginning of the period, where only a small number of data points are tested, the false alarm rate is too high but because it then decreases below the desired level, the criterion is still satisfied. This also confirms once more that one should choose b to be a function, rather than a constant threshold as is often assumed. With a constant threshold, as the example shows, P 0 (τ ≤ N) ≤ α can only be achieved if P 0 (τ = 1) ≤ α. This is true more generally; recall that for independent or weakly dependent observations, it has been shown, for example, in Pollak and Tartakovsky (2009) that the distribution of τ is approximately exponential when the threshold b is large but constant.
In view of the above, we propose to choose a threshold function that limits the false alarm rate for the current window to be α (which can be related to α from before as outlined below). That is, we require to hold, uniformly across all n, where T ∈ {τ, ω}. If T = ω, the above can be simplified because P 0 (ω = 1) = P 0 (ω = n | ω > n − 1) for any n. As mentioned in the introduction, in the definition of the MLFA we can choose N = 1 to recover (28). In view of Fig. 6 this seems a good choice as one would like to control the false alarm rate at every time instance; it is, however, the least conservative because the MLFA increases monotonically in N .
To relate (28)-(2), note that P 0 (T ≤ N) = N n=1 P 0 (T = n). Using that it is possible to express each P 0 (T = n) as P 0 (T = n | T > n − 1) n−1 t=1 1 − P 0 (T = t | T > t − 1) . Thus, in principle one can allow for α to depend on the current window size n as well, and choose a sequence of α n such that P 0 (τ ≤ N) ≤ α is achieved. For example, we can set Therefore, the condition (28) indeed allows a better control over the false alarm performance as desired.
Approximations for (28) are readily available. For example, we can apply EV, LD, and CLT approximations as in Section 3.3.2 with N replaced by n, and 1 − (1 − α) 1/N replaced by α. In order to ensure (28), we now need the threshold function to depend on the current window size n. Thus, if the window size is fixed, the threshold function is the same for every window. If windows are expanding, we obtain an adaptive threshold function. In the latter case, it is all the more important that evaluation of the threshold function is simple so that this can be carried out on-line as a new observation arrives.

Independent Data Example
For illustration we consider again the independent data example from Section 3.2.2, yet now false alarm rates are estimated according to (28). See Fig. 7 for an example with stopping time ω, which displays the probability P 0 (ω = 1) that is achieved on average, for various choices of α. (A comparison of different shift sizes is not depicted because the false alarm behavior remains very stable, as desired.) In comparison to the example in Section 3.3.2 we note that the LD and EV approximations are closer but slightly above the desired false alarm rate; whereas in Section 3.3.2 they were rather conservative. This difference may be explained by the fact that in Section 3.3.2 we approximated an upper bound to P 0 (τ ≤ N) rather than the probability itself. When windows are expanding (and the stopping time is τ and thresholds are adaptive), a very similar false alarm performance is obtained.
In Fig. 8 we check that when the threshold is obtained as suggested in the current section, with the sequence α n defined by EqE 29, we indeed obtain a false alarm performance similar to Fig. 2, where the threshold was chosen with the aim of achieving P 0 (τ ≤ N) ≤ α. The Fig. 7 Comparison of probabilities P 0 (ω = 1) obtained with adaptive thresholds chosen such that EqE 28 is achieved with n = 50, θ = 1, ν = 1, for various α (indicated by the dotted line)  (28), where α n is chosen according to EqsE 29 such that EqE 2 holds, with N = 150, θ = 1, ν = 1, for various α (indicated by the dotted line) performance in terms of delay is then as in Fig. 3, as a consequence of the similar false alarm performance.
In summary, Figs. 7 and 8 together confirm that (28) is a stronger false alarm criterion that allows better control over the false alarms at any given time point.
We provide more involved examples in Kuhn (2017), Kuhn et al. (2015), which show that for Gaussian observations the procedures we proposed can also be applied without modification when observations are not independent themselves but can be transformed into a sequence of independent random variables by applying a whitening filter.

Conclusion
In this paper we considered two false alarm criteria derived from the MLFA. Both criteria are stronger than the traditional ARL, however, the first is less stringent than the MLFA, whereas the second is a special case.
We then provided methods for the selection of the threshold such that the false alarm criteria under consideration hold at least approximately. With respect to numerical methods for threshold selection these are easily applicable, and moreover allow the selection of a threshold function rather than a constant threshold. We investigated the performance of the resulting detection procedures in numerical examples. In terms of false alarm performance, the EV approximation was usually closest to the desired level. However, the LD threshold function typically mimicked the shape of the LLRs more closely, and thus yielded the best trade-off between false alarm and delay performance. We also saw that a threshold function generally seems to be preferable in comparison to a constant threshold (and accordingly the EV and the LD threshold functions outperformed the constant CLT threshold).
A topic for future research is the improvement of the EV approximation: we saw that a shift of the resulting threshold function yields a good false alarm performance; however, it should be determined what the optimal size of that shift is, depending on the available parameters. Furthermore, the LD approximation requires the evaluation of the limiting logarithmic moment generating function of the LLR. In this paper, we only provided these computations for the case of Gaussian observations. Similarly, for the EV approximation we assumed Gaussian observations. In future research other distributions should also be considered in more detail.