A Constant-per-Iteration Likelihood Ratio Test for Online Changepoint Detection for Exponential Family Models

Online changepoint detection algorithms that are based on likelihood-ratio tests have been shown to have excellent statistical properties. However, a simple online implementation is computationally infeasible as, at time $T$, it involves considering $O(T)$ possible locations for the change. Recently, the FOCuS algorithm has been introduced for detecting changes in mean in Gaussian data that decreases the per-iteration cost to $O(\log T)$. This is possible by using pruning ideas, which reduce the set of changepoint locations that need to be considered at time $T$ to approximately $\log T$. We show that if one wishes to perform the likelihood ratio test for a different one-parameter exponential family model, then exactly the same pruning rule can be used, and again one need only consider approximately $\log T$ locations at iteration $T$. Furthermore, we show how we can adaptively perform the maximisation step of the algorithm so that we need only maximise the test statistic over a small subset of these possible locations. Empirical results show that the resulting online algorithm, which can detect changes under a wide range of models, has a constant-per-iteration cost on average.


Introduction
Detecting changes in data streams is an important statistical and machine learning challenge that arises in applications as diverse as climate records (Beaulieu and Killick, 2018), financial time-series (Andreou and Ghysels, 2002), monitoring performance of virtual machines (Barrett et al., 2017) and detecting concept drift of inputs to classifiers (Sakamoto et al., 2015).In many contemporary applications there is a need to detect changes online.In such settings we sequentially monitor a data stream over time, seeking to flag that a change has occurred as soon as possible.Often online change algorithms need to run under limited computational resource.For example, Ward et al. (2022) detect gamma ray bursts using the local computing resource onboard small cube satellites, and Varghese et al. (2016) work with sensor networks where computations need to be performed locally by the sensors.Alternatively algorithms may need to be run for ultra high-frequency data (Iwata et al., 2018), or need to be run concurrently across a large number of separate data streams.These settings share a common theme of tight constraints on the computational complexity of viable algorithms.
There have been a number of procedures that have been suggested for online detection of changes, each involving different trade-offs between statistical efficiency and computational cost.For example, Yu et al. (2020) proposed a likelihood-ratio test with excellent statistical properties, but the natural implementation of this method has a computational cost per iteration that increases linearly with time.However, for online applications we need the computational cost to be constant.There exist algorithms with a constant computational cost per iteration, but they need one to only test for changes that are a pre-specified time in the past (e.g.Eichinger and Kirch, 2018;Ross et al., 2011;Ross and Adams, 2012;Chen and Tian, 2010), or specify the distribution of the data after a change (e.g.Page, 1954;Lucas, 1985).If the choices made in implementing these algorithms are inappropriate for the actual change one wishes to detect, this can lead to a substantial loss of power.
Recently Romano et al. (2021a) proposed a new algorithm called Functional Online Cumulative Sum (FOCuS).This algorithm is able to perform the likelihood-ratio test with a computational cost that only increases logarithmically with time.FOCuS was developed for detecting a change in mean in Gaussian data and has been extended to Poisson (Ward et al., 2022) and Binomial (Romano et al., 2023) data.FOCuS has two components: one that does pruning of past changepoint times that need not be considered in the future, and a maximisation step that considers all past changepoint times that have not been pruned.Interestingly, the pruning step for Poisson and Binomial data is identical to that for Gaussian data, and it is only the maximisation step that changes.
In this paper we show that this correspondence extends to other one-parameter exponential family models.Furthermore, we show how to substantially speed up FOCuS.In previous implementations the pruning step has a fixed average cost per iteration, and the computational bottleneck is the maximisation step that, at time T , needs to consider on average O(log T ) possible changepoint locations.We show how previous calculations can be stored so that the maximisation step can consider fewer past changepoint locations.Empirically this leads to a maximisation step whose per iteration computational cost is O(1).To our knowledge this is the first algorithm that exactly performs the likelihood-ratio test for detecting a change with an average constant-per-iteration cost.

Problem Statement
Assume we observe a univariate time series signal x 1 , x 2 , ..., and wish to analyse the data online and detect any change in the distribution of the data as quickly as possible.We will let T denote the current time point.
A natural approach to this problem is to model the data as being independent realisations from some parametric family with density f (x | θ).Let θ 0 be the parameter of the density before any change.If there is a change, denote the time of the change as τ and the parameter after the change as θ 1 .We can then test for a change using the likelihood-ratio test statistic.
There are two scenarios for such a test.First, we can assume the pre-change distribution, and hence θ 0 is known (Eichinger and Kirch, 2018).This simplifying assumption is commonly made when we have substantial training data from the pre-change distribution with which to estimate θ 0 .Alternatively, we can let θ 0 be unknown.We will initially focus on the pre-change distribution known case, and explain how to extend ideas to the pre-change distribution unknown case in Section 4.
The log-likelihood for the data x 1:T = (x 1 , . . ., x T ), which depends on the pre-change parameter, θ 0 , the post-change parameter, θ 1 , and the location of a change, τ , is The log-likelihood ratio test statistic for a change prior to T is thus Naively calculating the log-likelihood ratio statistic involves maximising over a set of T terms at time T .This makes it computationally prohibitive to calculate in an online setting when T is large.There are two simple pre-existing approaches to overcome this, and make the computational cost per iteration constant.First, MOSUM approaches (e.g.Chu et al., 1995;Eichinger and Kirch, 2018) fix a number, K say, of changepoint times to be tested, with these being of the form τ = T − h i for a suitable choice of h 1 , . . ., h K .Alternatively one can use Page's recursion (Page, 1954(Page, , 1955) that calculates the likelihood-ratio test statistic for a pre-specified post-change parameter.Again we can use a grid of K possible post-change parameters.Both these approaches lose statistical power if the choice of either changepoint location (i.e. the h i values for MOSUM) or the post-change parameter are inappropriate for the actual change in the data we are analysing.

FOCuS for Gaussian data
As an alternative to MOSUM or Page's recursion, Romano et al. (2021a) introduce the FOCuS algorithm that can efficiently calculate the log-likelihood ratio statistic for univariate Gaussian data where θ denotes the data mean.
In this setting.it is simple to see that We can then introduce a function which is the log-likelihood ratio statistic if the post-change parameter, θ 1 , is known.Obviously, LR T = max θ1 2Q T (θ 1 ).For Gaussian data with known mean, θ 0 , and variance, σ 2 , we can standardise the data so that the pre-change mean is 0 and the variance is 1.In this case, each term in the sum of the log-likelihood ratio statistic simplifies to θ 1 (x t − θ 1 /2), and (1) This is the point-wise maximum of T − 1 quadratics.We can thus store Q t (θ 1 ) by storing the coefficients of the quadratics.
The idea of FOCuS is to recursively calculate Q T (θ 1 ).Whilst we have written Q T (θ 1 ) as the maximum of T −1 quadratics in θ 1 , each corresponding to a different location of the putative change, in practice there are only ≈ log T quadratics that contribute to Q T (Romano et al., 2021a).This means that, if we can identify this set of quadratics, we can maximise Q T , and hence calculate the test statistic, in O(log T ) operations.Furthermore Romano et al. (2021a) show that we can recursively calculate Q T , and the minimal set of quadratics we need, with a cost that is O(1) per iteration on average.
The FOCuS recursion is easiest described for the case where we want a positive change, i.e. θ 1 > θ 0 .An identical recursion can then be applied for θ 1 < θ 0 and the results combined to get Q T .This approach to calculating Q T uses the recursion of Page (1954), To explain how to efficiently solve this recursion, it is helpful to introduce some notation.For τ i < τ j define At time T − 1 let the quadratics that contribute to Substituting into Page's recursion we obtain The key step now is deciding which changepoint locations in I T −1 ∪ {T − 1} no longer contribute to Q T .To be consistent with ideas we present in Section 3 we will present the FOCuS algorithm in a slightly different way to Romano et al. (2021a).Assume that I T −1 = {τ 1 , . . ., τ n }, with the candidate locations ordered so that τ 1 < τ 2 < . . .< τ n .We can now define the difference between successive quadratics as These differences do not change from time T − 1 to time T .
For the difference between quadratics associated with changes at τ i and τ i+1 , let l i ≥ 0 denote the largest value of θ 1 such C T −1 (θ 1 ) then τi+1 (θ 1 ) to be smaller than l i .In this case we have that C (T ) τi+1 (θ 1 ) does not contribute to Q T (•) and thus can be pruned.
This suggests Algorithm 1.Note that this algorithm is presented differently from that in Romano et al. (2021a), as the way the quadratics are stored is different.Specifically, here we store the difference in the quadratics, rather than use summary statistics.The input is just the difference of the quadratics that contribute to Q T −1 .The main loop of the algorithm just checks whether the root of C (T −1) τj is smaller than that of C (τj ) τj−1 , which is our condition for pruning the quadratic associated with τ j .If not, we stop any further pruning and return the set of quadratic differences plus the quadratic C (T ) T −1 .If it is, then the quadratic associated with τ j is removed and the quadratic difference associated with τ j−1 is updated -by adding on the quadratic difference associated with τ j .We then loop to consider removing the next quadratic (if there is one).
A pictorial description of the algorithm is shown in Figure 1.It is simple to see that this algorithm has an average cost per iteration that is O(1).This is because, at each iteration, the number of steps of the while loop is one more than the number of quadratics that are pruned.As only one quadratic is added at each iteration, and a quadratic can only be removed once, the overall number of steps of the while loop by time T will be less than 2T .

FOCuS for Exponential Family Models
Different parametric families will have different likelihoods, and likelihood ratio statistics.However the idea behind FOCuS can still be applied in these cases provided we are considering a change in a univariate parameter, with different forms for the curves (described in Equation 2) and hence different values for the roots of the curves.Whilst one would guess that the different values of the roots would lead to different pruning of curves when implementing Algorithm 1, Ward et al. (2022) and Romano et al. (2023) noted that the pruning, i.e. the changepoints associated with the functions that contribute to Q T , are the same for a Poisson model or a Binomial model as for the Gaussian model; it is only the shape of the functions that changes.Here we show that this is a general property for many one-parameter exponential family models.
A one-parameter exponential family distribution can be written as for some one-parameter functions α(θ), β(θ), γ(x), δ(x) which are dependent on the specific distribution.Examples of one-parameter exponential family distributions given in Table 1 include Gaussian change in mean, Gaussian change in variance, Poisson, Gamma change in scale, and Binomial distributions, for which α(θ) and β(θ) are increasing functions.γ(x) is the sufficient statistic 0.0 0.5 1.0 1.5 2.0 2.5 −0.5 0.0 0.5 1.0 1.5 2.0 (cyan) that contribute to Q T (θ 1 ) directly, together with the intervals where each is optimal (demarked by grey vertical lines).To prune, we first add the zero line (dotted black), then prune C (T −1) τ4 , as it is no longer optimal for any θ 1 .We then add θ 1 (x T − θ 1 /2) to all quadratics.The bottom-left plot shows the storage of quadratic differences C Algorithm 1 FOCuS update at time T for θ 1 > θ 0 and θ 0 = 0. Algorithm based on storing quadratic differences.
for the model, and is often the identity function.We do not need to consider δ(x) as it cancels out in all likelihood ratios.
There are various simple transformations that can be done to shift data points from one assumed exponential family form to another before applying change detection methods, for example binning Exponentially distributed data into time bins to give rise to Poisson data, approximating Binomal(n, θ) data as Poisson(nθ) for large n and small θ, or utilising the fact that x ∼ N (0, 1) then x 2 ∼ Gamma(1/2, 1/2) to turn a Gaussian change in variance problem into a Gamma change in parameter problem (refer to Section 6 for an illustration of this).Nevertheless, the ability to work flexibly in all possible exponential family settings without requiring data pre-processing can be helpful.
The ideas from Section 2.2 can be applied to detecting a change in the parameter of a oneparameter exponential family.The main change is to the form of the log-likelihood.For Algorithm 1 we need to store the differences C (τj ) τi (θ 1 ) in the log-likelihood for different choices of the changepoint location.This becomes These curves can summarised in terms of the coefficients of α(θ 1 ) − α(θ 0 ) and β(θ 1 ) − β(θ 0 ), that is Distribution Table 1: Examples of one-parameter exponential families and the corresponding forms of α(θ), β(θ) and γ(x).The Gaussian change in mean model is for a variance of 1, the Gaussian change in variance model is for a mean of 0; the Binomial model assumes the number of trials is n; and the Gamma model is for a change in scale parameter with shape parameter k. τj t=τi+1 γ(x t ) and τ j − τ i .The pruning of Algorithm 1 is based on comparing roots of curves.One challenge with implementing the algorithm for general exponential family models is that the roots are often not available analytically, unlike for the Gaussian model, and thus require numerical root finders.However, pruning just depends on the ordering of the roots.The following proposition shows that we can often determine which of two curves has the larger root without having to calculate the value of the root.Define γτi:τj = 1 τ j − τ i τj t=τi+1 γ(x t ), to be the average value of γ(x t ) for t = τ i + 1, . . ., τ j , and define θ τ 1 ( = θ 0 ) to be the root of Then the following proposition shows that the ordering of the roots is determined by the ordering of γ values.
Proposition 1 Suppose that for our choice of θ 0 the function is strictly increasing.Then the sign of γτi:τj − γτj:T is the same as the sign of θ τi 1 − θ τj 1 .
Proof: See Supplementary Material.In other words, θ τi 1 > θ τj 1 if and only if γτi:τj > γτj:T .Thus we can change the condition in Algorithm 1 that compares the roots of two curves with a condition that compares their γ values.Or equivalently we can implement Algorithm 1 but with l i = γτi:τi+1 rather than the root of C τi+1 τi = 0.An immediate consequence of this result is that one-parameter exponential family models that satisfy the condition of Proposition 1 and that have the same value for γ(x) will prune exactly the same set of curves.This leads to the following corollary based on a set of exponential family models with γ(x) = x, the same as the Gaussian change in mean model of the original FOCuS algorithm.
Corollary 2 The Gaussian (change in mean), Poisson, Binomial, and Gamma variations of the FOCuS algorithm have the same pruning.
A graphical example of this corollary is shown in Figure 2.More generally we have the following.So, for example, the pruning for the Gaussian change in variance model will be the same as for the Gaussian change in mean model run on data x 2 1 , x 2 2 , . . . .One consequence of this corollary is that the strong guarantees on the number of curves that are kept at time T for the original FOCuS algorithm (Romano et al., 2021a) applies to these equivalent exponential family models.The results on the expected number of curves kept by FOCuS makes minimal assumptions for the data, namely that the observations are exchangeable.These results imply the on average the number of curves kept at iteration T is O(log T ).

Unknown Pre-change Parameter
We next turn to consider how to extend the methodology to the case where both pre-change and post-change parameters are unknown.When θ 0 is unknown, the log likelihood-ratio statistic, LR T , satisfies The challenge with calculating this is the first term.Define If we can calculate this function of θ 0 and θ 1 , it will be straightforward to calculate the likelihoodratio statistic.If we fix θ 0 and consider Q * T as a function of only θ 1 then this is just the function Q T (θ 1 ) we considered in the known pre-change parameter.
As before, we can write Q * T (θ 0 , θ 1 ) as the maximum of a set of curves, now of two variables θ 0 and θ 1 , with each function relating to a specific value of τ .As before if we can easily determine the curves for which values of τ contribute to the maximum, we can remove the other functions and greatly speed-up the calculation of Q * T .To do this, consider Q * T (θ 0 , θ 1 ) as a function of θ 1 only, and write this as Q T,θ0 (θ 1 ).Algorithm 1 gives us the curves the contribute to this function for θ 1 > θ 0 .This set of curves is determined by the ordering of the roots of the curves, i.e. the l i for i ≥ 1 in Algorithm 1.If we now change θ 0 , the roots of the curves will change, but by Proposition 1 the orderings will not.The only difference will be with the definition of l 0 .That is as we reduce θ 0 we may have additional curves that contribute to the maximum, due to allowing a larger range of values for θ 1 , but as we increase θ 0 we can only ever remove curves.I.e.we never swap the curves that need to be kept.Thus if we run Algorithm 1 for θ 0 = −∞, then the set of curves we keep will be the set of curves that contribute to Q * T (θ 0 , θ 1 ) for θ 1 > θ 0 .
In practice, this means that to implement the pruning of FOCuS with pre-change parameter unknown, we proceed as in Algorithm 1 but set l 0 = −∞ when considering changes θ 1 > θ 0 , and l 0 = ∞ when considering changes θ 1 < θ 0 .The equivalence of Algorithm 1 across different exponential family models, that we demonstrated with Corollary 3, also immediately follows.

Adaptive Maxima Checking
The main computational cost of the FOCuS algorithm comes from maximising the curves at each iteration.This is particularly the case for non-Gaussian models, as maximising a curve requires evaluating max θ0,θ1 (x 1:T |θ 0 , θ 1 , τ ), which involves computing at least one logarithm (as in the cases of Poisson, Binomial, Gamma data).As the number of curves kept by time T is of order log(T ), calculating all maxima represents a (slowly) scaling cost.However we can reduce this cost by using information from previous iterations so that we need only maximise over fewer curves in order to detect whether Q T is above or below our threshold.This is possible by obtaining an upper bound on Q T that is easy to evaluate, as if this upper bound is less than our threshold we need not calculate Q T .
The following proposition gives such an upper bound on the maximum of all, or a subset, of curves.First for τ i < τ j , we define the likelihood ratio statistic for a change at τ i with the signal ending at τ j .Define this likelihood ratio statistic as where H 0 denotes the set of possible values of θ 0 .H 0 will contain a single value in the pre-change parameter known case, or be R for the pre-change parameter unknown case.
Proposition 4 For any τ 1 < τ 2 < ... < τ n < T , we have Proof: See Supplementary Material.A pictorial explanation of the result is also shown in Figure 3 0.0 1.0 2.0 0.0 0.5 1.0 1.5 2.0 θ 1 Quadratic Differences 0.0 1.0 2.0 0.0 0.5 1.0 1.5 2.0 Figure 3: Example of the bound of Propositon 1 for the pre-change mean known case.Left-hand plot shows the differences between the three curves that contribute to Q T (θ 1 ).The m τi:τj values correspond to the maximum of these curves (vertical lines).Right-hand plot shows Q T (θ), the three curves that define it, and the maximum difference between the curves (vertical bars).The bound is the sum of the maximum differences (right-most stacked line).
Input: A set of n likelihood curves and associated (τ k , M τ k ) values.We can use this result as follows.The sum M τ k := k−1 i=1 m τi,τi+1 can be stored as part of the likelihood curve for τ k , and the maxima checking step can proceed as in Algorithm 2. The idea is that we can bound Q T above by m τ k ,T + M τ k .So, starting with the curve with largest τ k value we check if m τ k ,T + M τ k is below the threshold.If it is, we know Q T is below the threshold and we can output that no change is detected without considering any further curves.If not, we see if m τ k ,T , the likelihood-ratio test statistic for a change at τ k is above the threshold.If it is we output that a change has been detected.If not then we proceed to curve with the next largest τ k value and repeat.
Empirical results suggest that for τ 1 ...τ n ∈ I T when searching only for an up-change (or analogously only for a down-change), the upper bound in Proposition 4 is quite tight under the underlying data scenario of no change because most of the m τi,τi+1 are very small.Furthermore, as we show in Section 6, at the majority of time-steps only one curve needs to be checked before we know that Q T is less than our threshold.

Numerical Examples
We run some examples to empirically evaluate the computational complexity of the FOCuS procedure, comparing the various implementations presented in this paper with those already present in the literature.
In Figure 4 we show the number of floating point operations as a function of time.The Figure was obtained by averaging results from 50 different sequences of length 1 × 10 6 .Results were obtained under the Bernoulli likelihood.Under this likelihood, the cost for an update is negligible, given that this involves integer operations alone, and this allows for a better comparison of the costs of pruning and checking the maxima.We compare three different FOCuS implementations: (i) FOCuS with pruning based on the ordered roots l 1 , . . ., l n , where such roots are found numerically through the Newton-Raphson procedure, (ii) FOCuS with the average value pruning of Section 3 and lastly (iii) FOCuS with the average value pruning and the adaptive maxima checking of Section 5.
We note that avoiding explicitly calculating the roots leads to a lower computational overhead when compared to Newton-Raphson.The best performances are, however, achieved with the addition of the adaptive maxima checking procedure, where we find a constant per iteration computational cost under the null centered around 15 flops per iteration.Without the adaptive maxima checking, the maximisation step is the most computationally demanding step of the FOCuS procedure, as we need to evaluate O(log(T )) curves per iteration.
In Figure 5 we place a change at time 1 × 10 5 and we focus on the number of curves stored by FOCuS, and the number of curves that need to be evaluated with the adaptive maxima checking.Furthermore, for comparison, we add a line for the naive cost of direct computation of the CUSUM likelihood-ratio test.We can see how, before we encounter a change, with the adaptive maxima checking routine we only need to maximise on average 1 curve per iteration, as compared to about 7.4 for the standard FOCuS implementation.After we encounter a change, then, the number of curves that need evaluation increases, as the likelihood ratio statistics increases and it is more likely to meet the condition of Proposition 4. As it can be seen from the short spike after the change, this is only occurs for a short period of time preceding a detection.This empirically shows that FOCuS is O(1) computational complexity per iteration while being O(log T ) in memory, as we still need to store in memory on average O(log T ) curves.To illustrate the advantages of running FOCuS for the correct exponential family model, we consider detecting a change in variance in Gaussian data with known mean.We will assume that we have standardised the data so it has mean zero.A common approach to detecting a change in variance is to detect a change in mean in the square of the data (Inclan and Tiao, 1994), so we will compare FOCuS for Gaussian change in mean applied to the square of the data against FOCuS for the Gaussian change in variance model (as in Table 1).
For a process distributed under the null as a normal centered on 0 with variance θ 0 = 1, we present 5 simulations scenarios for θ 1 = 0.75, 1.25, 1.5, 1.75 and 2. Each experiment consists of 100 replicates.Thresholds were tuned via a Monte Carlo approach to achieve an average run length of 1 × 10 5 under the null in the same fashion of (Chen et al., 2022, Section 4.1).We then introduce a a change at time 1000 and measure performances in terms of detection delay (the difference between the detection time and the real change).
In Figure 6 we illustrate the scenarios and present results in terms of the proportion of detections within t observations following the change.For a positive change large enough, e.g. for θ 1 = 2, there is only a small advantage in employing the Gaussian change-in-variance model over the Gaussian change-in-mean applied to the square of the data.However, as we lower the signal-to-noise ratio and shift towards more subtle changes, we can see how using the correct model gives an increasing advantage in terms of reducing the detection delay.
Note the similarity of the first and third terms that will allow telescopic cancellations when summing the m τi,τi+1 .Setting τ n+1 := T for convenience, we have that for any 1 ≤ k ≤ n, noting that we have inequalities on the first two terms due to maximising the same likelihood over an expansion of the hypothesis set, and equality in the final term.This proves the result.The construction n−1 i=1 m τi,τi+1 + m τn,T is essentially fitting changepoints at every single one of the τ i .This compares against the construction max i=1,...,n m τi,T , which fits only one changepoint at the most promising τ i .
Where {τ 1 , ..., τ n } ∈ I T and are therefore ordered in increasing/decreasing γτi:τi+1 all representing up-changes/down-changes, it is the case that you don't gain much by fitting all of the τ i as changepoints rather than just the best one.In the underlying data scenario of no change, the earlier m τi,τi+1 will be very small, and it is m τn,T that will contribute the most as it captures the fluctuations of recent events in the signal.

)Figure 1 :
Figure 1: Example of one iteration of FOCuS.The top row plots the quadratics C (T −1) τ1 (red), C (T −1) τ2 in Algorithm 1.The roots of these quadratic differences are shown by grey vertical lines.The roots of the first three quadratic difference demark the intervals where the quadratics are optimal.The root of C (T −1) τ4shows the region where that curve is above the zero-line.The algorithm considers pruning τ 4 based on whether the root of C (T −1) τ4 is smaller than the root of C (τ4) τ3 .The pruning of τ 4 combines cyan with blue into the quadratic difference C (T −1) τ3(bottommiddle, blue line).We then add C (T )T −1 (black) as its own quadratic difference (bottom-right).We require no iteration over the full quadratic list, as C

Figure 2 :
Figure 2: Comparison of three different cost functions computed from the same realizations y 1 , . . ., y 500 ∼ Poi(1).The leftmost, center, and rightmost figures show the cost function Q n (θ)should we assume respectively a Gaussian, Poisson, or Gamma loss.The floating number refers to the timestep at which each curve was introduced.In gray, the curves that are no longer optimal and hence were pruned. then Return no change.

Figure 4 :
Figure 4: Flops per iteration in function of time for three FOCuS implementations.In green, the flops for FOCuS with pruning based on calculating the roots l 1 , . . ., l n numerically.In light blue, FOCuS with the average value pruning.In blue, finally, FOCuS with the average value pruning and the adaptive maxima checking.Log-scale on both axes.

Figure 5 :
Figure 5: Number of curves to store and evaluations per iteration in function of time.The grey dotted line is the naive cost of computing the CUSUM likelihood ratio test.The dashed line are the number of curves stored by FOCuS over some Gaussian (light-green), Poisson (dark-green), Bernoulli (light-blue) and Gamma (dark-blue) realizations.The solid lines are the number of curves that need to be evaluated at each iteration with the adaptive maxima checking.Log-scale on both axes.