On optimal multiple changepoint algorithms for large data
 3.5k Downloads
 10 Citations
Abstract
Many common approaches to detecting changepoints, for example based on statistical criteria such as penalised likelihood or minimum description length, can be formulated in terms of minimising a cost over segmentations. We focus on a class of dynamic programming algorithms that can solve the resulting minimisation problem exactly, and thus find the optimal segmentation under the given statistical criteria. The standard implementation of these dynamic programming methods have a computational cost that scales at least quadratically in the length of the timeseries. Recently pruning ideas have been suggested that can speed up the dynamic programming algorithms, whilst still being guaranteed to be optimal, in that they find the true minimum of the cost function. Here we extend these pruning methods, and introduce two new algorithms for segmenting data: FPOP and SNIP. Empirical results show that FPOP is substantially faster than existing dynamic programming methods, and unlike the existing methods its computational efficiency is robust to the number of changepoints in the data. We evaluate the method for detecting copy number variations and observe that FPOP has a computational cost that is even competitive with that of binary segmentation, but can give much more accurate segmentations.
Keywords
Breakpoints Dynamic Programming FPOP SNIP Optimal Partitioning pDPA PELT Segment Neighbourhood1 Introduction
Often timeseries data experiences multiple abrupt changes in structure which need to be taken into account if the data is to be modelled effectively. These changes, known as changepoints, or breakpoints, cause the data to be split into segments which can then be modelled separately. Detecting changepoints, both accurately and efficiently, is required in a number of applications including bioinformatics (Picard et al. 2011), financial data (Fryzlewicz 2012), climate data (Killick et al. 2012; Reeves et al. 2007), EEG data (Lavielle 2005), Oceanography (Killick et al. 2010) and the analysis of speech signals (Davis et al. 2006).
As increasingly large datasets are obtained in modern applications, there is a need for statistical methods for detecting changepoints that are not only accurate but also are computationally efficient. A motivating application area where computational efficiency is important is in detecting copy number variation (Olshen et al. 2004; Zhang et al. 2010). For example, in Sect. 7 we look at detecting changes in DNA copy number in tumour microarray data. Accurate detection of regions in which this copy number is amplified or reduced from a baseline level is crucial as these regions can relate to tumorous cells and their detection is important for classifying tumour progression and type. The data analysis in Sect. 7 involves detecting changepoints in thousands of timeseries, many of which have hundreds of thousands of data points. Other applications of detecting copy number variation can involve analysing data sets which are orders of magnitude larger still.
There are a widerange of approaches to detecting changepoints, see for example Frick et al. (2014) and Aue and Horvth (2013) and the references therein. We focus on one important class of approaches (e.g. Braun et al. 2000; Davis et al. 2006; Zhang and Siegmund 2007) that can be formulated in terms of defining a cost function for a segmentation. They then either minimise a penalised version of this cost (e.g. Yao 1988; Lee 1995), which we call the penalised minimisation problem; or minimise the cost under a constraint on the number of changepoints (e.g. Yao and Au 1989; Braun and Müller 1998), which we call the constrained minimisation problem. If the cost function depends on the data through a sum of segmentspecific costs then the minimisation can be done exactly using dynamic programming (Auger and Lawrence 1989; Jackson et al. 2005). However these dynamic programming methods have a cost that increases at least quadratically with the amount of data, and is prohibitive for largedata applications.
Alternatively, much faster algorithms exist that provide approximate solutions to the minimisation problem. The most widely used of these approximate techniques is Binary Segmentation (Scott and Knott 1974). This takes a recursive approach, adding changepoints one at a time. With a new changepoint added in the position that would lead to the largest reduction in cost given the location of previous changepoints. Due to its simplicity, Binary Segmentation is computationally efficient, being roughly linear in the amount of data, however it only provides an approximate solution and can lead to poor estimation of the number and position of changepoints (Killick et al. 2012). Variations of Binary Segmentation, such as Circular Binary Segmentation (Olshen et al. 2004) and Wild Binary Segmentation (Fryzlewicz 2012), can offer more accurate solutions for slight decreases in the computational efficiency.
An alternative approach is to look at ways of speeding up the dynamic programming algorithms. Recent work has shown this is possible via pruning of the solution space. Killick et al. (2012) present a technique for doing this which we shall refer to as inequality based pruning. This forms the basis of their method PELT which can be used to solve the penalised minimisation problem. Rigaill (2010) develop a different pruning technique, functional pruning, and this is used in their pDPA method which can be used to solve the constrained minimisation problem. Both PELT and pDPA are optimal algorithms, in the sense that they find the true optimum of the minimisation problem they are trying to solve. However the pruning approaches they take are very different, and work well in different scenarios. PELT is most efficient in applications where the number of changepoints is large, and pDPA when there are few changepoints.
The focus of this paper is on these pruning techniques, with the aim of trying to combine ideas from PELT and pDPA. This leads to two new algorithms, Functional Pruning Optimal Partitioning (FPOP) and Segment Neighbourhood with Inequality Pruning (SNIP). SNIP uses inequality based pruning to solve the constrained minimisation problem providing an alternative to pDPA which offers greater versatility, especially in the case of multivariate data. FPOP uses functional pruning to solve the penalised minimisation problem efficiently. We show that FPOP always prunes more than PELT. Empirical results suggest that FPOP is efficient for large data sets regardless of the number of changepoints, and we observe that FPOP has a computational cost that is, in some scenarios, even competitive with Binary Segmentation.
The structure of the paper is as follows. We introduce the constrained and penalised optimisation problems for segmenting data in the next section. We then review the existing dynamic programming methods and pruning approaches for solving the penalised optimisation problem in Sect. 3 and for solving the constrained optimisation problem in Sect. 4. The new algorithms, FPOP and SNIP, are developed in Sect. 5, and compared empirically and theoretically with existing pruning methods in Sect. 6. We then evaluate FPOP empirically on both simulated and CNV data in Sect. 7. The paper ends with a discussion.
2 Model definition
Assume we have data ordered by time, though the same ideas extend trivially to data ordered by any other attribute such as position along a chromosome. Denote the data by \(\mathbf {y}=(y_1,\ldots ,y_n)\). We will use the notation that, for \(t\ge s\), the set of observations from time s to time t is \(\mathbf {y}_{s:t}=(y_{s},\ldots ,y_t)\). If we assume that there are k changepoints in the data, this will correspond to the data being split into \(k+1\) distinct segments. We let the location of the jth changepoint be \(\tau _j\) for \(j=1,\ldots ,k\), and set \(\tau _0=0\) and \(\tau _{k+1}=n\). The jth segment will consist of data points \(y_{\tau _{j1}+1},\ldots ,y_{\tau _j}\). We let \(\mathbf {\tau }=(\tau _0,\ldots ,\tau _{k+1})\) be the set of changepoints.
The statistical problem we are considering is how to infer both the number of changepoints and their locations. The specific details of any approach will depend on the type of change, such as change in mean, variance or distribution, that we wish to detect. However a general framework that encompasses many changepoint detection methods is to introduce a cost function for each segment. The cost of a segmentation can then be defined in terms of the sum of the costs across the segments, and we can infer segmentations through minimising the segmentation cost.
2.1 Segmenting data using penalised and constrained optimisation
If k is not known, then a common approach is to calculate \(C_{k,n}\) and the corresponding segmentations for a range of values, \(k=0,1,\ldots ,K\), where K is some chosen maximum number. We can then estimate the number of changepoints by minimising \(C_{k,n}+f(k,n)\) over k for some suitable penalty function f(k, n).
Choosing a good value for f(k, n) is still very much an open problem. The most common choices of f(k, n), for example SIC (Schwarz 1978) and AIC (Akaike 1974) are linear in k, however these are only consistent in specific cases and rely on assumptions made about the data generating process which in practice is generally unknown. Recent work in Haynes et al. (2014) looks at picking penalty functions in greater detail, offering ranges of penalties that give good solutions.
In both the constrained and penalised cases we need to solve a minimisation problem to find the optimal segmentation under our criteria. There are dynamic programming algorithms for solving each of these minimisation problems. For the constrained case this is achieved using the Segment Neighbourhood Search algorithm (see Sect. 4.1), whilst for the penalised case this can be achieved using the Optimal Partitioning algorithm (see Sect. 3.1).
Solving the constrained case offers a way to get segmentations for \(k=0,1,\ldots ,K\) changepoints, and thus gives insight into how the segmentation varies with the number of segments. However, a big advantage of the penalised case is that it incorporates model selection into the problem itself, and therefore it is often computationally more efficient when dealing with an unknown value of k. In the following we will use the terminology optimal segmentation to define segmentations that are the solution to either the penalised or constrained minimisation problem, with the context making it clear as to which minimisation problem it relates to.
2.2 Conditions for pruning
The focus of this paper is on methods for speeding up these dynamic programming algorithms using pruning methods. The pruning methods can be applied under one of two conditions on the segment costs:
Note that C1 is a stronger condition than C2. If C1 holds then C2 also holds with \(\kappa =0\) and this is true for many practical cost functions. For example it is easily seen that for the negative loglikelihood (2) C1 holds with \(\gamma (y_i,\mu )=\log (p(y_i\mu ))\) and C2 holds with \(\kappa =0\). By comparison, segment costs that are the sum of (2) and a term that depends nonlinearly on the length of the segment will obey C2 but not C1.
3 Solving the penalised optimisation problem
We first consider solving the penalised optimisation problem (5) using a dynamic programming approach. The initial algorithm, Optimal Partitioning (Jackson et al. 2005), will be discussed first before mentioning how pruning can be used to reduce the computational cost.
3.1 Optimal Partitioning
3.2 PELT
Since at each time step in the PELT algorithm the minimisation is being run over fewer values it is expected that this method will be more efficient than the basic Optimal Partitioning algorithm. In Killick et al. (2012) it is shown to be at least as efficient as Optimal Partitioning, with PELT’s computational cost being bounded above by \(\mathcal {O}(n^2)\). Under certain conditions the expected computational cost can be shown to be bounded by Ln for some constant \(L<\infty \). These conditions are given fully in Killick et al. (2012), the most important of which is that the expected number of changepoints in the data increases linearly with the length of the data, n.
4 Solving the constrained optimisation problem
We now consider applications of dynamic programming to solve the constrained optimisation problem (4). These methods assume a maximum number of changepoints that are to be considered, K, and then solve the constrained optimisation problem for all values of \(k=1,2,\ldots ,K\). We first describe the initial algorithm, Segment Neighbourhood Search (Auger and Lawrence 1989), and then an approach that uses pruning.
4.1 Segment Neighbourhood Search
4.2 Pruned Segment Neighbourhood Search
Rigaill (2010) has developed techniques to increase the efficiency of Segment Neighbourhood Search using functional pruning. These form the basis of a method called pruned Dynamic Programming Algorithm (pDPA). A more generic implementation of this method is presented in Cleynen et al. (2012). Here we describe how this algorithm can be used to calculate the \(C_{k,t}\) values. Once these are calculated, the exact segmentation can be extracted as in Segment Neighbourhood Search.
An example of the pDPA recursion is given in Fig. 2 for a change in mean using the negative normal loglikelihood cost function (3). The lefthand plot shows \(Cost^*_{k,t}(\mu )\). In this example there are 5 intervals of \(\mu \) corresponding to 4 different values of \(\tau \) for which \(Cost^*_{k,t}(\mu )=Cost^\tau _{k,t}(\mu )\). When we analyse the next data point, we update each of these four \(Cost^\tau _{k,t}(\mu )\) functions, using \(Cost^\tau _{k,t+1}(\mu )=Cost^\tau _{k,t}(\mu )+\gamma (y_{t+1},\mu )\), and introduce a new curve corresponding to a changepoint at time \(t+1\), \(Cost^{t+1}_{k,t+1}(\mu )=C_{k1,t+1}\) (see middle plot). We can then prune the functions which are no longer optimal for any \(\mu \) values, and in this case we remove one such function (see righthand plot).
pDPA can be shown to be bounded in time by \(\mathcal {O}(Kn^2)\). Rigaill (2010) further analyse the time complexity of pDPA and show it empirically to be \(\mathcal {O}(Kn\log n)\), further indications towards this will be presented in Sect. 7. However pDPA has a computational overhead relative to Segment Neighbourhood Search, as it requires calculating and storing the \(Cost^\tau _{k,t}(\mu )\) functions and the corresponding sets \(Set_{k,t}^\tau \). Currently implementations of pDPA have only been possible for models with scalar segment parameters \(\mu \), due to the difficulty of calculating the sets in higher dimensions. Being able to efficiently store and update the \(Cost^\tau _{k,t}(\mu )\) has also restricted applications primarily to models where \(\gamma (y,\mu )\) corresponds to the loglikelihood of an exponential family. However this still includes a widerange of changepoint applications, including that of detecting CNVs that we consider in Sect. 7. The cost of updating the sets depends heavily on whether the updates (13) can be calculated analytically, or whether they require the use of numerical methods.
5 New changepoint algorithms
Two natural ways of extending the two methods introduced above will be examined in this section. These are, respectively, to apply functional pruning (Sect. 4.2) to Optimal Partitioning, and to apply inequality based pruning (Sect. 3.2) to Segment Neighbourhood Search. These lead to two new algorithms, which we call Functional Pruning Optimal Partitioning (FPOP) and Segment Neighbourhood with Inequality Pruning (SNIP).
5.1 Functional Pruning Optimal Partitioning
Functional Pruning Optimal partitioning (FPOP) provides a version of Optimal Partitioning (Jackson et al. 2005) which utilises functional pruning to increase the efficiency. As will be discussed in Sect. 6 and shown in Sect. 7, FPOP provides an alternative to PELT which is more efficient in certain scenarios. The approach used by FPOP is similar to the approach for pDPA in Sect. 4.2, however the theory is slightly simpler here as there is no longer the need to condition on the number of changepoints.
5.2 Segment Neighbourhood with Inequality Pruning
In a similar vein to Sect. 5.1, Segment Neighbourhood Search can also benefit from using pruning methods. In Sect. 4.2 the method pDPA was discussed as a fast pruned version of Segment Neighbourhood Search. In this section a new method, Segment Neighbourhood with Inequality Pruning (SNIP), will be introduced. This takes the Segment Neighbourhood Search algorithm and uses inequality based pruning to increase the speed.
Under condition (C2) the following result can be proved for Segment Neighbourhood Search and this will enable points to be pruned from the candidate changepoint set.
Theorem 1
Proof
The idea of the proof is to show that a segmentation of \(y_{1:T}\) into k segments with the last changepoint at t will be better than one with the last changepoint at s for all \(T>t\).
6 Comparisons between pruning methods
Functional and inequality based pruning both offer increases in the efficiency in solving both the penalised and constrained problems, however their use depends on the assumptions which can be made on the cost function. Inequality based pruning is dependent on the assumption C2, while functional pruning requires the slightly stronger condition C1.
Functional pruning also requires a larger computational overhead than inequality based pruning. This arises due to the potential difficulties in calculating \(Set_t^{\tau }\) for all \(\tau \) at a given timepoint t. If this calculation can be done efficiently (ie. for a univariate parameter from a model in the exponential family, where the intervals can be calculated analytically) then the algorithm (such as FPOP or pDPA) will be efficient too. In particular, this is infeasible (at least using current approaches) for multidimensional parameters, as in this case the intervals \(Set_t^{\tau }\) are also multidimensional.
If we consider models for which both pruning methods can be implemented, we can compare the extent to which the methods prune. This will give some insight into when the different pruning methods would be expected to work well.
As Fig. 4 illustrates, PELT prunes very rarely; only when evidence of a change is particularly high. In contrast, FPOP prunes more frequently keeping the candidate set small throughout. Figure 5 shows similar results for the constrained problem. While pDPA constantly prunes, SNIP only prunes sporadically. In addition SNIP fails to prune much at all for low values of k.
Figures 4 and 5 give strong empirical evidence that functional pruning prunes more points than the inequality based method. In fact it can be shown that any point pruned by inequality based pruning will also be pruned at the same time step by functional pruning. This result holds for both the penalised and constrained case and is stated formally in Theorem 2.
Theorem 2
Let \(\mathcal {C}(\cdot )\) be a cost function that satisfies condition C1, and consider solving either the constrained or penalised optimisation problem using dynamic programming and either inequality or functional pruning.
Any point pruned by inequality based pruning at time t will also have been pruned by functional pruning at the same time.
Proof
We prove this for pruning of optimal partitioning, with the ideas extending directly to the pruning of the Segment Neighbourhood algorithm.
7 Empirical evaluation of FPOP
As explained in Sect. 6 functional pruning leads to a better pruning in the following sense: any point pruned by inequality based pruning will also be pruned by functional pruning. However, functional pruning is computationally more demanding than inequality based pruning. We thus decided to empirically compare the performance of FPOP to PELT (Killick et al. 2012), pDPA (Rigaill 2010), Binary Segmentation (BinSeg), Wild Binary Segmentation (WBS) (Fryzlewicz 2012) and SMUCE (Frick et al. 2014).
PELT and pDPA have been discussed in Sects. 3.2 and 4.2 respectively. Binary Segmentation (Scott and Knott 1974) involves the entire data being scanned for a single changepoint and then splitting into two segments around this change. The process is then repeated on these two segments. This recursion is repeated until a certain criterion is satisfied. Wild Binary Segmentation (Fryzlewicz 2012) takes this method further, taking a randomly drawn number of subsamples from the data and searching these subsamples for a changepoint. As before the data is then split around the changepoint and the process repeated on the two created segments. Lastly SMUCE (Simultaneous Multiscale Changepoint Inference) (Frick et al. 2014) uses a multiscale test at level \(\alpha \) and estimates a step function that minimises the number of changepoints whilst lying in the acceptance region of this test.
To do the analysis, we implement FPOP for the quadratic loss (3) in C++, the code for this can be found in the opfp project repository on RForge:
7.1 Speed benchmark: 4467 chromosomes from tumour microarrays
Hocking et al. (2014) proposed to benchmark the speed of segmentation algorithms on a database of 4467 problems of size varying from \(n=25\) to 153662 data points. These data come from different microarrays data sets (Affymetrix, Nimblegen, BAC/PAC) and different tumour types (leukaemia, lymphoma, neuroblastoma, medulloblastoma).
We compared FPOP to several other segmentation algorithms: pDPA (Rigaill 2010), PELT (Killick et al. 2012), Binary Segmentation (BinSeg), Wild Binary Segmentation (WBS; Fryzlewicz 2012), and SMUCE (Frick et al. 2014). We ran pDPA and BinSeg with a maximum number of changes \(K=52\), WBS and SMUCE with default settings, and PELT and FPOP with the SIC penalty.
We used the R microbenchmark package to measure the execution time on each of the 4467 segmentation problems. The R source code for these timings is in benchmark/systemtime.arrays.R in the opfp project repository on RForge: https://rforge.rproject.org/R/?group_id=1851.
Figure 6 shows that the speed of FPOP is comparable to BinSeg, and faster than the other algorithms. As expected, it is clear that the asymptotic behavior of FPOP is similar to pDPA for a large number of data points to segments. Note that for analysing a single data set, WBS could be more easily implemented in parallelised computing environment that the other methods. If done so this would lead to some reduction in it computational cost per data set. For analysing multiple data sets, as here, all methods are trivially parallelisable through analysing each data set on a different CPU.
7.2 Speed benchmark: simulated data with different number of changes
For PELT the expected time complexity is not as clear, but pruning should be more efficient if there are many changepoints. Hence for a signal of fixed size n, we expect the runtime of PELT to decrease with the underlying number of changes.
Based on Sect. 6, we expect FPOP to be faster than PELT and pDPA. Thus it seems reasonable to expect FPOP to faster for the whole range of K. This is what we empirically check in this section.
To do that we simulated a Gaussian signal with \(n=2\times 10^5\) data points, and varied the number of changes K. We then repeat the same experiment for signals with \(n=10^7\) and timed FPOP and BinSeg only. The R source code for these timings is in benchmark/systemtime.simula tion.R in the opfp project repository on RForge: https://rforge.rproject.org/R/?group_id=1851.
It can be seen in Fig. 7 that FPOP is always faster than pDPA, PELT, WBS, and SMUCE. Interestingly for both \(n=2\times 10^5\) and \(n=10^7\), FPOP is faster than BinSeg for a true number of changepoints larger than \(K=500\).
7.3 Accuracy benchmark: the neuroblastoma data set
Hocking et al. (2013) proposed the neuroblastoma tumor microarray data set for benchmarking changepoint detection accuracy of segmentation models. These data consist of annotated region labels defined by expert doctors when they visually inspected scatterplots of the data. There are 2845 negative labels where there should be no changes (a false positive occurs if an algorithm predicts a change), and 573 positive labels where there should be at least one change (a false negative occurs if an algorithm predicts no changes). There are 575 copy number microarrays, and a total of 3418 labeled chromosomes (separate segmentation problems).
Let m be the number of segmentation problems in the train set, let \(n_1, \dots , n_m\) be the number of data points to segment in each problem, and let \(\mathbf y^{1}\in \mathbb {R}^{n_1}, \dots , \mathbf y^{m}\in \mathbb {R}^{n_m}\) be the vectors of noisy data to segment. Both PELT and pDPA have been applied to this benchmark by first defining a penalty value of \(\beta = \lambda n_i\) in (5) for all problems \(i\in \{1, \dots , m\}\), and then choosing the constant \(\lambda \in \{10^{8}, \dots , 10^1\}\) that minimises the number of incorrect labels in the train set. To apply this model selection criterion to WBS and SMUCE, we first computed a sequence of models with up to \(K=20\) segments (for WBS we used the changepoints.sbs function, and for SMUCE we varied the q parameter).
First, we computed train error ROC curves by considering the entire database as a train set, and computing false positive and true positive rates for each penalty \(\lambda \) parameter (Fig. 8, left). The ROC curves suggest that FPOP, PELT, pDPA, and BinSeg have the best detection accuracy, followed by SMUCE, and then WBS.
Second, we performed crossvalidation to estimate the test error of each algorithm. We divided the labeled segmentation problems into six folds. For each fold we designate it as a test set, and use the other five folds as a train set. For each algorithm we used grid search to choose the penalty \(\lambda \) parameter which had the minimum number of incorrect labels in the train set. We then count the number of incorrect labels on the test set. In agreement with the ROC curves, FPOP/pDPA/PELT/BinSeg had the smallest test error (2.2 %), followed by SMUCE (2.43 %), and then WBS (3.87 %). Using a paired onesided \(t_5\)test, FPOP had significantly less test error than WBS (\(p=0.005\)) but not SMUCE (\(p=0.061\)).
7.4 Accuracy on the WBS simulation benchmark
We assessed the performance of FPOP using the simulation benchmark proposed in the WBS paper (Fryzlewicz 2012) page 29. In that paper 5 scenarios are considered. We considered an additional scenario from a further paper on SMUCE (Futschik et al. 2014) corresponding to Scenario 2 of WBS with a standard deviation of 0.2 rather than 0.3. We call this Scenario 2’. We first compared FPOP with \(\beta =2 \log (n)\), WBS with the sSIC and SMUCE with \(\alpha =0.45\) (used in Futschik et al. (2014) for Scenario 2’) in terms of mean squared error (MSE). For FPOP we first standardised the signal using the MAD (Mean Absolute Deviation) estimate as was done for PELT in Fryzlewicz (2012).

\(\mathbf{{H_0}}\) the average MSE difference between WBS and FPOP is lower or equal to 0.

\(\mathbf{{H_1}}\) the average MSE difference between WBS and FPOP is larger than 0.
We performed similar analysis on our speed benchmark (Fig. 7, left) and found that FPOP is competitive or better than WBS and SMUCE in terms of MSE, BkpEr, exact TP and \(\hat{K}\). Results are shown in supplementary file. The R codes are also available on Rforge.
8 Discussion
We have introduced two new algorithms for detecting changepoints, FPOP and SNIP. A natural question is which of these, and the existing algorithms, pDPA and PELT, should be used in which applications. There are two stages to answering this question. The first is whether to detect changepoints through solving the constrained or the penalised optimisation problem, and the second is whether to use functional or inequality based pruning.
The advantage of solving the constrained optimisation problem is that this gives exact segmentations for a range of numbers of changepoints. The disadvantage is that solving it is slower than solving the penalised optimisation problem, particularly if there are many changepoints. In interactive situations where you wish to explore segmentations of the data, then solving the constrained problem is to be preferred (Hocking et al. 2014). However in noninteractive scenarios when the penalty parameter is known in advance, it will be faster to solve the penalised problem to recover the single segmentation of interest. Further, recent work in Haynes et al. (2014) explores a way of outputting multiple segmentations (corresponding to various penalty values) for the penalised problem.
The decision as to which pruning method to use is purely one of computational efficiency. We have shown that functional pruning always prunes more than inequality based pruning, and empirically have seen that this difference can be large, particularly if there are few changepoints. However functional pruning can be applied less widely. Not only does it require a stronger condition on the cost functions, but currently its implementation has been restricted to detecting changes in a univariate parameter from a model in the exponential family. Even for situations where functional pruning can be applied, its computational overhead per nonpruned candidate is higher.
Our experience suggests that you should prefer functional pruning in the situations where it can be applied. For example FPOP was always faster than PELT for detecting a change in mean in the empirical studies we conducted, the difference in speed is particularly large in situations where there are few changepoints. Furthermore we observed FPOP’s computational speed was robust to changes in the number of changepoints to be detected, and was even competitive with, and sometimes faster than, Binary Segmentation.
Software C++ implementation (within an R wrapper) for the FPOP algorithm can be found in the opfp project repository on RForge: https://rforge.rproject.org/R/?group_id=1851.
Reproducibility The subversion repository of the opfp project on RForge contains all the code necessary to make the figures in this manuscript.
Notes
Acknowledgments
The authors would like to thank Adam Letchford for helpful comments and discussions, and the Isaac Newton Institute for Mathematical Sciences, Cambridge, for support and hospitality during the programme Inference for ChangePoint and Related Processes where work on this paper was undertaken. This study was funded by EPSRC (Grant Number EP/K014463/1 and through the STORi Doctoral Training Centre).
Compliance with ethical standards
Conflicts of interest
The authors declare that they have no conflict of interest.
Research involving human and animal rights
Research involving human participants and/or animals: This article does not contain any studies with human participants or animals performed by any of the authors.
Supplementary material
References
 Akaike, H.: A new look at the statistical model identification. IEEE Trans. Autom. Control 19, 716–723 (1974)MathSciNetCrossRefzbMATHGoogle Scholar
 Aue, A., Horvth, L.: Structural breaks in time series. J. Time Ser. Anal. 34(1), 1–16 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 Auger, I.E., Lawrence, C.E.: Algorithms for the optimal identification of segment neighborhoods. Bull. Math. Biol. 51, 39–54 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
 Braun, J.V., Braun, R.K., Muller, H.G.: Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation. Biometrika 87, 301–314 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
 Braun, J.V., Müller, H.G.: Statistical methods for DNA sequence segmentation. Stat. Sci. 13(2), 142–162 (1998)CrossRefzbMATHGoogle Scholar
 Cleynen, A., Koskas, M., Rigaill, G.: A generic implementation of the pruned dynamic programing algorithm. ArXiv eprints (2012)Google Scholar
 Davis, R.A., Lee, T.C.M., RodriguezYam, G.A.: Structural break estimation for nonstationary time series models. J. Am. Stat. Assoc. 101, 223–239 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
 Frick, K., Munk, A., Sieling, H.: Multiscale change point inference. J. R. Stat. Soc. Ser. B Stat. Methodol. 76(3), 495–580 (2014)MathSciNetCrossRefGoogle Scholar
 Fryzlewicz, P.: Wild binary segmentation for multiple changepoint detection. Ann. Stat. (2012) (to appear)Google Scholar
 Futschik, A., Hotz, T., Munk, A., Sieling, H.: Multiscale DNA partitioning: statistical evidence for segments. Bioinformatics 30(16), 2255–2262 (2014)CrossRefGoogle Scholar
 Haynes, K., Eckley, I. A., Fearnhead, P.: Efficient penalty search for multiple changepoint problems. ArXiv eprints (2014)Google Scholar
 Hocking, T.D., Boeva, V., Rigaill, G., Schleiermacher, G., JanoueixLerosey, I., Delattre, O., Richer, W., Bourdeaut, F., Suguro, M., Seto, M., Bach, F., Vert, J.P.: SegAnnDB: interactive webbased genomic segmentation. Bioinformatics 30, 1539–1546 (2014)CrossRefGoogle Scholar
 Hocking, T.D., Schleiermacher, G., Janoueixlerosey, I., Boeva, V., Cappo, J., Delattre, O., Bach, F., Vert, J.P.: Learning smoothing models of copy number profiles using breakpoint annotations. BNC Bioinform. 14, 164 (2013)CrossRefGoogle Scholar
 Jackson, B., Scargle, J.D., Barnes, D., Arabhi, S., Alt, A., Gioumousis, P., Gwin, E., Sangtrakulcharoen, P., Tan, L., Tsai, T.T.: An algorithm for optimal partitioning of data on an interval. IEE Signal Process. Lett. 12, 105–108 (2005)CrossRefGoogle Scholar
 Killick, R., Eckley, I.A., Ewans, K., Jonathan, P.: Detection of changes in variance of oceanographic timeseries using changepoint analysis. Ocean Eng. 37(13), 1120–1126 (2010)CrossRefGoogle Scholar
 Killick, R., Fearnhead, P., Eckley, I.A.: Optimal detection of changepoints with a linear computational cost. J. Am. Stat. Assoc. 107, 1590–1598 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
 Lavielle, M.: Using penalized contrasts for the changepoint problem. Signal Process. 85, 1501–1510 (2005)CrossRefzbMATHGoogle Scholar
 Lee, C.B.: Estimating the number of change points in a sequence of independent normal random variables. Stat. Prob. Lett. 25(3), 241–248 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
 Olshen, A.B., Venkatraman, E.S., Lucito, R., Wigler, M.: Circular binary segmentation for the analysis of arraybased DNA copy number data. Biostatistics 5, 557–572 (2004)CrossRefzbMATHGoogle Scholar
 Picard, F., Lebarbier, E., Hoebeke, M., Rigaill, G., Thiam, B., Robin, S.: Joint segmentation, calling, and normalization of multiple CGH profiles. Biostatistics 12, 413–428 (2011)CrossRefGoogle Scholar
 Reeves, J., Chen, J., Wang, X.L., Lund, R., Lu, Q.Q.: A review and comparison of changepoint detection techniques for climate data. J. Appl. Meteorol. Climatol. 46, 900–915 (2007)CrossRefGoogle Scholar
 Rigaill, G.: Pruned dynamic programming for optimal multiple changepoint detection. ArXiv eprints (2010)Google Scholar
 Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
 Scott, A.J., Knott, M.: A cluster analysis method for grouping means in the analysis of variance. Biometrics 30, 507–512 (1974)CrossRefzbMATHGoogle Scholar
 Yao, Y.C.: Estimating the number of changepoints via Schwarz’ criterion. Stat. Prob. Lett. 6(2), 181–189 (1988)MathSciNetCrossRefzbMATHGoogle Scholar
 Yao, Y.C., Au, S.T.: Leastsquares estimation of a step function. Indian J. Stat. 51(3), 370–381 (1989)MathSciNetzbMATHGoogle Scholar
 Zhang, N.R., Siegmund, D.O.: A modified bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics 63, 22–32 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
 Zhang, N.R., Siegmund, D.O., Ji, H., Li, J.Z.: Detecting simultaneous changepoints in multiple sequences. Biometrika 97(3), 631–645 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.