BayesProject: Fast computation of a projection direction for multivariate changepoint detection

This article focuses on the challenging problem of efficiently detecting changes in mean within multivariate data sequences. Multivariate changepoints can be detected by projecting a multivariate series to a univariate one using a suitable projection direction that preserves a maximal proportion of signal information. However, for some existing approaches the computation of such a projection direction can scale unfavourably with the number of series and might rely on additional assumptions on the data sequences, thus limiting their generality. We introduce BayesProject, a computationally inexpensive Bayesian approach to compute a projection direction in such a setting. The proposed approach allows the incorporation of prior knowledge of the changepoint scenario, when such information is available, which can help to increase the accuracy of the method. A simulation study shows that BayesProject is robust, yields projections close to the oracle projection direction and, moreover, that its accuracy in detecting changepoints is comparable to, or better than, existing algorithms while scaling linearly with the number of series.


Introduction
Changepoint detection is an area of research with immediate practical applications in the monitoring of financial data (Bai and Perron 1998;Frick et al. 2014), network traffic data (Lévy-Leduc and Roueff 2009;Lung-Yut-Fong et al. 2012), as well as bioinformatics (Maidstone et al. 2017;Guédon 2013), environmental (Nam et al. 2015) and signal or speech processing applications (Desobry et al. 2005;Haynes et al. 2017). For instance, a sudden change in (mean) activity of one or more data streams in a network could hint at an intruder sending data to an unknown host from several infected computers. In general, a user is faced with the task to monitor several data streams simultaneously, ideally in real time, with the aim of detecting a change occurring in at least one series as soon as it occurs.

Electronic supplementary material
The online version of this article (https://doi.org/10.1007/s11222-020-09966-2) contains supplementary material, which is available to authorized users. Starting with the work of Page (1954), considerable contributions have been made to the literature on changepoint detection in the areas of both methodology and applications. Of particular interest for the work we present are the developments in multivariate changepoint detection. Other aspects of changepoint detection address nonparametric asymptotic tests (Aue et al. 2009), scan and segmentation algorithms based on Chi-squared statistics (Zhang et al. 2010), detection of mean changes in intervals for biomedical applications (Siegmund et al. 2011), consistent nonparametric estimation of the number and locations of changepoints (Matteson and James 2012) and detection of multiple structural breaks in the autocovariance of a multivariate piecewise stationary process (Preuß et al. 2015). See Horváth and Rice (2014) and Truong et al. (2018) for a comprehensive overview of the pertinent literature on changepoint analysis.
The Inspect algorithm of Wang and Samworth (2017) is of particular interest, introducing a new approach for the detection of changes in a multivariate data sequence. Detection is achieved by projecting the multivariate data to a univariate series in which a changepoint is detected via a cusum approach, thus reducing the problem to the known univariate case. However, estimating the projection direction scales unfavourably with the number of series in Inspect due to the fact that the projection vector is computed as the leading left singular vector of a matrix (calculated via a convex optimisation problem or a cusum transform). This SVD computation leads to an algorithmic runtime of higher order in the number of series.
Our article presents an alternative Bayesian framework to compute an estimate of the projection direction. One of the main features of the proposed framework is that it permits the incorporation of prior knowledge of the changepoint scenario under investigation through the choice of a suitable prior on the change in each series, when such information is available. Its computational effort is linear in the number of series and superlinear in the series length.
The problem of detecting changes in multivariate data has received increasing attention over recent years. We compare the performance of our proposed algorithm not only to Inspect, but also to state-of-the-art algorithms from the following papers. Cho and Fryzlewicz (2015) propose to only consider the cusum statistics of those series which pass a certain threshold, thereby reducing the impact of noise. Cho (2016) proposes a double cusum statistic to aggregate cusum statistics in high dimensions and proves consistency in single and multiple changepoint detection. Enikeeva and Harchaoui (2019) propose a two-component test to detect a change in mean in a sequence of high-dimensional Gaussian vectors which yields optimal rates of convergence for a wide range of changepoint regimes. Yu and Chen (2020) introduce a Gaussian multiplier bootstrap to calibrate the critical values for the cusum statistic in high dimensions, as well as an estimator for the changepoint location which achieves a nearoptimal convergence rate. Grundy et al. (2020b) propose to project a multivariate dataset to two dimensions instead of one, allowing the detection of a change in mean and variance by applying univariate changepoint detection methods to the two projected series.
The article is structured as follows. Section 2 formulates the changepoint setting under consideration (Sect. 2.1) and revisits the Inspect algorithm (Sect. 2.2) before introducing the alternative Bayesian approach for computing the projection direction (Sect. 2.3). Section 3 considers practical implementation details, including threshold calibration (Sect. 3.1), the computation of a grid for projecting the data (Sect. 3.2) and pseudo-code of the proposed algorithm (Sect. 3.3). A simulation study, presented in Sect. 4, evaluates the original Inspect algorithm, the proposed Bayesian projection approach and two classical methods (sum-cusum and max-cusum) with respect to the accuracy of the estimated projection direction, the accuracy of the resulting changepoint and their runtime. The simulation scenarios investigate the dependence on a range of features: the number of series and the series length, the proportion of series with a change, the size of the change and the location of the changepoint. Results for multiple changepoint detection on simulated data are presented in Sect. 5. In the article, · q denotes the L q norm of a vector and |·| denotes the absolute value of a scalar or the size of a set.

Mathematical formulation of the problem
Consider the problem of estimating the location of a single changepoint in a high-dimensional dataset, X ∈ R p×n , composed of independent p-dimensional normal random column vectors where 0 ≤ d ≤ p out of the p series exhibit a changepoint at ζ , meaning that μ A := μ (1) = · · · = μ (ζ ) and μ (ζ +1) = · · · = μ (n) =: μ B under the condition that is the mean before (after) the change, and I p denotes the p × p identity matrix. The goal is to estimate the location of the changepoint, ζ .
One particularly simple way to attempt to find the time point, ζ , is by computing a univariate cusum transform for each of the p series (Page 1954). The cusum transform of the univariate series X i,· is a scaled difference of means before and after each assumed changepoint location t ∈ {1, . . . , n}. Let T (·) denote the row-wise cusum transform of the p series, given by for i = 1, . . . , p and t = 1, . . . , n − 1. A changepoint is declared in each series i, individually, if at any location, the cusum transform T (X ) i,· exceeds some prespecified threshold τ > 0, with the changepoint location in series i being To find a multivariate changepoint, some form of aggregation of the cusum information from all p series is required, thus effectively resulting in a reduction of dimensionality. Two simple aggregation approaches include sum-cusum and max-cusum. In sum-cusum, the p cusum transforms are summed (and averaged) for each assumed location t ∈ {1, . . . , n}, and a changepoint is declared if the average exceeds a prespecified threshold. The sum-cusum approach works well if most series change; however, it can lose power in the situation where the changes are sparse. In maxcusum, the cusum information is aggregated by computing the maximum across all p series for each t ∈ {1, . . . , n}, max i∈{1,..., p} T (X ) i,t , and by declaring a changepoint once the maximum exceeds τ . This approach works well if changes are sparse but pronounced. The computational complexity of both approaches is O(np).
Alternatively, the problem of multivariate changepoint detection can be reduced to the known univariate case by projecting a multivariate series to one dimension. Wang and Samworth (2017) show that there is an ideal projection direction, given by the oracle direction θ = μ B − μ A , which if used for projecting the data will involve no loss of information about the changepoint location. In particular, projecting the data X along v := θ maximises the signal-to-noise ratio at ζ , v X ζ ∼ N (v μ (ζ ) , σ 2 ζ ). Since μ A and μ B (and thus v) are unknown, the goal in this setting is to estimate v.
Assuming a changepoint occurs at t, a natural estimatev of v is given byv i ∝ X i,1:t − X i,(t+1):n (where we assumev to be normalised to have Euclidean norm one), the difference of sample means before and after t, where X i,1:t is the mean of the first t observations of series i and X i,(t+1):n is the mean of series i from time t + 1 to n. Projecting the data usingv leads to the univariate seriesv X , for which the square of the cusum statistic is proportional to the sum of the square of the cusum statistics of each individual series. This is just the test statistic one obtains from a likelihood-ratio test for a change in mean in all series for normal data (James et al. 1985;Srivastava and Worsley 1986); and like sum-cusum this works well if all series change, but loses power when only a small proportion of series change (Enikeeva and Harchaoui 2019). For these sparse settings, the estimate,v, of the projection direction can be improved if additional assumptions on the type of change are made. This is the basis of the approaches described in the next sections. Specifically, in Sect. 2.2, we briefly summarise the Inspect algorithm proposed by Wang and Samworth (2017), while in Sect. 2.3 we introduce a new, Bayesian approach to estimatingv.

The Inspect algorithm
The Inspect algorithm, proposed by Wang and Samworth (2017), estimates the projection direction by maximising the signal v max,k ∈ arg max where T (·) is the row-wise cusum transform given in (1) and S p−1 (k) is the unit ball in p dimensions with at most k nonzero components. However, optimising over all (exponentially many) k-sparse vectors in S p−1 (k) is computationally very expensive.
To mitigate this, Wang and Samworth (2017) reformulate (2) as the convex relaxation max M∈M M, T , where M := {M ∈ R p×(n−1) : M * = 1, rank(M) = 1, nz(M) ≤ k}, nz(M) is the number of nonzero rows of M, and · * is the nuclear norm of a matrix. The definition of M as a subspace of R p×(n−1) stems from the fact that the Euclidean norm in (2) can be rewritten asv T (X )ŵ = vŵ , T (X ) with v ∈ S p−1 and w ∈ S n−2 [Wang and Samworth 2017, Eq. (12)]. This can be relaxed further tô where S 1 := {M ∈ R p×(n−1) : M * ≤ 1}. Here, λ > 0 is a tuning parameter controlling the sparseness of the solution. The problem in (3) can be solved with the alternating direction of multipliers algorithm (ADMM), see Gabay and Mercier (1976) and Boyd et al. (2011). Furthermore, Wang and Samworth (2017) establish that (3) can be further relaxed by replacing the set S 1 by S 2 := {M ∈ R p×(n−1) : M 2 ≤ 1}. In particular, they show that this allows for the closedform solutioñ where (soft(A, λ)) i j = sgn(A i j ) max{|A i j | − λ, 0} is a soft-thresholding operation for a matrix A = (A i j ) and a threshold λ ≥ 0. Finally, the estimate of the projection direction,v, is obtained as the leading left singular vector ofMM or MM , respectively, which is of dimension p × p. This relies on an SVD computation with effort O( p 3 ) (Golub and van Loan 2012). Usingv, X is projected to a univariate serieŝ v X , to which the cusum transform T (·) is applied. For a given threshold τ > 0 and t 0 = arg max T (v X ), a changepoint is declared at t 0 if max T (v X ) > τ. Section 3.1 discusses the choice of τ . The aforementioned approach can be combined with wild binary segmentation of Fryzlewicz (2014) to detect multiple changepoints.
Computing an estimate of M via soft-thresholding is considerably faster than computing (3) via ADMM, and thus Inspect will be employed with soft-thresholding in the remainder of the article. The soft-thresholding operation, as well as the cusum transform, requires an effort of O( pn). The SVD computation forMM orMM (each matrix multiplication takes O( p 2 n) time) has effort O( p 3 ), thus leading to an overall linear runtime dependence of Inspect on n and a cubic dependence on p. As an alternative to the standard SVD employed in the implementation of Wang and Samworth (2016), the leading left singular vector can also be efficiently computed via partial SVD algorithms (Lanczos iterations). We explore this idea in the supplementary material and show that empirically, Inspect with partial SVD has a slightly improved asymptotic runtime in p.

A Bayesian approach
While the work of Wang and Samworth (2017) is very powerful, in practice it is still computationally intensive. In particular, while soft-thresholding helps reduce the computational cost, computing an estimate of the projection direction in the Inspect algorithm still relies on the computation of a SVD. Thus, ideally, a faster way of estimating a (good) projection direction would be desirable.
We begin by recalling that Inspect is based on certain sparsity assumptions (Wang and Samworth 2017, Sect. 3) which lead to relaxation (3) to detect sparse changes. Other alternative assumptions on the change in mean are also possible. For example, one possible framework is to adopt a Bayesian setting in which a prior is specified for the change in mean of each series. After calculating the posterior, the projection direction is estimated proportionally to the posterior change in mean of each series. This framework permits the straightforward incorporation of different assumptions for the changepoint scenario via the selection of an appropriate prior.
A Normal prior on the magnitude of change in mean in each series is equivalent to a Chi-squared test statistic (for the changepoint location) and results in a simple estimate which is proportional to the sample change in mean of each series. However, to incorporate that a change often only occurs in a subset of the p series, we adopt a mixture prior-the socalled spike and slab prior-on the change in each individual series, consisting of a Normal prior and a point mass at zero for those series without a change. In particular, the mixture prior on each θ i = μ A,i − μ B,i proposed in this article is given by θ i ∼ (1 −π)δ 0 +π N (0, ω 2 ), whereπ ∈ [0, 1] is the proportion of series which change, ω 2 is the variance of the assumed size of a change, and δ 0 is the point mass at zero.
Under the assumption that a changepoint occurs at time t ∈ {1, . . . , n}, for series i, let D (t) i = X i,1:t − X i,(t+1):n be the difference of the means before and after the changepoint, Here, K ∈ R is a tuning parameter, and we assume that the data have been standardised so that the variance κ = 1. A derivation of (5) is provided in Appendix A. The exponential term in (5) depends on the ratio of the prior variance of the size of the change in mean, ω 2 , to the variance of the cusum statistic, σ 2 t . If prior knowledge on ω is available, (5) can be used directly in several ways. For instance, ω can be estimated from a set of changepoints detected in the individual series, or expert knowledge of the changepoint scenario can be used. However, we show in the supplementary material that (5) seems to be robust towards an under-or over-specification of the value σ 2 t /ω 2 . In the detectable scenario in which the size of the change is relatively large compared to the noise in the cusum statistic, thus In (6), the exponential term shrinks small values of D (t) i more towards zero than larger values. The estimate of the projection direction returned by our approach isv . . , n} which maximises the cusum statistic at t 0 after projection.
As derived in [Appendix A, Eq. (9)], the tuning parameter K in (6) depends on two quantities: 1. K is proportional to the fractionπ/(1 −π), whereπ is the proportion of the p series affected by the change. This could be set using expert information on the changepoint scenario or estimated using individual changepoint detection applied to each series. If it is known that changes are sparse, one obtains K ∝π . 2. K is roughly antiproportional to ω/σ t , the ratio of the size of a change to the noise in the cusum statistic. This ratio is assumed to be relatively large as already used to simplify (5) into (6).
Section 4 investigates the robustness of the Bayesian approach on the choice of K .

Implementation details
This section considers some details required for a practical implementation of the algorithm of Sect. 2.3. In particular, we look at the tuning of the threshold used to declare a changepoint, the number and positions of the timepoints at which the projection direction of (6) is computed, and present our proposed approach as pseudo-code.

Choice of the threshold
We standardise each series of the input data X ∈ R p×n before applying either Inspect, our Bayesian approach, or sum/maxcusum. For this, we replace each series with its differencing series which we normalise using the median absolute deviation (Rousseeuw and Croux 1993). The standardisation of the variance allows us to tune the threshold of the four aforementioned changepoint methods independently of the input data by applying it to simulated standard Normal data of equal dimensions p ×n, but without a changepoint. The returned projection direction is employed to project the data, and the maximum cusum value of the univariate dataset after projection is recorded. This process is repeated for r ∈ N generated datasets. To minimise Monte Carlo effects, τ is chosen as the 95% quantile among the r maximum cusum statistics. In the simulations (Sect. 4), the choice r = 100 is used. Our approach differs slightly from the one used in Inspect (Wang and Samworth 2016), where the threshold is chosen as the maximal cusum statistic among r repetitions.

Choice of the projection timepoints
Naïvely, one would compute projection direction (6) at every possible changepoint location t, project the data and find a changepoint in the projected series. However, computing (6) for every t ∈ {1, . . . , n} is computationally expensive and not necessary. This is due to the fact that projection direction (6) does not change appreciably between two nearby timepoints. Therefore, in order to test for a change at a certain timepoint t, the projection direction from a nearby timepoint can be used.
One possible way of choosing the timepoints t in the grid, which we outline below, ensures that for any location of the true changepoint ζ (Sect. 2.1) and any series i, the cusum statistic at ζ is within a factor γ ∈ [0, 1] of the expectation of D (t) i for some t. As in Sect. 2.3, for a fixed time Assume we project at a timepoint before the true changepoint, i.e. t ≤ ζ . In this case, the expectation of the observations up to t equals the true mean of that segment, but the expectation of the observations after t does not, thus decreasing the value of D This shows that the expected difference in means differs by a factor n−ζ n−t or ζ t from the true difference in means θ i , respectively. We aim to bound these two quantities from below. This leads to the following strategy to choose a grid of timepoints for computing the projection direction.
Fix γ ∈ [0, 1]. Choose a set of timepoints T 1 ⊆ {1, . . . , n} such that for any ζ ∈ {1, . . . , n − 1}, there is a t ∈ T 1 satisfying t ≥ ζ and ζ t ≥ γ . Similarly, choose T 2 ⊆ {1, . . . , n} such that for any ζ ∈ {1, . . . , n − 1}, there is a t ∈ T 2 satisfying t ≤ ζ and n−ζ n−t ≥ γ . The grid T = T 1 ∪ T 2 is used for computing (6). This construction leads to the first timepoint being placed roughly at 1/γ in T 1 , the next one at roughly 1/γ 2 , etc. (analogously for T 2 ). In total, T will contain O(log γ n) timepoints. With this construction, the grid T becomes denser towards the beginning and the end of the time series. Towards the middle of the time series, fewer grid points are allocated. This is in line with the intuition that in order to detect a change in the middle of a sequence, where the error on the cusum statistic is less than at the borders, the projection direction need not be estimated as accurately as on the sides. Naturally, the larger the choice of γ , the more timepoints the grid will contain. The choice of the parameter γ is investigated in the supplementary material.

Pseudo-code of the algorithm
Algorithm 1: BayesProject input: X ∈ R p×n , γ > 0; 1 c ← 0 ∈ R n ; 2 Normalise X and calibrate τ (Sect. 3.1); 3 Choose K (Sect. 2.3) and T for the choice of γ (Sect. 3.2); 4 for t ∈ T do 5 Computev (t) according to (6); x i / 1 t + 1 n−t ; 9 end 10 t 0 ← arg max t∈T c t ; 11 returnv (t0) ; Pseudo-code of our approach (called BayesProject) is given in Algorithm 1. The algorithm works on a multivariate input X ∈ R p×n . First, X is normalised row-wise using the median absolute deviation (MAD) estimator of the variance and the threshold τ is calibrated (Sect. 3.1). The set T of projection timepoints is computed as in Sect. 3.2 for a prespecified parameter γ chosen by the user. The parameter K is chosen as in Sect. 2.3.
For any t ∈ T , an estimate,v (t) , is computed using (6). The resulting vector is normed, and the data are projected to a univariate series (v (t) ) X , for which the standard cusum statistic is then computed at t and stored in c t . BayesProject returns the projection directionv (t 0 ) for the timepoint t 0 ∈ T leading to the largest absolute cusum statistic at the point of projection.
Although the projection direction that BayesProject returns is selected from a finite set of timepoints T , BayesProject is not restricted to a detection of a changepoint at the loca-tions in T only. Instead, the projection directionv (t 0 ) is used to project the data onto a univariate time series in which a changepoint is declared anywhere within {1, . . . , n} if the maximum of the cusum transform applied to the projected series satisfies max T v (t 0 ) X > τ. The changepoint location is accordingly given by arg max T v (t 0 ) X . This approach is readily extendable to the multiple changepoint setting: we simply combine BayesProject with wild binary segmentation proposed by Fryzlewicz (2014) to detect several changepoints.
For any t ∈ T , computing the projection directionv (t) according to (6)

Simulation studies
The article presents five simulation scenarios to assess the dependence of the four algorithms under consideration (Inspect with soft-thresholding of Sect. 2.2, BayesProject of Sect. 3.3, as well as sum-cusum and max-cusum of Sect. 2.1) on the following: 1. the dependence on the series length n, 2. the number of series p, 3. the proportion d/ p of series exhibiting a change, where d < p is the number of series which change, 4. the size s of the change, 5. and the location of the changepoint.
The simulation scenarios are chosen to better understand the scaling behaviour of all four algorithms both as the series length and the number of series increase; moreover, we are interested in their ability to detect a change in challenging situations, for instance if only a small proportion of series has a change, the change size is small, or the change occurs at the beginning or end of the series. The supplementary material additionally presents experiments on univariate changepoint detection and the null case in which no changepoint is present.
In this section, all experiments use simulated standard Normal data with a single changepoint. Apart from the investigation of the dependence on the location of the changepoint, the changepoints are always drawn uniformly among {1, . . . , n} in each simulation. Each experiment is considered in three settings: in a sparse (d/ p = 0.05), moderate (d/ p = 0.2) and dense (d/ p = 0.5) scenario. The size of the change was set to s = 0.04. All results are averages over 1000 repetitions.
The quality of the estimated changepoint is evaluated with two measures: first, results show the proportion among all repetitions in which a changepoint was found (denoted as proportion of estimated changepoints in the figures). Second, as a measure of accuracy of the position of the estimated changepoint, the average distance to the true changepoint location is reported. Additionally, the methods are evaluated with respect to the accuracy of the estimated projection direction, measured as the L 2 difference between the (normalised) estimated and ideal projection directions. Lastly, all algorithms are compared with respect to their runtimes in seconds. Empirical runtimes are computed with the aim to verify the theoretical runtimes in practice. Other practical considerations The threshold for each method is computed as described in Sect. 3.1 for every new pair of parameters (n, p). All simulation results for BayesProject in this and the following sections are computed with the fixed choice K = 1 (corresponding to a scenario in which half of all series are assumed to exhibit a change and the size of each change is of the same size as the variance of the data, see Sect. 2.3). The grid of projection timepoints for BayesProject is computed as in Sect. 3.2 with γ = 0.6. This is an arbitrary choice. Simulations in the supplementary material indicate that the estimated projection direction (and thus the found changepoint by BayesProject) does not change appreciably as long as the grid of projection timepoints is not too sparse (corresponding to low values of γ ). That is, empirically, for roughly the range γ ∈ [0.5, 1], the accuracy of BayesProject does not considerably change, though the method becomes slower as γ increases (due to the increasing number of computed projection directions as well as the resulting projected series scanned for a changepoint). Dependence on p. Figure 1 shows the dependence of four methods (BayesProject, Inspect, sum-cusum and maxcusum) on the number of series p, while n = 10,000 is fixed. The rows display the proportion of estimated changes (first row), the accuracy of the estimated changepoint location measured via average distance to the true changepoint location (second row), the L 2 difference between ideal and estimated projection directions (third row) as well as the runtime (fourth row). Columns show a sparse (left), moderate (middle) and dense (right) scenario. Figure 1 shows that BayesProject and sum-cusum perform similarly in terms of proportion of estimated changes and average distance to the true changepoint location, followed by Inspect. Max-cusum is noticeably worse. Except for the sparse case, Inspect, BayesProject and sum-cusum do not show a strong dependence of the accuracy of the estimated changepoint on p. Both BayesProject and sum-cusum most accurately estimate the projection direction.
The empirical runtime analysis shows a superlinear runtime dependence on p for Inspect, while BayesProject as well as sum-cusum and max-cusum show a linear runtime Fig. 1 Dependence of BayesProject (black crosses), Inspect (red circles), sum-cusum (green squares) and max-cusum (blue triangles) on the number of series p while n = 10,000. Rows show proportion of estimated changes (first row), average distance to the true changepoint location (second row), L 2 distance between the estimated and ideal pro-jection directions (third row) and runtime (fourth row). Columns show sparse (left), moderate (middle) and dense (right) scenarios. Log-log plots to assess runtime scalings in the fourth row. Slope estimates for the moderate scenario (middle column) are 1.23 (Inspect), 1.00 (BayesProject), 1.03 (sum-cusum) and 1.03 (max-cusum). (Color figure online) dependence. Additional experiments for n = 100 and p > n included in the supplementary material further investigate the asymptotic runtime in p.
Qualitatively similar results are seen as we vary the number of series n, the dependence on the proportion d/ p of series with a change, the size of the change s and the changepoint location-see the supplementary material for details. These simulations show max-cusum consistently performing worse than the other methods. While overall performance of the other methods is similar, Inspect is most accurate for sparse scenarios, whereas sum-cusum and BayesProject are more accurate for the moderate and denser scenarios.
Overall, we find that BayesProject yields projections close to the oracle projection direction and detects changepoints with a consistently high accuracy across all scenarios considered while exhibiting a linear runtime in p. Choice of K . The qualitative dependence of the tuning parameter K is investigated in a detailed simulation study in the supplementary material, where parts of the simulation study reported in this section are repeated for the optimal choice of K (using oracle knowledge of the changepoint scenario), as well as an underestimate K /2 and an overestimate 2K . The supplementary material shows that the performance of BayesProject is largely identical for the three choices of K . For instance, Fig. 2 shows the dependence of BayesProject on the number of series p for the three choices of K in the same scenario as the one used for Fig. 1. As can be seen, the projection direction estimate is slightly better for K /2 or 2K in some scenarios, however at the expense of performing worse in the other scenarios. The supplementary material shows that the estimation accuracy can be slightly improved if prior knowledge on the proportion of series with a change, or the size of the change is available.

Multiple changepoint detection in simulated data
We now turn to consider the performance of various approaches for the multiple changepoint detection scenario. We replicate the simulation setting of (Wang and Samworth 2017, Sect. 5.3) which considers Normal distributed data of dimensions n = 2000 and p = 200, with changepoints at the locations ζ = (500, 1000, 1500) and κ 2 = 1. At the three changepoints, the signal changes in k = 40 series by varying amounts, but with an aggregated signal size for the three changes of ϑ (1) , ϑ (2) , ϑ (3) = (ϑ, 2ϑ, 3ϑ), where ϑ ∈ {0.4, 0.6}. Three scenarios are considered: (i) the complete overlap case, where the same k series change at each of the three changepoints; (ii) the half overlap case, where changes occur in the series i−1 2 k + 1, . . . , i+1 2 k for i ∈ {1, 2, 3}; and (iii) the no overlap case, where changes occur at disjoint sets of indices. As in Wang and Samworth (2017), all results provided in this section are based on 100 repetitions.
Additionally, we include the five algorithms described in Sect. 1 in our study. Those are the Double Cusum algorithm (Cho 2016), Sparsified Binary Segmentation (SBS) of Cho and Fryzlewicz (2015), as well as the approaches of Enikeeva and Harchaoui (2019), Yu and Chen (2020) and Grundy et al. (2020b). For SBS, the parameter for detecting a change was set to log( p/0.05). The method of Enikeeva and Harchaoui (2019) was run with significance level 0.05. The method of Yu and Chen (2020) used an exclusion boundary of s = n 0.1 as proposed by the authors, 100 bootstrap repetitions, and the threshold was computed as the 0.95 quantile of the bootstrap samples. The method of Grundy et al. (2020b) is provided in the R-package changepoint.geo on CRAN (Grundy et al. 2020a).
All algorithms are combined with binary segmentation in order to find multiple changepoints, apart from Grundy et al. (2020b) since the changepoint.geo package is already able to detect multiple changes. Before running BayesProject, its threshold is calibrated as described in Sect. 3.1. The tuning parameter K is set to K = 0.1. The grid of timepoints is chosen as in Sect. 3.2 with γ = 0.6. Table 1 shows simulation results. To compare the four algorithms with respect to their accuracy in multiple changepoint detection, the v-measure is employed (Rosenberg and Hirschberg 2007), a widely used measure in changepoint analysis (Eriksson and Olofsson 2019;Ludkin et al. 2018;Li and Munk 2016;Frick et al. 2014). The measure is normed to the interval [0, 1], with zero being the worst score and one indicating a perfect match. Within Table 1, the complete, half and no overlap scenarios are given as the first, second and third block of the table, with nine rows each for the two choices of ϑ determining the size of the change at each changepoint. The table shows frequency counts for the number of estimated changes (between zero and five or more changepoints), the average v-measure score, as well as the number of true positives and false positives (defined as being at most a distance 15 away from any true changepoint).
As shown in Table 1, BayesProject performs slightly worse than Inspect. Of the other approaches, the method of Enikeeva and Harchaoui (2019) performs best across all scenarios when assessed with the v-measure metric. The double cusum and SBS algorithms perform comparably when assessed with the v-measure metric, though SBS has a higher false-positive rate. The approaches of Yu and Chen (2020) and Grundy et al. (2020b) are competitive for ϑ = 0.6 but lose power for ϑ = 0.4. Since Grundy et al. (2020b) detects changes in both mean and variance, a loss in power compared to the other algorithms is expected.
We expect BayesProject to perform well in scenarios with uneven segment lengths, since the projections computed along the gridpoints introduced in Sect. 3.2 allow BayesPro-   ject to have comparatively less power for detecting changes in the middle of a sequence while having an advantage for detecting changes at the start and end. Table 2 repeats the previous experiment for the changepoints ζ = (200, 400, 1600) and ϑ (1) , ϑ (2) , ϑ (3) = (2.4, 1.8, 1.2). We observe that BayesProject performs very competitively and mildly outperforms both Inspect and Enikeeva and Harchaoui (2019) when assessed with the v-measure metric. However, Inspect and Enikeeva and Harchaoui (2019) incur a lower false-positive rate for the no overlap scenario. A scenario with two changepoints at the sides (n = 1000, p = 100 with a change in k = 30 series, changepoints ζ = (100, 800) with magnitudes ϑ (1) , ϑ (2) = (1.5, 1.0)) is presented in Table 3. Here, BayesProject performs better than or equal to all other algorithms throughout all three scenarios when assessed with the v-measure while having a competitive true-positive rate and a low false-positive rate.
We also look at the accuracy by means of density plots for the estimated changepoint locations. For improved readability of the density plots, we select the five best methods based on the previous analyses. Those are: BayesProject, Inspect, as well as the methods of Cho (2016), Enikeeva and Harchaoui (2019) and Yu and Chen (2020). Since the empirical densities have sharp peaks, we employ the (smoothed) log-concave maximum likelihood estimator of Rufibach and Dümbgen (2009) which is free of tuning parameters, thus not requiring the selection of a bandwidth which could bias the visualisation. An implementation is provided in the R-package logcondens on CRAN (Rufibach and Dümbgen 2016). Figure 3 shows density plots for the complete overlap scenario of Table 2. We observe that BayesProject detects the first changepoint with the highest accuracy, while the other methods are less accurate. BayesProject and Inspect are equally competitive for the second changepoint, while Inspect has the highest accuracy in detecting the third changepoint, followed by BayesProject and Enikeeva and Harchaoui (2019). Figure 4 shows similar density plots for the detection of the two changepoints in Table 3 in the no overlap scenario. We observe that for both changepoints, BayesProject achieves a high accuracy.
Density plots for all the scenarios displayed in Tables 1, 2 and 3 are included in the supplementary material and show a competitive performance of BayesProject in comparison with the other algorithms.

Discussion
This article introduces BayesProject, a Bayesian approach to estimate the projection direction for multivariate changepoint detection. The proposed approach addresses changepoint detection in scenarios with a large number of series p and provides a linear runtime in the number of series and a superlinear runtime in the series length. Simulations indicate that BayesProject is robust, yields projections close to the oracle projection direction and, moreover, that its accuracy in detecting changepoints is comparable to existing algorithms. While we have focussed just on the change in mean in Gaussian data, with one specific form of prior for the change in mean, the idea can be applied more widely to other models and prior distributions. However, the use of conjugate priors would be necessary for a computationally efficient method that allows for calculating the projection analytically.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.