Generalized multiple change-point detection in the structure of multivariate, possibly high-dimensional, data sequences

The extensive emergence of big data techniques has led to an increasing interest in the development of change-point detection algorithms that can perform well in a multivariate, possibly high-dimensional setting. In the current paper, we propose a new method for the consistent estimation of the number and location of multiple generalized change-points in multivariate, possibly high-dimensional, noisy data sequences. The number of change-points is allowed to increase with the sample size and the dimensionality of the given data sequence. Having a number of univariate signals, which constitute the unknown multivariate signal, our algorithm can deal with general structural changes; we focus on changes in the mean vector of a multivariate piecewise-constant signal, as well as changes in the linear trend of any of the univariate component signals. Our proposed algorithm, labeled Multivariate Isolate–Detect (MID) allows for consistent change-point detection in the presence of frequent changes of possibly small magnitudes in a computationally fast way.


Introduction
Change-point detection algorithms have been actively developed and investigated from the scientific community.Their ability to segment data into smaller, homogeneous parts have helped researchers and practitioners to develop flexible statistical models that can adapt in non-stationary environments.Due to the natural data heterogeneity in many real problems, such algorithms have been applied in a wide range of application areas, such as bioinformatics (Picard et al. (2011); Hocking et al. (2013)), cyber security (Siris and Papagalou (2004)), or finance (Lavielle and Teyssiere (2007); Schröder and Fryzlewicz (2013)).The advantages of detecting changes in the behaviour of the data fall into two main categories; interpretation and forecasting.Interpretation comes naturally since the detected changes are usually connected with life events that took place near the estimation time.Associating the changes with such real-life phenomena can lead to a better understanding and quantification of the effect that these events had on the behaviour of the stochastic process.With respect to forecasting, the role of the final segment is important because it allows for a more accurate prediction of the future values of the data sequence at hand.
Based on whether we have full knowledge of the data to be analysed, change-point detection is split into two main categories; offline detection, where the data are already obtained, and online detection, in which the observations arrive sequentially at present.With respect to the dimensionality of the data, change-point detection can be further separated into algorithms that act only on univariate data and to those that are suitable for change-point detection in multivariate data sequences.In this paper, we focus on multivariate, possibly high-dimensional, offline settings; our aim is to estimate the number and locations of certain structural changes in the behaviour of given multivariate data.The model is where X t ∈ R d×1 are the observed data and f t ∈ R d×1 is the d-dimensional deterministic signal with structural changes at certain points.The signals that we treat in the current manuscript are those that changes appear in the mean structure or in the vector of the first order derivatives.
The diagonal matrix Σ ∈ R d×d has diagonal elements denoted by σ 1 , . . ., σ d , while for any t ∈ {1, . . ., T }, the noise terms t ∈ R d×1 are random vectors with mean the zero vector and covariance the identity matrix.The elements σ j , j = 1, . . ., d of the diagonal matrix Σ might be unknown.In such cases, σ j are estimated using the Median Absolute Deviation method explained in Hampel (1974).The true number, N , of the change-points, as well as their locations r 1 , . . ., r N , are unknown and our aim is to estimate them; N is free to grow with the sample size T and the dimensionality d.
The initial purpose of change-point detection algorithms has been to detect a single change in the mean structure of a univariate signal under the setting of Gaussian noise, but much progress has since been made.Researchers have heavily focused on the detection of multiple change-points in the mean structure of a univariate data sequence.Towards this purpose, optimization-based methods have been developed, in which the estimated signal is chosen based on its fit to the data, penalized by a complexity rule.To solve the implied penalization problem, dynamic programming approaches, such as the Segment Neighborhood (SN) and Optimal Partitioning (OP) methods of Auger and Lawrence (1989) and Jackson et al. (2005), have been developed.Killick et al. (2012) and Rigaill (2015) introduce improvements over the classical OP and SN algorithms, respectively.In the context of regression problems, Frick et al. (2014) introduced the simultaneous multiscale change-point estimator (SMUCE) for change-point detection in exponential family regression.Apart from optimization based algorithms, a popular method in the literature is binary segmentation where changes are detected one at a time through an iterative binary splitting of the data.Recent variants of binary segmentation with improved performance are the Wild Binary Segmentation (WBS) of Fryzlewicz (2014) and its recently developed second version of Fryzlewicz (2020), the Narrowest-Over-Threshold (NOT) method of Baranowski et al. (2019), and the Seeded Binary Segmentation of Kovács et al. (2020).The Isolate-Detect (ID) algorithm has been developed in Anastasiou and Fryzlewicz (2022) to detect, one by one, structural changes in a data sequence.It is based on an isolation technique, which leads to very good accuracy on the estimated number and locations of the change-points, particularly in scenarios with many frequent change-points.Our proposed method is partly based on ID; therefore we elaborate on its important parts later.For a more thorough review of the literature on the detection of multiple change-points in the mean of univariate data sequences, see Cho and Kirch (2020) and Yu (2020).Apart from changes in the mean of a univariate data sequence, research has also been done for the detection of change-points under more complex scenarios, such as detection of changes in the slope for piecewise-linear models (Anastasiou and Fryzlewicz (2022); Baranowski et al. (2019); Fearnhead et al. (2019); Maeng and Fryzlewicz (2019); Tibshirani (2014)) changes in the variance (Inclán and Tiao (1994)), as well as for distributional changes under a non-parametric setting (Matteson and James (2014); Zou et al. (2014)).
Even though there is an extensive literature on change-point detection for univariate data sequences, the multivariate, possibly high-dimensional setting, which is the focus of this paper, has not been investigated in such degree.Working under the model in (1.1), Vert and Bleakley (2010) proposes a method for approximating the signal f t as the solution of a convex optimization problem.In order to achieve this, the problem is first reformulated to a group LASSO one and then the group least-angle regression (LARS) procedure explained in Yuan and Lin (2006) is employed.Another interesting approach with very good behaviour has been introduced in Wang and Samworth (2018).The algorithm, called inspect, estimates the number and locations of the change-points in the mean structure of f t as in (1.1).Firstly, inspect applies a cumulative sum (CUSUM) transformation to the original data matrix.Secondly, a projection direction of the transformed matrix is computed as its leading left singular vector, and, finally, a univariate change-point detection algorithm is applied to the projected series.It is among the many methods that employ CUSUM-type statistics for change-point detection in the multivariate setting.In general, methods that belong to this category, mainly either use CUSUM aggregations of the d component data sequences in order to test the obtained values against a threshold or to construct alternative test-statistics.For instance, Groen et al. (2013) examines the asymptotic behaviour of the maximum absolute and average CUSUM and gives finite-sample performance results.Focusing on testing for the existence of a change-point in the mean structure of the multivariate signal, Enikeeva and Harchaoui (2019) and Horváth and Hušková (2012) propose tests based on the 2 aggregation of the CUSUM statistics for each univariate component, while Jirak (2015) employs the ∞ aggregation of the aforementioned values.In Cho (2016), the Double CUSUM (DC) operator is introduced, which takes as input the ordered absolute CUSUM values of each individual component and performs a weighted 1 aggregation to construct the DC statistic which is then compared against a test criterion.Departing from the detection of changes in the mean structure of a multivariate signal, Cho and Fryzlewicz (2015) propose the Sparsified Binary Segmentation algorithm (SBS) for the detection of multiple change-points in the secondorder structure of a multivariate data sequence.SBS is based on a first, "sparsifying" step which is used to exclude individual component data sequences from an 1 aggregation; a pre-defined threshold is used for the exclusion.The recent work of Anastasiou et al. (2022) introduces Crosscovariance isolate detect (CCID), which, motivated from the necessity of estimating changes in time-varying functional connectivity networks, detects multiple change-points in the secondorder (cross-covariance or network) structure of multivariate, possibly high-dimensional time series.Ombao et al. (2005) investigates the application of smooth localized complex exponentials (SLEX) waveforms, to the detection of changes in spectral characteristics of EEG data.Lavielle and Teyssiere (2006) detects changes in the covariance structure of i.i.d.multivariate time series based on the minimization of a penalized Gaussian log likelihood, while Bücher et al. (2014) uses a test statistic based on sequential empirical copula processes to detect changes in the cross-covariance structure.For a survey of various offline change-point detection algorithms on multivariate time series see Truong et al. (2020).
In this paper, we propose a method called Multivariate Isolate-Detect (MID) for the consistent estimation of multiple change-points under the multivariate, possibly high-dimensional, structure of the model in (1.1).Our method builds on the foundations of the ID algorithm developed in Anastasiou and Fryzlewicz (2022); that is, we first isolate each true change-point within subintervals of the domain [1, . . ., T ] and then we proceed to detect them.Isolation enhances detection power, especially in frameworks with frequently occurring change-points.MID is explained in detail in Section 2; here, we only give a brief description of its important steps.The main idea is that for the observed data sequences x t,j t = 1, . . ., T, j = 1, . . ., d, and with λ T a positive constant, playing the role of an expansion step as in Anastasiou and Fryzlewicz (2022), our method first creates two ordered sets of K = T /λ T right-and left-expanding intervals.For i = 1, . . ., K, the i th right expanding interval is The algorithm first acts on the interval R 1 = [1, λ T ] by calculating, for every univariate component data sequence, the contrast function value for the Q possible candidates in this interval (details are given in Section 2).This process will return Q vectors y j , j = 1, . . ., Q of length d each; for example, the elements of y 1 ∈ R d will be the contrast function values related to the first change-point candidate in R 1 , for each of the d component data sequences, the elements of y 2 ∈ R d will be the relevant values for the second candidate in R 1 , and so on.The next step is to apply to each y j a mean-dominant norm L : R d → R. The definition of such norms can be found in Section 2 of Carlstein (1988) and examples include where y j,i ≥ 0, ∀i, j.Applying L(•) to each y j , will return a vector v of length Q.We identify bR 1 := argmax j {v j }.If v bR 1 exceeds a certain threshold, denoted by ζ T,d , then bR 1 is taken as a change-point.If not, then the process tests the next interval in S RL .Upon detection, the algorithm makes a new start from the end-point (respectively, start-point) of the right-(respectively, left-) expanding interval where the detection occurred.Upon correct choice of ζ T,d , MID ensures that we work on intervals with at most one change-point.
The rest of the paper is structured as follows.Section 2 describes our proposed methodology and its associated theory.Useful variants of our algorithm and the choice of important parameter values are explained in Section 3. In Section 4, we perform a thorough simulation study to compare our algorithm with state-of-the-art methods, while Section 5 illustrates the behaviour of MID on two examples of real data; the monthly percentage changes in the UK house price index over a period of twenty two years in twenty London Boroughs, and the daily number of new COVID-19 cases in the four constituent countries of the United Kingdom; England, Northern Ireland, Scotland, and Wales.We conclude the paper in Section 6 with some general remarks and reflections on our proposed algorithm.The proofs of Theorem 2.1 and Theorem 2.2 are given in the Appendix.

Methodology
We work under the model given in (1.1).The objective is to estimate both the number, N , and the locations r 1 , . . ., r N where the multivariate deterministic signal f t exhibits structural changes.We note that N can possibly grow with the sample size T and with the dimensionality d.In addition, a change-point does not necessarily appear in all univariate component signals.Before providing a full, step-by-step explanation of our algorithm, two simple examples are given to assist in ease of understanding.In Figure 1 r 3 = 165.To be more precise, We take σ 1 = 3, σ 2 = 1, and σ 3 = 2, while the random variables t,i follow the standard Gaussian distribution for i = 1, 2, 3.The component data sequences X t,1 and X t,2 share a common changepoint at t = 165, while they also have their own change-points at t = 27 and t = 73, respectively.There are no change-points in X t,3 .Let us denote by r 0 = 0, r N +1 = T .In this toy example, we take the expansion parameter λ T = 10, while ζ T,d is a well-chosen predefined threshold.The choice of the aforementioned parameters is discussed in detail in Section 3.2; special attention will be given to the dimensionality of the data sequence in order to make a robust threshold choice.Figure 2 shows the steps of MID until all change-points are detected.We will be referring to Phases 1, 2 and 3 involving five, six, and five intervals, respectively; these are clearly indicated in Figure 2.These phases are only related to this specific example of three change-points; in cases with a different number of change-points we would have a different number of such phases.At the beginning of the detection process, we have that s = 1 and e = T = 200.As already mentioned in Section 1, the proposed algorithm acts both sequentially and interchangeably on subintervals of the full domain; this being [1, . . . , 200] for this example.For a well-chosen threshold ζ T,d , then r 1 = 27 is the first change-point to be detected; this occurs in the interval [1,30] as shown in Phase 1 of Figure 2. We now briefly explain how the detection occurred for r 1 .Let A be the 30 × 3 matrix, with each column being the first 30 observations (since we are working in the interval [1,30]) of each of the three univariate data sequences.The next step is to compute the contrast function (in this specific case of piecewise-constant signals, the function is the absolute value of the widely used CUSUM statistic as given in (2.4)) values for each candidate point and for all three component data sequences.We end up with a matrix B ∈ R 29×3 , with B i,j being the value of the contrast function for the i th data point of the j th data sequence when we work in the interval [1,30]; the last point of the interval is not among the change-point candidates.Applying a mean-dominant norm to each row of B gives us a vector of length 29. Figure 3 (Detection 1), illustrates these values, when we employed the L 2 and the L ∞ mean-dominant norms as defined in (1.2).In Figure 3, we see that for both employed norms, t = 27 has the highest value, which exceeds the predefined threshold value obtained as in Section 3.2.Therefore, r1 = 27 is assigned as the estimated location for r 1 .After the detection, s is updated as the end-point of the (right-expanding) interval where the detection occurred; therefore s = 30, and MID is, in Phase 2, applied in the interval [30,200].Then, r 3 = 165 gets detected at the sixth step of Phase 2 in the interval [161,200].After this second detection, MID proceeds to Phase 3, where it is applied in the interval [30,161] and r 2 gets isolated (for the first time) and detected in the interval [30,80] as shown in Figures 2 and 3.In the end, MID is applied in the interval [80,161], where there will be no expanding interval that contains a point with an aggregated CUSUM value that surpasses the threshold ζ T,d ; therefore, the process will terminate after scanning all the data.
In Figure 4, we graphically provide an example of a three dimensional data sequence of length T = 200 with three change-points in the slope at locations r 1 = 53, r 2 = 100 and r 3 = 124.To be more precise, We take σ 1 = σ 2 = σ 3 = 7, while t,i follow the standard Gaussian distribution for i = 1, 2, 3.The first two component data sequences have two change-points each; for X t,1 at locations t = 53 and t = 124, while for X t,2 at locations t = 100 and t = 124.There are no change-points in X t,3 .
After giving two examples of structures that MID can be employed to, our method can now be described in a general framework.Our proposed algorithm is based on the same isolation technique as that of the univariate change-point detection method ID and therefore, extensive details of how this isolation is achieved are avoided and can be found in Section 3.1 of Anastasiou and Fryzlewicz (2022).As shown in Figure 2 for the specific example in Figure 1, our method is looking for change-points interchangeably in right-and left-expanding intervals.In each of them, MID acts in the same way: Let I be one of these intervals, with cardinality | I |, and A I ∈ R |I|×d is such that A I i,j = X i,j for i ∈ I and j ∈ {1, . . ., d}.The contrast function value is then separately applied to the elements of each of the columns of A I ; this will result in the matrix B I ∈ R J×d , whose element at position (i, j) corresponds to the contrast function value of the i th data point in I for the j th data sequence.The number J of rows of the matrix B I (with the contrast function values) depends on the type of changes that our algorithm aims to find; for example, it has already been discussed in the literature that in the case of changes in the mean structure J =| I | −1, while if we are looking for changes in the slope of a piecewise-linear signal, then The next step is to apply to each row of B I , a mean-dominant norm as those in (1.2) which will lead to a vector v ∈ R J .We identify bI = argmax t {v t } and v bI is tested against the

CUSUM values in detection intervals
Figure 3: CUSUM values in the relevant detection intervals explained in Figure 2. The vertical dashed line indicates the time point with the highest value in the corresponding interval, while the horizontal dashed line is the optimal threshold value.On the left column you can see the results when the L 2 aggregation method was used, while for the right column we employed the L ∞ -based aggregation approach.The algorithm stops when there are no more intervals to be tested.
(S2) Changes in the first derivative: For k = 1, . . ., N + 1, f t = µ 1,k + µ 2,k t for t = r k−1 + 1, . . ., r k , where µ 1,k and µ 2,k are vectors in R d .In addition, we require that for j = 1, . . ., N , the equality µ 1,j + µ 2,j r j = µ 1,j+1 + µ 2,j+1 r j is satisfied.Under this framework, the changepoints, r k , satisfy that f r k −1 + f r k +1 = 2f r k .Therefore, the univariate component signals f t,j , for j = 1, . . ., d are continuous and piecewise-linear.The aforementioned scenarios are only two specific illustration cases in which the proposed MID algorithm can be applied.Due to its change-point isolation step prior to detection, our algorithm can be applied in more complicated scenarios where each univariate signal could be, for example, piecewise-polynomial or piecewise-exponential.In Sections 2.2.1 and 2.2.2, we provide the main theorems for the consistency of our method in accurately estimating the true number and the location of the change-points in Scenarios (S1) and (S2), respectively.The theoretical results presented in this section are for the L ∞ norm being employed for the aggregation of the information from the component data sequences; in practice, our algorithm exhibits very good behaviour for other mean-dominant norms as well, something which is shown in Sections 4 and 5.

Scenario (S1)
As already discussed, the first step of the detection process depends on an appropriately chosen contrast function, which, for every component data sequence, is applied to each change-point candidate.In Scenario (S1) , the contrast function applied to the component data sequences X t,j , ∀j ∈ {1, . . ., d} is the absolute value of the widely used CUSUM statistic, with the latter being defined as where 1 ≤ s ≤ b < e ≤ T and n = e − s + 1.Before proceeding with the main theoretical result for the consistency of our method, allow us to introduce some more notation, as below. (2.5) For the consistency result with respect to the estimated number of change-points and their estimated locations obtained by MID, we work under the assumption (A1) as follows: (A1) The minimum distance, δ T , between two change-points and the minimum magnitude of jumps, f T as in (2.5), are connected by The number of change-points, N , is allowed to grow with the sample size T and the dimensionality d.Theorem 2.1 provides the main theoretical result for Scenario (S1).The proof is given in Appendix A.
Theorem 2.1.Let {X t } t=1,...,T follow model (1.1) with f t as in Scenario (S1) and t are i.i.d.random vectors from the d-variate standard normal distribution.Let N and r j , j = 1, . . ., N be the number and locations of the change-points, while N and rj , j = 1, . . ., N are their estimates sorted in increasing order.In addition, for [s j , e j ] being the interval where rj is obtained, we denote by q j := argmax k=1,...,d | X rj ,k s j ,e j |.Assuming that (A1) holds, then, there exist positive constants C 1 , C 2 , C 3 , and C 4 , which do not depend on T or d, such that for The lower bound for the probability in (2.6) does not depend on the dimensionality d and its order is 1 − O T −1 .Furthermore, the rate of convergence of the estimated change-point locations does not depend on the minimum distance between two change-points, δ T ; the jump magnitude, ∆ q j j , is the only quantity that affects the rate.We notice, though, that to be able to match the estimated change-point locations with the true ones, then δ T should be larger than the distance between the estimated and the true change-point locations.Therefore, based on (2.6), we deduce that δ T must be at least O log T d 1/4 .This, combined with Assumption (A1) that requires δ T (f T ) 2 to be of order at least O log T d 1/4 , means that f T could decrease with T in cases where δ T is of order higher than O log T d 1/4 .With respect to the threshold parameter, ζ T,d , the rate of its lower bound is O log T d 1/4 ; this will also be used in practice as the default rate.Therefore, we have that where C is a positive constant.More details on the choice of C are given in Section 3.2.

Scenario (S2)
We are under the scenario where for any j ∈ {1, . . ., d}, the underlying signal f t,j , t = 1, . . ., T has a continuous and piecewise-linear structure as in Figure 4.In this case, the contrast function applied to the component data sequences X t,j , ∀j ∈ {1, . . ., d} is where for n = e − s + 1 and we have that the contrast vector, φ b s,e (t), is equal to (2.8) For more details on how this vector is constructed for (S2), please see section B.2 in the online supplementary material of Baranowski et al. (2019).Proceeding now with the consistency result of MID applied to Scenario (S2), we make the following assumption.
(A2) The minimum distance, δ T , between two change-points and the minimum magnitude of changes, where C * is a large enough constant.
Theorem 2.2 provides the main theoretical result for Scenario (S2).The proof can be found in Appendix B.
Theorem 2.2.Let {X t } t=1,...,T follow model (1.1) with f t as in Scenario (S2) and t are i.i.d.random vectors from the d-variate standard normal distribution.Let N and r j , j = 1, . . ., N be the number and locations of the change-points, while N and rj , j = 1, . . ., N are their estimates sorted in increasing order.In addition, for [s j , e j ] denoting the interval where rj is obtained during MID, we denote by q j := argmax k=1,...,d C rj s j ,e j (X k ) .With ∆ q j :=| f r j −1,q + f r j +1,q − 2f r j ,q | and assuming that (A2) holds, then, there exist positive constants C 1 , C 2 , C 3 , and C 4 , which do not depend on T or d, such that for ) .
Similarly as in (S1), the upper bound for the probability in (2.9) does not depend on the dimensionality of the data sequence X t ; its rate of convergence is 1 − O T −1 .Furthermore, the rate of convergence of the estimated change-point locations depends only on the change magnitude, ∆ q j j , as defined in the statement of Theorem 2.2.The lower bound for the threshold, /4 ; this is the default rate that is used in practise in Sections 4 and 5. Therefore, where C * > 0.More details on the choice of the values for C * are given in Section 3.2.
3 Practicalities and variants

Mean-dominant norms
The proposed MID methodology is based on an aggregation step of the contrast function values obtained from each component data sequence using mean-dominant norms in the way these are defined in Section 2 of Carlstein (1988).Even though Theorems 2.1 and 2.2 cover the theoretical behaviour of MID under the L ∞ norm given in (1.2), similar results could be obtained when other norms are employed.In the remaining sections, the discussion on the choice of the parameter values as well as results on the practical performance of MID will be focused when our method is combined with the L ∞ or the L 2 mean-dominant norm, as defined in (1.2), for the aggregation step of the contrast function values.

Choice of parameter values
In order to choose the constants C in (2.6) and C * in (2.10), we ran a large scale simulation study involving data sequences {X t } t=1,...,T , for T = 700, 1400 following model (1.1),where f t = 0 while t are i.i.d.from the d-variate standard normal distribution.Specifically, for each d ∈ {1, . . ., 50} we generated 500 replicates and applied MID under scenarios (S1) and (S2) to each one of those replicates using various threshold constant values C and C * , respectively, in order to estimate the number of change-points.For each dimension, in order to control the Type I error rate, α, of falsely detecting change-points, we chose the default constant to be the one that its number of times that successfully did not detect any change-points was closer to (1 − α) * 500.For d > 50, we keep the threshold that gave the best results for the simulated 50-dimensional data sequence.Table 1 presents the results for α ∈ {0.05, 0.1} under Scenarios (S1) and (S2).From now on, the obtained values for C and C * will be referred to as the default constants.

Decision on the aggregation method
In Cho and Fryzlewicz (2015), Cho (2016), and Anastasiou et al. (2022), it has been explained that the L ∞ -based aggregation of the contrast functions for each component data sequence tends to exhibit a better behaviour (compared to the L 1 and L 2 -based aggregations) in scenarios where the true change-points appear only in a small number of the component data sequences.
In contrast, due to spuriously large contrast function values, the L ∞ approach could suffer in situations where the change-points are not sparse across the panel of the data sequences.This difference in the behaviour between the L 2 and the L ∞ norms examined in our paper is what has motivated us to introduce an introductory step in MID, where the sparsity in a given d-dimensional data sequence, X t , is first estimated.For r 1 , . . ., r N being the N true changepoints, allow us, for any j ∈ {1, . . ., N }, to define by A j ⊆ {1, . . ., d} the set of indices for the univariate component data sequences that contain the change-point r j .Then, for X t , the sparsity is given by (3.11) where | A j | is the cardinality of the set A j .It is straightforward that sp ∈ [0, 1].Our proposed hybrid approach aims to data adaptively decide on the aggregation method to be used and eliminates the necessity of choosing between the L 2 and L ∞ norms.The steps to achieve our purpose are as follows.
Step 1: We apply MID paired with the L ∞ aggregation rule in order to obtain r1 , . . ., rM .It has already been explained that the L ∞ norm is preferable and provides very good results in cases with sparse change-points, while it tends to overestimate the number of change-points (due to the contrast function taking spuriously large values) under the scenario of dense change-points.Therefore, M ≥ N .
Step 2: We will now estimate the sparsity in the given data as defined in (3.11).With r0 = 0 and rM+1 = T , we first collect the triplets (r m−1 + 1, rm , rm+1 ), ∀m ∈ {1, . . ., M }.After this, ∀i ∈ {1, . . ., d}, we calculate CS (i) (r m ) := C rm rm−1 +1,r m+1 (X i ), with C b s,e (X i ) being the relevant contrast function (based on whether we are under Scenario (S1) of Section 2.2.1 or Scenario (S2) of Section 2.2.2) value for the point b, when we are in the interval [s, e], for the univariate component data sequence X i .The d contrast function values for each rm are collected in the sets S m = CS (1) (r m ), . . ., CS (d) (r m ) , j = 1, . . ., M. (3.12) Step 3: For each m = 1, . . ., M , all the elements of S m are tested against the relevant threshold value, ζ T , for univariate change-point detection which is of the order O( √ log T ) in both scenarios of piecewise-constant and continuous piecewise-linear signals (representing scenarios (S1) and (S2), respectively, covered in this paper).In terms of the threshold constants, we employ those of Anastasiou and Fryzlewicz (2022).We denote by ŝp m := #{i: Step 4: For cases where ŝp ≤ 0.4, we accept the result from Step 1, where MID has been paired with the L ∞ norm, whereas if ŝp ≥ 0.6, then MID is paired with the L 2 norm.Extensive simulations have shown that there is no significant difference on the MID's practical performance with respect to accuracy (on both the estimated number of change-points and the estimated locations) when ŝp ∈ (0.4, 0.6).Therefore, MID could be paired with any of the aforementioned two norms and give very good results.For computational cost reasons, in such cases we accept the result of Step 1. From now on, we denote by MID opt to be the data-adaptive, sparsity-based MID version explained in this section, where an aggregation method is chosen based on the estimated sparsity of the change-points in the given data.

A permutation-based approach
MID is a threshold-based algorithm because at each step, the largest aggregated contrast function value is tested against a predefined threshold in order to decide whether there is a change-point at the corresponding location.In Section 3.2 we have explained how the threshold constant is carefully chosen to control the Type I error taking into account the dimensionality of the data.However, misspecification of the threshold can possibly lead to the misestimation of the number of change-points.To solve such issues, we propose a variant of MID based on permutation.
The idea of a "data-adaptive threshold" through permutations or bootstrapping is not new.In Cho (2016), a bootstrap procedure is proposed, which is motivated by the representation theory developed for the Generalised Dynamic Factor Model, in order to approximate the quantiles of the developed double CUSUM test statistic under the null hypothesis of no change-points; the obtained quantiles are used as a test criterion for detection.In Cabrieto et al. (2018), a permutation-based approach is used to test the presence of correlation changes in multivariate time series.Under a univariate framework, in Antoch and Hušková (2001) a permutation scheme is proposed for deriving critical values for test statistics based on functionals of the partial sums k i=1 (X i − Xn ), k = 1, . . ., n, where X i are the observed data and Xn their mean value.The proposed scheme consists of three steps: a) compute the test statistic using the original data, b) construct the permutation distribution by computing the relevant test statistic on permuted versions of the data, and c) reject the null hypothesis of no change-points if the test statistic lies in the tails of the distribution.
We propose a variant that combines the isolation technique of MID with an extension of the permutation procedure used in Antoch and Hušková (2001) to the multivariate framework.Although this permutation procedure tends to be computationally expensive, it has a straightforward implementation.As generally holds for permutation procedures, the test-statistic obtained from the original data is compared to those obtained when applying the same steps to several permuted versions of the data.For our proposed permutation scheme all steps remain the same as MID with the only difference being the way the algorithm chooses to accept or reject a change-point within each interval.To be more precise, suppose that, for a given data sequence {X t } t=1,...,T , the MID algorithm is at a step where it looks for a change-point in the interval I = [s * , e * ], where 1 ≤ s * < e * ≤ T .As described in Section 2.1, MID returns a vector v ∈ R J , where J is the amount of all change-point candidates.The elements of v correspond to the aggregated contrast function values for each candidate point in I.The next step is to store T Imax = max i∈{1,...,J} {v i } and repeat the following procedure a prespecified amount, denoted by K, of times: 1. Generate a random permutation from (s * , . . ., e * ).
2. Reorder the data according to the permutation.
3. Calculate and store the maximum aggregated contrast function value for each permutation.
The empirical distribution obtained from the maximum values is used to construct our test.More precisely, we identify bI = argmax t {v t } as a change-point if, for given α ∈ (0, 1), T Imax > q 1−α , where q 1−α is the 100(1 − α)% quantile of the empirical distribution.
The parameter α controls the probability of false detections.Small values of α will make it harder for the algorithm to reject the null hypothesis of no change-points, whereas large values can reduce the probability of a Type II error.For the simulations in Section 4, we take α = 0.01.Regarding the parameter K, the simulation results provided in Antoch and Hušková (2001)although for a univariate framework-suggest that the empirical quantiles get stabilised quickly.Therefore, for our simulations, we take K = 1000.

Simulations
In this section, we investigate the performance of our method in Scenarios (S1) and (S2) covered in Sections 2.2.1 and 2.2.2, respectively.Furthermore, in (S1) MID is compared with state-ofthe-art methods through a comprehensive simulation study.The competitors are the Double Cusum (DC) method of Cho (2016), the Sparsified Binary Segmentation (SBS) algorithm of Cho and Fryzlewicz (2015), and the INSPECT algorithm of Wang and Samworth (2018).DC and SBS are implemented in the hdbinseg R package, while for INSPECT we have used the InspectChangepoint R package.For DC, the parameters used were the ones that gave the best results in the simulation study carried out in the relevant paper (Cho (2016)), while SBS and INSPECT were called with their default arguments.Regarding our algorithm, we give results for the data-adaptive, sparsity-based MID version of Section 3.3, denoted by MID opt , as well as for the permutation-based variants in Section 3.4; these are denoted by MIDPERL 2 and MIDPERL ∞ for the L 2 and L ∞ norms, respectively.Simulation setup: We considered the settings where T = 1500, d ∈ {30, 100} and N ∈ {3, 20, 50}.With the definition of sparsity as in (3.11), we took sp ∈ {0.2, 0.5, 0.8}.In total, we tested the methods in 18 different setups covering a wide range of multivariate sequences.The change magnitude at the component data sequences (either in the mean or in the slope depending on whether we are under (S1) or (S2), respectively) follows the uniform distribution U(1, 2).Standard Gaussian noise was added to the signals.We ran 100 replications for each setup.The frequency distribution of N − N is provided, while the accuracy of the estimated locations is evaluated through the Adjusted Rand Index (ARI) of the estimated segmentation against the true one (Hubert and Arabie (1985)), and the scaled Hausdorff distance, where n s is the length of the largest segment.The average computational times, in seconds, are also provided.The results, for MID and the competing methods under Scenario (S1) are given in Tables 2-4.The method with the highest empirical frequency of N − N being equal to zero (or close to zero, depending on the example) and those within 5% off the highest are given in bold.For Scenario (S2), the results are presented in Table 5.As the tables show, MID performs extremely well in all setups in both (S1) and (S2).More specifically, for (S1) our method is 17 out of 18 times either the best method overall or within 5% off the best method with respect to the estimated number of change-points.In all cases, MID attains very high values for the ARI, and quite small (in most cases it actually attains the smallest value) ones for the scaled Hausdorff distance; such results justify that apart from being extremely accurate in estimating the correct number of change-points, MID is also very accurate regarding the estimated change-point locations.Regarding the permutation-based variants of MID, both MIDPERL 2 and MIDPERL ∞ show a very good behaviour in all different scenarios in terms of accuracy with respect to both the estimated number and the estimated change-point locations.Although these permutation-based variants do not require specification of the threshold, their computational cost is quite large (see Tables 2-4).In regards to the competing methods, INSPECT has a very accurate behaviour in all different scenarios.DC's performance is very good only in those cases with three change-points and it seems to heavily underestimate in cases with more, regularly occurring change-points.SBS does not have a good performance in the scenarios tested; it underestimates the number of change-points.
With respect to Scenario (S2) as in Section 2.2.2, the results in Table 5 exhibit MID's very good performance.In conclusion, taking into account its low computational time, we can deduce that our proposed method is reliable and quick in accurately detecting changepoints under various different (with respect to the number of change-points, their magnitude, the sparsity, the dimensionality of the data, and the structure of the changes) multivariate settings.R code and instructions to replicate the results in this section are available on https: //github.com/apapan08/Simulations-MID.

Real data examples 5.1 UK House Price Index
In this section, the performance of our method is studied on monthly percentage changes in the UK house price index for all property types from January 2000 to January 2022 in twenty London Boroughs.The data are available from http://landregistry.data.gov.uk/app/ukhpi and they were last accessed in May 2022.We have a multivariate data sequence X t = (X t,1 , . . ., X t,20 ) , with t = 1, . . ., 265.The boroughs used are : Barnet (Ba), Bexley (Be), Bromley (Br), Camden (Ca), Croydon (Cr), Ealing (Ea), Enfield (En), Greenwich (Gr), Hackney (Ha), Hammersmith and Fulham (HF), Harrow (Har), Islington (Is), Kensington and Chelsea (KC), Lambeth (La), Merton (Me), Newham (Ne), Richmond upon Thames (RuT), Sutton (Su), Tower Hamlets (TH), and Wandsworth (Wa).Similar data have been investigated in Baranowski et al. (2019), Fryzlewicz (2014), Fryzlewicz (2020), and Anastasiou and Fryzlewicz (2022); however under a univariate setting.Figure 5 indicates the results when MID opt of Section 3.3 was employed for the detection of changes under Scenario (S1) as explained in Section 2.2; with respect to the threshold constant, we used the optimal values as in Table 1 for α = 0.05.Our method identifies 27 change-points in the mean structure of the multivariate data sequence at hand.We have analysed the same data using the competing methods of Section 4. INSPECT detects 26 change-points at locations very similar to those detected by our method, DC detects one change-point at location 41, while SBS does not detect any change-points.We need to highlight, though, that in INSPECT, multiple change-points are estimated using a wild binary segmentation scheme, which, due to the randomness involved, does not necessarily detect the same change-points when it is employed more than once over the same data set.

The COVID-19 outbreak
The performance of our method is also investigated on data from the COVID-19 pandemic.In this case, we focus though on the detection of changes under Scenario (S2) as described in Section 2.2.The data under consideration consist of the daily number of new lab-confirmed COVID-19 cases in the four constituent countries of United Kingdom; England, Northern Ireland, Scotland, and Wales.The period under investigation is from 01/04/2020 until 01/04/2022.The data are available from https://coronavirus.data.gov.uk and they were last accessed on the 30 th of April 2022.Based on the description, in this example we have a multivariate data sequence X t = (X t,1 , . . ., X t,4 ) , with t = 1, . . ., 731.Due to the fact that the data are positive integer  numbers, we perform the Anscombe transform, a : N → R, with a(x) = 2 x + 3/8, to each X t,j ; this transform brings the distribution of the component data sequences closer to the Gaussian one with constant variance.
Our method has detected 29 change-points in the vector of the first partial derivatives (changes in the slope for the component univariate data sequences) which seem to capture the important movements in the data.

Conclusion
In this paper, the MID methodology has been proposed for multiple generalized change-point detection in multivariate, possibly high-dimensional, data sequences.The algorithm partly constitutes a generalization of the Isolate-Detect algorithm of Anastasiou and Fryzlewicz (2022) same; the difference lies in the way the method chooses to accept or reject a change-point within the interval under consideration, suppose this is denoted by I = [s * , e * ], where 1 ≤ s * < e * ≤ T .In Section 2.1, it has been explained in detail that MID will first return a vector v ∈ R J , where J is the amount of all change-point candidates.The elements of v correspond to the aggregated contrast function values for each candidate point in I.The next step is to store the maximum value of v as obtained from the original data and compare it to the appropriate quantiles of the empirical distribution of the values obtained when applying the same steps to several permuted versions of the data; see the exact steps in Section 3.4.The proposed permutation-based variant of MID, though computationally expensive, performs very well in terms of accuracy with respect to the estimated number and locations of the change-points; see the results in Section 4.
The choice of the mean-dominant norm to be employed in the aggregation step of MID has already been discussed in the current paper, more specifically in Section 3.3.Our aim has been to provide a method that in practice would require minimal parameter choice to the user; towards this purpose, a data-adaptive variant, named MID opt , of the method has been constructed.We first estimate the sparsity level in the given multivariate data; the steps are explained in detail in Section 3.3.Depending on the value of the estimated sparsity, MID opt estimates the change-points employing either the L ∞ or the L 2 mean-dominant norm as defined in (1.2).
Through simulated and real-life data examples presented in Sections 4 and 5, respectively, it has been shown that MID has very good performance in terms of accuracy and speed.Specifically, MID lies in the top 5% (in terms of the accurate estimation of the number and the location of the change-points) of the best methods when compared to the state-of-the-art competitors.In addition, MID opt is a very quick detection method which in a few seconds can analyse signals with length in the range of thousands and dimensionality in the range of hundreds; this is carried out automatically with minimal decision making from the user on the aggregation method to be employed.it is straightforward to see that pr (A T ) ≥ pr (A * T ) and therefore, it suffices to show that pr (( From the definition of our model in (1.1), we have that for any j ∈ {1, . . ., d}, Xb,j s,e − f b,j s,e = ˜ b,j s,e , and simple calculations lead to ˜ t,j s,e ∼ N (0, 1).Therefore, for where φ(x), x ∈ R is the probability density function of the standard normal distribution.We conclude that pr( for n = e − s + 1.With f j = (f 1,j , . . ., f T,j ) and for [s, e] being any interval that contains only one true change-point, namely r j , allow us for ease of presentation to introduce the notation .15)In (A.15) above, k ∈ {1, . . ., d}, while • 2 is the Euclidean norm.Due to t , t = 1, . . ., T following the d-variate standard normal distribution, then straightforward calculations lead to we will show that pr (B T ) ≥ 1 − 1/(12 √ πT ).Using (A.16) and the Bonferroni inequality, then, for Z ∼ N (0, 1), similar steps as in (A.14) Step 3: This is the main part of our proof, where we explain in detail how to get the result in (2.6).For ease of understanding, we split this step into two smaller parts.From now on, we assume that A * T (and therefore, also, A T ) and B T both hold.The constants we use are where C is as in condition (A1).
Step 3.1: For ease of presentation, we take λ T ≤ δ T /3; for more information on the general case of λ T ≤ δ T /m, for an m > 1, see Remark 1 in the online supplementary material of Anastasiou and Fryzlewicz (2022).Allow us now ∀j ∈ {1, . . ., N }, to define the intervals It is apparent that in order for I R j and I L j to have at least one point, then we actually implicitly require that δ T > 3, which is the case for sufficiently large T .Since the length of the intervals in (A.19) is equal to δ T /3 and λ T ≤ δ T /3, then MID ensures that for K = T /λ T and k, m ∈ {1, . . ., K}, there exists at least one c r k = kλ T and at least one c l m = T − mλ T + 1 that are in I R j and I L j , respectively, ∀j = 1, . . ., N .At the beginning of our algorithm, s = 1, e = T (see, for example, Figure 2) and depending on whether r 1 ≤ T − r N then r 1 or r N will get isolated in a right-or left-expanding interval, respectively.W.l.o.g., assume that r 1 ≤ T − r N and, as already mentioned, our algorithm (which can be seen as an extension of the ID methodology in Anastasiou and Fryzlewicz (2022)) naturally ensures that ∃k ∈ {1, . . ., K} such that c r k ∈ I R 1 .We highlight that there is no other change-point in [1, c r k ] apart from r 1 .With C b s,e and D b s,e as in (A.13), we will show that for b with ∆ q j defined in (2.5).For δ T as in (2.5), we notice that using assumption (A1), Therefore, there will be an interval of the form [1, c r k ], with c r k > r 1 , such that [1, c r k ] contains only r 1 and max 1≤b<c r In addition, for ease of presentation, allow us to denote by Xb 1 ,j 1,c r k * ; this is the univariate component data sequence where the contrast function value got maximised at location b 1 .We will find γ T > 0 such that for any b Proving (A.22) and using the definition of b 1 we can conclude that |b 1 − r 1 | (∆ q 1 1 ) 2 ≤ γ T .Now, since in our model, X t,j = f t,j + t,j , then (A.22) can be expressed as (A.23) W.l.o.g.assume that b * ≥ r 1 and a similar approach as below holds when b * < r 1 .Using Lemma 4 in the online supplementary material of Baranowski et al. (2019), gives for the left-hand side of the inequality in (A.23) that For the terms on the right-hand side of (A.23), since we are working under A * T as in (A.13), we obtain that , while from (A.17) and Lemma 4 from the online supplementary material of Baranowski et al. (2019), .
From (A.24) and since we deduce that (A.22) is implied by .26)and this is because if we assume that min This comes to a contradiction to  .22).Therefore, necessarily So far, for λ T ≤ δ T /3 we have proven that working under the assumption that A * T (which implies that A T also holds) and B T hold, there will be an interval For M.1: We will split the explanation into two cases with respect to the location of b 1 .
Case 1: b 1 < r 1 < c r k * .Using (A.27) and with , where . Following similar steps as in (A.21), we have that We will now show that , which is not true by the definition of b 2 .Having said this, we conclude that |b 2 − r 2 | (∆ q 2 2 ) 2 ≤ C 3 log T d 1/4 .Having detected r 2 , then our algorithm will proceed in the interval [s, e] = [c r k *

2
, T ] and all the change-points will get detected one by one since Step 3.2 will be applicable as long as there are undetected change-points in [s, e].Denoting by rj the estimation of r j as we did in the statement of the theorem and for [s * , e * ] being the interval where the isolation and detection of r j occurs (in the way that we explained in Steps 3.1 and 3.2) allow us to denote by q j := argmax m=1,2,...,d X rj ,m s * ,e * .Then, Steps 3.1 and 3.2 have shown that MID will detect all the change-points one by one and = 8 log T d After not detecting anything in all intervals of the above form, then the algorithm concludes that there are not any more change-points to be detected and stops.

B Proof of Theorem 2.2
Before proving the result, we introduce some useful notation with Proof.We will prove the more specific result pr N = N, max j=1,...,N which implies the result in (2.9) of the main paper.
Steps 1 and 2: Due to the fact that pr ÃT ≥ pr Ã * T , then arguments as those used in Steps 1 and 2 of the proof of Theorem 2.1 can be employed here as well in order to show that pr ÃT ≥ pr Ã * Step 3: This is the main part of our proof, where we explain in detail how to get the result in (B.33).From now on, we assume that Ã * T (therefore, ÃT as well) and BT both hold.The constants we use are where C * is as in assumption (A2) of the main paper.
Step 3.1: First, ∀j ∈ {1, . . ., N }, we define I R j and I L j as in (A.19).At the beginning of our algorithm, s = 1, e = T and depending on whether r 1 ≤ T − r N , then r 1 or r N will get isolated first in a right-or left-expanding interval, respectively.W.l.o.g., assume that r 1 ≤ T − r N .Because of the structure of the ID methodology developed in Anastasiou and Fryzlewicz (2022) and partly employed in our paper, it is ensured that ∃ where the third inequality is due to the result of Lemma 5 in the online supplement of Baranowski et al. (2019).Now, The result in (B.35), Assumption (A2) in Section 2.2.2 of the main paper, and (B.36) yield The result in (B.37) above proves that there will be an interval of the form Note that b 1 can not be an estimation of any other change-point as [1, c r k * ] includes only r 1 .Allow us, for ease of presentation, to denote by q 1 := argmax j=1,...,d C b 1 1,c r k * (X j ) and q 1 is basically the location index of the univariate component data sequence where the contrast function value got maximised at point b 1 .Apparently, it holds that Proving (B.38) and using the definition of b 1 , we can then conclude that |b 1 − r 1 | (∆ q 1 1 ) 2/3 ≤ γT .The inequality in (B.38) can be expressed as 39) W.l.o.g.assume that b * ≥ r 1 and a similar approach as below holds when b * < r 1 .We denote by 2 and for the terms in the right-hand side of (B.39), we get that , while from (B.31) and Lemma 7 in the online supplement of Baranowski et al. (2019),  1/3 / (∆ q 1 1 ) 2/3 − 1 then we obtain that For (M.1):The approach is the same as the one in Step 3.2 in the proof of Theorem 2.1 and will not be repeated here.
Similarly to the approach in Step 3.1, our method applied now to [c r k * , T ], will first detect r 2 or r N depending on whether r 2 − c r k * is smaller or larger than T − r N .If T − r N < r 2 − c r k * then r N will get isolated first and the procedure to show its detection is exactly the same as in Step 3.1 where we explained the detection of r 1 .Therefore, w.l.o.g. and also for the sake of showing (M.2) let us assume that r 2 − c r k * ≤ T − r N .For (M.2): Due to the isolation aspect of MID, it is certain that there exists a right expanding point c r k 2 , such that c r k 2 ∈ I R 2 , with I R j being defined in (A.19) of the main paper.We will show that r 2 gets detected in We will now show that |b 2 − r 2 | (∆ q 2 2 ) 2/3 ≤ C 3 (log T ) 1/3 .Following the same process as in Step , T ] and all the change-points will get detected one by one since Step 3.2 will be applicable as long as there are previously undetected change-points in [s, e].Denoting by rj the estimation of r j as we did in the statement of the theorem and for [s * , e * ] being the interval where the isolation and detection of r j occurs (in the way that we explained in Steps 3.1 and 3.2) allow us to denote by , ∀j ∈ {1, . . ., N } .
Step 4: The arguments given in Steps 1-3 hold in Ã * T ∩ BT .At the beginning of the algorithm, s = 1, e = T and for N ≥ 1, there exist k 1 ∈ {1, . . ., K} such that s k 1 = s, e k 1 ∈ I R 1 ) 2/3 ≤ C 3 log T d 1/4 1/3 .After this, the algorithm continues in [c r k * , T ] and keeps detecting all the change-points as explained in Step 3. It is important to note that there will not be any double detection issues because naturally, at each step of the algorithm, the new interval [s, e]  After not detecting anything in all intervals of the above form, then the algorithm concludes that there are not any change-points in [s, e] and stops.

Figure 1 :
Figure 1: An example of a three dimensional data sequence that undergoes three changes in its mean structure at locations r 1 = 27, r 2 = 73 and r 3 = 165.

Figure 2 :
Figure 2: An example with three change-points in the mean structure; r 1 = 27, r 2 = 73 and r 3 = 165.The dashed line is the interval in which the detection took place in each phase.

Figure 4 :
Figure 4: An example of a three dimensional data sequence with piecewise-linear structure, that undergoes three changes in its first derivative at locations r 1 = 53, r 2 = 100 and r 3 = 124.

Figure 5 :
Figure 5: The monthly percentage changes in the UK house price index for the twenty London boroughs under consideration.The estimated change-point locations when MID opt was employed can be seen with red, vertical lines.

Figure 6 :
Figure6: The daily number of COVID-19 cases for England, Northern Ireland, Scotland, and Wales.The estimated change-point locations in the trend of each component data sequence when our proposed MID opt method was employed can be seen with red, vertical lines.
Vert, J.-P. and K. Bleakley (2010).Fast detection of multiple change-points shared by many signals using group lars.In NIPS, pp.2343-2351.Wang, T. and R. J. Samworth (2018).High-dimensional changepoint estimation via sparse projection.Journal of the Royal Statistical Society: Series B 80, 57-83.Yu, Y. (2020).A review on minimax rates in change point detection and localisation.arXiv preprint arXiv:2011.01857 .Yuan, M. and Y. Lin (2006).Model selection and estimation in regression with grouped variables.Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (1), 49-67.Zou, C., G. Yin, L. Feng, and Z. Wang (2014).Nonparametric maximum likelihood approach to multiple change-point problems.The Annals of Statistics 42, 970-1002.A Proof of Theorem 2.1 With Xb,j s,e as in (2.4), we denote, for s ≤ b < e, by f b,j s,e = e−b n(b−s+1) b t=s f t,j − b−s+1 n(e−b) e t=b+1 f t,j the value of the CUSUM statistic for the j th underlying component signal.Furthermore,

Step 2 :
Let us denote by ψ b s,e = (ψ b s,e (1), . . ., ψ b s,e (T )) the contrast vector, where s+1) , t = s, . . ., b, − b−s+1 n(e−b) , t = b + 1, . . ., e, ,d .Let us, for k * ∈ {1, . . ., K}, to denote by c r k * ≤ c r k the first right-expanding point where this happens and let b 1 an estimation of r 1 that satisfies (A.27).Step 3.2: After detecting the first change-point, MID follows the same process as in Step 3.1 but now in the set [c r k * , T ], which contains the change-points r 2 , . . ., r N .We do not check for possible change-points the interval [b 1 + 1, c r k * ).However, this does not create any issues because:M.1There is no change-point in [b 1 + 1, c r k * ), apart from maybe the already detected r 1 ; M.2 c r k * is at a location which allows for detection of r 2 .
then there is no other change-point in [b 1 + 1, c r k * ) apart from r 1 .Case 2: r 1 ≤ b 1 < c r k * .By construction c r k * − r 1 < 2δ T /3, which means that apart from r 1 there is no other change-point in [r 1 , c r k * ).With r 1 ≤ b 1 , then [b 1 + 1, c r k * ) does not have any change-point.Cases 1 and 2 above show that no matter the location of b 1 , there is no change-point in [b 1 + 1, c r k * ) other than possibly the previously detected r 1 .Similarly to the approach in Step 3.1, our method applied now in [s, e] = [c r k * , T ], will first isolate, and then detect, r 2 or r N depending on whether r 2 − c r k * is smaller or larger than T − r N .If T − r N < r 2 − c r k * then r N will get isolated first in a left-expanding interval and the procedure to show its detection is exactly the same as for the detection of r 1 in Step 3.1.Therefore, for the sake of showing M.2 let us assume that r 2 − c r k * ≤ T − r N .For M.2: By construction, there exists a right expanding point c r k 2 ∈ I R 2 , with I R j defined in (A.19).We will show that r 2 gets detected in [c r k * , c r k * 2 ], for k * 2 ≤ k 2 and its detection is b

2 − 2 . 2 −
Following exactly the same process as in Step 3.1 and assuming now w.l.o.g. that b 2 < r 2 , we have that for b * ∈ c r k * , . . .min |b * − r 2 | , c r k * r 2 > C 3 log T / ∆ f 2 In the same way as in Step 3.1 and by contradiction we can show that min c r k * r 2 , r 2 − c r k * + 1 > C 3 log T / ∆ f

Step 4 :
The arguments given in Steps 1-3 hold in A * T ∩ B T .At the beginning of the algorithm, s = 1, e = T and for N ≥ 1, there exist k 1 ∈ {1, . . ., K} such thats k 1 = s, e k 1 ∈ I R 1 and k 2 ∈ {1, . . ., K} such that s k 2 ∈ I LN , e k 2 = e.As in our previous steps, w.l.o.g.assume that r 1 ≤ T − r N and r 1 gets isolated and detected first in an interval [s, c r k * ], where c r k * is less than or equal to e k 1 .Then, r1 = argmax s≤t<c r k * C t s,c r k * is the estimated location for r 1 and|r 1 − r1 | (∆ q 1 1 ) 2 ≤ C 3 log T d 1/4.After this, the method continues in [c r k * , T ] and keeps detecting all the change-points as explained in Step 3.There will not be any double detection issues because naturally, at each step of the algorithm, the new interval[s, e]  does not include any previously detected change-points.Once all the change-points have been detected one by one, then [s, e] will contain no other change-points.Our method will of course keep checking for possible change-points in right-and left-expanding intervals denoted by [s * , e * ].However, MID will not detect anything in [s * , e * ] because ∀b ∈ [s * , e * ), C b s * ,e * ≤ D b s * ,e * + 8 log T d 1 4 (•)  is as in (2.7) of the main paper.For ease of presentation, we define the sets 1 <s≤r j r j <e≤r j+1 s≤b<e Ãb s,e (k, r j ) ≤ 8 log T d d .Let us, for k * ∈ {1, . . ., K}, denote by c r k * ≤ c r k the first right-expanding point where this happens and let b because if we assume thatmin {r 1 − 1, c r k * − r 1 } ≤ 2 1/3 C 3 log T d 1 4

=,
C 1 log T d 1 4 ≤ ζ T,d ,(B.43)where the third inequality is a direct application of Lemma 5 in the online supplement ofBaranowski et al. (2019).The result in (B.43) comes to a contradiction to G b 1 1,c r k * > ζ T,d which has already been proven in (B.37).Therefore, (B.42) holds and for T d 1/4 being sufficiently large,min {r 1 − 1, c r k * − r 1 } > 2 1/3 C 3 which implies (B.38).Therefore, necessarily,|b 1 − r 1 | (∆ q 1 1 ) 2/3 ≤ C 3 log T dfar, we have proven that working under the sets Ã * T and BT , there will be an interval of the form [1, c r k * ], with G b 1 1,c r k * > ζ T,d , where b 1 = argmax 1≤t<c r * G t 1,c r k * is an estimation of r 1 that satisfies (B.45).Step 3.2: After detecting the first change-point, MID follows the same process as in Step 3.1 in the set [c r k * , T ], which contains r 2 , . . ., r N .This means that we do not check for possible change-points in the interval [b 1 + 1, c r k * ) and we need to prove that: (M.1)There is no other change-point in [b 1 + 1, c r k * ), apart from possibly the already detected r 1 ; (M.2) c r k * is at a location which allows for detection of r 2 .

1
and k 2 ∈ {1, . . ., K} such that s k 2 ∈ I L N , e k 2 = e, where I R j and I L j are defined in (A.19) of the main paper.As in our previous steps, w.l.o.g.assume that r 1 ≤ T − r N + 1, meaning that r 1 gets isolated and detected first in an interval [s, c r k * ], where c r k * ≤ e k 1 .Then, r1 = argmax s≤t<c r k * G t s,c r k * is the estimated location for r 1 and |r 1 − r1 | (∆ q 1 does not include any previously detected change-points.Once all the change-points have been detected one by one, then [s, e] will have no other change-points in it.Our method will keep interchangeably checking for possible change-points in right-and left-expanding intervals denoted by [s * , e * ].MID will not detect anything in [s * , e * ] since ∀b ∈ [s * , e * ), using (B.31) we have that G b s * ,e * ≤ H b s * ,e * + 8 log T d 1 4 = 8 log T d 1/4 < C 1 log T d 1/4 ≤ ζ T,d .

Table 1 :
The optimal values for the threshold constants, C and C * , which control the Type I error rate α under the Scenarios (S1) and (S2), respectively, for d = 1, . . ., 50.Results are presented for the L 2 and the L ∞ norms

Table 2 :
Distribution of N − N over 100 simulated multivariate data sequences with 3 changepoints under Scenario (S1) of Section 2.2.1.The average ARI, d H , and computational times are also given

Table 3 :
Distribution of N − N over 100 simulated multivariate data sequences with 20 changepoints under Scenario (S1) of Section 2.2.1.The average ARI, d H , and computational times are also given

Table 4 :
Distribution of N − N over 100 simulated multivariate data sequences with 50 changepoints under Scenario (S1) of Section 2.2.1.The average ARI, d H , and computational times are also given

Table 5 :
Distribution of N − N over 100 simulated multivariate data sequences under Scenario (S2) of Section 2.2.2.We present the results of MID for different levels of sparsity and dimension for the data sequence.The number of change-point is equal to 3, 20, or 50.The average ARI, d H , and computational times are also given

after December 1999 Monthly percentage changes
T,d , which has already been proven in (A.21).Therefore, (A.26) holds and (A.25) is restricted to |b because in the case of the continuous piecewise-linear signals of Scenario (S2) we necessarily have that δ T ≥ 2; otherwise change-point identifiability would not be possible.In addition, since c r k