In this section, we set up the high-dimensional changepoint problem for our scenario. We define our new method, GeomCP, and discuss how changes in high-dimensional time series manifest themselves in the mapped time series. We then suggest an appropriate univariate changepoint detection method for detecting changes in the mapped time series—although practically others could be used.
Before proceeding, we define some notation used throughout the paper. We define the \(\mathbb {1}_p\) vector as a p-dimensional vector where each entry is 1 and the number of dimensions, p, is inferred from context. For a vector, \(\varvec{y}=(y_1,\ldots ,y_p)^T\), we define the \(l_q\)-norm as \(\left\Vert \varvec{y}\right\Vert _q:=\left( \sum \nolimits _{j=1}^p|y_j|^q\right) ^\frac{1}{q}\) for \(q\in [1,\infty )\). We define \(\left\langle \cdot ,\cdot \right\rangle \) as the standard scalar product such that for vectors \(\varvec{x}\) and \(\varvec{y}\) we have \(\left\langle \varvec{x},\varvec{y}\right\rangle =\sum \nolimits _{j=1}^px_jy_j\). Finally, the terms variables, series and dimensions shall be used interchangeably to indicate the multivariate nature of the problem.
Problem setup
We study the time series model where \(\varvec{Y_1},\ldots ,\varvec{Y_n}\) are independent, p-dimensional time vectors that follow a multivariate Normal distribution where
$$\begin{aligned} \varvec{Y_i}\sim N_p(\varvec{\mu }_i,\varvec{\sigma }^2_i\varvec{I}_p),\;\;\;\;\;1\le i\le n\;. \end{aligned}$$
We assume there are an unknown number of changepoints, m, which occur at locations \(\tau _{1:m}=\left( \tau _1,\ldots ,\tau _m\right) \). These changepoints split the data into \(m+1\) segments, indexed k, that contain piecewise constant mean and variance vectors, \(\varvec{\mu }_k\) and \(\varvec{\sigma }_k^2\). Note we assume a diagonal covariance matrix, so the covariance matrix can be described by the variance vector and the identity matrix. We define \(\tau _0=0\) and \(\tau _{m+1}=n\) and assume the changepoints are ordered so, \(\tau _0=0<\tau _1<\ldots<\tau _m<\tau _{m+1}=n\).
The following section introduces the geometric intuition and mappings used within GeomCP. These mappings reduce the dimension of the problem to make the problem computationally feasible as n and p grow large.
Geometric mapping
When analyzing multivariate time series from a geometric viewpoint, we seek to exploit relevant geometric structures defined in the multi-dimensional space. Here, we aim to detect changepoints in the mean and variance vectors of multivariate Normal random variables; therefore, we wish to utilize geometric properties that capture these changes.
A change in the mean vector of our data generating process will cause a location shift of the data points in the multi-dimensional space. Consider a distance between each data point and some fixed reference point, if the data points are shifted in the multi-dimensional space, then their distance to the reference point would be expected to change. Hence, we can detect when the mean vector of the data generating process changes by observing a change in the distances. For a change in distance not to occur after an underlying mean change, the new mean vector must remain exactly on the same \((p-1)\)-sphere (centered in the reference point) that the old mean vector lay on. Given that the computation of the mean vector is a linear operator on the multivariate time series, the requirement to lie on the same sphere (a quadric in \({\mathbb {R}}^p\)) is highly non-generic from a geometric prospective. As a result, these scenarios are rare especially in high dimensions.
A change in the covariance of our data generating process will cause a change in the shape of the data points. More specifically in our setup, a change in the variance would cause the shape of the data points to expand or contract. Consider the angle between each data vector and a reference vector; as the shape of the data points expands (contracts), the angles will become more (less) varied. Hence, we can detect changes in the variance of the data generating process by detecting changes in the angles.
By using distances and angles, we can map a p-dimensional time series to two dimensions. To calculate these mappings, we need a pre-specified reference vector to calculate a distance and angle from. Naturally, one may think to use the mean of the data points. However, this requires a rolling window to estimate the mean of data points prior to the point being mapped. Not only does this introduce tuning parameters, such as the size of the rolling window, but will result in spikes in the distance and angle measures at changepoints. To detect changepoints, we would need a threshold for these spikes and calculating such a threshold is a non-trivial task; hence, we seek an alternative.
We propose setting the reference vector to be a fixed vector, \(\varvec{y_0}\). We then translate all the points based upon this fixed reference vector,
$$\begin{aligned}&y'_{i,j}=y_{i,j}-(\min \limits _iy_{i,j}-y_{0,j}),\nonumber \\&\quad i\in [1,\ldots ,n]\;,\;j\in [1,\ldots ,p]\;. \end{aligned}$$
(2.1)
This results in a data-driven reference vector. We choose to set \(\varvec{y_0}=\mathbb {1}\) as this bounds the angle measure between 0 and \(\pi /4\), meaning we do not get vectors close to the origin facing in opposite directions causing non-standard behavior within a segment. Moreover, having a nonzero element in every entry of \(\varvec{y_0}\) ensures changes in the individual series will manifest in the angle measure. Note due to the translation in (2.1), the choice of \(\varvec{y_0}\) does not affect the distance measure. Throughout we assume the reference vector is set as \(\varvec{y_0}=\mathbb {1}\).
For data points in the same segment, we would expect their distances and angles to the reference vector to have the same distribution. When a mean (variance) change occurs in the data, this leads to a shift (spread) in the data; hence, the distances (angles) will change. Therefore, by detecting changes in the distances and angles, using an appropriate univariate changepoint method, we recover changepoints in the p-dimensional series.
We define our distance and angle measures based upon the standard scalar product. To obtain our distance measure, \(d_i\), we perform a mapping, \(\delta :{\mathbb {R}}^p\rightarrow {\mathbb {R}}^1_{>0}\),
$$\begin{aligned} d_i=\delta (\varvec{y_i})=\sqrt{\left\langle (\varvec{y'_i}-\mathbb {1}),(\varvec{y'_i}-\mathbb {1})\right\rangle }\;, \end{aligned}$$
(2.2)
which is equivalent to \(\left\Vert \varvec{y'_i}-\mathbb {1}_p\right\Vert _2\).
To obtain our angle measure, \(a_i\), we perform a mapping \(\alpha :{\mathbb {R}}^p\rightarrow [0,\frac{\pi }{4}]\),
$$\begin{aligned} a_i=\alpha (\varvec{y_i})=\cos ^{-1}\left( \frac{\langle \varvec{y_i}^{\prime },\mathbb {1}\rangle }{\sqrt{\langle \varvec{y_i}^{\prime },\varvec{y'_i}\rangle }\sqrt{\langle \mathbb {1},\mathbb {1}\rangle }}\right) \;, \end{aligned}$$
(2.3)
which is the principal angle between \(\varvec{y'_i}\) and \(\mathbb {1}\).
By using the standard scalar product, we are incorporating information from each series in the distance and angle measures. As such, we would expect GeomCP to perform well in scenarios where a dense set of the series change at each changepoint. This idea will be explored further and verified in Sect. 3.
Analyzing mapped time series
Understanding the distributional form of the distance and angle mappings will aid in the choice of univariate changepoint methods. Under our problem setup, Theorem 1 shows that the distance measure, asymptotically in p, follows a Normal distribution.
Theorem 1
Suppose we have independent random variables, \(Y_i\sim N(\mu _i,\sigma _i^2)\). Let \(X=\sqrt{\sum \nolimits _{i=1}^pY_i^2}\), then as \(p\rightarrow \infty \),
$$\begin{aligned} \frac{X-\sqrt{\sum \nolimits _{i=1}^p(\mu _i^2+\sigma _i^2)}}{\sqrt{\frac{2\sum \nolimits _{i=1}^p\left( \mu _i\sigma _i\right) ^2+\sum \nolimits _{i=1}^p\sigma _i^4+2\rho \sqrt{2\sum \nolimits _{i=1}^p\sum \nolimits _{j=1}^p\mu _i^2\sigma _i^2\sigma ^4_j}}{2\sum \nolimits _{i=1}^p(\mu _i^2+\sigma _i^2)}}}\xrightarrow {{\mathcal {D}}}N(0,1)\;, \end{aligned}$$
where \(\rho \) is an unknown correlation parameter (see proof).
Proof
See the Supplementary Material. \(\square \)
Theorem 1 shows that, asymptotically in p, the distance between each time vector and a pre-specified fixed vector follows a Normal distribution. Hence, for piecewise constant time vectors, the resulting distance measure will follow a piecewise constant Normal distribution. It is common in the literature to assume that angles also follow a Normal distribution, as in Fearnhead et al. (2018). We found by simulation, for large enough p, the angle measure defined in (2.3) is well approximated by a Normal distribution with piecewise constant mean and variance.
While any theoretically valid univariate method could be used to detect changepoints in the mapped series, we use the PELT algorithm of Killick et al. (2012) as this is an exact and computationally efficient search. For \(n\rightarrow \infty \), PELT is consistent in detecting the number and location of changes in mean and variance (Tickle et al. 2019; Fisch et al. 2018); hence, using Theorem 1, we gain consistency of our distance measure as \(p\rightarrow \infty \) also. When the Normal approximation of the distance and angle measures holds, we use the Normal likelihood as our test statistic within PELT and allow for changes in mean and variance. If p is small, we may not want to make the Normal assumption. In this case, we recommend using a nonparametric test statistic, such as the empirical distribution from Zou et al. (2014) (where consistency has also been shown) as embedded within PELT in Haynes et al. (2017b).
GeomCP algorithm
Algorithm 1 details the pseudo-code for GeomCP. As changepoints can manifest in both the distance and angle measure, we post-process the two sets of changepoints to obtain the final set of changes. We introduce a threshold, \(\xi \), and say that a changepoint in the distance measure, \({\hat{\tau }}^{(d)}\), and a changepoint in the angle measure, \({\hat{\tau }}^{(a)}\), are deemed the same if \(\left| {\hat{\tau }}^{(d)}-{\hat{\tau }}^{(a)}\right| \le \xi \). If we determine two changepoints to be the same, we set the changepoint location to be the one given by the angle measure as Sect. 3.2 demonstrates, this results in more accurate changepoint locations. The choice of \(\xi \) should be set based upon the minimum distance expected between changepoints. Alternatively, \(\xi \) could be set to zero and then an alternative post-processing step would be required to determine whether similar changepoint estimates correspond to the same change.
One of the major downfalls of many multivariate changepoint methods is they are computationally infeasible for large n and p. Within GeomCP, the computational cost to calculate both the distance and angle measures in (2.2) and (2.3) is \({\mathcal {O}}(np)\). If we implement the PELT algorithm for our univariate changepoint detection, this has expected computational cost \({\mathcal {O}}(n)\) under certain conditions. The main condition requires the number of changepoints to increase linearly with the number of time points, and further details are given in Killick et al. (2012). If these conditions are not satisfied, PELT has an at worst computational cost of \({\mathcal {O}}(n^2)\). Hence, the expected computational cost of GeomCP is \({\mathcal {O}}(np+n)={\mathcal {O}}(np)\) (under the conditions in Killick et al. (2012)) and has at worst computational cost \({\mathcal {O}}\left( np+n^2)\right) ={\mathcal {O}}\left( n(p+n)\right) \).
Non-Normal and dependent data
The current problem setup assumes multivariate Normal distributed data with a diagonal covariance matrix. These assumptions are made to facilitate our theoretical analysis and result in the Normality of the mapped series. If these assumptions are broken, the geometric intuition described in Sect. 2.2 still holds, but we can say less about the theoretical properties of the mapped series.
Firstly, if we allow for an arbitrary covariance matrix, this describes the shape and spread of the data points. Suppose our data undergoes a change from \(X_{\text {pre}}\sim N(\varvec{0},\varSigma )\) to \(X_{\text {post}}\sim (\varvec{0},\varvec{\sigma }\varSigma )\) this will cause the data points to spread out in the directions of the principal components. Hence, we would still expect the angles between the time vectors and the reference vector to change, revealing the change in covariance. We investigate this further in Sect. 3.5. In fact, a Normal distributed data set with a known covariance matrix could be transformed into a Normal distributed data set with a diagonal covariance matrix (satisfying our initial problem setup) by an orthogonal transformation that aligns the axes with the principal components. Such a transformation would preserve the distances and angles by definition but requires knowledge of the true covariance structure.
Alternatively, we could consider other inner products in our distance and angle mappings defined in (2.2) and (2.3); here the geometric motivation of the method would remain valid. In this case, for an underlying mean change to occur without the distance measure changing, the new mean vector must remain exactly on the more general \((p-1)\)-quadric in \({\mathbb {R}}^p\). This is still a highly non-generic requirement from a geometric prospective. In particular, we could use scalar products directly derived from the covariance matrix, such as the Mahalanobis Distance (Mahalanobis 1936). In such cases, the direct relation between angles and the correlation coefficients is well known (Wickens 1995). However, such inner products require an estimate of the covariance in each segment, which is non-trivial and therefore left as future work.
If we allow the data to be distributed from a non-Normal distribution, then we would expect changes to the first and second moment of these distributions to still manifest in the distance and angle mappings. However, being able to understand the distribution of the mapped series would be more challenging. In practice, the empirical cost function could be used within PELT (Haynes et al. 2017b), yet this would lead to less power in the detection of changes in the univariate series.
Finally, if we allowed temporal dependence between the time points, this would lead to temporal dependence in the mapped series and an appropriate, cost function for PELT could be used. Understanding how the temporal dependence in the multivariate series manifests in the mapped series is non-trivial and is left as further work.
In the next section, we provide an extensive simulation study exploring the effectiveness of GeomCP at detecting multivariate changes in mean and variance and demonstrate an improved detection rate on current state-of-the-art multivariate changepoint methods. Furthermore, we illustrate the improved computational speed of GeomCP over current methods, especially as n and p grow large.