A communication-efficient, online changepoint detection method for monitoring distributed sensor networks

We consider the challenge of efficiently detecting changes within a network of sensors, where we also need to minimise communication between sensors and the cloud. We propose an online, communication-efficient method to detect such changes. The procedure works by performing likelihood ratio tests at each time point, and two thresholds are chosen to filter unimportant test statistics and make decisions based on the aggregated test statistics respectively. We provide asymptotic theory concerning consistency and the asymptotic distribution if there are no changes. Simulation results suggest that our method can achieve similar performance to the idealised setting, where we have no constraints on communication between sensors, but substantially reduce the transmission costs.


Introduction
During the last decade, there has been a significant focus on the important challenge of efficient and accurate detection of changes in both univariate and multivariate data sequences (Cho and Fryzlewicz, 2014;Fisch et al., 2022;Kovács et al., 2023;Truong et al., 2020;Tveten et al., 2022;Wang and Samworth, 2018).More recently, focus has turned to translating the efficiency of such approaches to the online setting, typically motivated by an applied challenge such as how to deal with limited computational power (e.g. Ward et al., 2024).Recent major contributions to the online setting include Adams and MacKay (2007), Tartakovsky et al. (2014), Yu et al. (2023), Chen et al. (2022), and Romano et al. (2023).In this paper we consider a less studied scenario, monitoring edge-behaviour within distributed sensor networks, which are common architectures within the Internet of Things framework (IoT).The importance of efficiently detecting changes at the edge efficiently, whilst minimising communication between sensors and the cloud is perhaps best appreciated by considering two key applications: detecting cyber-attacks on smart cities (Alrashdi et al., 2019) and optimising the performance of base stations (Wu et al., 2019).
Consider, by way of example, Figure 1 which shows a schematic representation of real-time monitoring within a distributed network.Here we assume that d data streams are monitored, each by its own sensor.Communication between the sensors and the centre is possible as shown by the dashed lines.An unusual event happens at time τ , and we want to detect this event as quickly as possible.However, in modern sensor networks that deploy IoT devices the computational resources of the sensors can be substantial.Moreover, communication between the sensors and the cloud can be problematic due to the heavy energy usage involved with transmitting data (Varghese et al., 2016;Pinto and Castor, 2017).As such, we need algorithms that can identify the time when it is important for information to be shared with the cloud.More specifically, in this article, we seek to develop a new method to detect changes within such a network in real time with high statistical power and as little communication and computation as possible.
Changepoint methods which can be applied in the fully centralised problem, when the data Centre τ Figure 1: Schematic representation of a sensor network made up of d sensors, where S i is the index for sensor i, X i,t is the data observed at sensor i, and M i,t is the message transmitted from sensor i to centre at time t.from the sensors is processed and transmitted to the centre (cloud) at every time step, are well studied.Approaches typically seek to calculate the maximum or the sum of all the test statistics (see, e.g., Mei, 2010;Xie and Siegmund, 2013;Chan, 2017;Chen et al., 2022;Gösmann et al., 2022).The rationale behind these methods is to set thresholds and raise the alarm if the aggregated test statistics from multiple streams exceed pre-defined thresholds.Numerical experiments (Mei, 2010) indicate that taking the maximum is the optimal method when there are only a few affected data streams -what we will term a sparse change.Conversely, taking the sum is optimal when most data streams are affected, also known as a dense change.
Recent contributions to this distributed problem include (Rago et al., 1996;Veeravalli, 2001;Appadwedula et al., 2005;Mei, 2005Mei, , 2011;;Tartakovsky and Kim, 2006;Banerjee and Veeravalli, 2015).Among them, two recent papers of particular interest develop communication efficient schemes for monitoring a large number of data streams (Zhang and Mei, 2018;Liu et al., 2019).The key idea is that each sensor computes a local monitoring statistic and then employs a thresholding step, only sending the statistic to the centre if there is some evidence of a change.The information from multiple sensors is then combined at the centre.This approach reduces unnecessary transmission by ignoring streams with little evidence for a change, while only focusing on data streams that show signs of change.
Although computationally feasible, existing works assume that the pre-and post-change mean are known.In practice, the pre-change mean can be estimated based on historical data.However assuming a known post-change mean is typically unrealistic in practice, with an incorrect value potentially leading to a failure to detect, or poor detection power.Liu et al (2019) approximate the post-change mean recursively but, as a consequence, somewhat sacrifice statistical power of the algorithm.
Our approach builds on recent work developing the moving sum (MOSUM) as a windowbased changepoint methods (see, e.g., Aue et al., 2012;Kirch and Kamgaing, 2015;Kirch and Weber, 2018).Specifically, we propose an online communication-efficient changepoint detection algorithm (distributed MOSUM) to detect changes in real-time within the distributed network setting.A local threshold is chosen to filter out unimportant information and only transmit the statistically important test statistic to the centre.The change will be alarmed when the aggregated test statistic exceeds the pre-defined global threshold in the central cloud.The low time complexity and communication efficient scheme of our proposed method makes it suitable for online monitoring.We also establish that the proposed method can achieve similar statistical power as the idealistic setting, where there is no communication constraint, at detecting large changes whilst substantially reducing the transmission cost.
Moreover, we also show how to make the detection performance of distributed MOSUM close to that of the idealised setting by increasing the window size, which will only sacrifice the storage cost and a little transmission cost.
The key differences between our work and previous distributed changepoint detection contributions (e.g., Liu et al., 2019) are: Firstly, a moving window-based test statistic MOSUM is chosen to avoid the requirement of knowledge of the post-change mean.Secondly, earlier works have been based on the framework that controls the average run length (ARL)the average amount of time until incorrectly detect a change.However, such a metric gives a somewhat limited amount of information since the distribution of run length is usually unknown.For instance, if multiple procedures end quickly while a few replications stop significantly longer, the ARL would be the same if all the replications terminated around the same time.Conversely, in this work, we present methods in terms of controlling the error rate under the null at a specific level, and with asymptotic power 1 under alternatives.Furthermore, our ideas generalise trivially to methods controlling the average run length.
The structure of this paper is as follows.In Section 2, the problem setting is outlined, before introducing the distributed MOSUM methodology in Section 3. Several theoretical results for this new approach are given in Section 4. Simulation studies are carried out in Section 5, before ending with some concluding remarks (Section 6).

Problem setting
We begin by assuming that we have d sensors, each of which is observed as follows: X t = (X 1,t , X 2,t , X 3,t ..., X d,t ) at every time point t ∈ N.Here X t could be raw data or the residuals after pre-processing the data.These observations are assumed to be identically distributed and independent across series.Such assumptions are common in the problem of detecting changes within a distributed system setting (Tartakovsky and Veeravalli, 2002;Mei, 2010;Xie and Siegmund, 2013;Liu et al., 2019).We do not strictly assume time independence here, but our method is optimal when this assumption holds.Moreover, the impact of time dependence will be numerically studied in Section 5.3.
We begin by assuming that at some unknown time, τ , the distribution of some unknown subsets of d sensors will change.For simplicity, we only consider change in mean, but the ideas below are easily extended to other changepoint settings.Therefore, in this illustrative change in mean setting, the model for the data is expressed as follows: where µ i is the known pre-change mean, δ i is the mean shift, and {ϵ i,t : t ∈ N} are strictly stationary error sequences.After time τ , the mean of the i-th data stream changes immedi-ately from µ i to µ i + δ i .Here it is useful to note that our setting also permits some δ i = 0, which means that only a subset of data streams are affected by the change.Without loss of generality, we assume µ i = 0.Under the null hypothesis, the model for the data can be rewritten as (2.2) Moreover under the alternative hypothesis, the model is Our aim is to monitor such a system and raise the alarm as soon as possible following the event at time τ .One way of achieving this is to perform hypothesis testing sequentially, i.e., evaluate the null hypothesis of no change in mean at each time point t ∈ N. The algorithm will stop and declare a change when we can reject the null hypothesis.
In the classical sequential changepoint detection problem, we evaluate the performance of an algorithm subject to a constraint on its false alarm rate.First, consider an open-ended stopping rule where the algorithm never we have an infinite time-window of measurements and the algorithm never halts until it detects a change.The false alarm rate can be evaluated in two ways.Assume there is no change, and let τ be the time at which we detect a change, with the convention that τ = ∞ if we detect no change.One approach is to control the average run length, E ∞ ( τ ), the expected time of to a false alarm.This makes sense for procedures with a constant threshold for detection, for which we are certain to detect a change under the Null if we monitor for an infinite time period.Alternatively, one can control the false alarm rate, P ∞ ( τ < ∞), the probability of a false alarm.To control this over an infinite time horizon requires increasing the threshold for detecting a change over time.Equivalently, this can be achieved by multiplying the test statistic with a weight function w(•) < 1. See Leisch et al. (2000); Zeileis et al. (2005);Horváth et al. (2008); Aue et al. (2012); Kirch and Kamgaing (2015); Weber (2017); Yau et al. (2017); Kirch and Weber (2018); Kengne and Ngongo (2022) for examples of how to choose an appropriate weight function.
In our paper, we focus on controlling the false alarm rate.However Aue et al. (2012) states that "applying open-ended procedures built from the asymptotic critical values have a ten-dency to be too conservative in finite samples".Therefore, our paper considers a close-ended stopping rule.In this approach, the algorithm will stop either upon detecting a change or upon reaching the predefined monitoring time T .We thus control the false alarm rate over a time time window of length T .However, the ideas we present can easily be adapted to the open-ended setting, and also to methods which control the average run length.
Under the context of distributed changepoint detection problem, we additionally evaluate the index -the average transmission cost ∆.This is the average number of transmissions at each time step for d sensors, and should be smaller than the pre-specified transmission cost ∆.
Before introducing our proposed method, we first review relevant work.At time t, the local monitoring statistic, T i is calculated for the ith stream.Then all the local statistics T i can be combined into a global monitoring statistic T at the fusion centre.There are two common choices of message combinations for monitoring changes within the distributed system.One of these two types, the SUM scheme (Mei, 2010), declares a change when the sum of all the local monitoring statistics exceeds a pre-defined threshold, that is: where c Global is global threshold.This way of combining statistics across streams is known to be good if the series are independent and the changes are dense.However, implementing this method on the distributed system requires sending every T i to the fusion centre, which is expensive.A sum-shrinkage method (Liu et al., 2019) is proposed to reduce the communication cost by thresholding the test statistics before summing them: Empirically the sum-shrinkage method could achieve similar performance as the SUM scheme in the dense case and surprisingly performs better in the sparse case.
When the change is sparse, it has been shown both theoretically and empirically (Mei, 2010;Liu et al., 2019;Chen et al., 2022) that monitoring the maximum of the test-statistics across series is best.In such a setting, the MAX procedure (Tartakovsky and Veeravalli, 2002) monitors the maximum of test statistics and raises the alarm when the maximum of the local test statistics exceeds the thresholds, that is: The best choice of different schemes depends on the sparsity of changes which is based on the number of affected data streams p.This can be made precise if we consider an asymptotic setting where p → ∞ (Enikeeva and Harchaoui, 2019), and define a change to be sparse if the number of affected streams is p = o( √ d), and it to be a dense change otherwise.
A recent paper (Chen et al., 2022) combines both SUM procedure and MAX procedure to achieve good performance regardless of the sparsity.In the context of distributed monitoring, the MAX procedure is trivially implemented without any communication.Specifically, each sensor has the threshold for the max-statistic and flags a change if their local statistic is above this threshold.Therefore, within this paper, we only focus on developing a communicationefficient version of the SUM scheme.Our aim is a method that performs well for dense changes, but limits the communication cost.We will use the SUM scheme as the ideal method to compare against since it has no restrictions on communication.

Distributed change point detection method
Our proposed methodology is summarized in Algorithm 1, and described in detail below.
The method essentially comprises of three steps.The first step involves the parallel local monitoring of each data stream by the sensors.As the monitoring unfolds, messages are occasionally sent from the sensors to the centre to indicate the presence of a potential change.
Finally, at the centre these messages are aggregated to find changes that occur across a number of data streams.
Algorithm 1: Centralized and distributed MOSUM input : historic data x i,t for i = 1, 2, ..., d, and 1 ≤ t ≤ m Estimating the baseline parameters // can be done offline for i = 1 to d do estimate μi and σi end Data: x i,t for i = 1, 2, ..., d at time t while change is detected or reached the maximum monitoring time T do Local monitoring // parallel computing for i = 1 to d do

Estimating the baseline parameters
Our sequential testing approach requires a historic data set of length m to estimate the baseline parameters.Theoretical results are obtained later in the paper when m → ∞.The parameters of interest are the mean of each data stream µ i and the variance of the errors σ 2 i .For the ith data stream these estimates are, If the errors cannot be assumed to be independent we can estimate the long run variance.
This requires specifying a kernel function K(•): In this setting, the Kernel function can be seen as a weighting function for sample covariance The kernel function must be symmetric and such that K(0) = 1.Various kernel functions are proposed.Standard kernel functions include Truncated (White and Domowitz, 1984), Bartlett (Newey and West, 1986) and Parzen (Gallant, 2009) amongst others.Among them, the Bartlett kernel is frequently used in Econometrics.This kernel takes the form: For more details, see Horváth and Hušková (2012); Kiefer and Vogelsang (2002a); Kiefer and Vogelsang (2002b).

Starting local monitoring
Once the baseline parameters have been estimated, beginning at time m+1 data X i,m+1 , X i,m+2 , . . .are observed sequentially and monitored for a change.This is achieved using a MOSUM statistic which at monitoring time, k, takes a window containing the most recent h observations: Intuitively, if there is no change the weighted MOSUM will remain small, but it will be large if there is a change.Figure 2 gives the behavior of weighted MOSUM statistic under the null and the alternative assumptions for one data stream.

Message passing
The local monitoring described in the previous section is applied to each sensor independently.In order to make global decisions about the state of the system, messages from the sensors must be passed to the central hub (see Figure 1).However, since there are constraints on communication in the system, the message passing process must be carefully designed.
At time t = m + k , where m is the historic period of length m, and k is the monitoring time, each sensor makes a decision as to whether or not to transmit a message to the centre.
This message vector is denoted as ).We consider two different messaging regimes: • Centralized messaging regime: • Distributed messaging regime: (3.6)The centralized massaging regime is one where there is no constraint on the communication between the sensors and the centre, so all sensors send a message to the centre at each time instant.This is similar to the "SUM" scheme changepoint detection method proposed by Mei (2010).However, when communication is expensive, a "distributed" messaging regime can be used where each of the sensors only send local monitoring statistics that exceed a chosen threshold.The N U LL means no message is sent.The threshold c Local can be chosen to control the fraction of transmitting sensors when there is no change.It is worth noting that when c Local = 0, the "distributed" messaging regime is equivalent to "centralized" messaging regime.

Global monitoring
In our paper, we assume that there is no communication delay between sensors and the central hub, so the message could be immediately received by the centre at time t.Based on the messages received, the centre will make the decision as to whether or not to flag a change.

Combining messages
Depending on different messaging regimes, the global MOSUM statistics are constructed as follows: • Centralized global MOSUM statistic: This is similar to the SUM scheme mentioned in Section 2. By using such a scheme, Formula 3.7 is the idealistic scheme under dense change.
• Distributed global MOSUM statistic: where N U LL values in Formula 3.6 are taken to be zeros in the sum.The form of Equation (3.8) is taken from the multivariate MOSUM (Kirch and Kamgaing, 2015;Weber, 2017;Kirch and Weber, 2018).

Declaring the change
Similar to the local monitoring procedure, a change is declared as soon as the weighted global MOSUM exceeds a threshold.A closed-end stopping rule can be used when the aim is to monitor changes within a fixed time.This can be formalised as where min{∅} = ∞ and the total length of the data T = m T .If no change is detected by this stopping rule prior to ⌊m T ⌋, the monitoring procedure is terminated.The parameter T > 0 governing the length of the monitoring period is chosen in advance (Horváth et al., 2008;Aue et al., 2012).
Figure 3 shows the weighted global MOSUM statistic for the distributed and centralized messaging regimes on the same dataset.Whenever the weighted global MOSUM of distributed regime hits zero, there is no communication between the edges and the centre at that time.In the next section, we will show the theoretical properties of our proposed method under H 0 and H A .

Theoretical properties for distributed MOSUM
This section considers the theoretical properties of the closed-end stopping rule, τ m, T defined in Equation (3.9) as m → ∞.Firstly, in Section 4.1 we find the limiting distribution under the null hypothesis for the different procedures.Then, appropriate choices for the thresholds, c Local and c Global are given in Section 4.2 using these results.Finally, in Section 4.3 we prove that the detection procedures we have studied are consistent under alternatives.
Three key assumptions are made in order to derive asymptotic results, which are the same in Horváth et al. (2008), Aue et al. (2012), and Weber (2017): Assumption 1 (Clean historic data).h → ∞ as m → ∞ and the location of the changepoint This assumption is to guarantee we can get good estimators based on the training dataset, and it can be easily achieved in real applications.
Assumption 2 (Asymptotic regime).h → ∞ as m → ∞ and This assumption quantifies the long run connection between the length of the historical period m and the window size h := h(m).
where σ i > 0, {W i (t), 0 ≤ t < ∞} is a standard Brownian motion when h → ∞, and S i (x) = ⌊x⌋ t=1 ϵ i,t .σ i can be estimated by σi .Furthermore, σi satisfying σi This assumption is a functional central limit theorem on the errors, ϵ, in the model for the data (2.1).

Asymptotics under the null
In this part, the asymptotic theories of our proposed method will be given, which can help guide the choice of thresholds.
The local monitoring process of our proposed method within each sensor is the same as univariate MOSUM detection process.Thus, Theorem 1 and Corollary 1.1 of the local MOSUM can be directly cited from Horváth et al. (2008), Aue et al. (2012) and Weber (2017).For simplicity, we denote where {W i (t), 0 ≤ t < ∞} are independent standard Brownian motions.
Theorem 1 (Local MOSUM ).If assumption 1-3, and model 2.2 holds, then under H 0 , let Corollary 1.1 (Local MOSUM -asymptotic type-I error ).Under H 0 , for any T > 0 and ith data stream, Thus, the false alarm rate for one data stream is asymptotically equal to a pre-specified type-I-error ∈ (0, 1).
Following the results of local MOSUM, similar results for global MOSUM follow readily.
These can be used to choose thresholds given the pre-defined Type-I-error.Below we obtain two limiting distributions, for the centralized and distributed regime settings of Section 3 respectively.
Thus, their limiting distribution will be a function of Gaussian process.Using the Theorem 2, the following may be obtained: Corollary 2.1 (Global MOSUM -asymptotic type-I error ).Under H 0 , for any T > 0, This result can lead us to find the local and global thresholds which can obtain the pre-defined type-I-error.

Obtaining critical values
Using the results of the previous section, appropriate critical values can be found such that the asymptotic type-I error is controlled for the different procedures.To achieve this the stochastic processes {Z i (t), 0 ≤ t ≤ T /β, 1 ≤ i ≤ d} need to be approximated on a fine grid.This is done in the same way as Aue et al. (2012), simulating the component standard Brownian motions using ten thousand i.i.d.standard normal random variables.The parameters used were β = 1/2 and T = 10.Tables 1 and 2  where Z is the standard normal distribution.
Therefore, the local threshold can be chosen based on the restriction of the transmission cost.
Combined with pre-defined type-I-error, the global threshold will be given based on Theorem 2.

Asymptotics under the alternative
Under the alternative it is assumed that there is a changepoint at monitoring time k * and a subset S of the data streams have an altered mean Deriving sharp asymptotic results on the detection delay of the proposed method is challenging, and thus we focus only on giving consistency results.A procedure is consistent if it stops in finite time with probability approaching one as m → ∞.In other words, the test statistic should tend to infinity as m → ∞.In the asymptotic regime of interest, we additionally assume that the changepoint k * grows at the same order as h, that is k * h → γ ≥ 0, and the size of change δ i,t satisfies √ h|δ i,t | → ∞ as m → ∞ and h → ∞.These assumptions are the same in Aue et al. (2012).
Then, as m → ∞ and h → ∞

The numerical dependency on local thresholds
Our proposed method requires specifying two thresholds.Usually, c Global can be given based on the Theorem 2.1 once α and c Local are confirmed.Therefore, it is crucial to pick an appropriate local threshold.This section gives numeric results with different values of local thresholds, which may provide some guidance in choosing the local threshold.
Figure 4 gives the average detection delay and transmission cost for different values of local thresholds.There is a trade-off between communication savings and detection performance when choosing the local threshold.Larger local thresholds can reduce the transmission cost but will also lead to longer delays, especially when the change is small.However, with the increase in the mean shift, the detecting power of larger thresholds will close to that of small thresholds.
A centralized framework can be seen as an idealistic setting, which is equivalent to distributed setting when c local = 0. Compared with the idealistic setting, the distributed MOSUM can achieve similar performance when the size of the change is not small but also reduces massive transmission costs.But we will lose power in detecting small changes.We show the result below that distributed MOSUM can approximate the performance of idealistic setting overall by increasing the window size.

The numerical dependency on parameters
One advantage of using MOSUM statistics is that we do not need to specify the post-change mean.Instead, our proposed method requires specifying the window size h and the training size m.In this section, we will investigate the impact of bandwidth and training size.

The impact of bandwidth
As shown in Figure 5a, increasing the window size can increase the power of detecting small changes while leading to a slight delay in detecting large changes.Although increasing window size will increase the storage cost, it will not significantly increase the transmission cost as shown in Figure 5b.This drive us to think about whether we can improve the ability q q q q q q q q q q q q q q q q q q q q q q q q q q q 0 1 2  of distributed MOSUM with a large threshold to detect small changes by increasing the window size.Ideally, we would expect distributed MOSUM with increased window size can achieve similar performance as the idealistic setting.q q q h=20 h=50 h=100 (b) ∆ versus h.

Recovering detectability
For simplicity, we denote that the default window size for centralized MOSUM is h 0 , and h * is the smallest window size that would allow distributed MOSUM to have similar performance as the idealistic setting.It is difficult to develop a neat theoretical formula between h * and h 0 .But we can approximately find h * under alternatives by simulation.Our idea can be summarized as follows, and Figure 6 is the graphic explanation: • The behaviour of D will decrease dramatically when the mean shift is within a certain range (gray area).Therefore, we can find the median or mean δ of this certain range, denoted by δ 0 .Also, the corresponding ADD D0 can be calculated.
• The optimal window size h * = arg min Dc Local (h) − D0 (h 0 ) .See blue arrow (h * ) is shorter than yellow arrow (h).q q q q q q q q q q q q 0 1 2 3 4 Local

The impact of the training dataset
Fix the bandwidth h, the impact of the size of the training dataset can be investigated.4.

The violation of the independence assumption
Before, we assume that there is temporal independence among observations.However, it may not always hold in the real application.This section will investigate the performance when this assumption is violated.Here we measure our algorithm under AR(1) noise process, that is where ϵ i,t = ϕϵ i,t−1 + v t with v t ∼ N (0, 1).|ϕ| < 1 is used to measure the strength of the auto-correlation.
Auto-correlation will inflate the variance of data.There are two possible ways to handle this problem.The first one is to estimate the long-run variance as shown in Section 3.2.And one can also inflate the thresholds.We measure the false positives, average detection delay and the number of transmitted messages with fixed type-I-error of these two solutions under different scenarios.For better comparison, we also show the result of MOSUM without any adjustment.This will give us hints that to what extent our method fails to detect the change if we ignore the auto-correlation.
As Table 5 shows, our proposed method without adjustment can lose the ability to detect changes when introducing auto-correlation, that it fails to detect the change and always alarms.The performances of MOSUM with inflating thresholds are generally better than MOSUM with LRV since the former can detect the change in most scenarios.However, for those scenarios that the MOSUM with LRV can detect (usually δ is not small and ϕ is not large), it always has the lowest transmission cost and reasonable detection power.For example, when p = 100, δ = 1, and ϕ = 0.25, both solutions have similar false positive rates and average detection delay, while MOSUM with LRV has lower transmission cost.It is surprising that estimating LRV has the lowest false positive rates and average detection delay when ϕ = 0 and p = 100/50.This may be because it underestimates the variance.
However, when the auto-correlation is serious, it is not appropriate to apply our method to the raw data.Instead, it is more reasonable to apply our method after pre-processing the data, such as the residuals of AR models.

Conclusion
Within this paper, we proposed an online communication-efficient distributed changepoint detection method, and it can achieve similar performance as an idealistic setting but save many transmission costs.Numerically, we show that the local threshold and window size have an impact on the performance of our algorithm, and there is a trade-off in choosing a local threshold and window size.In application, we recommend choosing a large local threshold in general cases.But when the change is extremely small, the choice of the local threshold depends on the communication and storage budgets.If the communication budget is much more limited, choosing a large threshold with a large window size is sensible.If the storage cost is much more expensive, choosing a small threshold with small window size will approximately achieve the idealistic performance.
The violation of independent assumptions will negatively affect the power of our proposed method.We tried to solve this problem by inflating thresholds or estimating the long-run variance.Both ways can, to some extent, improve our algorithm when the auto-correlation problem is not severe.However, both approaches fail to detect changes in highly autocorrelated data.Therefore, one of the future research directions is how to detect change within highly auto-correlated data in real-time.
If data stream i is affected by the change, so that i ∈ S and δ i ̸ = 0 then  For the global sparse procedure the local MOSUM's w( k, h)T (m, k, h) are hard thresholded.
Since these diverge to infinity individually then the same argument used for the dense procedure applies.
.4) FollowingAue et al. (2012), the MOSUM statistic will declare a change at time k when the weighted local MOSUM statistic w(k, h)T i (m, k, h) exceeds a pre-defined threshold.A weight function w(•, •) is introduced to control the asymptotic size of the detection procedure.Typically w(•, •) depends on the monitoring time k, and the window size h, ρ(•).The choice of the weight function controls the sensitivity of the test.A wide range of weight functions can be used as long as they are continuous functions that satisfy inf 0≤t≤T ρ(t) > 0. In this paper, we use the weight function proposed inLeisch et al. (2000) andZeileis et al. (2005):

Figure 2 :
Figure 2: Example time series with no change (a) and a single change (b) in the top row.The bottom row shows the weighted MOSUM statistic with a historic period of length m = 100 and a window size of h = 50.

Figure 3 :
Figure 3: Example of the weighted global MOSUM statistic for the distributed (red dashed line) and centralized (black line) regime.The result is obtained with T = 1000, d = 100, m = 100, h = 50, δ = 0.5 and the number of affected sensors p = 50.A value of c Local = 3.44 was used in the distributed regime.
Transmission cost for d = 100 data streams.average detection delay versus η.

Figure 4 :
Figure 4: The average number of messages transmitted to the centre (top) and average detection delay across varying mean shifts (bottom).Results are obtained when m = 200, h = 100, T = 10000, τ = 5000, α = 0.05.Each line corresponds to a different local threshold, which is labelled on the top right.The colour changes from orange to blue as the local thresholds increase from 0 to 5.2.When the local threshold is 5.2, the global threshold will be 0.So all possible combinations of thresholds are covered.
average detection delay for distributed MOSUM (fixed h = 50, gray line) and distributed MOSUM (h = h * , colored line).Different lines corresponds to different c Local

Figure 7
Figure7displays the simulation results that, for distributed regime, we can recover the same detectability of the centralized statistic by inflating h.
are obtained over 1000 replications with T = 10000, m = 200, h = 100, τ = 5000, d = 100, c Local = 3.44, and α = 0.05 for all three methods.The blue colours are labelled when both the false positive rates and average detection delay are small.

Table 2 :
Critical values for the distributed procedure with different values for c Local , results averaged over five thousand replications.
Aue et al. (2012)l values obtained above are valid asymptotically (in m), an important question to consider is how they perform in finite samples.Numerical results of empirical size in the finite sample are shown in Table3.Thse indicate that the implementation in the finite sample setting can be conservative, as perAue et al. (2012).However, approximately, the type-I error is controlled at the correct level for both of the global procedures in finite samples.

Table 3 :
Empirical size, results averaged over one thousand replications with α = 0.05, T = 10, and β = 1/2.4.2.1 The choice of local threshold c LocalThe values for c Local used in Table2are somewhat arbitrary.The main influence of the value of local threshold is that it controls the proportion of messages that the system can pass (on average) per iteration.For d streams, the number of sensors passing message at each time step is: Corollary 2.2 (Transmission cost).For any t>0 and k=ht, the expected fraction of trans-mitting sensors at each time step is ∆t = dP (ρ(t)|Z| > c Local ) .

Table 4 :
Empirical size, and MSE for estimated mean and standard deviation, results averaged over one thousand replications with c Local = 3.44, h = 50, T = 6000 and α = 0.05.

Table 4
gives the thresholds, empirical size, and mean square errors (MSE) of estimated baseline parameters in our simulation.As we expected, the larger the training size is, the more accurate estimators are.Figure8indicates that overall the detection powers of four different sizes of training datasets are similar.A larger training size could slightly increase the detection power when detecting small changes, which is attributed to more accurate estimators.Thus, in the real application, it is beneficial to choose a large-size training dataset because it is not expensive that can be done offline.