# Online detection of continuous changes in stochastic processes

## Abstract

We are concerned with detecting continuous changes in stochastic processes. In conventional studies on non-stationary stochastic processes, it is often assumed that changes occur abruptly. By contrast, we assume that they take place continuously. The proposed scheme consists of an efficient algorithm and rigorous theoretical analysis under the assumption of continuity. The contribution of this paper is as follows: We first propose a novel characterization of processes for continuous changes. We also present a time- and space-efficient online estimator of the characteristics. Then, employing the proposed estimate, we propose a method for detecting changes together with a criterion for tuning its hyper-parameter. Finally, the proposed methods are shown to be effective through experimentation involving real-life data from markets, servers, and industrial machines.

## Keywords

Continuous change Local linear regression Online change detection## 1 Introduction

### 1.1 Motivation for and purposes of this paper

This paper addresses the issue of detecting changes in non-stationary stochastic processes. Specifically, we focus on the online setting. That is, when given a time series sequentially, we are concerned with detecting change points in a sequential manner. In conventional studies on change detection, researchers have sought to detect time points when the statistical models of data suddenly change [2, 9]. In real situations, however, changes may not occur abruptly, but rather incrementally over some successive periods of time. We call such changes *continuous changes*. Actually, there exist many phenomena that may be characterized by continuous changes (e.g., seismic motion before and after an earthquake, and stock prices in markets).

In this paper, we consider the problem of detecting time points when continuous changes start. This problem is worthwhile studying from a practical point of view, because the starting point of continuous changes can be considered a symptom of big changes that will occur in future. Therefore, detecting them in early stages can lead to predictions of important events in the future. Despite the importance of continuous changes, it has not yet been explored how to detect them. It is certain that the detection of continuous changes has been covered in some previous studies in the context of concept drift [3, 7]. Nevertheless, they have been thought of as succession of relatively small, ‘abrupt’ changes.

There are three purposes of this paper, and they are summarized as follows: The first is to introduce a framework for detecting continuous changes. In it, we define the magnitude of continuous changes and formalize the problem of measuring them. The second purpose is to propose an efficient algorithm for detecting continuous changes. The third purpose is to empirically demonstrate the effectiveness of our proposed algorithm in comparison with existing algorithms.

### 1.2 Novelty of this paper

- (1)
A

*novel framework for continuous change detection*: We first define a measure of continuous changes for parametric models. It is designed on the basis of Kullback–Leibler divergence between the model before and after a change point, which has been known as a typical measure of change. In our framework, we assume that the parameter value changes smoothly over time. We employ a weighted linear regression to model it. We thereby derive a novel measure of change by plugging the localized maximum likelihood estimates of the parameter and its rate of change into the approximation of the Kullback–Leibler divergence. We justify this measure theoretically by proving that it is invariant with respect to parameterization. We show several examples of calculations of this measure for parametric models, such as the independent exponential family and auto-regression model. - (2)
A

*novel efficient algorithm for online detection of continuous changes*: Real-time change point detection is more favorable than batch detection for a variety of applications. We develop an efficient algorithm for detecting continuous changes in an online fashion. It is designed on the basis of the following two key ideas: (i) to efficiently calculate change scores by utilizing a recurrence relation for the weighted linear regression and (ii) to calculate a threshold for scores dynamically. We establish an alarm when the score exceeds the threshold, where the threshold may change over time. Combining (i) with (ii) above yields an online algorithm for detecting continuous changes in the computation time*O*(*N*) for the total data size*N*. - (3)
A

*novel criterion for choosing hyper-parameters*: We also present a criterion that measures the fitness of the proposed model without knowing whether there are any changes in data. This enables us to automatically tune the hyper-parameters of the proposed method, such that one is no longer worried about the choice of hyper-parameters. - (4)
*Empirical demonstration of the effectiveness of our method*: We used synthetic and real datasets to compare our method to existing online change point detection methods in terms of how accurately and how early they can detect continuous changes. Specifically, we applied our method to malware detection, economic event detection, and industrial incident detection. We therefore show that our proposed method, together with the hyper-parameter-choosing criterion, is able to detect symptoms of important events significantly earlier than other methods.

### 1.3 Related work

Many methods have been proposed for detecting changes that happen abruptly in stochastic processes [2, 5, 8, 10, 15]. Online methods for detecting them have also been developed in [1, 5, 13, 18, 19, 20, 21]. Those methods somehow test whether the two sample sets clipped from neighboring two sliding windows are generated from an identical distribution. It is assumed that changes occur at some discrete time points and that the generated distribution is piecewise stationary. Therefore, it follows that they can unnecessarily degrade their performance in detecting continuous changes, which take place over some periods of time rather than a discrete point.

Change detection is related to the topic of *concept drift* (see, for example, [6, 7, 14, 22]). Changes that occur gradually over time are called *incremental changes* in the context of concept drifts [7, 22], but there are no studies on online detection algorithms tailored for incremental changes to the best of our knowledge. Recently, changes in the rate of change have been studied in the scenario of volatility shift change detection [12]. This implicitly assumes that changes can be continuous. Our work differs in that ours deals with the rate of change with continuously changing smooth models, while [12] deals with that with a piecewise stationary model.

The remainder of this paper is organized as follows: Sect. 2 introduces a measure for continuous changes. Section 3 gives a time- and space-efficient algorithm for computing the proposed estimates, as well as a number of examples for some statistical models. In Sect. 4, we present a method for detecting continuous changes employing them. Section 5 provides us with a criterion for choosing hyper-parameters of the proposed method. Section 6 shows experimental results on synthetic and real datasets. Section 7 gives concluding remarks.

## 2 Estimating the magnitude of continuous changes

### 2.1 Measures of magnitude of changes

*t*is a

*change point in the process*\(\mathcal P\) if and only if \(\theta _t \ne \theta _{t-1}\). Then, it is reasonable to estimate whether

*t*is a change point by estimating some divergence measure \(d(\theta _t, \theta _{t-1})\) and by comparing it to a threshold \(\beta \). The Kullback–Leibler (KL) divergence is one of the most common divergence measures for such purpose. Let \(y_t\) be the KL divergence between probability densities specified by \(\theta _t\) and \(\theta _{t-1}\)

*magnitude of change*at time

*t*and consider how to estimate \(y_t\).

*discrete*if no change occurs before and after the corresponding point. The magnitude of a discrete change can be estimated in the following manner: Assuming that the data distribution is stationary before and after the change point

*t*, one can estimate \(\theta _t\) and \(\theta _{t-1}\), respectively, with, for example, the technique of maximum likelihood estimation. Plugging those estimates into \(y_t\) gives a simple estimator of the magnitude of the change

*t*such that \(\hat{y}_{t}\) is sufficiently large. A substantial number of previous methods for detecting change points can be viewed as a special case of the above scheme.

*continuous*if a change occurs over some successive period of time. In this case, direct estimations of \(\theta _t\) and \(\theta _{t-1}\) as described above do not make sense since every step of the change is surrounded with other small changes. In order to deal with continuous changes, we consider another characterization of the measure \(y_t\), rather than parameters themselves. In this paper, we estimate \(y_t\) through the following proxy measure:

*t*is a change point (i.e., \(z_t\ne 0\)). Note that the plug-in estimation of \(z_t\) requires estimating \(\delta _t\), which requires the smoothness of the parameter sequence \(\theta _0^{n-1}\) in exchange of its piecewise stationarity. Hence, it fits better for detecting continuous changes.

### 2.2 Estimating linearly changing parameters

### Proposition 1

\(\sum _{k=0}^{n-1} L_k(\theta +(k-t)\delta )\) is strictly convex with respect to \(\theta \) and \(\delta \). Therefore, it has a unique minimizer.

### Proof

It immediately follows from that the objective function is a positive-weighted summation of strongly convex functions \(L_k(\theta +(k-t)\delta )\) with respect to \(\theta \) and \(\delta \). \(\square \)

### 2.3 Estimating continuously changing parameters

*localized maximum likelihood estimator*by

*x*.

*t*. In order to fit the linear model locally, we reduce \(w_i\) when |

*i*| becomes sufficiently large. In addition, to take the balance between both sides of time point

*t*, we calibrate the first-order moment of the weights,

*t*can be seen as the center of the weights given by

*t*such that there exist two parameters \(\theta _-\) and \(\theta _+\) \((\theta _-\ne \theta _+)\) and that \(\theta _k=\theta _-\) if \(k<t\), otherwise \(\theta _k=\theta _+\). Then, as in the following proposition, the resulting estimate of the magnitude \(\hat{z}_t\) is bounded away from zero in probability in the limit of large \(|\varLambda |\). It implies that the proposed estimate works well even if changes are discrete.

### Proposition 2

Assume that the weight sequence \(\left\{ w_i \right\} _{i\in \mathbb {R}}\) uniformly converges to a Riemann integrable function \(\left\{ \bar{w}_i \right\} _{i\in \mathbb {R}}\), where \(0<\int _0^n \bar{w}_{k-t}dk<\infty \). Then, under the preceding assumption of discrete changes, \(\hat{\delta }_t\) given in (7) is bounded away from zero—except with a small probability for large \(|\varLambda |\). Therefore, the estimated magnitude \(\hat{z}_t\) is also asymptotically bounded away from zero in probability.

### Proof

See “Appendix 1.”

## 3 Approximating estimates

Although the idea behind the proposed estimate is very simple, there is a difficulty specific to the problem of detecting changes: Since the solution of (8) is often analytically intractable, it becomes computationally intractable as the number of observations \(|\varLambda |\) grows. Even if it is calculated, there is another undesirable property: the resulting value of the estimated measure, \(\hat{\delta }_t^\top I(\hat{\theta }_t)\hat{\delta }_t\), depends on the parameterization of model space \(\varTheta \).

### Proposition 3

Let \((\hat{\theta }_t, \hat{\delta }_t)\) be the approximated estimate solving (11) and let \(\hat{z}_t=\hat{\delta }_t^\top I(\hat{\theta }_t)\hat{\delta }_t\). Then, \(\hat{z}_{t}\) is invariant with respect to the parameterization of model space \(\varTheta \) if condition (1) holds.

### Proof

See “Appendix 2.”

So far we have presented a general scheme for estimating the magnitude of continuous changes. In the rest of this section, we give two examples of how to calculate estimates for concrete statistical models: the independent exponential family and Gaussian autoregressive (AR) models.

### 3.1 Independent exponential family

*x*and \(xx^\top \).

### 3.2 Gaussian autoregressive models

*p*. The conditional density function is given by

*U*(

*n*) is invertible if \(\gamma _0,\gamma _1>0\) since it is positive definite,

## 4 Algorithm for detecting changes

*discounting rate*. The rate

*r*can be seen as the

*resolution*parameter that controls the time constant \(c=-1/\log (1-r)\approx r^{-1}\) in which the weight decays to 1 /

*e*times. This means that each observation remains to be effective on the value of \(\hat{z}(n)\) during the period of length proportional to

*c*.

We regard the estimate \(\hat{z}_t\) as a change score of the stream given observations until point *n*, where *t* is automatically given by (10). To clarify the dependence of \(\hat{z}_t\) to *n*, we refer \(\hat{z}(n)\) as to \(\hat{z}_t\) from now on. We also define *t*(*n*), \(\hat{\theta }(n)\), and \(\hat{\delta }(n)\) in the same vein.

In the remainder of this section, we first give an efficient method for solving (11) with exponentially discounting weights. We then give a procedure for generating alarms to indicate changes.

### 4.1 Computing weighted summations

*f*. Let \(\varDelta \) denote the lag in detection, \(\varDelta =n-t(n)\). We have the following recurrence relation:

*f*can be done within a constant time.

### 4.2 Making alarms with threshold

We now connect the estimate \(\hat{z}(n)\) with an algorithm for detecting continuous changes. We activate an alarm when \(\hat{z}(n)\) exceeds a threshold where the desirable value of the threshold will change over time. This is because \(\hat{z}(n)\) is biased in the positive direction, even if there is no change occurred and the bias could be vary as *n* increases.

*d*is the dimensionality of the parameter space \(\varTheta \), and \(V_n^j\overset{\text {def}}{=}\sum _{k=0}^{n-1}(k-t)^jw_{k-t}^2\). Here, \(V_n^j\) is also computed employing a recurrence formula similar to (19). Note that this works well when each statistic \(T(X_i)\) is uncorrelated with the others. If they are strongly correlated, one has to consider a further correction on \(s_n\). If the correlation is time-invariant, however, the correction can be offset by the threshold \(\beta \).

## 5 Choosing hyper-parameters

Because change detection is a task of unsupervised learning, we cannot “train” hyper-parameters in an explicit manner. Therefore, we propose criteria for choosing those parameters. The criteria are to be minimized in reference to “training period” of data.

The proposed method has two hyper-parameters, *r* and \(\beta \). We focus on choosing the optimal discounting rate *r* that induces the best estimate of \((\theta _n, \delta _n)\). In contrast, we believe that there is no *best* value of the threshold \(\beta \), since it controls the trade-off relation between the false-positive and false-negative rate of the alarm \(\left\{ a_n \right\} \). The optimal balance of these false-alarm rates should be determined at a higher level (e.g., by users).

*r*controls the trade-off between the accuracy and delay (i.e., the variance and bias) of the alarms generated by the proposed method. The lag in the alarm in the proposed method, \(n-t(n)\), coincides approximately with \(r^{-1}\) according to (10),

*n*and small

*r*. Thus, small

*r*biases the estimate \((\hat{\theta }(n), \hat{\delta }(n))\) and delays the detection. On the other hand, small

*r*also reduces the variance of the estimate. For example, the variance of the estimate calculated with a model for the independent exponential family is evaluated as

*r*has a direct effect on the performance of the score \(s_n\).

*r*. In the first place, the weights are designed such that the resulting estimate \((\hat{\theta }(n), \hat{\delta }(n))\) approximates the current parameter and derivative \((\theta _n, \delta _n)\) utilizing the past observations \(x_0^{n-1}\). Hence, we are encouraged to evaluate the trade-off relationship in terms of a predictive error on an unseen sample \(x_n\). We define the sequential predictive error,

*r*to minimize it,

*R*is a set of relevant values for the discounting rate.

*r*but also to select the statistical model \(\mathcal M\) itself.

## 6 Experiments

In this section, we show experimental results comparing the proposed method to conventional ones. First, the quantitative results on synthetic experiments are presented. Next, the qualitative results on experiments with three real-life data are exhibited.

### 6.1 Synthetic datasets

*h*starting from \(n=1000i\) \((i=1,2,\ldots ,9)\). Each sequence \(x_n\ (n=0,1,\ldots ,9999)\) was independently drawn from the univariate Gaussian distribution with mean \(\mu _n\) and variance 1, where

*S*(

*t*) denotes a

*slope function*with slope length

*h*,

Next, we introduce the online change detection method employed in this experiment. For the proposed method, we employed as the statistical model the univariate Gaussian distributions with unknown mean and unknown variance. We refer to this as *local linear regression* (LLR). In addition to LLR, we employed three other algorithms for comparison, both of which are designed to detect abrupt changes in an online fashion: (1) *Page–HinkleyTest* (PHT) [17], which is one of the most widely used methods of change monitoring, (2) *change finder* (CF) [18, 19, 21], as a the state of the art of abrupt change detection, and (3) the Bayesian method proposed by [1], which we refer to here as *Bayesian online change point detection* (BOCPD). To compare the performance of PHT with ours, we calculated the scores of change as the reciprocal of estimated run length given by PHT. Similarly, we compute the change scores of BOCPD as the posterior variance of parameters \(\theta _t\) utilizing the posterior probability \(P_n(l)\) of run length.

*r*, \(\gamma _0\), and \(\gamma _1\). We chose \(r=0.05\) to minimize predictive errors over training data (see Fig. 1). We also set \(\gamma _0=\gamma _1=0\) (i.e., no regularization) because the univariate Gaussian model has only two parameters \((\mu , \sigma ^2)\) and then not worth worried about its over-fitting.

*T*be the maximum tolerant delay of the change detection. When a change occurred at point \(t^*\), we define the

*benefit*of an alarm at time

*t*with respect to \(t^*\),

*S*denotes the set of all the change points. The number of

*false alarms*is calculated aswhere \(\mathbb {I}(t)\) denotes the binary function that takes 1 if and only if proposition

*t*is true. Finally, we visualized the performance by plotting the true-positive rate (TPR), \(B/\sup _\beta B\), against the false-positive rate (FPR), \(N/\sup _\beta N\), with a varying threshold parameter \(\beta \). Through these performance metrics, we regard the alarms raised by

*T*step after true changes as correct detection and the others as false detection. Specifically, for \(T=0\), we can evaluate the usefulness of change score \(s_n\) to detect changes as they are occurring. This scheme of evaluation is adopted since early detection of ongoing changes and subsequent countermeasures are important especially in the scenarios where continuous changes are expected.

*h*and

*T*. Noticing that the performance of LLR and CF is robust with respect to the change in tolerance

*T*and that those of PHT and BOCPD drastically degrade when \(T=0\) and

*h*is small, one can see that LLR and CF are good at early detection of changes. Specifically, it is remarkable that LLR, whose hyper-parameters are selected automatically and independently to the ROC-AUC metric outperformed the other methods, whose hyper-parameters are tuned in order to maximize ROC-AUC. Also note that LLR outperformed the others even in the discrete case \((h=1)\). This is because the former merely detects whether there is a trend in the change, whereas the latter detects individual changes and ignores their trends.

### 6.2 Real datasets

With three distinct real-world datasets, we qualitatively compared change detection methods, including LLR. We used (1) malware attack data, (2) economic time series data, and (3) industrial boiler data.

#### 6.2.1 Malware detection

First, we used eighteen days of transaction records logged on a server system when a backdoor was planted on it. This dataset was provided by LAC Corporation (http://laccorp.com/). It is known that some types of malware, such as a backdoor reveal symptoms (e.g., scanning) in the transactions before the attack actually starts. Such symptoms can be discovered by detecting the starting point of continuous changes in the transaction data.

We counted the maximum number of transactions having the identical IP address to an identical URL each second. The total length of the data was 1,551,498, and it was very sparse. We refer to this statistic as MNT and employed them as the input sequence for this experiment. Meanwhile, we counted the number of transactions in which the server returned the message 500ServerError. We refer to this statistic as 500SE. 500ServerError is known as a sign of an attack through backdoors. We applied to 500SE the Kleinberg’s burst-detection algorithm [15] [henceforth, *burst detection* (BD)] with base 2 and transition cost 1 in order to detect bursts of 500ServerError messages. The detected bursts can be thought of as the time points for the emergence of the attack. Thus, we utilized it to validate change scores. In summary, we attempted to find the appearance of attacks observing only MNT, without 500SE, and compared the resulting score of change with 500SE.

*r*is comparable to that of BD with a manually chosen hyper-parameters.

#### 6.2.2 Dow Jones returns

^{1}that was originally used in [11] and in [1]. It is a sequence of daily

*returns*of the Dow Jones Industrial Average from July 5, 1972 to June 3, 1975 (top of Fig. 8). The returns are calculated as \(R_n=p_n/p_{n-1}-1\), where \(p_n\) denotes the closing price of day

*n*. In the sequence, the variance of daily returns tended to change continuously or suddenly in association with various world events.

We applied LLR, BOCPD, and CF to this sequence and investigated how their scores were related to the three events: (1) the conviction of G. Gordon Liddy and James W. McCord, Jr. in the Watergate incident; (2) the declaration of the oil embargo by the Organization of Petroleum Exporting Countries (OPEC) against the USA; and (3) the resignation of then President Richard Nixon. Here we took \(b_0=1\). The rest of the regularization parameters of LLR, \(\gamma _0\) and \(\gamma _1\), are tuned to minimize the mean predictive error as well as the discounting rate *r* (Fig. 7).

Figure 8 shows change scores of the respective methods versus time. The bottom three plot in Fig. 8 shows that the two conventional methods clearly captured the latter two events but gave vague or delayed scores for the first one. On the other hand, LLR raised its scores not only for the latter two, but also for the first one. This is supposed to be because a continuous change occurred around the first period.

#### 6.2.3 Tube failure of industrial boiler

Finally, we examined the time series of forty sensors on an industrial boiler, as a typical multivariate time series. This data was provided by Toray Industries, Inc. The duration of the data was about three weeks, and the sampling rate was 1/30 Hz (the total length is \(N=59{,}041\)). The most important fact is that a tube failure in the boiler likely due to its deterioration was logged at the very end of the data. Moreover, it oscillates during a period of eight hours by the normal operation. Thus, the data are highly non-stationary in shorter ranges, but the non-stationarity of data is not essential with respect to the incident.

## 7 Concluding remarks

We proposed a novel model for continuously changing stochastic processes. We also described an online algorithm for estimating their characteristics, by employing a technique for localized linear regression. The estimate is invariant with respect to the parameterization and is computed with an *O*(1)-space and *O*(1)-time updating procedure. We then examined the statistical properties of the estimates and combined them into the novel algorithm for change detection. A criterion for choosing a hyper-parameter *r* of the algorithm was also proposed. In experiments with synthetic datasets, our method outperformed conventional methods in the trade-off between the true-positive rate and the false-positive rate on some synthetic datasets. Specifically, we demonstrated that our method is better at detecting continuous changes, and more robust even in detecting discrete changes. In experiments with real datasets, on the other hand, we saw that there likely exist continuous changes in real-life data and that our method is able to capture them well as expected.

From practical point of view, we recommend practitioners to employ the exponential family of distributions (Sect. 3.1), e.g., the (multivariate) Gaussian, Poisson, exponential, gamma, multinomial distribution. It is ensured to be computationally effective and highly flexible to model the statistical nature of data. If one is willing to explicitly model temporal dependence of data, then autoregressive models with Gaussian noise are available (Sect. 3.2).

One may wonder when the proposed approach can be applied for real-life data. Basically, it is designed to detect continuous and locally linear changes, but, as we have shown in Proposition 2, it can be used for detecting abrupt changes. Moreover, the core theoretical analysis like the parameter independence holds independent of the actual nature of data. On the other hand, asymptotic validation of the method is based on the assumption that data are distributed according to the employed statistical model and that changes can be captured by linear regression. We consider that these assumptions is not very restrictive but unfortunately difficult to verify in practice. Then, we recommend to test several statistical models on training datasets and choose the one with the smallest predictive error (given in Sect. 5).

The scalability is also of great interest in practice. For the length scalability, we have mentioned in Sect. 4 that our method runs in optimal time rate (e.g., linear with respect to the length of the data). For dimensional scalability, it depends on statistical models and individual analyses are needed. For instance, at most square time with respect to the dimensionality is required for multivariate Gaussian distributions, which is likely to be irreducible since just updating the likelihood costs square time.

- (1)
*Further analysis of the statistical properties of the estimates.*The statistical distribution of the estimate \((\hat{\theta }(n), \hat{\delta }(n))\) plays an important role in our methodology. We have shown that it is approximated with \(\chi ^2\) distribution. However, we feel the need of further analysis in cases of strongly correlated processes, specifically on the tail probability of \(\hat{z}(n)\). It induces the desirable value of threshold \(\beta _\alpha \) given the permissible rate of false alarms \(\alpha \). - (2)
*Extension toward a theory of predicting changes.*The starting point for continuous changes can be thought of as a symptom of a big change. In future research, we shall extend our framework to cover other various types of symptoms of changes. - (3)
*Methodology of detecting anomalous changes.*For multidimensional statistical models, changes should be localized to a specific dimension (or tuple of dimensions) of the parameter in order to understand the cause of the change. We demonstrated such a localization technique in an ad hoc manner in Sect. 6.2.3, but more comprehensive research on this topic is a future task. Moreover, it is necessary to discriminate whether such localized changes are anomalous, since some kinds of changes are sometimes not anomalous and out of interest in practical situations.

## Footnotes

## Notes

### Acknowledgements

This work was partially supported by JST, CREST. We express our sincere gratitude to LAC Corporation and Toray Industries, Inc. for providing datasets.

## References

- 1.Adams, R.P., MacKay, D.J.C.: Bayesian online changepoint detection (2007). arXiv:0710.3742
- 2.Basseville, M., Nikiforov, I.V.: Detection of Abrupt Changes: Theory and Application. Prentice-Hall, Englewood Cliffs (1993)Google Scholar
- 3.Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of SIAM International Conference on Data Mining, pp. 443–448 (2007)Google Scholar
- 4.Fawcett, T., Provost, F.: Activity monitoring: noticing interesting changes in behavior. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 53–62 (1999)Google Scholar
- 5.Fearnhead, P., Liu, Z.: On-line inference for multiple change point problem. J. R. Stat. Soc. Ser. B
**69**((Part 4)), 589–605 (2007)MathSciNetCrossRefGoogle Scholar - 6.Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Proceedings of SBIA Brazillian Symposium on Artificial Intelligence, pp. 285–295 (2004)Google Scholar
- 7.Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Computing Surveys (CSUR)
**46**(4), 44 (2014)CrossRefMATHGoogle Scholar - 8.Guralnik, V., Srivastava, J.: Event detection from time series data. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 33–42 (1999)Google Scholar
- 9.Gustafsson, F.: The marginalized likelihood ratio test for detecting abrupt changes. In: IEEE Transactions on Automatic Control, vol. 41, pp. 66–78. IEEE (1996)Google Scholar
- 10.Hinkley, D.V.: Inference about the change-point in a sequence of random variables. Biometrika
**57**(1), 1–17 (1970)MathSciNetCrossRefMATHGoogle Scholar - 11.Hsu, D.A.: Tests for variance shift at an unknown time point. Appl. Stat.
**26**, 279–284 (1977)CrossRefGoogle Scholar - 12.Huang, D.T.J., Koh, Y.S., Dobbie, G., Pears, R.: Detecting volatility shift in data streams. In: IEEE International Conference on Data Mining (2014)Google Scholar
- 13.Ide, T., Kashima, H.: Eigenspace-based anomaly detection in computer system. In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 440–449 (2004)Google Scholar
- 14.Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: Proceedings of the Thirtieth International Conference on VLDB, pp. 180–191 (2004)Google Scholar
- 15.Kleinberg, J.: Bursty and hierarchical structure in streams. Data Min. Knowl. Discov.
**7**(4), 373–397 (2003)MathSciNetCrossRefGoogle Scholar - 16.Miyaguchi, K., Yamanishi, K.: On-line detection of continuous changes in stochastic processes. In: IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 1–9. IEEE (2015)Google Scholar
- 17.Page, E.: Continuous inspection schemes. Biometrika
**41**(1/2), 100–115 (1954)MathSciNetCrossRefMATHGoogle Scholar - 18.Takahashi, T., Tomioka, R., Yamanishi, K.: Discovering emerging topics in social streams via link anomaly detection. IEEE Trans. Knowl. Data Eng.
**26**(1), 120–130 (2014)CrossRefGoogle Scholar - 19.Takeuchi, J., Yamanishi, K.: A unifying framework for detecting outliers and change points from time series. IEEE Trans. Knowl. Data Eng.
**18**(18), 676–681 (2006)Google Scholar - 20.Urabe, Y., Yamanishi, K., Tomioka, R., Iwai, H.: Real-time change-point detection using sequentially discounting normalized maximum likelihood coding. In: Advances in Knowledge Discovery and Data Mining, pp. 185–197. Springer (2011)Google Scholar
- 21.Yamanishi, K., Takeuchi, J.: A unifying framework for detecting outliers and change points from non-stationary time series data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 676–681 (2002)Google Scholar
- 22.Žliobaitė, I.: Learning under concept drift: an overview. Technical Report, Faculty of Mathematics and Informatics, Vilnius University, Vilnius, Lithuania (2009)Google Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.