1 Introduction

The volume of automatically generated data is constantly increasing [39], which leads to more urgent demand for real-time analysis [34]. Conventional statistical and data mining techniques are usually designed for offline situations and are therefore often inappropriate for real-time analysis [21]. Thus, a wide range of streaming algorithms is continuously being developed by the data mining community that address a range of real-time tasks, such as clustering, filtering, cardinality estimation, estimation of moments or quantiles, predictions, dimensionality reduction, and anomaly detection [19, 34].

Concept drift, a well-known issue when working with data streams, refers to unforeseeable changes in the underlying distribution of streaming data over time [28]. Therefore, the prediction error for a machine learning method trained on historic data usually increases when faced with concept drift. Consequently, machine learning methods should be retrained on more recent data to adapt to the concept drift. A range of different methods has been developed to detect and adapt to concept drift, and a review is provided in Section 2.

In this paper, we address the problem of adapting to concept drift when the objective is to track quantiles of the data stream distribution. A data stream that changes from slow to rapid variations or experiences changes in the scale of the data over time is a typical example of concept drift. To the best of our knowledge, this is the first work in the literature to address this research problem. When quantiles are estimated in real time, the problem is usually referred to as quantile tracking [16]. Quantile tracking has found a wide range of applications, a review of which is provided in Section 2. In this paper, we focus on a family of lightweight and efficient estimators called incremental quantile estimators. The estimators perform small updates of the quantile estimate every time a new sample is received from the data stream. Incremental quantile estimators document state-of-the-art performance on quantile tracking [14, 16].

Learning under concept drift is traditionally based on two steps. The first step involves a method for detecting concept drift, while the second step involves a method for adaptation when concept drift is detected, which is typically based on retraining the machine learning method [28]. However, this two-step procedure is not optimal for making quantile tracking algorithms adapt to concept drift. First, the retraining of incremental quantile tracking algorithms is computationally heavier than running the algorithms. Secondly, incremental quantile algorithms usually only have very few parameters. This opens up for a single-step, fast approach instead of the two steps outlined above. More specifically, in this paper, we suggest a novel approach to optimize the parameters of the quantile tracking algorithms in each iteration based on monitoring the expected quantile loss, a loss function commonly used in quantile regression [20]. The main contributions of the paper can be summarized as follows:

  • We present two new methods for efficient quantile tracking under concept drift. The methods can be coupled with any quantile tracking algorithm to improve tracking performance under concept drift. To the best of our knowledge, these are the first methods in the literature to address the issue of quantile tracking under concept drift.

  • The methods have appealing theoretical properties in the sense that, for stationary data streams, the resulting procedures will converge to the true quantile. We prove convergence using the asymptotic theory.

  • Our experiments show that the suggested procedures clearly improve the tracking performance of existing quantile estimators for data streams with concept drift. The performance is close to the theoretically optimal performance, i.e., the performance when using the optimal values of the parameters in the quantile tracking algorithm in every iteration.

  • The real-life applicability of the procedures is demonstrated using three large datasets related to Twitter event detection, activity recognition, and stock trading.

The paper is organized as the following: Section 2 presents related work. Section 3 gives a short presentation of incremental quantile estimators. Section 4 explains how to estimate the current quantile tracking error, and how to use it to adapt to concept drift. Sections 6 and 7 evaluate the suggested procedures using synthetic and real-life data streams, while Section 8 consists of some closing remarks.

2 Related work

2.1 Incremental quantile algorithms

Quantiles are useful for characterizing data stream distribution in a flexible and non-parametric way [29]. They have been used for a range of applied intelligence tasks, such as real-time classification [13], concept drift detection [14], anomaly detection [11], portfolio risk measurement in the stock market [1, 8], fraud detection [49], signal processing and filtering [44], climate change monitoring [50], Service Level Agreement (SLA) violation monitoring [42, 43], network monitoring [6, 27], structural health monitoring [9], non-parametric statistical testing [23], concept drift detection [14], and Tukey depth estimation [11, 33].

Incremental algorithms are an important class of methods for addressing the problem of quantile tracking, but the research on such methods is still quite sparse. Tierney [46] introduced the concept of incremental quantile estimators, which was originally designed to work on static data streams. A more recent incremental quantile estimation approach is the Frugal algorithm of Ma et al. [30], which can also be applied to dynamically varying data streams. Yazidi and Hammer [48] suggested using the DUMIQE algorithms, while Tiwari and Pandey [47] proposed DQTRE and DQTRSE, which can all be seen as multiplicative variants of the Frugal algorithm. The multiplicative variants are more efficient at adapting to changes in the scale of the data. A weakness of the above estimators is that they do not use the magnitude of the observations when updating the current estimate, which can result in slow convergence. To address this issue, Hammer et al. [14] suggested the QEWA algorithm, where the update size is proportional to the current tracking error. This algorithm has documented efficient performance, but it has a disadvantage compared to most other incremental quantile estimators because it is not robust to outliers in the data stream. The reason for this is the fact that the observations are used directly in the updating of the running estimate. One limitation of the estimators above is that they can only track a single quantile. Obviously, the estimators can be run in parallel to track many quantiles, but the monotone property of quantiles can then be violated. Some methods have recently been developed to address this issue by allowing joint tracking of multiple quantiles [15, 16]. We will not consider tracking of multiple quantiles in this paper, although our suggested concept drift adaptation methods can also be applied to these algorithms. There is a gap in the current research within incremental quantile estimation related to finding suitable values for the tuning parameters, and how to adjust the values if the properties of the data stream change (concept drift) which is the specific issue addressed in this paper.

2.2 Concept drift

Methods for adapting to concept drift mainly consist of two parts: i) a method to detect concept drift and ii) a method to adapt after concept drift is detected.

One of the first methods for concept drift detection was the Drift Detection Method (DDM) of [7], which is based on testing whether the prediction error significantly increased within a recent time window. However, a disadvantage of window-based approaches is that every sample in the window is weighted equally. Another computationally efficient approach that addresses this issue relies on tracking the current prediction error using the exponentially weighted moving average [12, 40].

For concept drift adaptation, the most natural approach is probably to retrain the machine learning methods based on recent data. An example is the paired learner, which uses a stable learner and a reactive learner using only recent data [4]. There are also researchers who attempt to integrate concept drift detection with retraining, like [26] for Extreme Learning Machine. However, a weakness of these approaches is that it is computationally demanding to repeatedly retrain a machine learning method. To address this issue, Sun et al. [45] suggest a strategy to build up an ensemble of trained models to reduce the amount of retraining required when faced with recurring drift. Pratama et al. [37] suggested only updating parts of the parameters of the machine learning method, thereby saving computational resources.

Class imbalance is another well-known issue in machine learning. Recent works like [2, 24, 38] address this challenge when faced with concept drift. Concept drift methods usually assume that class labels are immediately available. Mahdi et al. [31] addressed the problem of concept drift detection in the opposite case when class labels are not available, whereas the work reported in [25] suggests a method for making an ensemble of trained models as efficient as possible by introducing diversity measures.

3 Incremental quantile algorithms

Let \(X_{t} \sim f_{t}(x)\) represent possible outcomes from a data stream at time t, xt a random sample, and Qt,q the quantile associated with probability q, i.e., P(XtQt,q) = Ft(Qt,q) = q.

Incremental quantile algorithms update a quantile estimate every time a new observation is received. The algorithms are initiated with an estimate \(\widehat {Q}_{0,q}(\lambda _{t})\) that is further recursively updated

$$ \begin{array}{@{}rcl@{}} \widehat{Q}_{t+1,q}(\lambda_{t}) &\leftarrow& \widehat{Q}_{t,q}(\lambda_{t}) + \lambda_{t} D_{1}\left( q, \widehat{Q}_{t,q}(\lambda_{t})\right) \text{ if } x_{t} \\&\geq& \widehat{Q}_{t,q}(\lambda_{t}) \\ \widehat{Q}_{t+1,q}(\lambda_{t}) &\leftarrow& \widehat{Q}_{t,q}(\lambda_{t}) - \lambda_{t} D_{2}\left( q, \widehat{Q}_{t,q}(\lambda_{t})\right) \text{ if } x_{t} \\&<& \widehat{Q}_{t,q}(\lambda_{t}) \end{array} $$
(1)

where the functions D1 and D2 are positive and can be deterministic or random. The estimation procedure is intuitive in the sense that, if the received observation is above (below) the current estimate, the estimate is increased (decreased). The functions are typically further constructed to ensure that the estimator converges to the underlying true quantile [46]. A prominent example is the deterministic-based multiplicative incremental quantile estimator (DUMIQE) [48] where \(D_{1}(q, \widehat {Q}_{t,q}(\lambda _{t})) = q \widehat {Q}_{t,q}(\lambda _{t})\) and \(D_{2}(q, \widehat {Q}_{t,q}\) \((\lambda _{t})) =(1 - q) \widehat {Q}_{t,q}(\lambda _{t})\). Another example is the Frugal estimator [30], where \(D_{1}(q, \widehat {Q}_{t,q}(\lambda _{t})) = I(q < U)\) and \(D_{2}(q, \widehat {Q}_{t,q}(\lambda _{t})) = I(1-q < U)\), and where U denotes a uniformly distributed number on [0,1] and I(⋅) the indicator function.

The tuning parameter λt determines the update size in each iteration. If the data stream distribution changes rapidly (slowly) with time, a high (small) value should be used. Furthermore, the step size should be adjusted to the scale of the data.

4 Adaptive quantile tracking under concept drift

We consider the problem of predicting Qt,q in every iteration using an incremental quantile algorithm. However, such an algorithm consists of one or more parameters, e.g., the λt from the previous section. If the properties of the data stream change (concept drift), the values of these parameters must be adjusted to maintain efficient tracking. See the bottom panel of Fig. 2 for an example where the data stream concept changes from rapid variations to slow variations on iteration 10000.

To adapt to concept drift, we suggest monitoring the current quantile tracking error using expected quantile loss (EQL), a popular loss function in quantile regression. The quantile loss on iteration t is

$$ \text{QL}_{t}(\lambda_{t}) = \begin{cases} (1 - q)\left( x_{t} - \widehat{Q}_{t,q}(\lambda_{t})\right) & \text{ if } x_{t} \!\geq\! \widehat{Q}_{t,q}(\lambda_{t}) \\ q\left( \widehat{Q}_{t,q}(\lambda_{t}) - x_{t}\right) & \text{ if } x_{t} \!<\! \widehat{Q}_{t,q}(\lambda_{t}) \end{cases} $$
(2)

To estimate the current expected quantile loss, we use the exponentially weighted moving average [12]

$$ \widehat{\text{EQL}}_{t}(\lambda_{t}) = (1 - \gamma) \widehat{\text{EQL}}_{t-1}(\lambda_{t}) + \gamma \text{QL}_{t}(\lambda_{t}) $$
(3)

A common strategy for learning under concept drift is to 1) use the error measure \(\widehat {\text {EQL}}_{t}(\lambda _{t})\) to detect concept drift, and 2) to adjust the values of the parameter of the algorithm. However, since the number of parameters of quantile tracking algorithms is usually small, we suggest avoiding the detection part and instead continuously updating the values of the parameters of the quantile tracking algorithms in every iteration. The method will therefore adapt more rapidly to concept drift, especially in cases where the drift occurs gradually. We call the method Oracle. It is described below.

Oracle

Let the values \(\lambda _{1}^{\ast } < \lambda _{2}^{\ast } < {\cdots } < \lambda _{L}^{\ast }\) span all reasonable values for the tuning parameter and track \(Q_{t,q}(\lambda _{l}^{\ast })\), l = 1,…,L using (1). For each \(\widehat {Q}_{t,q}(\lambda _{l}^{\ast })\), compute the associated EQL, \(\widehat {\text {EQL}}_{t}(\lambda ^{\ast })\), l = 1,…,L, using (3). Let \(\lambda _{t} = \arg \min \limits _{\lambda _{l}^{\ast }, l \in 1,\ldots ,L} \widehat {\text {EQL}}_{t}(\lambda _{l}^{\ast })\) and let the current quantile estimate be given by \(\widehat {Q}_{t,q}(\lambda _{t})\).

Naturally, there may be large fluctuations in the values of λ used, and updates may be limited to only neighbouring values (friction), but we have not explored this further in this paper. We denote this approach the Oracle approach, since we can imagine an Oracle that monitors the individual quantile tracking procedures and uses the estimated EQLs to select the best current quantile estimate, without disturbing the quantile or EQL tracking procedures. The procedure is illustrated in Fig. 1, where the objective is to track the q = 0.7 quantile of the data stream (gray dots). In the bottom left panel, the quantile is tracked with a small update size, \(\lambda _{1}^{\ast }\), resulting in a high estimation bias (the proportion of observations below the quantile estimates is far below the target q = 0.7). Further, in the bottom right panel, the quantile is tracked with a large update size, \(\lambda _{L}^{\ast }\), resulting in high tracking variance. The estimator in the middle panel seems to have both a fairly small bias and a variance resulting in a small EQL.

Fig. 1
figure 1

An overview of the Oracle approach. In the bottom row, the gray dots show observations from the data stream. The black curves show tracking of the q = 0.7 quantile for different step sizes (\(\lambda _{1}^{\ast }\), \(\lambda _{2}^{\ast }\) and \(\lambda _{L}^{\ast }\)). The information in the panels is used to estimate expected quantile loss for each value of the step size using the procedure in Section 4. The expected quantile loss estimates are sent to the Oracle, which selects the currently best quantile estimate in each iteration

A potential challenge if this approach is used is that, with limited knowledge about the data stream, it may be difficult to select the range of λ’s. The solution is that, if the Oracle selects the quantile estimate for a λ close to \(\lambda _{1}^{\ast }\) (or \(\lambda _{L}^{\ast }\)), additional estimators using values of λ less than \(\lambda _{1}^{\ast }\) (or above \(\lambda _{L}^{\ast }\)) are included.

The approach entails storing a total of 2L values and performing a total of 2L operations (storing and updating the quantile estimate and the expected quantile loss) for each sample received from the data stream. A typical value for L is 100, and thus for extremely massive data streams, this may be a computational challenge. An example is when incremental quantile estimators are used to track depth contours, which requires the tracking of thousands or even millions of quantile estimates [11].

HIL

This approach follows the traditional approach to handling concept drift, namely to first detect and then adapt to concept drift. The method will be computationally less demanding than the Oracle approach, but at the cost of adapting less efficiently to concept drift. The approach only tracks the quantile for a high, intermediate and low (HIL) value of the update size, and therefore only entails storing a total of 2 ⋅ 3 values and performing a total of 2 ⋅ 3 operations for each received data stream sample. For convenience of notation, the subscript t is avoided.

  • Run three quantile tracking estimators in parallel using tuning parameters λ1ow = λ/a, λintermediate = λ and λhigh = aλ, a > 1, and track the EQL for each of them.

  • Every M iterations, update λ:

    • If EQL is smallest for λ1ow (or λhigh), reduce (or increase) the value of the tuning parameter for the three estimators by setting \(\lambda \leftarrow \lambda /a\) (or \(\lambda \leftarrow a\lambda \)). Restart the three quantile estimators initialized with the currently best quantile estimate, i.e., for λ1ow (or λhigh).

    • If EQL is smallest using λintermediate, no updates are done.

The current quantile estimate is given by \(\widehat {Q}_{t,q}(\lambda )\). Concept drift usually happen slowly or rarely, and it makes sense to use a high value of M, say 103. Further, the values of the tuning parameter γ should be chosen so that, when an update of λ is performed, the estimate of the EQL has converged at the same time as most of the information since the last update of λ is used. A simple rule of thumb is to set \(\gamma = 1 - \sqrt [M]{0.01}\), which means that the weight of the Mth term of the exponentially weighted sum in (3) is 0.01. This worked well in our experiments. Data streams follow different periodic patterns most of the time, thus randomly selecting when to update λ can sometimes be useful.

A challenge posed by the HIL approach is that the quantile estimates and associated EQLs must be restarted after an update and must have converged before a new update is performed. This limits how rapidly and smoothly the procedure can adapt to concept drift. Further, since λ is rarely updated, a fairly large value of a, say 2, must be used, thereby limiting any fine tuning of λ. The Oracle approach is not burdened with these challenges.

5 Asymptotics

A natural requirement of the suggested procedures is that, if the data stream distribution does not change with time (stationary stream), the quantile estimate should converge to the true quantile as time goes to infinity. This is confirmed by the following theorem for the HIL approach (a similar theorem can also be set up for the Oracle approach, which is explained below).

Theorem 1

Assume a stationary data stream, and let Qq be the true quantile to be estimated. Further assume that D1 and D2 are selected, such that \(\widehat {Q}_{t,q}(\lambda _{t})\) satisfies the assumptions for convergence of Markov processes [35, 36]. Using the HIL approach with at = (t + M)p/tp,0 < p < 1, then

$$ \begin{array}{@{}rcl@{}} &&\underset{t \to \infty}{\lim} \lambda_{t} = 0 \end{array} $$
(4)
$$ \begin{array}{@{}rcl@{}} &&\underset{t \to \infty}{\lim} \widehat{Q}_{t,q}(\lambda_{t}) = Q_{q} \end{array} $$
(5)

The proof is provided in Appendix ??. The overall intuition is that, if the data stream is stationary, the HIL procedure will iteratively select a smaller and smaller step size, and the estimator will converge to the true quantile. For the Oracle approach, we suggested including values less than \(\lambda _{1}^{\ast }\) if the Oracle selected the quantile estimate for a λt close to \(\lambda _{1}^{\ast }\). Thus, a similar argument can be set up to prove the convergence of the Oracle approach.

Most of the recently developed incremental quantile estimators satisfy the assumptions for convergence of Markov processes referred to in the theorem, see, e.g., the proofs in [14, 15, 48]. Therefore, by using one of these incremental quantile estimators in combination with either the HIL or the Oracle procedures, convergence is guaranteed.

6 Synthetic experiments

Consider a normally distributed data stream where the expectation changes between slow and rapid dynamics (concept drift)

$$ \begin{array}{@{}rcl@{}} f_{t}(x) &=& N\left( \mu + b \sin\left( \frac{2\pi}{\tau(n)}n\right), \sigma\right) \\ \tau(n) &=& \tau_{1} I(\text{mod} n < T) + \tau_{2} I(T \leq \text{mod} n < 2T) \end{array} $$
(6)

with τ1 = 500, τ2 = 104, μ = 8, b = 2, σ = 1, T = 104. We tracked this data stream using the DUMIQE algorithm. To estimate the step length λ in each iteration, we used the the Oracle approach with the following values for the update sizes \(\lambda _{1}^{\ast } = \exp (-7), \lambda _{2}^{\ast } = \exp (-6.95), \ldots , \lambda _{L}^{\ast } = \exp (0)\). We estimate the expected quantile loss using (3) with \(\gamma = 1 - \sqrt [1000]{0.01} = 0.005\), following the rule of thumb suggested above.

Figure 2 shows the estimated expected quantile loss (top left) and the estimated values of the step length (top right) in each iteration and the actual quantile tracking (bottom panel). In the upper left panel, we see that quantile loss quickly decreases and then stabilizes. The initial value of the quantile tracking algorithm is outside the data stream distribution. Therefore, in the initial phase, the quantile loss is high and a high value of λ (update size) is used for the tracking algorithm to quickly get within the data stream distribution. When the quantile tracking algorithm is within the data stream distribution, a smaller λ is optimal and the value of λ is reduced. The expected quantile loss is reduced accordingly. The gray lines in the top right panel show the optimal value of λ for the data stream dynamics before and after the concept change at iteration 10000. We see that a larger step size is needed to efficiently track rapid variations. We further see that the Oracle approach uses step lengths close to the optimal step lengths.

Fig. 2
figure 2

Diagnostics plots when tracking the q = 0.7 quantile using the Oracle approach. The top left panel shows the estimated QL in every iteration. The top right panel shows the resulting recursive updating of λ. The symbol λopt refers to the value resulting in the minimal estimated QL. The gray lines refer to theoretically optimal values of λ. The bottom panel shows the resulting tracking. The gray dots represent the data stream in (6) and the black line is tracking

We now analyze the performance of the suggested procedures in more detail for data streams with concept drift. In particular, we analyze how the efficiency of the concept drift adaptation depends on the parameter γ in (3). Consider again the data stream in (6) as well as a χ2 distributed stream

$$ \begin{array}{@{}rcl@{}} f_{t}(x) &=& \chi^{2}\left( \nu + b \sin\left( \frac{2\pi}{\tau(n)}n\right)\right) \\ \tau(n) &=& \tau_{1} I(\text{mod} n < T) + \tau_{2} I(T \leq \text{mod} n < 2T) \end{array} $$
(7)

using the same values τ1, τ2, T and b as above, and ν = 6. χ2(ν) represents the χ2 distribution with ν degrees of freedom. The χ2 distributed stream is challenging since both the expectation and variance change with time. For the HIL approach, a = 1.5 and M = 1000 + U were used, where U was uniformly distributed on the interval [0,1000], i.e., λ was updated on average every 1500 iterations. We tracked the q = 0.5,0.7 and 0.9 quantiles. To remove any Monte Carlo error, data streams were run for a total of N = 107 iterations and the observed tracking MSE was computed:

$$ \text{MSE} = \frac{1}{N} \sum\limits_{t=1}^{N} \left( \widehat{Q}_{t,q} - Q_{t,q}\right)^{2} $$
(8)

Figure 3 shows the tracking error as a function of γ for the data streams above, which contain concept drift.

Fig. 3
figure 3

The left and right panels show results for the normal and χ2 distributed data streams. The rows from top to bottom show results from tracking of the q = 0.5,0.7 and 0.9 quantiles, respectively

Let MSE\(_{\tau _{1}}^{\text {theo}}\) and MSE\(_{\tau _{2}}^{\text {theo}}\) represent the theoretically minimum tracking MSE for a data stream with constant fast dynamics (τ1) and constant slow dynamics (τ2), respectively. These errors were found by running the DUMIQE algorithm for a range of values of λ to find the values that minimized the tracking error for fast dynamics and slow dynamics. The data streams in examples (6) and (8) consist of an equal amount of fast and slow dynamics. By tracking with optimal values of λ for fast and slow dynamics and instantaneous adaptation to concept drift, the minimum tracking error will be \(0.5 \text {MSE}_{\tau _{1}}^{\text {theo}} + 0.5 \text {MSE}_{\tau _{2}}^{\text {theo}}\), which is shown as gray solid lines in Fig. 3. The gray lines therefore represent the theoretically minimal tracking error. We further computed the minimum tracking MSE using a constant value of λ, which is shown as gray dashed lines. The results show that, for all the cases, the Oracle approach tracks the true quantile with an error that is just slightly above the theoretically optimal tracking error. The HIL approach also performs well, but not as well as the Oracle approach, which is as expected given the discussion at the end of Section 4. Optimal values of the tuning parameter, γ, are close to the suggested rule of thumb \(1 - \sqrt [M]{0.01} = 0.0031\).

7 Real-life data examples

In this section, we present three real-world data examples for benchmarking the procedures. The examples are related to Twitter streaming data, activity monitoring, and stock trading.

7.1 Twitter

Twitter data streams have been used for many interesting applications, such as predicting election outcomes and detecting natural disasters and other real world events [3, 17].

We consider a dataset consisting of the time stamp for every tweet posted by Norwegian users before and after the terrorist attack on July 22, 2011 [41]. The time stamps were given in seconds, and to reconstruct a representation of the true time stamps a uniformly drawn value between zero and one second was added to each time stamp.

Let Tt represent the time stamp when tweet number t was posted. We consider the problem of tracking quantiles of the quantity Rt = (TtTt− 1)− 1, which can be interpreted as the frequency of posted tweets. The quantity can, for instance, be used for real world event detection in the sense that if the number of received tweets is increasing, Rt will increase.

The terrorist attack was initiated by a bomb exploding in Oslo on July 22 at 3:25 p.m. local time, which created a significant concept drift in the data stream distribution. We will evaluate how well the methods in this paper are able to adapt to the concept drift in the dataset. We tracked quantiles of Rt using the Frugal estimator [30] and the Oracle approach was used for adaptation. We used γ = 0.005 in accordance with the rule of thumb in Section 4.

The results are shown in Fig. 4. The black curve in the upper panel shows the tracking of the q = 0.7 quantile of Rt. The gray dots show the observed Rt. The gray and black curves in the bottom panel show the value of λ in every iteration and a moving average, respectively. The Frugal estimator was initiated with λ = 1 and the procedure rapidly adjusted the step length to a more suitable value.

Fig. 4
figure 4

Twitter data example. Upper panel: Gray dots show the observed data stream Rt (every 10th shown) and the black curve tracking of the q = 0.7 quantile using the Frugal estimator. Bottom panel: The values of λ are shown in gray, and the black curve is a moving average of the λ values

For the period until the bomb exploded, it is a trend that λ is adjusted to lower values during night time, which makes sense since both the scale and dynamics of Rt are smaller. When the bomb exploded, the values of Rt rapidly increased (concept drift), and the value λ therefore rapidly increased to be able to efficiently track under the increased scale and dynamics.

7.2 Activity monitoring

Activity recognition is a popular machine learning task where the goal is to use sensors to automatically detect and identify the activity of a user. For instance, activity recognition could be used to assess whether a person is performing a healthy amount of exercises or to detect accidents such as falls. In this experiment, we will track quantiles of accelerometer data that are available on almost any modern cell phone or smart watch device. Tracking quantiles can be used to detect when a user changes activity [10] or to classify the current activity of the user [5].

We consider an accelerometer dataset from the Wireless Sensor Data Mining (WISDM) project [22]. Accelerations in x, y, and z directions were observed, with a frequency of 20 observations per second, while users were performing the following activities: walking, jogging, walking up a stairway and walking down a stairway.

Figure 5 shows the tracking of the x acceleration of an arbitrary user. We see that, when the accelerometer distribution is stable over time, the Oracle approach uses a small value of λ, e.g, for the two first activities. When the user changed activity, an immediate change in acceleration was observed. The value of λ (step length) was rapidly increased for the quantile tracking to efficiently adapt to the new accelerometer distribution. Further, the value of λ was gradually reduced when the accelerometer distribution stabilized for the new activity. From around 20 minutes, the user changed activity far more frequently, and higher values of λ were therefore used to ensure efficient quantile tracking.

Fig. 5
figure 5

Accelerometer data example. Upper panel: Gray dots show the accelerometer observations and the black curve the tracking of the q = 0.7 quantile using the Frugal estimator. The gray vertical lines show when the user changed activity. Bottom panel: The values of λ

7.3 Stock trading

High frequency data are highly abundant in stock trading, and they need to be analyzed efficiently in real-time [18]. Figure 6 shows tracking of the q = 0.7 quantile of the number of traded Tesla Inc. shares on the The New York Stock Exchange [32]. When the number of traded shares increased rapidly, the value of λ was increased to rapidly adapt to the rapid change in scale and dynamics (concept drift). The value of λ further decreased slightly when the trade volume (dynamics) stabilized at a high level.

Fig. 6
figure 6

Stock trading example. Upper panel: Gray dots show the number of traded Tesla, Inc. shares, while the black curve shows the tracking of the q = 0.7 quantile using the Frugal estimator. Bottom panel: The values of λ

The three real-life data experiments demonstrate that the suggested procedure is able to rapidly adjust λ to maintain efficient quantile tracking under concept drifts.

8 Closing remarks

Surprisingly little attention has been paid to automatic adjustment of the values of the parameters of incremental quantile algorithms. In this paper, we develop two simple procedures to address this problem. Both procedures are based on estimating the current tracking quantile loss and using this to efficiently track the true quantiles. The Oracle approach tracks the quantile and associated quantile loss for a wide range of values of the algorithm parameters and, in each iteration, selects the quantile estimate with the minimal estimated quantile loss. The second approach only tracks the quantile for three values of the tuning parameter and repeatedly forgets the estimate with the highest estimated quantile loss and adds a quantile estimator for another value of the tuning parameter. Both methods are computationally and memory efficient, since only a limited set of quantities needs to be computed and stored in each iteration.

The results show that the methods are highly efficient for adjusting the value of λ to achieve efficient tracking. The synthetic experiments showed that the resulting tracking error is close to the theoretical minimum. The Oracle performs best, but at a higher computational cost. The real-life data examples demonstrated that the procedures were able to adapt to concept drift for complex and massive real-life data streams. In future work, we would like to test the methods on data with different properties, such as mental health activity data that often show rapid changes in the behaviour of the patient.