Introduction

Presently, there is an explosion of data in practical applications such as online shopping [5], email [4], stock exchange [14] and more. These data comprise continuously generated, real-time streaming data and differ significantly from traditional static datasets. Amidst this vast data landscape, data mining technology plays a crucial role in extracting useful insights, with mining knowledge from data streams being particularly significant. However, traditional offline algorithms [18] struggle to swiftly adapt to dynamic and continuous data streams as they require iterative solutions to specific optimization problems based on static and quantitative training data. In contrast with offline algorithms, online algorithms [36] enable continuously incremental participation of training sample point and therefore excel in scenarios involving stream data mining.

The burgeoning importance of online learning from the stream data lies in its capability to detect abnormal situations and learn from them, e.g. to identify abnormal traffic patterns [30] or network attacks [37] occurred first and pinpoint similar threats [35] thereafter. However, stream data mining faces challenges due to concept drift, referring to the shift in the target concept caused by evolving data distributions over time [17]. Concept drift research can be categorized into three aspects [22]: understanding, detection, and adaptation. Concept drift understanding [19] retrieves the time, extent and region in which concept drift occurs and lasts. Concept drift detection [1, 23] determines whether concept drift occurs by a change in the distribution of data or a change in time intervals. Yu et al. [34] designed an online learning algorithm to obtain distribution-independent test statistic for detecting concept drift in multiple data streams. Concept drift adaptation [20] refers to updating the current model based on drift. Yu et al. [33] developed a continuous support vector regression to address regression problems in nonstationary streaming data, and incorporated an incremental solving similarity term into quadratic programming problem. This paper belongs to the concept drift adaptation category as we believe the ultimate goal of stream data mining is to consistently adapt the model whenever new concepts appear, detection and understanding can be seen as some former procedures serving for the concept drift adaptation.

A variety of online learning based concept drift adaptation strategies have been proposed in literature. For instance, Yu et al. [32] employed a meta-learning approach and learning-to-adapt framework to address concept drift. Liu et al. [21] proposed a fast switch Naive Bayes application, resolving issues related to the impact of training data volume on the model. Li et al. [15] introduced an incremental semi-supervised learning algorithm based on neural networks tailored for stream data. Additionally, WIDSVM [6] is an algorithm utilizing weighted moving window techniques to derive an incremental-decremental Support Vector Machine (SVM) for traversing concept-drifting data streams. Among miscellaneous online learning algorithms, single pass stands out for its efficiency in processing one sample point at a time, and inspecting it only once at most. Currently, there exist online algorithms [9, 24] tailored for single pass over the stream data by converting the problems of classification into Minimum Enclosing Ball (MEB). However, these methods mainly focus on expanding the ball to enclose the new data. An excessively large ball might overwrite data of the new concept, creating difficulty in triggering the model updating process.

To address the challenges above, this paper presents a new framework based on MEB for concept drift adaptation on stream data, and then proposes three methods built upon this framework. When new sample points arrive, these methods adjust the ball dynamically by expanding or contracting to avoid the issue of excessively large balls, and all of them require only a single pass through the data stream. The study conduct experiments on 7 synthetic datasets and 5 real-world datasets, and compare the proposed methods with StreamSVM [24], Naive Bayes [38], Hoeffding Tree [11], ADWIN [2], and OSVM-SP [9]. Experimental results on prequential error, support vectors percent, and radius demonstrate that the SCDA-III method performs best in predictive performance, memory usage, and the scalability of the ball.

This paper is structured as follows: “Related work” presents an overview of related work in concept drift adaptation. “Preliminaries” reviews the formulations of SVM and MEB. “Methodology” provides an elaborate introduction of the proposed framework and methods. “Experiments” showcases the experimental results and analysis, and “Conclusion” summarizes and offers some promising directions in the future.

Related work

Active, passive adaptation for concept drift

Concept drift adaptation is categorized into two approaches: active adaptation [10] and passive adaptation [8]. Active adaptation involves proactive detection upon data arrival to identify and update the model for new concepts. However, determining the necessary parameters for active detection proves challenging. To address repeated concept drift and limited labeled data, Wen et al. [31] leveraged a specialized concept classifier pool and historical classifiers to label data, enhancing concept drift detection efficiency. In contrast, passive adaptation updates the model without actively detecting concept drift upon data arrival. Li et al. [15] introduced an incremental semi-supervised learning framework that resolves parameter issues and offers an efficient incremental semi-supervised algorithm. The methods proposed in this paper fall under passive adaptation.

Online, incremental learning

Concept drift adaptive model learning from stream data involves two main methods: online learning [28] and incremental learning [16]. Incremental learning entails dividing data into blocks and updating the model when a specific data block threshold is reached. While block updating models can capture data distributions, determining the necessary parameters for data blocks poses challenges. Wang et al. [29] successfully employed adaptive windowing, assigning weights to sample point to handle concept drift with notable effectiveness. Similarly, Silva et al. [25] introduced the SSOE-FP-ELM algorithm, which utilizes parameterized forgetting when traversing data blocks to detect concept drift. Online learning involves continuous learning of individual samples, enabling real-time model updates without the need to accumulate a specific data volume before updating.

Single pass, multiple passes online learning

Online classification algorithms scanning stream data can be categorized into two groups: multipass and single pass [12]. Multipass algorithms require multiple iterations through the data during model training to reach an optimal or convergent solution. However, this characteristic often renders the online learning process inefficient. In contrast, single-pass algorithms involve scanning the data only once during training, making them more suitable for large-scale real-time data processing scenarios.

StreamSVM is an adaptive method for handling concept drift. Utilizing a single pass, StreamSVM converts the classification problem into a minimum enclosing ball dilemma. When new sample points fall outside the existing ball, StreamSVM expands the ball to encompass the new point. However, this expansion strategy leads to unnecessary coverage of vast areas when the new point is far from the existing ball. StreamSVM’s limitation lies in its inability to shrink the ball, resulting in infinite expansion. Similarly, OSVM-SP adopts the minimum enclosing ball method, expanding the old ball’s radius to cover new sample points. However, like StreamSVM, OSVM-SP lacks the capability to shrink the ball, exhibiting limited flexibility. The SCDA framework created in this paper also belongs to single-pass traversal.

Preliminaries

This section introduce the formulations of batch mode SVM and MEB, along with their equivalence, all of which bear close relevance to our methods.The primary symbols utilized in this article are presented in Table 1.

Table 1 List of symbols

Support vector machine

Let S be a training set with N labeled sample points, \(S = \{(\varvec{x}_i, y_i)|\varvec{x}_i\in \mathbb {R}^{d \times 1}, y_i\in \{1, -1\},i = 1,\ldots , N\}\), the objective of SVM is to find a hyperplane in reproducing kernel Hilbert space (RKHS) which separates sample points of different classes in the training set, and generalizes well on the unseen data. According to the Karush–Kuhn–Tucker (KKT) theorem [13], the dual problem of SVM is listed in Eqs. (1a) (1b) [9].

$$\begin{aligned}{} & {} \begin{aligned} \max \limits _{\varvec{\alpha }} - \sum _{i = 1}^{N}\sum _{j = 1}^{N}\alpha _i\alpha _jy_iy_j\left( k_\phi (\varvec{x}_i, \varvec{x}_j) + \frac{\delta _{ij}}{C}\right) \end{aligned} \end{aligned}$$
(1a)
$$\begin{aligned}{} & {} \begin{aligned} s.t. \quad \varvec{\alpha }\geqslant \varvec{0}, \varvec{\alpha }'\varvec{1} = 1. \end{aligned} \end{aligned}$$
(1b)

The variable \(\varvec{\alpha }\) in Eq. (1a) is a vector consisting of N Lagrange multipliers, \(\varvec{\alpha } = [\alpha _1, \ldots , \alpha _N]'\in \mathbb {R}^{N \times 1}\). For any sample point \(\varvec{x}\), SVM predicts its label by \(y = \text {sig}(f(\varvec{x}))\), in which \(\text {sig}(\cdot )\) represents the sign function, and function \(f(\varvec{x})\) is the output of SVM on sample point \(\varvec{x}\) which can be computed by:

$$\begin{aligned} f(\varvec{x}) = \sum _{i = 1}^{N}\alpha _iy_ik_\phi (\varvec{x}_i, \varvec{x}). \end{aligned}$$
(2)

Minimum enclosing ball

Let \(\{\varvec{x}_i\in \mathbb {R}^{d \times 1}|i = 1,\ldots , N\}\) be a training set with N sample points, the central idea of MEB is to find a minimal ball enclosing all of them. The primal problem of MEB is listed as follows [26, 27]:

$$\begin{aligned}{} & {} \min \limits _{\varvec{c}, r} r^2 \end{aligned}$$
(3a)
$$\begin{aligned}{} & {} s.t. \left\| \varvec{c} - \varphi (\varvec{x}_i) \right\| ^2 \leqslant r^2, i = 1, \ldots , N. \end{aligned}$$
(3b)

Here, we employ a different kernel mapping function \(\varphi (\cdot )\) in MEB to distinguish \(\phi (\cdot )\) in SVM. The hypersphere learned by MEB can be represented by \((\varvec{c}, r)\), where the center \(\varvec{c}\) is a high-dimensional feature vector in RKHS and \(r\in \mathbb {R}^+\) represents the radius. For a random sample point \(\varvec{x}\), MEB predicts whether it is inside the hypersphere or not according to the following inequality:

$$\begin{aligned} \left\| \varvec{c} - \varphi (\varvec{x})\right\| ^2 \leqslant r ^ 2. \end{aligned}$$
(4)

Equivalence of SVM and MEB

The supervised problem of SVM can be equivalently reformulated as an unsupervised problem of MEB, so long as the label information in the former is encoded in the kernel mapping function in the latter. In other words, \(\varphi (\varvec{x}_i)\) is related to \(y_i\) and hence can also be denoted by \(\varphi (\varvec{x}_i,y_i)\). Specifically, Eqs. (1a), (1b) and (3) are equivalent to each other if the following two conditions are satisfied [24, 27]:

  1. 1.

    \(\varphi (\varvec{x}_i,y_i) = [y_i\phi (\varvec{x}_i); \varvec{e}_i/\sqrt{C}]\).

  2. 2.

    \(k_\phi (\varvec{x}_i, \varvec{x}_i) = \kappa _\phi , \forall \varvec{x}_i\).

The specific derivation process of the formulas can be found in Appendix A.

Fig. 1
figure 1

Framework of SCDA

Methodology

Exchange of positive and negative annotations

Let \(\overline{S}\) be the mirror set of S by exchanging the positive and negative annotations, \(\overline{S} = \{(\varvec{x}_i, y_i)|(\varvec{x}_i, -y_i)\in S,i = 1,\ldots , |S|\}\), and \(\overline{\varvec{\alpha }}, \overline{f}(\cdot ), (\overline{\varvec{c}}, \overline{r})\) be the solutions learned on \(\overline{S}\). Two solution groups are correlated with each other in Eq. (5), which can be easily derived according to “Preliminaries”.

$$\begin{aligned} \left\{ \begin{array}{l} \overline{f}(\cdot ) = -f(\cdot ), \\ \frac{\overline{\varvec{c}} + \varvec{c}}{2} = [\varvec{0}; \frac{1}{\sqrt{C}}\varvec{\alpha }]\\ \overline{r} = r, \\ \overline{\varvec{\alpha }}=\varvec{\alpha }. \\ \end{array} \right. \end{aligned}$$
(5)

According to the results in Eq. (5) we can see that:

  1. 1.

    For any random input \(\varvec{x}\), output functions \(f(\cdot ), \overline{f}(\cdot )\) learned on S and \(\overline{S}\) consistently output opposite label predictions. Given that the label of \(\varvec{x}\) is either 1 or \(-1\) in binary classification, there will be only one of them returning the correct prediction.

  2. 2.

    Two centers \(\varvec{c}\) and \(\overline{\varvec{c}}\) own a static midpoint \(\varvec{A}\), \(\varvec{A}=[\varvec{0}; \frac{1}{\sqrt{C}}\varvec{\alpha }]\). Here the ‘static’ indicates that \(\varvec{A}\) is not relevant directly to the sample points.

  3. 3.

    Two radii r and \(\overline{r}\) are equal to each other, hence we simply utilize r to denote the radii of both hyperspheres \((\varvec{c}, r)\) and \((\overline{\varvec{c}}, r)\). Likewise, \(\varvec{\alpha }\) is utilized to represent the vector of Lagrange multipliers of both functions \(f(\cdot )\) and \(\overline{f}(\cdot )\).

When the new sample point \((\varvec{x}_i, y_i)\) comes for \(i=1,2,\ldots \), we use \(\varvec{z}^i\) to denote its corresponding feature vector in RKHS according to the first condition in “Equivalence of SVM and MEB”, that is \(\varvec{z}^i = \varphi (\varvec{x}_i,y_i) = [y_i\phi (\varvec{x}_i); \varvec{e}_i/\sqrt{C}]\). And \(\overline{\varvec{z}}^i\) represents the mirror point of \(\varvec{z}^i\), \(\overline{\varvec{z}}^i = \varphi (\varvec{x}_i,-y_i) = [-y_i\phi (\varvec{x}_i); \varvec{e}_i/\sqrt{C}]\). Mirror points are another possibility of existence for the real sample points, possessing identical attributes but opposite classes. In view of the fact that the framework proposed in this manuscript aims to cover the real sample points using the minimum enclosing ball. Therefore, the importance of mirror points lies in serving as a reference that mirror points should be no nearer to the ball than the real sample points. Let the current balls (hyperspheres in RKHS) be \((\varvec{c}^{i-1}, r^{i-1})\) and \((\overline{\varvec{c}}^{i-1}, r^{i-1})\), in which the centers can be expressed as (6) [26].

$$\begin{aligned} \left\{ \begin{array}{l} \varvec{c}^{i-1} = \sum _{j=1}^{i-1}\alpha _j^{i-1}\varvec{z}^j. \\ \overline{\varvec{c}}^{i-1} = \sum _{j=1}^{i-1}\alpha _j^{i-1}\overline{\varvec{z}}^j. \end{array} \right. \end{aligned}$$
(6)

The distances between \(\varvec{z}^i, \overline{\varvec{z}}^i\) and two centers \(\varvec{c}^{i-1}, \overline{\varvec{c}}^{i-1}\) can be denoted by \(d(\varvec{z}^i, \varvec{c}^{i-1})\), \(d(\varvec{z}^i, \overline{\varvec{c}}^{i-1})\), \(d(\overline{\varvec{z}}^i, \varvec{c}^{i-1})\) and \(d(\overline{\varvec{z}}^i, \overline{\varvec{c}}^{i-1})\), respectively. Moreover, the correlation of four distances is listed in Eq. (7), which can be easily derived according to their definitions.

$$\begin{aligned} \left\{ \begin{array}{l} d(\varvec{z}^i, \varvec{c}^{i-1}) = d(\overline{\varvec{z}}^i, \overline{\varvec{c}}^{i-1}). \\ d(\varvec{z}^i, \overline{\varvec{c}}^{i-1}) = d(\overline{\varvec{z}}^i, \varvec{c}^{i-1}). \end{array} \right. \end{aligned}$$
(7)

Equation (7) indicates that the distance between a feature vector and the center of a ball is equal to the distance of the corresponding mirror feature vector and the corresponding center of the mirror ball. In view of this, only two distances (\(d(\varvec{z}^i, \varvec{c}^{i-1}), d(\overline{\varvec{z}}^i, \varvec{c}^{i-1})\), which can be computed by Eq. (8)) between \(\varvec{z}^i, \overline{\varvec{z}}^i\) and one center \(\varvec{c}^{i-1}\) are employed in the following sections, and we utilize \(d^{\text {min}}\) (\(d^{\text {max}}\) ) in Eq. (9) to represent the smaller (larger) distance in between.

$$\begin{aligned} d(\varphi (\varvec{x}_i,y), \varvec{c}^{i-1})&= \kappa _\phi + \frac{\left\| \varvec{\alpha }^{i-1}\right\| ^2+1}{C}\nonumber \\&\quad +\sum _{j,k = 1}^{i-1}\alpha _j^{i-1}\alpha _k^{i-1}y_jy_kk_\phi (\varvec{x}_j, \varvec{x}_k)\nonumber \\&\quad -2y\sum _{j=1}^{i-1}\alpha _j^{i-1}y_jk_\phi (\varvec{x}_i,\varvec{x}_j) \end{aligned}$$
(8)
$$\begin{aligned} \left\{ \begin{array}{l} d^{\text {min}} = \min (d(\varvec{z}^i, \varvec{c}^{i-1}), d(\overline{\varvec{z}}^i, \varvec{c}^{i-1}))\\ d^{\text {max}} = \max (d(\varvec{z}^i, \varvec{c}^{i-1}), d(\overline{\varvec{z}}^i, \varvec{c}^{i-1})) \end{array} \right. \end{aligned}$$
(9)

Learning online based on stream data

The framework of this article is depicted in Fig. 1. When the ith sample point \((\varvec{x}_i, y_i)\) of data stream comes, feature mapping is firstly employed to generate sample point \(\varvec{z}^i\) and mirror point \(\overline{\varvec{z}}^i\) (circle and triangle in Fig. 1), which then go through three different paths according to individual situations. Specifically, (1) when two sample points are both outside of the current ball (the black one in Fig. 1), “New sample point and its mirror are outside of two balls” elaborates how to learn a new ball (the red one in Fig. 1) to replace the current one (see the blue arrows in Fig. 1).   Moreover, three distinct methods (SCDA-I, SCDA-II and SCDA-III) are proposed in this section based on different strategies employed; (2) when two sample points are inside and outside of the current ball one each, “New sample point and its mirror are inside of separate balls” presents constructing a new ball by either shrink or be replaced by the current one (green arrows); (3) when two sample points are both inside, “New sample point and its mirror are inside of intersection area of two balls” explains why and how a new ball should be replaced by the current one (orange arrows). The pseudo code is listed in Algorithm 1.

Algorithm 1
figure a

Framework of SCDA.

New sample point and its mirror are outside of two balls

When new sample point \(\varvec{z}^i\) and its mirror \(\varvec{z}^i\) are outside of the current ball \((\varvec{c}^{i-1}, r_{i-1})\), we compare their distances further and update the model according to individual situations.

Fig. 2
figure 2

The sample point is closer to the ball, move and enlarge the ball to enclose the sample point

On the one hand, if \(\varvec{z}^i\) is nearer to the current ball than \(\overline{\varvec{z}}^i\) (see Fig. 2), the new ball \((\varvec{c}^{i}, r^{i})\) requires to be enlarged to enclose \(\varvec{z}^i\). Therefore, we enlarge the ball in the way of StreamSVM [24] according to Eq. (10) (see lines 8–9 in Algorithm 1). The specific derivation process of the formulas can be found in Appendix B.

$$\begin{aligned} \left\{ \begin{array}{l} \alpha _j^i\leftarrow \left\{ \begin{array}{l} (1 - r^{i-1} / d(\varvec{z}^i,\varvec{c}^{i-1})) / 2, \quad \quad j=i. \\ (1 - \alpha _i^i)\alpha _j^{i-1}, \quad \quad \quad \quad \quad \quad \quad \forall j=1,\cdots ,\\ i-1. \end{array} \right. \\ r^i\leftarrow (d(\varvec{z}^i,\varvec{c}^{i-1})+r^{i-1}) / 2. \end{array} \right. \end{aligned}$$
(10)

On the other hand, if \(\varvec{z}^i\) is farther away from the current ball than \(\overline{\varvec{z}}^i\), we develop three different strategies (‘S’ for short) to deal with such situation as follows:

  1. S1

    enlarge the ball to enclose \(\varvec{z}^i\) in the same way of StreamSVM [24].

  2. S2

    enlarge the ball slightly to either enclose \(\varvec{z}^i\) or keep it nearby.

  3. S3

    keep the radius unchanged, slightly move the center towards \(\varvec{z}^i\) to either enclose it or keep it nearby.

The S1 strategy is employed by SCDA-I (See line 11 in Algorithm 1) to enlarge the ball same as above. In order to change the current situation, SCDA-II utilizes the S2 strategy (See line 12 in Algorithm 1) to enlarge the radius as least as possible. We find that, if the inequality \(r^{i-1} \geqslant (2\bigstar - 1)d^\text {max}\) satisfies, the least enlargement will be attained by Eq. (10) to make the sample point \(\varvec{z}^i\) enclosed (see Fig. 3a). Otherwise, the least enlargement will be attained by Eq. (11) to let the point \(\varvec{D}\) be center and make the sample point \(\varvec{z}^i\) nearby (see Fig. 3b). Here \(\varvec{D}\) denotes the intersection point of the perpendicular at midpoint of \(\varvec{z}^i, \overline{\varvec{z}}^i\), and the line segment that joins \(\varvec{z}^i, \varvec{c}^{i-1}\). The specific derivation process of the formulas can be found in Appendix C.

$$\begin{aligned} \alpha _j^i\leftarrow \left\{ \begin{array}{l} \bigstar \alpha _j^{i-1}, \forall j=1,\ldots ,i-1. \\ 1 - \bigstar , j=i. \end{array} \right. \end{aligned}$$
(11a)
$$\begin{aligned} r^i\leftarrow r^{i-1} + (1 - \bigstar )d^\text {max}. \end{aligned}$$
(11b)
$$\begin{aligned} \bigstar \leftarrow 4\kappa _\phi / ((d^\text {max})^2 + 4\kappa _\phi - (d^\text {min})^2). \end{aligned}$$
(11c)
Fig. 3
figure 3

Two situations of ball enlargement according to the S2 strategy

Fig. 4
figure 4

Two situations of ball movement according to the S3 strategy

The S3 strategy employed in SCDA-III (see line 13 in Algorithm 1) moves the center as least as possible to change the current situation while keeping the radius unchanged (\(r^i\leftarrow r^{i-1}\)). We find that, if the inequality \(r^{i-1}\geqslant \bigstar d^\text {max}\) satisfies, the least movement will be attained by Eq. (12) to enclose the sample point \(\varvec{z}^i\) (see Fig. 4a); otherwise, the center will be moved by Eq. (11a) and stopped at point \(\varvec{D}\) to make the sample point \(\varvec{z}^i\) nearby (see Fig. 4b). The specific derivation process of the formulas can be found in Appendix D.

$$\begin{aligned} \begin{array}{l} \alpha _j^i\leftarrow \left\{ \begin{array}{l} r^{i-1}\alpha _j^{i-1} / d^\text {max}, \forall j=1,\ldots ,i-1. \\ 1 - r^{i-1} / d^\text {max}, j=i. \end{array} \right. \\ \end{array} \end{aligned}$$
(12)

New sample point and its mirror are inside of separate balls

When \(d^{\text {min}}\leqslant r^{i-1} < d^{\text {max}}\), one sample point (\(\varvec{z}^i\) or \(\overline{\varvec{z}}^i\)) is inside of the current ball \((\varvec{c}^{i-1}, r^{i-1})\) and the other (\(\overline{\varvec{z}}^i\) or \(\varvec{z}^i\)) is outside. On the one hand \(\varvec{z}^i\) is inside and \(\overline{\varvec{z}}^i\) is outside (see Fig. 5), \(\overline{\varvec{z}}^i\) is actually inside of the mirror ball \((\overline{\varvec{c}}^{i-1}, r^{i-1})\) according to Eq. (7). It implicates that new data coincide with the current concept and no drift happens. Therefore, there is no need to update the model in such case and the variables are computed according to Eq. (13).

$$\begin{aligned} \left\{ \begin{array}{l} \alpha _j^i\leftarrow \left\{ \begin{array}{l} \alpha _j^{i-1}, \forall j=1,\ldots ,i-1. \\ 0, j=i. \end{array} \right. \\ r^i\leftarrow r^{i-1}. \\ \end{array} \right. \end{aligned}$$
(13)

Otherwise, \(\overline{\varvec{z}}^i\) is inside of \((\varvec{c}^{i-1}, r^{i-1})\) and \(\varvec{z}^i\) is outside, which can be explained as the data in representation of the new knowledge contradicts the current concept. Therefore, the model requires to be adjusted to remove the contradiction. To be specific, the ball requires to shrink to a maximal inscribed ball according to Eq. (14) to uncover the sample point \(\overline{\varvec{z}}^i\) (see Fig. 6). Here a constant \(\text {COEF}\) is utilized to make sure that the sample point \(\overline{\varvec{z}}^i\) will be located slightly outside of the new ball (not on it) after the update, and \(\text {COEF}\) can be set slightly less than one, 0.999 for instance. The specific derivation process of the formulas can be found in Appendix E.

$$\begin{aligned} \left\{ \begin{array}{l} \alpha _j^i\leftarrow \left\{ \begin{array}{l} (1 - \spadesuit )\alpha _j^{i-1}, \forall j=1,\ldots ,i-1. \\ \spadesuit \quad \quad \quad \quad , j=i. \end{array}\right. \\ r^i\leftarrow \text {COEF}(d^{\text {min}}+r^{i-1}) / 2. \\ \spadesuit \leftarrow \text {COEF}/2 + (\text {COEF}-2)r^{i-1}/(2d^{\text {min}}). \end{array} \right. \end{aligned}$$
(14)
Fig. 5
figure 5

The sample point is inside the ball

New sample point and its mirror are inside of intersection area of two balls

When \(d^{\text {max}}\leqslant r^{i-1}\), it represents \(\varvec{z}^i\) and \(\overline{\varvec{z}}^i\) are both inside of the ball \((\varvec{c}^{i-1}, r^{i-1})\). According to Eq. (7) we can also know that they are inside of the ball \((\overline{\varvec{c}}^{i-1}, r^{i-1})\) as well. In other words, they are inside of intersection area of two balls. A nonempty intersection area (if there exits) indicates some essential area in RKHS that a MEB has to enclose no matter what label annotations are set. Therefore, there is no need to update the model in such case and variables are computed according to Eq. (13).

Correlations of SCDA-I, SCDA-II and SCDA-III

SCDA-I, SCDA-II, and SCDA-III are three methods based on the SCDA framework, all capable of expanding and shrinking the minimum enclosing ball. When only mirror point is within the ball, all three methods shrink the ball. When the sample point is closer to the ball, all three methods expand the ball to encompass the sample point. When the mirror point is closer to the ball, SCDA-I expands the ball in the same manner as when the sample point is closer, SCDA-II aims to minimize the expansion of the radius as much as possible, and the distinction of SCDA-III lies in maintaining a constant radius. Both SCDA-I and SCDA-II methods increase the radius of the ball to varying degrees.

Experiments

Benchmark datasets

To assess the effectiveness of our method, we utilized seven synthetic and five real worldFootnote 1,Footnote 2,Footnote 3,Footnote 4,Footnote 5 binary concept drift datasets. To ensure compatibility with SVM models, symbolic features were excluded, while numeric features underwent Min–max normalization, scaling them within the range of \([-1, 1]\). Before conducting experiments, all datasets were subjected to normalization processing. No further processing was performed to avoid excessive manipulation that may affect data concept drift and experimental outcomes. Details regarding these datasets are outlined in Table 2.

Since the drift types of realworld datasets are unavailable, we utilize the Massive Online Analysis (MOA) toolbox [3] to generate synthetic datasets with various drift types such as gradual, sudden and incremental. Specifically, we utilize three popular data generators (Sine, SEA and AGR) to generate datasets with either gradual (e.g. Sine(g), SEA(g), AGR(g)) or sudden (Sine(s), SEA(s), AGR(s)) drift, and employ Hyperplane to create data with incremental drift. Each synthetic dataset comprises 10,000 data instances with balanced classes. Sudden drifts were positioned at the 5000th instance, while gradual drifts spanned from the 5000th to the 6000th instances, with a width of 1000. Hyperplane incremental drift was employed as a standard configuration.

Fig. 6
figure 6

The mirror point is inside the ball, shrink the ball to uncover it

Table 2 Specifications of synthetic and real world datasets

Baseline methods

The three methods introduced in this paper employ distinct strategies for manipulating the enclosing ball. SCDA-I enlarges the ball to encapsulate both the prior ball and the incoming sample point. SCDA-II expands and repositions the ball to approach proximity with the new sample point. Meanwhile, SCDA-III maintains the circle’s radius unchanged while shifting the circle towards the new sample point until it encompasses it.

StreamSVM StreamSVM is an SVM algorithm utilizing a minimum enclosing ball technique to navigate streaming data. However, its model can solely expand the original ball to include both the original ball and new sample points, lacking the ability to shrink.

Table 3 Prequential error (%)

OSVM-SP This method extends the ball so that it precisely covers the radius of the previous ball and the new sample point. Notably, OSVM-SP exclusively enlarges the enclosing ball.

Naive Bayes Naive Bayes operates on the assumption of attribute independence within the label, estimating prior and conditional probabilities for each attribute accordingly.

Hoeffding Tree The Hoeffding Tree represents an incremental decision tree algorithm designed to learn from a continuous stream of data. It relies on the Hoeffding bound to ensure the accuracy of the observed mean.

ADWIN ADWIN functions as an adaptive sliding window algorithm, dynamically altering window size based on data fluctuations. Notably, it shrinks the window size in response to significant data changes and enlarges it when changes are less pronounced.

Performance

Prequential error

This section presents a comparative analysis of prequential errors among three methods introduced in this paper and five other methods across twelve datasets. The prequential error at time i, \(P_e(i)\), is utilized to evaluate the performance of online methods which can be computed by Eq. (15) according to [7]. For the sample point at time k, \(\hat{y}_k\) and \(y_k\) indicate the predicted label and ground truth, respectively. The results are shown in Table 3. Notably, the figures in bold highlight the superior performance indicators, while those in parentheses indicate the prequential error rankings among the eight methods. A method with a lower average rank demonstrates superior performance.

$$\begin{aligned} P_e(i) = \frac{1}{i}\sum _{k=1}^{i}e_k, e_k = {\left\{ \begin{array}{ll} 1, y_k \ne \hat{y}_k \\ 0, y_k = \hat{y}_k \\ \end{array}\right. } \end{aligned}$$
(15)

From the results we can see that, SCDA-I, SCDA-II and SCDA-III demonstrate significant predictive performance. Notably, SCDA-III showcases the lowest prequential error across all synthetic datasets and several realworld datasets such as Spam and Electricity. In other realworld datasets however, either SCDA-I or OSVM-SP slightly outperforms SCDA-III. The failure of StreamSVM is attributed to its simple principle of ball enlargement only, resulting in comparatively weaker performance. Both the proposed methods and OSVM-SP attempt to modify the StreamSVM method with different strategies. Overall, SCDA-III emerges as the superior method. The remaining three methods (NaiveBayes, Hoeffding Tree and ADWIN) yielding moderate results. Moreover, for the three proposed methods, the prequential error of datasets with sudden concept drift is generally lower than that of datasets with gradual concept drift.

Figures 7 and 8 illustrate the trend of prequential errors concerning the increase in the number of instances. These figures prominently showcase SCDA-III’s consistently lower prequential error across nearly all datasets. In AGR(s) and Sine(s) datasets, the prequential error initially decreases and stabilizes with the accumulation of data. When sudden drifts occur, the error spikes, gradually stabilizing afterward. In AGR(g) datasets, the prequential error initially diminishes with increasing instances, following a stable trend. The error increases during the onset of gradual drifts and then gradually stabilizes upon their conclusion. For Sine(g), SEA, and Hyperplane datasets, most methods exhibit a declining prequential error initially, later stabilizing. StreamSVM displays overall stability across synthetic datasets.

Fig. 7
figure 7

Prequential error on synthetic datasets

Fig. 8
figure 8

Prequential error on realworld datasets

Fig. 9
figure 9

SV percent on synthetic datasets

SV percent

The second experiment involves comparing the support vectors among StreamSVM, OSVM-SP, and the three methods proposed in this paper, all utilizing SVM. The percentage of support vectors reflects memory usage, typically increasing as the dataset size grows. Figure 9 displays the support vectors of these five methods solely on the synthesized dataset since the timing and characteristics of drift are known.

Observing the figure reveals that StreamSVM possesses minimal support vectors. OSVM-SP slightly exceeds Stream SVM but remains smaller than the methods introduced in this paper. Notably, the support vectors ratio among the three proposed methods is nearly identical. Analysis of the figure, combined with insights from the first experimental result, indicates significant model changes within the initial ten percent of data. Consequently, the support vectors proportion in this initial segment is notably high. On datasets like AGR(s), AGR(g), and Sine(s), drift occurrences are notably prominent within the 50–60% range. This heightened occurrence corresponds to a substantial influence of drift on the model, resulting in a higher proportion of corresponding support vectors.

Radius

In this section, the minimum enclosing ball radii of the three methods proposed in this paper are compared with StreamSVM. As the methods introduced in this paper aim to enhance StreamSVM, it serves as the comparative benchmark, while OSVM-SP, utilizing different techniques, is not included in this comparison. Figure 10 illustrates the radii concerning varying data quantities.

At the first sample point, the minimum enclosing ball is itself, resulting in a radius of 0. As subsequent sample points arrive, the radius begins to acquire values. By the tenth sample point, the radius stabilizes, undergoing slight adjustments thereafter. SCDA-I, SCDA-II, and SCDA-III encompass minimum enclosing ball models capable of both expansion and reduction, allowing their radii to increase or decrease. However, SCDA-III exhibits greater volatility, indicating higher sensitivity. The radii for SCDA-I and SCDA-II are nearly identical. In contrast, the radii for StreamSVM only demonstrate an increase with the growing data volume. Notably, the radius corresponding to StreamSVM appears significantly larger compared to the others, as depicted in the figure.

Fig. 10
figure 10

Radius on synthetic datasets

Ablation study

Fig. 11
figure 11

Prequential error on ablation study

The results of the ablation experiment are depicted in Fig. 11. The abscissa displays six methods, while the ordinate represents the prequential error. Each method calculates the average prequential error across the 12 datasets outlined in Table 2. The error bars indicate the range, spanning the upper and lower boundaries, for each graph. The rectangles signify a quarter to three-quarters of the total data, encompassing half of the dataset. The red horizontal line represents the median. The symbol ‘+’ along the horizontal axis denotes minimum enclosing ball expansion(corresponding to statement 9 in Algorithm 1), while ‘–’ indicates reduction (corresponding to statement 16 in Algorithm 1). ‘None’ signifies no adjustment, ‘Plus’ refers to the expansion method as per statement 12 in Algorithm 1, and ‘Move’ represents the expansion method corresponding to statement 13 in Algorithm 1.

Upon comparing the first three methods, it’s evident that shrinking the minimum enclosing ball significantly reduces the prequential error. Among the last four methods, unadjusted prequential error remains notably high, necessitating attention. Among SCDA-I, SCDA-II, and SCDA-III, all of which involve shrinking the minimum enclosing ball, SCDA-III exhibits a slightly lower prequential error than the other two methods. An overall assessment highlights SCDA-III as the optimal method from the entire figure.

Performance analysis

The three SCDA methods and StreamSVM, OSVM-SP are all based on the transformation of SVM and MEB. The StreamSVM method expands the ball to a sufficient size when a small number of sample points arrive, making it difficult to trigger updates when subsequent sample points arrive. As shown in Fig. 7b, the StreamSVM curve grows the fastest when ten percent of the samples are reached, and the subsequent curve tends to stabilize. The OSVM-SP method is optimized based on StreamSVM, with a slower initial growth curve, but gradually approaching the curve of StreamSVM later on. In contrast, the SCDA methods can expand and contract the ball, and the prequential error is ultimately the lowest. From Fig. 9a, it can be seen that the lowest SV percent of StreamSVM is due to the difficulty in triggering model updates later on. The SV percent of the OSVM-SP method is slightly higher than that of StreamSVM. From Fig. 10a, it can be seen that the radius of StreamSVM grows the fastest in the first ten sample points, and subsequently grows slowly due to few updates triggered. In contrast, especially in SCDA-III, the radius of the SCDA method flexibly expands and contracts, and the final radius of the ball will not be excessively large like that of StreamSVM.

Conclusion

This paper introduces an SCDA framework utilizing SVM and minimum enclosing ball algorithms, presenting three distinct methods derived from this framework. The minimum enclosing ball within this framework dynamically adjusts its size, expanding or contracting based on specific conditions. Experimental analysis demonstrates the superior accuracy of our proposed method over the comparative approach. Particularly, the SCDA-III method exhibits greater efficacy, showcasing the enhanced flexibility of the minimum enclosing ball’s scalability.

Moving forward, our focus in future research will center on several key areas. Firstly, this study exclusively addresses binary classification data, prompting a meaningful exploration into methods for handling multiple classifications. Secondly, while this paper employs a balanced class synthetic dataset, investigating strategies for dealing with class imbalances presents an intriguing and challenging direction for further study. At last, the availability of massive data also brings the challenge in selecting a small quatity of informative data for learning the model.