1 Introduction

In an open and unknown environment, the first step in our cognition of things is to gather them into different categories according to the similarity of objects, and then find the appropriate category label for each category through feature extraction and other methods, which is the ability that people have from the beginning of birth. With the increasing number of new samples, our cognition of things continues to evolve, and we will also use the learned experience to clustering the new samples, and these new samples will also be used as a priori knowledge for subsequent learning, i.e., the learning and using stages of the algorithm evolve simultaneously.

However, the current machine learning methods are not so, especially for clustering algorithms [14]. Current clustering algorithms usually work in two stages of “learning and using”. The using stage only uses the model learned well in the learning stage, but does not adjust it. However, many new data may be obtained after the training of the clustering model, and the obtained data may be significantly different from the data distribution in the learning stage. Therefore, under the condition of limited learning samples, it is difficult for us to obtain a reasonable k value that can adapt to the dynamically increasing data through a priori knowledge, and the k value set according to a priori cannot be adjusted with the change of data. Once the k value set in the learning stage is mismatched with the data distribution in the using stage, all the previous training work fails, and training needs to be restarted, but sometimes the cost of early training is very high.

The problem of a fixed k is very obvious in batch clustering algorithms. For example, the K-means algorithm [57] and its optimization algorithm [813], DPC [1417], and GMM [18] can achieve a good clustering effect on convex datasets when the clustering number k value is set accurately. The density DBSCAN [1922] realizes clustering by connecting density core points, and the SOM [23] based on dynamic neurons represents a class of samples through each neuron in the neural network, matches the samples with the corresponding neurons through competitive learning. However, in the unknown and open environment, we cannot obtain all samples simultaneously, and only new samples with gradually increasing and unknown categories can be contacted. Therefore, the clustering model obtained by one-time learning based on all data is challenging to adapt to the clustering scene with dynamic categories and unknown inputs.

Thus, the incremental clustering algorithm [24, 25] will update the clustering results in real-time according to the new samples to adapt to the increasing samples. For example, the incremental K-means judges the category of new samples by setting the threshold, and clustering the samples with similarity greater than the threshold into the new category. But these algorithms’ model structure and parameter threshold need to be set in advance according to empirical knowledge, and cannot evolve with the change of data. They only label the new samples, and do not use the new information introduced by the new samples to adjust the algorithm model and parameters. Dynamic clustering methods can also handle incremental data well, for example, Shafeeq [26] achieved dynamic data processing by increasing the number of clusters k based on the K-means. Lughofer [27] proposed a new splitting and merging method to address issues such as clustering fusion and splitting that may arise in data flow learning. Aiming at the possible delay phenomenon in online classification learning, CDTC algorithm [28] proposes to use a specific timestamp protocol to represent the classification time to improve the impact of incremental data on the classification process in online learning. Zheng [29] fully considered the dynamic changes in information during the K-means clustering process and reduces the number of iterations and learning time of the algorithm by reducing the set value of the algorithm’s termination condition. The dynamic clustering method has been applied in many real-world scenarios and has achieved good results, such as in e-commerce [30] and retail business [31].

Although these algorithms can solve the clustering problem when the samples are increasing, they do not evolve and do not organically combine the learning stage with the using stage, and ignore the memory characteristics of the clustering core itself, that is, the clustering core carries the memory function of historical data. Furthermore, incremental clustering and dynamic clustering are difficult to effectively cluster data that grows from 0 in unknown environments. When clustering incremental data, we hope that the algorithm can dynamically adjust the number of cluster cores according to the distribution changes of the dataset, so that the number of cluster cores and parameters always maintain an excellent matching relationship with the data distribution, rather than solidifying the number of clusters before the implementation of the algorithm. Moreover, the model and parameters of the algorithm should have the ability to evolve adaptively with the change of data, i.e., to realize the integration of learning and using of the clustering model. After adding new samples, the algorithm should be able to maintain the stable memory of the original learning results, i.e., it should be able to modify and adjust the original memory according to the information introduced by the newly added samples, but it is not allowed to cause subversive damage to the original memory.

This paper presents a dynamic core evolutionary clustering algorithm (DCC) based on saturated memory for incremental data. DCC algorithm follows the sequence of contact data in the physical world, can process incremental data independent of time factors, and can give reasonable clustering results in real-time when the data enters. DCC algorithm uses the dynamic core as the representative of the sample cluster, and uses the Gaussian function as its activation function. The center of the dynamic core is the center of the Gaussian function, and the coverage radius of the dynamic core is the variance of the Gaussian function. By adjusting the center position and coverage half diameter of the dynamic core in real-time, it can adaptively fit the distribution state of samples in the space. DCC algorithm selects the winning dynamic core according to the competitive learning method, and controls the activation frequency of the dynamic core by simulating the memory saturation degree set by the human memory mechanism, to judge whether to adjust its parameters or split it.

The main contributions of this paper are as follows:

  • The clustering model of the DCC algorithm can evolve with the increasing number of samples, and realize the dynamic clustering process without a priori parameters. In the clustering model, the dynamic core can evolve from 0 through the stimulation of new samples, and adaptively fit the cluster structure in limited samples according to the adjustment or splitting operation of the core itself.

  • DCC algorithm can cluster the new samples according to the existing clustering model, and modify the existing model with the new information introduced by the new samples, to continuously enhance the fitting ability of the model to the cluster structure and effectively integrate the learning and using stages of the algorithm.

  • Exhaustive experimental results show that, the DCC algorithm can well search the clusters in the learned samples through only one evolutionary process in increasing new samples. Furthermore, comparative experiments and practical applications show that the DCC algorithm has good clustering performance and robustness.

The rest of this paper is organized as follows. Section 2 introduces the background of the evolutionary clustering algorithm and the source of the idea of the DCC algorithm. In Sect. 3, we introduce the basic concepts in dynamic core evolutionary clustering and the design idea of memory saturation. Section 4 introduces our algorithm DCC in detail. And Sect. 5 introduces our experimental results and the corresponding discussion. Finally, Sect. 6 summarizes this paper.

2 Related background

2.1 Related works

2.1.1 Clustering algorithm with fixed structure

The Partition clustering algorithm is a typical clustering method with a fixed structure. To obtain the clustering result with the minimum error, it usually optimizes a batch of data using the set objective function [32, 33]. Yao et al. [34] used the multi-kernel learning method in the K-means algorithm, and selected the representative kernel as a subset into the algorithm framework. The density clustering algorithm usually needs to determine the connection relationship between samples by calculating the density when all samples are known. For example, the DPC algorithm determines the clustering core points according to the relative distance and relative density of samples; Wang and Huang [35] used level set density to cluster the data, but they all need to be completed on the premise that all samples information is known.

Huang et al. [36] proposed an integrated clustering method, which considers each base clustering method by local weighting, effectively reducing the impact of low-quality base clustering on the final result. Because base clustering is usually generated on batch data, the algorithm structure cannot evolve with data increase. Deep clustering [37] uses the advantage that a deep neural network can extract sample features well. It usually combines popular clustering algorithms such as K-means, and uses joint optimization to train the network model to realize the clustering of samples. For example, Yang et al. [38] combined WGAN-GP and VAE to generate the potential characteristics of samples, and used SGVB and random optimizer learning model. However, deep clustering usually needs to train the network on all samples, and it is impossible to cluster the incremental samples in real-time.

2.1.2 Incremental clustering algorithm

Hierarchical clustering algorithm belongs to incremental clustering in form, which can generate clustering tree through relevant rules with the increase of samples, and each layer of the tree is a clustering result, such as PERCH algorithm, Grinch algorithm [39], etc. However, these hierarchical clustering algorithms usually needs to set a priori parameters manually. Xie and Li [40] proposed a density-based evolutionary clustering algorithm DBEC, which can cluster new samples in real-time through relevant evolutionary rules, realize that the algorithm structure evolves with the increase of samples, and achieve better clustering performance.

Yu et al. [41] proposed a semi-supervised clustering method based on incremental integrated member selection, which selects the base cluster incrementally through the designed global and local objective functions. Although the algorithm contains an incremental form, the algorithm is still based on batch data, and the incremental process is only to select integration members. Gm-SOINN algorithm [42] uses the Gaussian membership function to represent the probability of node winning, to avoid excessive deletion or insertion of previously learned nodes. Although the Gm-SOINN algorithm can solve the clustering problem when the sample increment enters, it still needs to manually set the a priori parameters.

2.2 The idea of our algorithm

The main idea of the DCC algorithm comes from the human memory process of things, and human memory is closely related to the neurons in the brain’s hippocampus. Berg et al. [43] mentioned in the article published on the Cell in 2019 that in the hippocampal region of the mammalian brain, new functional neurons will be continuously produced by neural stem cells, and are associated with the embedding and structural changes of new neurons in the existing neural network. When people constantly contact the same kind of object, the memory will be continuously strengthened because the related memory neurons are constantly activated to split and generate new neurons. Inspired by this, the DCC algorithm takes the samples to be clustered as the objects to be remembered. If the samples in the same class continue to increase, the clustering core corresponding to this kind of sample will be continuously activated until splitting. The DCC algorithm fits the cluster structure as much as possible by changing the number of dynamic cores, as shown in Fig. 1, cluster 1 contains a wide range of samples, and the number of dynamic cores generated is large; the sample distribution in cluster 2 is relatively concentrated, which is represented by fewer cores.

Figure 1
figure 1

The sample distribution matches the number of dynamic cores

Based on biological enlightenment, the DCC algorithm designs a dynamic core self-evolution mechanism. The DCC algorithm model is represented by a set of dynamic cores, in which the number of dynamic cores in the set starts from 0 and evolves with the continuous increase of samples, to realize the dynamic development of the number of clusters. When new samples are contacted, the DCC algorithm can use the existing clustering model to mark them, and correct the clustering model according to the new information introduced by the new samples, to realize the organic combination of the algorithm in the process of learning and using. Correction is to adjust the parameters or split the winning dynamic core in the clustering model through competitive learning after adding new samples, so that the DCC algorithm can search the cluster structure in the learned samples through fewer cores.

3 Problem description and basic concepts

This section mainly introduces the symbols and related concepts used in the algorithm. On this basis, the evolution process of the algorithm model, i.e., the dynamic core set, is analyzed. Furthermore, the parameter representation and mechanism design are introduced in detail. Finally, the framework of the DCC algorithm is given, and the algorithm’s time complexity is analyzed.

3.1 Symbolic interpretation

The clustering model of DCC algorithm is described by the dynamic core representing each cluster, i.e., the set \({{N}_{i}}=\{{{n}_{1}},{{n}_{2}},\ldots,{{n}_{{{k}_{i}}}}\}\) of dynamic cores, where i represents the number of samples learned by the algorithm, \({{k}_{i}}\) represents the number of dynamic cores generated by the algorithm after learning the i-th sample, and \({{n}_{j}}\) represents the j-th dynamic cores. \({{N}_{0}}=\varnothing \) represents the dynamic core set before there is no sample input into algorithm for learning and training, but with the continuous increase of samples, the algorithm will generate a new dynamic core with the change of sample distribution, i.e., the number of dynamic cores evolves dynamically from 0.

We use the quaternion \({{n}_{j}}=({{n}_{j\_\mu }},{{n}_{j\_\sigma }},{{n}_{j\_m}},{{l}_{{{n}_{j}}}})\) to represent the j-th dynamic cores of the algorithm model, in which parameter \({{n}_{j\_\mu }}\) represents the central position of dynamic cores \({{n}_{j}}\), parameter \({{n}_{j\_\sigma }}\) represents the coverage radius of \({{n}_{j}}\), parameter \({{n}_{j\_m}}\) represents the memory saturation degree of \({{n}_{j}}\), and \({{l}_{{{n}_{j}}}}\) represents the category label of dynamic cores \({{n}_{j}}\), i.e., the sample cluster of class j. The dynamic core updates the sample cluster represented by itself through the changes of the central position and coverage radius, and regulates the splitting time of the dynamic core through the memory saturation degree, i.e., the splitting operation is performed when the memory saturation degree of the dynamic core reaches the saturation threshold \({{s}_{t}}\). In the process of dynamic core splitting, we use the concept of parent core and child core to express the relationship between dynamic cores. If dynamic core \({{n}_{j}}\) splits to generate a new dynamic core \({{n}_{j+1}}\), dynamic core \({{n}_{j}}\) is called the parent core of dynamic core \({{n}_{j+1}}\), and dynamic core \({{n}_{j+1}}\) is the child core of dynamic core \({{n}_{j}}\).

Sample \({{\mathbf{x}}_{i}}= ( {{x}_{i1}},{{x}_{i2}},\ldots,{{x}_{ip}} )\) represents the i-th new sample learned by the evolutionary algorithm in sample space χ, which has p-dimensional characteristics. After algorithm learning, \(( {{\mathbf{x}}_{i}},{{l}_{{{\mathbf{x}}_{i}}}} )\) represents the learned sample \({{\mathbf{x}}_{i}}\) with category label \({{l}_{{{\mathbf{x}}_{i}}}}\). At this time, the clustering model is represented as \({{N}_{i}}=\{{{n}_{1}},{{n}_{2}},\ldots,{{n}_{{{k}_{i}}}}\}\), i.e., after the algorithm processes the new sample \({{\mathbf{x}}_{i}}\), the algorithm model contains \({{k}_{i}}\) dynamic cores. When a new sample \({{\mathbf{x}}_{i+1}}\) is added, the algorithm will update the existing clustering model \({{N}_{i}}\), mainly to update the quaternion of the dynamic core in model \({{N}_{i}}\) and update \({{N}_{i}}\) to \({{N}_{i+1}}\), and \({{k}_{i}}\) to \({{k}_{i+1}}\).

3.2 Initial dynamic core

The clustering model of the DCC algorithm, i.e., the set of dynamic cores, can evolve from 0. Therefore, at the beginning of the algorithm, we set the initial dynamic core, and first realize the evolution of the dynamic core set from 0 to 1, i.e., dynamic core set from empty set \({{N}_{0}}\) generates the first dynamic core.

We define the initial dynamic core \({{n}_{1}}\) as the starting point of dynamic core split and evolution. The memory saturation degree of the initial dynamic core \({{n}_{1}}\) is set as the saturation threshold \({{n}_{1\_m}}= {{s}_{t}}\), i.e., when the initial core \({{n}_{1}}\) wins the competition (for the same input sample, the dynamic core with the largest response value wins the competition), it will split directly to generate a new dynamic core. Furthermore, the coverage domain of the initial core is set as global coverage to find other cluster structures outside the sample distribution concentration area. As shown in Fig. 2, when a new sample \({{\mathbf{x}}_{i+1}}\) is added, because it is far away from the centers of dynamic cores \({{n}_{2}}\) and \({{n}_{3}}\), the initial core \({{n}_{1}}\) wins the competition and splits to generate a new dynamic core, which is better used to represent the new clusters that may appear near sample \({{\mathbf{x}}_{i+1}}\).

Figure 2
figure 2

Splitting mechanism of initial dynamic core

3.3 Competitive learning

We choose the Gaussian function as the activation function of the dynamic core. When a new sample \({{\mathbf{x}}_{i+1}}\) is added, the corresponding output generated by the sample stimulating the dynamic core \({{n}_{j}}\) is:

$$ \begin{aligned} {{O}_{n}}&=f({{n}_{j}},{{\mathbf{x}}_{i+1}}) \\ &= \frac{1}{\sqrt{2\pi }{{n}_{j\_\sigma }}}\exp \biggl(- \frac{\operatorname{Dis}{{({{\mathbf{x}}_{i+1}},{{n}_{j\_\mu }})}^{2}}}{2{{n}_{j\_\sigma }}^{2}}\biggr). \end{aligned} $$
(1)

The mean value of Gaussian function is the center of the dynamic core, which represents the center of the cluster represented by the dynamic core; the covariance of the Gaussian function is used as the coverage radius of the dynamic core to represent the range belonging to the cluster represented by the dynamic core, where \(\operatorname{Dis}()\) represents the similarity function used to calculate the new sample and the dynamic core center. Because the input sample and the input of the dynamic core are of the same dimension, the similarity between the input sample and the center vector of the dynamic core can be measured by calculating the Euclidean distance.

According to the Gaussian distribution, the samples within the range from the dynamic core center 3σ occupy 99.73% of the coverage of the dynamic core. Thus, if the distance between the new sample \({{\mathbf{x}}_{i+1}}\) and the dynamic core center exceeds 3σ, the probability that sample \({{\mathbf{x}}_{i+1}}\) belongs to the cluster represented by the dynamic core is very low, so it is abandoned. Furthermore, if the output of dynamic core \({{n}_{j}}\) is the largest when adding new sample \({{\mathbf{x}}_{i+1}}\), indicating that \({{n}_{j}}\) wins the competition, copy the category label \({{l}_{{{n}_{j}}}}\) of dynamic core \({{n}_{j}}\) to the new sample \({{\mathbf{x}}_{i+1}}\), i.e., \({{l}_{{{\mathbf{x}}_{i+1}}}}={{l}_{{{n}_{j}}}}\).

3.4 Memory saturation degree

3.4.1 Update mechanism

We use the memory saturation degree to describe the activation frequency (the number of times the same dynamic core has the highest response value under the stimulation of new samples) of the dynamic core and regulate the timing of its division. When a new sample is added, all dynamic cores respond according to the Gaussian function, and the dynamic core with the largest output is selected as the winning dynamic core. Assuming that the dynamic core \({{n}_{i}}\) wins due to the stimulation of the new sample \({{\mathbf{x}}_{i+1}}\), the DCC algorithm updates the memory saturation degree \({{n}_{i\_m}}\) of the winning dynamic core \({{n}_{i}}\) according to (2), where \(n_{{i\_m}}^{\prime }\) represents the memory saturation degree after the dynamic core \({{n}_{i}}\) is updated.

$$ \begin{aligned} & n_{{i\_m}}^{\prime }= {{n}_{i\_m}}+\Delta s, \\ & \Delta s= o(1-o), \\ & o={{e}^{- \frac{\operatorname{Dis}{{({{\mathbf{x}}_{i+1}},{{n}_{i\_\mu }})}^{2}}}{2{{n}_{i\_\sigma }}^{2}}}}. \end{aligned} $$
(2)

When updating the memory saturation degree of the winning dynamic core, we use the radial basis function \(y(\mathbf{x})=\exp (- \frac{\operatorname{Dis}{{(\mathbf{x},\mu )}^{2}}}{2{{\sigma }^{2}}})\) to calculate the variable o in the memory saturation increment. The new samples too close and too far from the dynamic core have little disturbance to the original stable memory of the dynamic core. For the samples too close to the winning core center, the difference between them and the dynamic core is so slight that the new information introduced is almost 0, so the updated value of the memory saturation degree of the core is 0. The samples too far from the winning core center have less impact on the cluster represented by the winning core. Thus, we use the expression \(o(1-o)\) as the increment Δs of memory saturation.

As shown in Fig. 3, \({{n}_{i\_\mu }}\) is the center vector of the winning dynamic core \({{n}_{i}}\). When the new sample \({{\mathbf{x}}_{i+1}}\) is closer to the center vector of the dynamic core \({{n}_{i}}\), the output of the response of the dynamic core \({{n}_{i}}\) according to the radial basis function is closer to 1, indicating that the dynamic core can well represent the new sample \({{\mathbf{x}}_{i+1}}\), and Δs tends to 0, i.e., the dynamic core does not need to be significantly adjusted; if \({{\mathbf{x}}_{i+1}}\) is farther away from the dynamic core \({{n}_{i}}\), the output of the response of the dynamic core \({{n}_{i}}\) according to the radial basis function is closer to 0, which means that the dynamic core does not represent the new sample \({{\mathbf{x}}_{i+1}}\), and Δs also tends to 0, i.e., the dynamic core does not need to be greatly adjusted. Furthermore, only when the new samples are located under the bimodal curve in Fig. 3, it is considered that the information introduced by the new samples is of great significance to the winning core. Therefore, the algorithm promotes the splitting of the core by increasing the memory saturation of the winning core.

Figure 3
figure 3

Influence of distance between new sample and winning dynamic core \({{n}_{i}}\) center on variable o

3.4.2 Design of structural parameters

The memory saturation degree of the dynamic core indicates the degree of information carried by the dynamic core. If the dynamic core \({{n}_{i}}\) wins in many competitions (indicating that new samples continue to appear within the coverage radius of the dynamic core \({{n}_{i}}\)), i.e., the dynamic core \({{n}_{i}}\) is continuously stimulated, then the memory saturation degree of the dynamic core will continue to increase. When the memory saturation degree \({{n}_{i\_m}}\) of the dynamic core \({{n}_{i}}\) exceeds the upper limit of its information-carrying capacity, indicating that the sample density in the coverage area of the dynamic core is high, it will split to generate a new core, and multiple dynamic cores are used to subdivide the original cluster. In the DCC algorithm, we call the threshold of dynamic core splitting as saturation threshold \({{s}_{t}}\). The design of saturation threshold is to adopt more dynamic core representation in the dense area of sample distribution, to facilitate the dynamic core to find the local characteristics of the cluster in the process of dynamic location adjustment. Moreover, it avoids using a single core to represent a large group of samples, which is also an embodiment of the subdivision ability of the algorithm to the characteristics of the sample distribution.

When the winning dynamic core reaches the saturation threshold splitting, the DCC algorithm sets the memory saturation degree of the winning dynamic core to the reset threshold \({{r}_{t}}\). The dynamic core can win in many competitions and finally split, indicating that the dynamic core is in an active state, i.e., in the sample distribution area represented by the dynamic core, the possibility of new samples appearing again is high. If it is not reset, the dynamic core will be in a state of frequent splitting; reset to 0 is equivalent to abandoning the achievements learned before the dynamic core. Therefore, we reset it to \({{r}_{t}}\) to maintain the stable memory of the original learning results and avoid continuous division.

In the process of people’s memory, the timing of neuronal cell division has nothing to do with the type of object contacted, but only with its characteristics. The saturation threshold and reset threshold in the memory saturation degree also belong to the inherent structural parameters of the algorithm, which do not change with the change of clustering task, and there is no need to adjust parameters manually for different datasets. Through this design, the clustering results of the algorithm are not sensitive to parameter settings, and the scalability and robustness of the algorithm can be improved.

3.4.3 Forgetting mechanism

Furthermore, there is a forgetting phenomenon in the process of memory. For example, the forgetting curve proposed by Ebbinghaus, if the things that have been remembered do not appear for a long time, are easy to forget. Inspired by this, we designed the forgetting mechanism of the dynamic core according to the evolution process of dynamic core in the design of the DCC algorithm. When the dynamic core is about to reach the threshold but has not split, the stimulation of this kind of object disappears. At this time, to inhibit the excessive division of the core, the dynamic core will start the forgetting mechanism and gradually reduce the memory of this kind of object until it contacts the same kind of object again.

Due to the disorder of new samples, the dynamic core will not always win in the competition, i.e., the memory saturation degree of the dynamic core will not continue to increase. When the memory saturation degree of the dynamic core exceeds the reset threshold but does not reach the split threshold, the memory saturation degree is attenuated to the reset threshold according to the beat. As shown in Fig. 4, the ordinate represents the memory saturation degree of the dynamic core, and the abscissa represents the new samples added in turn. The dotted line in Fig. 4 indicates that the increase of memory saturation degree is not linear, but related to the newly added samples; the red arrow indicates that the dynamic core wins after the sample is added; the black line indicates that the dynamic core did not win after the sample was added. It can be found from Fig. 4 that when the sixth sample is added, and the dynamic core does not win, the memory saturation degree of the dynamic core decreases according to the beat. When the seventh sample is added, the dynamic core wins, and the memory saturation degree increases. When the ninth sample is added, the dynamic core does not win, but the memory saturation degree is already the reset threshold, so it will not be reduced and unchanged.

Figure 4
figure 4

Forgetting mechanism of dynamic core memory saturation degree

4 Our algorithm

4.1 Evolution of dynamic core

The movement based on the dynamic core center can realize the dynamic adjustment of the position of the dynamic core in space, so that the dynamic core can adaptively discover the dense areas of sample distribution in space. Therefore, the adjustment mode of the dynamic core determines the mapping relationship between the location of the dynamic core and cluster. After the new sample \({{\mathbf{x}}_{i+1}}\) is added, the dynamic core contains three states in the competitive learning process:

Status 1: initial core \({{n}_{1}}\) wins

Since the memory saturation degree of the initial core \({{n}_{1}}\) is the saturation threshold \({{s}_{t}}\), \({{n}_{1}}\) is in a state of splitting at any time, i.e., as long as it wins the competition, it will split immediately to generate a new dynamic core \({{n}_{{{k}_{{i+1}}}}}\). However, in the actual incremental environment, we cannot obtain all the sample information in the space to determine the center position of the space, so we take the position of the first new sample contacted by the algorithm in the space as the center of the initial core and do not adjust it. Furthermore, the coverage domain of the initial core is set as global coverage, i.e., \({{n}_{1\_\sigma }}=\inf \), so that a new core representing the discrete point can be generated for the discrete point that cannot be classified into any existing dynamic core.

As shown in Fig. 5, the center of the new core \({{n}_{{{k}_{{i+1}}}}}\) generated by the initial core splitting is the position of the new sample \({{\mathbf{x}}_{i+1}}\), and the memory saturation degree is assigned as 0. The coverage domain is set to \(\frac{1}{3}\operatorname{Dis}({{n}_{1\_\mu }},{{\mathbf{x}}_{i+1}})\), i.e., the parent core is located at the 3σ boundary of the newly generated core, so that the newly generated child core can better represent the emerging sample cluster and inherit the sample cluster information learned by the parent core. Due to the generation of the new core, the new sample \({{\mathbf{x}}_{i+1}}\) belongs to the category represented by the new core. Therefore, the category of the new sample \({{\mathbf{x}}_{i+1}}\) is labelled as \({{l}_{{{\mathbf{x}}_{i+1}}}}={{l}_{{{n}_{{{k}_{{i+1}}}}}}}\).

Figure 5
figure 5

The initial core \({{n}_{1}}\) wins and splits to generate a new core \({{n}_{{{k}_{{i+1}}}}}\)

Status 2: dynamic core \({{n}_{g(g\in [2,{{k}_{i}}])}}\) wins

Firstly, update the memory saturation degree of dynamic core \({{n}_{g}}\) to \(n_{g\_m}^{\prime }\) according to (2), and judge the relationship between the memory saturation degree of winning core and saturation threshold, if:

  • \(n_{g\_m}^{\prime }<{{s}_{t}}\). Adjust the center of dynamic core \({{n}_{g}}\) according to \(n_{g\_m}^{\prime }=n_{g\_m}^{\prime }\cdot {{n}_{g\_\mu }}+(1-n_{g\_m}^{\prime }) \cdot {{\mathbf{x}}_{i+1}}\), and \(n_{g\_\mu }^{\prime }\) represents the adjusted center position, and the coverage domain \({{n}_{g\_\sigma }}\) remains unchanged. At this time, the new sample is represented by the dynamic core \({{n}_{g}}\), and the category of the new sample \({{\mathbf{x}}_{i+1}}\) is labelled as \({{l}_{{{\mathbf{x}}_{i+1}}}}={{l}_{{{n}_{g}}}}\). The process of adjusting the center of dynamic core is shown in Fig. 6.

    Figure 6
    figure 6

    Dynamic core \({{n}_{g}}\) wins and adjust the center of \({{n}_{g}}\)

  • \(n_{g\_m}^{\prime }\ge {{s}_{t}}\). Perform a split operation to generate a new dynamic core \({{n}_{{{k}_{{i+1}}}}}\). Although the winning core \({{n}_{g}}\) is split, its center and coverage are not adjusted, and the memory saturation is reset to \({{r}_{t}}\). The generated core \({{n}_{{{k}_{{i+1}}}}}\) center vector is represented by the position of the new sample \({{\mathbf{x}}_{i+1}}\), the coverage area is \(\frac{1}{3}\operatorname{Dis}({{n}_{g\_\mu }},{{\mathbf{x}}_{i+1}})\), and the memory saturation degree is set to 0. Then the category of the new sample \({{\mathbf{x}}_{i+1}}\) is labelled as \({{l}_{{{\mathbf{x}}_{i+1}}}}={{l}_{{{n}_{{{k}_{{i+1}}}}}}}\). The process of splitting to produce a new core is shown in Fig. 7.

    Figure 7
    figure 7

    Dynamic core \({{n}_{g}}\) wins and splits to generate a new core \({{n}_{{{k}_{{i+1}}}}}\)

Status 3: dynamic core \({{n}_{g(g\in [2,{{k}_{i}}])}}\) does not win

According to the memory saturation degree of the dynamic core, judge whether to attenuate the memory saturation degree. If \({{r}_{t}}<{{n}_{g\_m}}<{{s}_{t}}\), decay according to the set number of beats, e.g., \(n_{g\_m}^{\prime }= {{n}_{g\_m}}-\frac{1}{\tau }\text{(}{{n}_{g\_m}}-{{r}_{t}})\) means decay to the reset threshold according to τ beats. The center and coverage of the dynamic core are not adjusted. If \({{n}_{g\_m}}<{{r}_{t}}\), the dynamic core will not change.

4.2 Model evolution algorithm framework

DCC algorithm can realize the evolution of the clustering model from 0, i.e., firstly, the clustering model needs to be obtained according to the model evolution algorithm. The model evolution algorithm determines the initial core \({{n}_{1}}\) through the setting method of the initial core, and then updates the clustering model and the quaternion information of the dynamic core contained in the model according to the evolution mechanism of the dynamic core. Before adding a new sample \({{\mathbf{x}}_{i+1}}\), the model evolution algorithm has obtained the dynamic core set \({{N}_{i}}\). The input of the model evolution algorithm is the new sample \({{\mathbf{x}}_{i+1}}\), and the output is the updated dynamic core set \({{N}_{i+1}}\) and the category label of sample \({{\mathbf{x}}_{i+1}}\). We give the pseudo-code of the model evolution process in the DCC algorithm, as shown in Algorithm 1.

Algorithm 1
figure a

Model_evolution

4.3 DCC algorithm framework

The clustering model that evolves with the increase of samples can be obtained by the model evolution algorithm. The model only contains the quaternion information of each dynamic core. Based on the model evolution algorithm and using the idea of evolution, the DCC algorithm needs to cluster all the learned samples in space, to obtain the clustering results of samples. The i unlabeled sample sets learned by the DCC algorithm are expressed as \({{C}^{i}}=\{{{\mathbf{x}}_{1}},{{\mathbf{x}}_{2}},\ldots,{{\mathbf{x}}_{i}} \}\), and the clustering result after learning, i.e., the sample set with clustering label, is expressed as \({{D}^{i}}=\{({{\mathbf{x}}_{1}},{{l}_{{{\mathbf{x}}_{1}}}}),({{ \mathbf{x}}_{2}},{{l}_{{{\mathbf{x}}_{2}}}}),\ldots,({{\mathbf{x}}_{i}},{{l}_{{{ \mathbf{x}}_{i}}}})\}\). The input of the DCC algorithm is the new sample \({{\mathbf{x}}_{i+1}}\), and the output is the clustering result \({{D}^{i+1}}\) of the learned \(i+1\) samples. The pseudo-code is shown in Algorithm 2.

Algorithm 2
figure b

DCC

The dynamic core represents all samples within its coverage radius, i.e., the samples within the dynamic core 3σ range should be clustered into the same class as the dynamic core. In order to cluster the samples, for the generated dynamic core set \({{N}_{i+1}}\), we use function \(\operatorname{Link}()\) to label the dynamic core. As shown in Fig. 8(a), during the dynamic adjustment of dynamic core center and coverage area, the center of dynamic core B enters the coverage area of dynamic cores A. Therefore, we mark dynamic cores A and B as the same category. As shown in Fig. 8(b), the centers of dynamic cores A and B do not enter the coverage of other dynamic cores, so the algorithm marks dynamic core A as one class and B as another class. Due to the increase of new samples, the coverage and center of the dynamic core will be dynamically adjusted, and the function \(\operatorname{Link}()\) will mark the dynamic core according to the center and coverage of the updated dynamic core, to cluster the dynamic core into different classes.

Figure 8
figure 8

Connection principle of dynamic core

When all dynamic cores are labelled, the categories of the learned \(i+1\) samples need to be updated. Because the DCC algorithm is based on an evolutionary pattern, the addition of each new sample can adjust the previous clustering results. Function \(\operatorname{Clustering}()\) takes all the learned samples as the input of the dynamic core in the updated clustering model, and clusters the samples into the cluster to which the dynamic core with the largest output belongs.

4.4 Time complexity analysis

In Algorithm 1, when a new sample \({{\mathbf{x}}_{i+1}}\) is added, the model evolution algorithm first takes it as the input of dynamic cores in competitive learning, and calculates the output according to the activation function. Therefore, the time complexity has a linear relationship with the number of cores, i.e., \({{t}_{1}}= {{k}_{i}}\). After determining the winning core, adjust its parameters or split the core, but this operation has nothing to do with the number of samples and cores, so it does not affect the time complexity.

In Algorithm 2, the DCC algorithm uses function \(\operatorname{Link}()\) to mark the set of dynamic cores after the model evolution algorithm. It needs to compare all cores’ coverage and central location, so the time complexity is \({{t}_{2}}=\frac{{{k}_{i+1}}({{k}_{i+1}}-1)}{2}\). Then, the function \(\operatorname{Clustering}()\) is used to cluster all the learned \(i+1\) samples, and the time complexity is \({{t}_{3}}=(i+1){{k}_{i+1}}\). Therefore, the time complexity of clustering the new samples is \({{T}_{i+1}}={{t}_{1}}+{{t}_{2}}+{{t}_{3}}={{k}_{i}}+ \frac{{{k}_{i+1}}({{k}_{i+1}}-1)}{2}+(i+1){{k}_{i+1}}\). However, because the number of cores \({{k}_{i+1}}\) is far less than the number of samples \(i+1\), i.e., the time complexity of clustering new samples is \({{T}_{i+1}}=O ( (i+1){{k}_{i+1}} )\).

5 Simulation

In this section, we verify the principle of the DCC algorithm by experiments, and show the characteristics of the DCC algorithm by comparing it with other algorithms. Furthermore, we analyze the sensitivity of the algorithm and the adaptability of the algorithm on unbalanced datasets, and make a simple application in the actual clustering scenario. The algorithm experiments are implemented in Matlab2020 on the same machine with i7-9750H CPU, 2.60 GHz

5.1 Preparation

Datasets: This paper mainly uses synthetic datasets to verify the principle of the algorithm, mainly including data with uneven density distribution, datasets subject to a normal distribution, and datasets with random distribution, to verify the algorithm’s effectiveness on datasets with different characteristic numbers. The details of the dataset are shown in Table 1.

Table 1 Datasets

Evaluation indicators: We select two commonly used clustering internal indicators and two external indicators as the criteria to evaluate the clustering performance, including DBI [44], CH [45], ACC, and NMI [46]. The higher the ACC, NMI, and CH scores, the better the clustering performance of the algorithm, and the lower the DBI score, the better the clustering result.

Comparison algorithm: We use the commonly used incremental K-means algorithm [47], incremental DBSCAN algorithm [48], GNG algorithm [49], GWR algorithm [50] and SOINN algorithm [51] for comparison. The incremental K-means algorithm first uses the K-means algorithm to cluster the existing data in batch, and then judges whether to generate new categories after adding new samples according to the set threshold. In the incremental DBSCAN algorithm, because the increase of new samples only affects the density of other samples in the neighbor of new samples, when updating the clustering model, we only need to change the status of all core samples in the neighbor of new samples. The Growing Neural Gas Network (GNG) algorithm uses neural networks to process incremental data, and clusters the newly added data through changes in network nodes, which has good scalability. The GWR algorithm and the GNG algorithm differ in the way they insert network nodes. The GWR algorithm changes the state of new nodes through the activation level of neurons. To stabilize the clustering results, the SOINN algorithm processed noisy nodes and designed a threshold adjustment mechanism, achieving good results in online clustering. In the comparative experiment, the parameters involved in the five comparative algorithms are the optimal parameters selected through many experiments.

5.2 Sensitivity analysis

The parameters involved in the DCC algorithm mainly include the structural parameters of dynamic core, such as reset threshold and forgetting beat in memory saturation degree. These two parameters control the splitting of dynamic cores and affect the number of dynamic cores finally generated. The change of forgetting beat is equivalent to the adjustment of the reset threshold. Therefore, we study the change of reset threshold on the algorithm while controlling the forgetting beat to remain unchanged. The reset threshold \({{r}_{t}}\) is between 0 and saturation threshold \({{s}_{t}}\). when \({{r}_{t}}\) is small, it means that the dynamic core needs to increase from a smaller memory saturation degree after splitting, so the frequency of splitting is low; on the contrary, a larger \({{r}_{t}}\) means that the dynamic core can still split at a higher frequency after splitting. Therefore, the setting of the reset threshold has an essential impact on the number of dynamic cores and clustering results. In the dataset DS3, we use different reset thresholds \({{r}_{t}}\) to test the changes of the number of dynamic cores and DBI, and test the sensitivity of the DCC algorithm to this parameter. In the experiment, the reset threshold \({{r}_{t}}\) changes in increments of 0.02 from 0.4 to 0.9, and the samples randomly increase one by one. Under each reset threshold, in order to avoid the impact of sample entry mode on the results, we take the average value of DBI obtained from 100 experiments as the final DBI score, and the average number of dynamic cores obtained from 100 experiments represents the number of dynamic cores under this threshold. Twenty groups of experiments were carried out at each threshold.

As shown in Fig. 9(a), dynamic cores show an apparent upward trend with the increasing reset threshold \({{r}_{t}}\). When the reset threshold is close to the saturation threshold, the dynamic core can easily reach the saturation threshold and split after the next win. Therefore, the number of generated dynamic cores will increase with the increase of \({{r}_{t}}\). As shown in Fig. 9(b), with the increasing number of \({{r}_{t}}\) and dynamic cores, the DBI of the final clustering result gradually decreases. When the value of threshold \({{r}_{t}}\) is too small, the number of dynamic cores generated is small. Although the algorithm’s complexity is low, it reduces the memory of the dynamic cores for the learned results. Thus, considering the algorithm complexity and DBI, we set the structure parameter \({{r}_{t}}\) to 0.8, maintaining a good DBI level, and using fewer dynamic cores to represent the sample cluster.

Figure 9
figure 9

Clustering results of DCC algorithm

5.3 Comparison and analysis of experimental results

Based on the adjustment of dynamic core center and coverage, the DCC algorithm is suitable for finding convex clusters in space. As shown in Fig. 10, we show the clustering results of the algorithm on the synthetic dataset, where “*” represents the position of the generated dynamic core in space. As shown in Fig. 10(a), more dynamic cores are generated in the dense sample distribution area, which is also in line with the original intention of the algorithm design, i.e., the dynamic core matches the sample distribution. In Fig. 10(b) and Fig. 10(c), the generated dynamic core can well fit the concentrated area of sample distribution, i.e., the adaptive adjustment feature of the dynamic core can well find the distribution of different feature clusters.

Figure 10
figure 10

Changes of dynamic core number and DBI under different reset thresholds

Taking DS3 as an example, we compare the DCC algorithm with the incremental K-means algorithm, incremental DBSCAN algorithm, GNG algorithm, GWR algorithm and SOINN algorithm. In DS3, there are only ten samples initially, and the labels of these ten samples are known. Then add new samples in turn, and compare the changes of the four indicators of the six algorithms when the samples are increased, in which the entry order of the samples is the same. As shown in Fig. 11(a) and Fig. 11(b), with the continuous increase of samples, the ACC and NMI of the incremental K-means and DBSCAN algorithms gradually increase, but the ACC and NMI scores of the DCC algorithm are higher than those of the other two algorithms and remain at a high level. As shown in Fig. 11(c), the DBI and CH of the DCC algorithm are the same as those of the incremental DBSCAN algorithm, but better than those of the incremental K-means algorithm. Although GNG algorithm, GWR algorithm, and SOINN algorithm use neurons to process new samples, they completely use hyperparameters to control the division of neurons when dealing with new samples, resulting in significant ladder like characteristics and severe oscillation in the curves of the four indicators. The stepped changes in the performance curves of GNG, GWR, and SOINN algorithms indicate a significant impact of adding new samples on the algorithm’s historical clustering results, but this phenomenon does not exist in the DCC algorithm. Although the DCC algorithm also has the process of updating dynamic cores with new samples, it adopts a memory mechanism to control the splitting and generation of dynamic cores, allowing the DCC algorithm to balance historical clustering results and new information introduced by new samples when processing new samples. Moreover, in the four indicators, the curve change of the DCC algorithm is relatively smooth, because when adding new samples, the algorithm can fine-tune the results in real-time according to the new samples based on inheriting the original learning results. The final performance comparison of the six algorithms after adding the last sample is shown in Table 2.

Figure 11
figure 11

Comparison of algorithm results

Table 2 Comparison of performance indicators of different algorithms after adding the 400th sample

However, we cannot obtain the distribution information and category labels of samples in the actual task. Therefore, we designed the experiment of clustering randomly distributed samples by DCC algorithm, incremental K-means algorithm, and incremental DBSCAN algorithm without label information, and evaluated it by internal index DBI. Figure 12(a) shows the clustering results of the DCC algorithm in 200 randomly distributed samples. It can be seen that although the sample distribution has no law, the DCC algorithm can still get better clustering results, because the DCC algorithm can adaptively search the position of clusters by adjusting the center of the dynamic core and coverage domain. Figure 12(b) shows the real-time DBI comparison of the three algorithms with the continuous increase of samples. It can be seen that the DBI of the DCC algorithm is significantly less than that of the other two algorithms, and the changing trend is relatively stable. When the number of samples increases to 500, the performance of the DCC algorithm is still better than incremental K-means and incremental DBSCAN algorithms, as shown in Figs. 12(c) and (d). The comparison of the final DBI indicators of the six algorithms after adding the last sample is shown in Table 3.

Figure 12
figure 12

Comparison of results when samples are randomly distributed

Table 3 Comparison of DBI after adding the last sample

5.4 Robustness experiment

In order to verify the robustness of the DCC algorithm, we designed two groups of the experiments on unbalanced datasets. In the first group of experiment, the dataset DS6 contains five clusters, and each cluster follows a normal distribution with a variance of 0.5, but the number of samples contained in the cluster is different \((20, 40, 60, 80, 100)\). When the samples increase randomly, we show the DCC algorithm’s clustering results and DBI changes, as shown in Fig. 13. In Fig. 13(b), the change of DBI is relatively gentle, and the DBI indicator remains around 0.15 after the addition of the 50th sample. This indicates that with the continuous increase of samples, the clustering results obtained by the DCC algorithm can be maintained at a good level. Although the number of samples in each cluster is different, the design of the initial core enables the DCC algorithm to find the cluster with a small number of samples, and use the new dynamic core generated by the initial dynamic core to represent it.

Figure 13
figure 13

Robust performance of DCC algorithm with different sample numbers

In the second group of experiments, to test the DCC algorithm’s clustering effect in clusters with uneven density distribution, the dataset DS7 we used contains 5 clusters, each cluster contains 100 samples, but each cluster obeys the normal distribution with different variance \((0.1,0.3,0.5,0.7,0.9)\). In dataset DS7, clusters with high density are distributed more intensively, the coverage area in space is small, and the coverage area of clusters with low density is large. DCC algorithm makes the dynamic cores in the final algorithm model cover clusters with different density distribution well by adjusting the coverage domain of the dynamic core. As shown in Fig. 14, in clusters with low density, the number of dynamic cores is large and the coverage range is large; in clusters with high density, the number of dynamic cores is small and the coverage range is small. Furthermore, with the continuous increase of new samples, the DBI score obtained by the DCC algorithm is small and the change is relatively gentle. For example, although the DBI index increases to 0.33 after the 46th sample is added, the DBI gradually decreases with the increase of samples, and fluctuates around 0.25 overall, which shows that the DCC algorithm can also achieve better results in clusters with uneven density distribution.

Figure 14
figure 14

Clustering performance of DCC algorithm with different variance

5.5 Practical application

DCC algorithm has a good effect on solving the convex clustering problem, so we apply it to the pipeline for real-time detection and clustering of geometric objects (the geometric object images in the experiment are from the COIL-100 dataset). The objects on the pipeline are usually added one by one, which is consistent with the input data requirements of the DCC algorithm. Therefore, we capture the conveyor belt picture in real-time through the camera on the pipeline, extract the frame image, use the PCA method for feature extraction, and input the feature vector into the DCC algorithm for evolutionary clustering.

For the clustering of objects on the pipeline, the method based on supervised learning is mainly used at present, i.e., before clustering, many samples in pipeline tasks are obtained, and then a better classifier is obtained by training network parameters. But, the classifier’s effectiveness will be significantly reduced when we cannot get enough training samples. However, the evolutionary clustering DCC algorithm does not need any knowledge about the task before clustering, it can automatically adjust the clustering results with the continuous samples in the task, and learn knowledge in the process of increasing new samples. Therefore, the evolutionary algorithm can normally work when the task sample size is small, and give reasonable clustering results about the learned samples in real-time by constantly adjusting the algorithm structure. The real-time clustering process of geometric objects on the convective waterline in the DCC algorithm is shown in DCC.GIF (see Additional file 1), in which the increasing geometric objects are new samples collected in real-time on the pipeline, and each line represents a cluster. In the clustering process, the number of lines in the video will change with the increase of samples, which means that the DCC algorithm will adjust the previous clustering results in real-time after adding new samples and evolve the algorithm model simultaneously.

The final clustering results of geometric objects on the convective waterline by the DCC algorithm are shown in Fig. 15. There are five kinds of geometric objects, containing 20 samples, and the placement positions of the same kind of objects in the image are different. This information is unknown before clustering. It can be found from Fig. 15 that the DCC algorithm aggregates objects into five categories, mainly because it aggregates objects of the second category into two categories (lines 2 and 6 of Fig. 15). Because the second kind of object is similar to the shape of long strips, when they are placed in the cross position, they are easy to be mistaken for two kinds of objects by the algorithm, but there is no wrong clustering of samples in the clustering results. Figure 16 shows the change of DBI score of clustering results obtained in real-time by the DCC algorithm with the continuous increase of objects. It can be seen from Fig. 16 that the DBI index fluctuates greatly during the adjustment process, but this is due to the small number of samples in the early stage of evolutionary clustering, which makes the newly added samples have a significant impact on the calculation of DBI, but it is finally stable around 0.6.

Figure 15
figure 15

The final clustering result of geometric objects by the DCC algorithm in pipeline task

Figure 16
figure 16

Real-time DBI change of the DCC algorithm in geometric object clustering

6 Conclusion

Aiming at the defects caused by the fixed algorithm model in the process of incremental clustering, a dynamic core evolutionary clustering algorithm DCC based on saturated memory is proposed in this paper. When new samples enter, all dynamic cores in the algorithm model respond according to the Gaussian function, judge the winning core according to competitive learning, and then adjust the winning dynamic core. Furthermore, the DCC algorithm uses the memory saturation degree to record the memory of the dynamic core to the learned samples, which reflects the frequency that the dynamic core is activated by the samples, and can control the core splitting through the change of memory saturation degree, to achieve a good match between the dynamic core and the sample distribution. The structural parameters involved in the DCC algorithm are the inherent attributes of all dynamic cores, and the clustering results are not sensitive to them. Many experiments show that the DCC algorithm can adaptively find the clusters in the sample space by updating the dynamic core center, coverage and memory saturation degree, and has good clustering performance and robustness. Finally, the test of the DCC algorithm in the pipeline task also verifies that the DCC algorithm has practical application value.

This paper focuses on the principle verification of the DCC algorithm. In future research, we will further improve the algorithm’s clustering accuracy and expansion performance and expand it to more clustering tasks.