1 Introduction

Classification is a fundamental task in machine learning, involving assigning data objects to apriori classes based on the values they assume for a set of features. It has received significant interest and has been extensively utilized in fields such as healthcare and medical diagnosis (Sivasankari et al., 2022; Malakouti, 2023), as well as image and video recognition (Wang et al., 2023; Chen et al., 2021).

The accuracy of a classifier depends on the availability of relevant and informative features, as well as on the choice of the classification algorithm. However, in many real-world scenarios, the data is complex, and the available features may not provide enough information to achieve high accuracy.

The statistical literature on supervised classification is growing rapidly. The use of a single algorithm often produces unsatisfactory results, often due to the complex structure of the groups to be classified (e.g., unbalanced distributions, non-linear relationships from predictors, presence of anomalous values). In recent years, techniques integrating or merging multiple algorithms from both supervised and unsupervised learning have been developed to enhance the decision rules provided by the model (Soheily-Khah et al., 2018; Sarker, 2021).

The K-nearest neighbor (KNN) algorithm has gained recognition as a powerful tool in the field of machine learning, providing an effective and straightforward method for classification in various pattern recognition scenarios (Zhang, 2016; Taunk et al., 2019). The primary approach employed by KNN involves determining the class of query samples by measuring the distance to the objects in the training set. The label of the query sample is then set by majority voting on the membership of the k-nearest objects in the training set. Recently, several novel adaptations of the KNN algorithm have been developed (Zhang et al., 2017b; Luo et al., 2020; Rastin et al., 2021a).

The Euclidean distance is often used with KNN algorithm to measure the dissimilarity between training and testing data. This procedure involves computing the dissimilarity, determining the nearest k neighbors based on these dissimilarities, and subsequently classifying the test sample based on the dominant class among the k neighbors. Although the Euclidean distance is easy to understand, it assigns equal importance to all sample features by considering them equally when calculating the distance. The use of equal weighting might be a limitation, particularly in situations where distinct features have differing degrees of importance to the categorization objective. To tackle this problem, various research has suggested alternative distance metrics that provide a more sophisticated method for calculating distances in KNN-based classification. These metrics have the potential to improve the performance of KNN-based classification (Chomboon et al., 2015; Ruan et al., 2021; Zhao & Yang, 2023).

This work explores the apriori classes and suggests the presence of subclasses or hidden patterns within them. This concept forms the core motivation of our research. The lack of exploration of such hidden patterns in existing literature strengthens our investigation. Although there have been numerous advancements in KNN recently (Gou et al., 2019a, b, c, 2022), none of them particularly address these hidden patterns.

The objective of our work is to bridge this gap by providing a comprehensive understanding of the complexities of apriori classes, with a specific emphasis on the unknown subclasses and hidden patterns that may exist within them.

In reality, many phenomena are often characterized by multiple sub-structures or sub-patterns. This implies that instances belonging to the same class can be distinguished by specific characteristics with varying relevance in the classification process.

The objective of this research is to improve the accuracy of the supervised classification (KNN) on high-dimensional data by integrating an unsupervised classification phase. The main idea is to extract relevant information from the original data by discovering sub-patterns that can aid in the classification task. However, integrating this information with a supervised classification algorithm in an efficient way poses a significant challenge.

This paper proposes a strategy based on three key points: (1) An algorithm based on the dynamic clustering (DC) algorithm (Diday, 1971) obtains subgroups from the initial labeled data which are combined with the original patterns to form a new cluster space. (2) An appropriate weight system is sought, aiming to find optimal weights for the features of each subgroup, using adaptive distances. (3) A KNN classification method that assigns the labels to data according to the cluster space carried out from the DC partition.

The traditional objective function used in the most well-known DC based method, the K-means, relies solely on the sum of the within-cluster deviance (Sinaga & Yang, 2020). This means that the objective function is related to the quadratic distances between each object in a cluster and the cluster representative. Despite awareness of Huygens’ theorem, which states that minimizing the deviance within clusters is equivalent to maximizing the deviance between clusters, the optimized criterion is not designed to handle the constraints imposed by the presence of apriori groups, due to the labels in the training set.

To address these challenges, a new objective function is proposed for the DC method that relies on both inter-cluster and intra-cluster variability to improve the algorithm’s performance and robustness. It is optimized to provide the identification of more homogeneous patterns (subgroups) and better separation between subgroups.

We also propose the integration of adaptive distances into the clustering procedure to measure the importance of features in the classification process, especially for complex, high-dimensional data (Diday et al., 1981). Weights are assigned to the features for each cluster. This results in the selection of features, according to the values of the associated weights, that significantly influence the achieved clusters (Li & Wei, 2020).

Finally, two supervised KNN classifier variants are proposed to label new elements according to the clusters of the achieved partitions of the initial classes. Specifically, the first proposal assigns a new instance to an apriori class based on the minimum (adaptive) distance to the elements of the clusters, while the second proposal considers the proximity to centroids (or representative elements) of the clusters rather than to the single elements, to improve significantly the computational cost, especially when dealing with a large number of elements.

The search for clusters (new patterns in a priori classes) through a DC algorithm improves the performance and accuracy of the classifier results.

This work has the ambition to advance our understanding of the dynamic clustering algorithm and provide an original approach in the field of classification. Especially, the proposal is denoted by two noteworthy innovations. Firstly, the paper highlights the benefits of using unsupervised classification techniques to identify new patterns in the original groups. This approach has a direct impact on the supervised classification’s performance, and the combination of these techniques leads to more definitive predictive results. Secondly, the paper proposes an alternative objective function to be utilized during the clustering stage. The adoption of this approach is expected to yield richer results than minimizing only the between clustering.

The structure of this paper is as follows. Section 2 provides a review of the main contributions in the current literature. Section 3 presents the proposed classification approach and the combination between DC and KNN, as well as a new objective function to ensure homogeneity within subgroups of the original dataset. Furthermore, Sect. 4 provides an application of the suggested approach on real datasets, along with a simulation study that covers six distinct scenarios. Section 5 offers concluding remarks and discusses future directions for further investigation.

2 Literature Review

Classification techniques, now increasingly developed in the field of machine learning, address the problem of assigning entities to predefined classes. Most classification methods construct models based on features that represent the characteristics of prior classes.

Many of them have been proposed with the aim of selecting the most discriminating features of groups of individuals and carrying out stable classification rules to predict the behavior of new entities. The most traditional algorithms of classification as KNN (Fix & Hodges, 1989), Naive Bayes classifier (Duda et al., 2006), C4.5 (Quinlan et al., 1996), logistic regression, classification and regression tree (Breiman, 2017), and the stochastic gradient boosting decision tree (Friedman, 2002) are recognized as having high accuracy. Due to the increasing volume of data associated with many real-world problems and their inherent complexity, novel learning classifiers have been proposed. These include variants of KNN, such as k-most similar neighbor (k-MSN), linear scan, and locality-sensitive hashing (LSH). Other classification algorithms designed for high-dimensional data encompass extreme learning machine, sparse representation-based classification (Abavisani & Patel, 2019), and certainly, all deep learning algorithms.

Numerous studies have been carried out to compare different classification methods in order to select the most appropriate classifier for specific problems, among the various papers with this aim, see Zhang et al. (2017a). One of the main considerations in comparing classification methods is that performance depends on the data analyzed and not on the particular algorithm. Likewise, accuracy should not be understood as the only measure of algorithm performance. Strong attention must be paid to feature selection as it deals with dimensionality reduction.

One of the main challenges of classification methods regards feature selection. Classes are typically not distinguished by explicit features. Despite the use of advanced feature selection algorithms, the number of dimensions in these characteristics can still be very large, making it challenging to accurately capture the similarity of classes. The KNN technique, like other classical methods, has inherent limitations that restrict its classification capacity. The limitations cover the curse of dimensionality, the computational cost, sensitivity to outliers, the challenge of determining the optimal value for k, and its non-parametric nature, which affects the interpretation and generalization of results with new data. Although KNN is both simple and powerful, it is crucial to take into account these constraints when deciding if it is a suitable method for a certain task.

Numerous approaches have been proposed to overcome the limitations of traditional KNN by proposing new variants. To address the problem of the KNN algorithm’s sensitivity to the choice of k, researchers have proposed different methods to dynamically determine the optimal k. One such approach, suggested by Gou et al. (2019b), involves two variations of the KNN rule: weighted representation-based KNN rule (WRKNN) and weighted local mean representation-based KNN rule (WLMRKNN). The experimental results indicate that the suggested methods demonstrate a lower sensitivity to the number of cluster k. The research conducted by Gou et al. (2019a) introduces the generalized mean distance-based KNN (GMDKNN) classifier as a method to enhance the choice of the neighbor’s number k. They asserted that the proposed technique shows lower sensitivity to the parameter k compared to the KNN-based classifiers. In Pan et al. (2020) propose a locally adaptive KNN algorithm based on discrimination class (AD-LAKNCN). This approach optimizes the values of k by taking into account the discrimination classes from the majority and second majority classes within the k neighborhood.

Furthermore, the KNN algorithm lacks a mechanism to assign varying weights to surrounding data points. To address this, Gou et al. (2012) introduced a new classification algorithm called distance-weighted KNN rule (DWKNN). This algorithm aims to overcome the sensitivity problem of selecting the neighborhood size and enhance classification performance. DWKNN utilizes a distance-weighted dual function and proves to be relatively robust to different choices of K, demonstrating good performance with a larger optimal K, as evidenced by experimental results on twelve real datasets. The performance of DWKNN surpasses other KNN-based methods currently considered state-of-the-art. Afterward, Rastin et al. (2021b) introduced a KNN stacking technique that employs a feature-weighted distance metric to mitigate the impact of irrelevant classes during stacking. Both of the aforementioned approaches take into account the weight of each adjacent point. However, it is important to note that when selecting weights, considering merely the distance information is inadequate.

In order to address the issue of the KNN algorithm’s sensitivity to noise points, Gou et al. (2019c) introduced a KNN approach called LMRKNN, which utilizes the multi-local mean vectors of the KNN belonging to the same class to linearly represent the testing sample. The referenced method employs local mean vectors effectively to reduce outlier influence, achieving notable classification accuracy. However, its performance is still sensitive to the selection of the k parameter.

Cherif (2018) proposed a K-means-based-KNN algorithm that utilizes the K-means algorithm to partition the training dataset into a predetermined number of clusters. Subsequently, the centroids of each cluster are determined, resulting in a new training dataset consisting solely of these centroids. The 1-nearest neighbor algorithm is then applied to this new training dataset; therefore, the classification is achieved by selecting the closest neighbor in terms of distance. Uddin et al. (2022) provides a critical evaluation of different KNN variants, including the 1NN approach, in scenarios characterized by high levels of noise and outliers. The study’s findings suggest that newer variants of KNN, potentially including those streamlined for efficiency like the K-means-based KNN, may not perform as effectively as the traditional KNN algorithm in complex, noisy environments. This underlines the importance of a careful selection of KNN variants depending on the specific characteristics of the dataset at hand, particularly when dealing with noise and outliers.

Maturo and Verde (2022) proposed a functional supervised classifier that combines functional data analysis with functional K-means and the functional KNN methods, improving the supervised classifier’s accuracy in classifying ECG signals. In an application on medical data, the authors showed that a clustering of labeled data was able to detect false positives and false negatives in the classification of healthy and sick patients not so well identified by other classification methods even by ensemble ones.

It is important to note that while approaches that combine K-means and KNN exist, they do not specifically focus on the theoretical aspect to emphasize the distances between subgroups of partition inside the apriori classes. To take this into account, this research proposes a new objective function that maximizes the inter-cluster separation between the subclasses of an apriori cluster and all subclasses of the other apriori classes.

Inter-cluster separation is a crucial factor in ensuring that resulting clusters are meaningful and easily interpretable. The aim is to take into consideration the pattern structure of the data. This approach also revealed the presence of subsets of anomalous patterns within the classes, which are labeled in the same way even though they present different characteristics. Furthermore, various studies have investigated the application of weighting techniques to improve the performance of clustering algorithms.

As proposed by Diday and Govaert (1977), the concept of dynamic clustering with adaptive distances is to assign a distance to each cluster based on its intra-cluster structure. Recent developments in this area have focused on the use of adaptive distance metrics for symbolic data, such as multivariate aggregated data in the form of intervals, histograms, and other data types. Incorporating an adaptive distance metric in clustering algorithms can improve their performance in various aspects. For example, outliers or anomalous data can have a significant impact on the determination of the centroids, but by assigning lower weights to such points, their influence can be reduced, making the clustering process more robust to outliers. Adaptive distances for classical data have been mainly defined as Euclidean-weighted distances. Recent advancements in symbolic data analysis have led to the development of a range of adaptive distance metrics customized for DC on aggregated data. Notably, De Carvalho and Lechevallier (2009) introduced the adaptive City-Block and Hausdorff distances for the partition-based clustering of symbolic interval data. Furthermore, the squared Wasserstein distance, which is specifically designed for histogram data, has been described in detail in Irpino et al. (2014); Balzanella and Verde (2020). Moreover, Rodríguez and de Carvalho (2022) have contributed to this field by developing adaptive Euclidean and City-Block distances for interval-valued data.

Bao et al. (2018) presented a new approach for addressing interval-valued data clustering, wherein they proposed an adaptive fuzzy c-means algorithm that incorporates the consideration of interval membership across various clusters within the partition. de Carvalho et al. (2022) presented a batch self-organizing map (SOM) algorithm for distributional-valued data based on a weighted Wasserstein distance, where the weights are computed through the optimization of the clustering loss function.

3 Methodology

This section presents two novel approaches to supervised classification using clustering. The first step involves developing a new DC variant to cluster apriori classes. The approach incorporates a new objective function that employs intra-cluster compactness, which uses an adaptive distance metric to compute dispersion information between subgroups of a given class, as well as inter-cluster separation, which measures the distance between a subgroup of a specific class and the subgroups of other apriori classes. Finally, the results obtained from DC are used for classification with the new KNN algorithm. In this process, subgroup weights are utilized with the adaptive distance to measure the similarity between the testing sample and their neighboring subgroup centroid points. Subsequently, the k nearest neighbors are determined based on the calculated similarities. The allocation of the new instances to a class is based on the majority vote among the neighbors of the k subgroup centroids.

3.1 Dynamic Clustering Algorithm

Clustering is a widely utilized technique in various applications, including image processing (Chang et al., 2017), video processing (Alayrac et al., 2016), gene analysis (Dapas et al., 2020), healthcare (Liao et al., 2016), and community detection (Li et al., 2022), among others. It involves dividing a dataset into groups, or clusters, based on similarity criteria, where objects within the same cluster are more alike than those in different clusters.

In this paper, our focus is on DC, an unsupervised learning algorithm that aims to partition data into clusters while simultaneously finding cluster representatives consistent with the distance function used for allocating units. Typically, the representatives are obtained as the minimizers of the sum of distances. The classic K-means algorithm can be seen as a specific case of DC where the distance metric is the Euclidean distance, and the representatives (centroids) are calculated as cluster averages.

The original concept of dynamic clustering was introduced by Diday (1971) and involves a two-step process of constructing clusters and selecting the best prototype for each cluster based on an adequacy criterion (Diday & Simon, 1976). The advantages of this scheme are mainly its flexibility with respect to the nature of the analyzed data and the choice of the distance function and the focus on providing cluster representatives, named prototypes. For instance, DC methods exist for datasets described by interval variables (De Carvalho & Lechevallier, 2009) and histogram variables (Balzanella & Verde, 2020; de Carvalho et al., 2022).

Let \(X=\{X_{1},\ldots ,X_{i},\ldots ,X_{n}\}\) be a set of n objects, where each object \(X_{i}=\{x_{i1},\ldots ,x_{im}\}\) is described by a set of m features. The general DC looks for the partition \(G=\{C_1,\ldots , C_K\}\) in K clusters and the set \(Z=\{Z_1,\ldots , Z_K\}\) of K prototypes representing the clusters in G, such that the following \(\Delta\) fitting criterion between the set Z of prototypes and the partition G is minimized:

$$\begin{aligned} \Delta (G,Z)=\sum _{k=1}^{K}\sum _{X_i\in C_k}d(X_i,Z_k) \end{aligned}$$
(1)

The fitting criterion is defined as the sum of dissimilarities or distance measures between each object \(X_i\) belonging to a class \(C_k\in G\) and the class representation \(Z_k \in Z\).

In this context, the DC algorithm iteratively implements the following representation and allocation steps:

  1. 1.

    The representation step describes the K clusters \((C_1,\dots ,C_K)\) of the partition G through a vector \(Z=(Z_1,\dots ,Z_K)\) of prototypes. Keeping the partition \(G=\hat{G}\) fixed for the current iteration of the algorithm, Z is obtained from the minimization of \(\Delta (\hat{G},Z)\), which is equivalent to finding the \(Z_k\;\;(k=1,\dots ,K)\) that minimize \(\sum _{i\in C_k}d(X_i,Z_k)\).

  2. 2.

    The allocation step assigns each element \(X_i\) to a cluster \(C_k\) according to the proximity to the prototype \(Z_k\in Z\). Keeping \(Z=\hat{Z}\) fixed for the current iteration of the algorithm, it finds the partition of G that minimizes \(\Delta (G,\hat{Z})\), by finding the cluster \(C_k=\{X_i\in X\mid d(X_i,Z_k)\le d(X_i,Z_l),\forall l=1,\dots ,K; l \ne k \}\).

3.1.1 DC as a Generalization of K-Means Algorithm

The K-means clustering methodology is broadly utilized as a partitioning strategy. The proposed dynamic clustering method is a generalization of the K-means algorithm, which puts forth a compelling notion that cluster centers do not essentially have to be the centroids of clusters in \(R^m\). Rather, it suggests substituting them with centers that can take various forms, based on the problem that needs to be addressed.

The K-means algorithm starts by selecting K initial cluster centers and then assigns each object to the closest cluster through the optimization of an objective function. As mentioned previously, the classical K-means algorithm only considers the intracluster compactness and the distances between the cluster centroids and individual data points. The membership matrix U, a \(n\times K\) binary matrix, indicates which objects are assigned to which clusters, and \(Z=\{Z_{1},\ldots , Z_{k},\ldots , Z_{K}\}\) represents the centroids of the K clusters, with elements \(Z_{k}=\{z_{k1},\ldots , z_{kj},\ldots , z_{km}\}\) for each feature \(j=1, \ldots , m\).

The objective function of the classic K-means without considering the inter-cluster separation is the following:

$$\begin{aligned} \begin{aligned} \Delta (U,Z) =&\sum _{k=1}^{K}\sum _{i=1}^{n}u_{ip}\sum _{j=1}^{m}(x_{ij}-z_{kj})^{2}, \end{aligned} \end{aligned}$$
(2)

such that \(\displaystyle \sum _{k=1}^{K} u_{ik}=1\), (with \(u_{ik}\in \{0,1\}\)), and \(X_i=\{ x_{i1},\ldots ,x_{ij},\ldots , x_{im} \}\) is an object of X described by m features.

DC algorithm optimizes the objective function by alternating the representation and allocation steps:

  1. 1.

    Representation step (the matrix of membership \(\hat{U}\) is fixed) The solution for the optimization problem \(\Delta (\hat{U},Z)\) is provided by the minimizer Z

    $$\begin{aligned} z_{kj}=\frac{\sum _{i=1}^{n}u_{ik}x_{ij}}{\sum _{i=1}^{n}u_{ik}}, \end{aligned}$$
    (3)

    where \(1\le k\le K\)

  2. 2.

    Allocation step (the vector of centroids \(\hat{Z}\) is fixed) According to Chan et al. (2004), the minimizer U of the optimization problem \(\Delta (U,\hat{Z})\) is given by

    $$\begin{aligned} u_{ik} = \left\{ \begin{array}{ll} 1 &{} \text {if}\, {\displaystyle \sum _{j=1}^{m}(x_{ij}-z_{kj})^{2}\le \sum _{j=1}^{m}(x_{ij}-z_{lj})^{2}}, \\ 0 &{} \text {otherwise.} \end{array}\right. \end{aligned}$$
    (4)

The partitioning criterion Eq. 2 decreases at each iteration, converging to a stationary value.

3.1.2 The Need for Adaptive Distances in DC

The central concept of dynamic clustering with adaptive distances is to assign a specific distance measure, denoted as \(d_k\), to each cluster \(C_k\) and to minimize the sum of distances \(d_k(X_i, Z_k)\) between objects \(X_i\) belonging to cluster \(C_k\) and the centroid \(Z_k\). Importantly, the distances employed in the DC algorithm are not fixed in advance but rather are tailored to each cluster.

In this clustering algorithm, a weighting step is introduced. It assigns a weight to each variable for each cluster, reflecting the relevance of the variable in a cluster. The use of adaptive distance can also be viewed as a means of automatically scaling variables, as scaling can greatly impact the dissimilarity values and clustering outcomes in clustering analysis.

The DC criterion, which incorporates adaptive distances, is expressed as follows:

$$\begin{aligned} \begin{aligned} \Delta (U,W,Z) =&\sum _{k=1}^{K}\sum _{X_i\in C_k}u_{ik}d_{k}(X_i,Z_k), \end{aligned} \end{aligned}$$
(5)

such that \(u_{ik}\in \{0,1\}\), \(\displaystyle \sum _{k=1}^{K} u_{ik}=1\)

In this context, distance \(d_{k}\) is a weighted sum of distances \(d_{w_{kj}}\)

$$\begin{aligned} \begin{aligned} d_{k}(X_i,Z_k) =&\sum _{j=1}^{m}d_{w_{kj}}(x_{ij},z_{kj}) = \sum _{j=1}^{m} w_{kj}d(x_{ij},z_{kj}) \end{aligned} \end{aligned}$$
(6)

The adaptivity of the distance \(d_{w_{kj}}\) is expressed by the vector of weights \(W_k\).

When using adaptive distances, the representation step is divided in two stages so that the global optimization scheme is

  1. 1.

    Representation step

    1. 1.

      Stage 1: fix the matrix \(\hat{U}\) of membership and the vector of weights \(\hat{W}\) Find the solution \(Z_k= \{z_{k1}, \ldots , z_{km}\}\) of the optimization problem \(\Delta (\hat{U},\hat{W},Z)\).

    2. 2.

      Stage 2: fix the matrix \(\hat{U}\) of membership and the vector of centroids \(\hat{Z}\) Find the vector of weights \(W_k=\{w_{k1}, \ldots , w_{km}\}\) that minimizes the criterion \(\Delta (\hat{U},W,\hat{Z})\).

  2. 2.

    Allocation step Fix the set of vectors of weights \(\hat{W}\) and the set of vectors of centroids \(\hat{Z}\). Find the membership matrix U that minimizes the criterion \(\Delta (U,\hat{W},\hat{Z})\)

The paper employs an adaptive distance metric, specifically a weighted Euclidean distance, to calculate the distance between subgroups within a given cluster. Explicit formulas for the optimum cluster centroids, as well as for the weights of the adaptive distances, are found based on a new objective function criterion. By integrating the procedures of data partitioning and centroids selection with adaptive distances, DC algorithm provides a comprehensive and flexible approach to clustering analysis for apriori classes.

3.2 Dynamic Clustering Algorithm to Partition Apriori Groups

In this section, a new objective function is proposed to discover new information on the original data by combining both intra-cluster compactness of the subgroups of the same apriori group and inter-cluster separation between one subgroup and the subgroups of other apriori classes, as illustrated in Fig. 1. Therefore, it may be ineffectual to evaluate the weights of the variables of a subgroup using only the variation within the groups of a data set. Under these conditions, inter-cluster separation can play a significant role in differentiating the significance of various patterns and taking into account the heterogeneity among the subgroups of each original group.

We apply inter-cluster separation by introducing the global subgroups centroids of a data set. In contrast to the conventional DC, our proposed DC algorithm maximizes the distances between the subgroup’s centroid of an apriori group and the global subgroups centroid of the other apriori groups partition, while minimizing the distances between objects and their subgroups centroid.

Fig. 1
figure 1

Scatter plot illustrating subgroups within two apriori groups

Let N be the total number of apriori classes, and \(U = \{U_{1}, \ldots ,U_{g}, \ldots , U_{N}\}\) the set of N matrices. Let \(n_g\) be the number of elements of class g and \(c_g\) be the number of subgroups in the class g. Each \(U_g\) is an \(n_g \times c_{g}\) indicator matrix containing the membership of each element i of the apriori class g to the subgroup p; such that where \(u_{gip}=1\) denotes that the i-th object belonging to group g is assigned to subgroup p; otherwise, \(u_{gip} = 0\), indicating that the object is not assigned to subgroup p. Let \(Z=\{Z_1,\ldots ,Z_{g},\ldots ,Z_{N}\}\) be a set of vectors representing the centroids of each original group. For group g, let \(Z_{g} = \{Z_{g1}, \dots , Z_{gc_g}\}\) be a set of \(c_g\) vectors that represent the subgroups’ centroids and let \(W_g = \{W_{g1}, W_{g2}, \dots , W_{gc_g}\}\) be a set of weight vectors associated with the subgroups, where \(w_{gpj}\) represents the weight of the j-th variable related to the p-th subgroup for class g. Let \(\beta\) represent a parameter used for adjusting the weights.

With the aim of achieving both intra-cluster compactness and inter-cluster separation, the optimization process is performed using a DC algorithm in which the objective function is modified to emphasize the separation between clusters belonging to different apriori classes:

$$\begin{aligned} \begin{aligned} P(U,W,Z) =&\sum _{g=1}^{N}\sum _{p=1}^{c_{g}}\frac{\sum _{i=1}^{n_{g}}u_{gip}d^{2}_{g}(X_{i},Z_{gp})}{n_{g}d^{2}(Z_{gp},Z_{gG})}\\ =&\sum _{j=1}^{m}(\sum _{g=1}^{N}\sum _{p=1}^{c_{g}}\frac{w^{\beta }_{gpj}\sum _{i=1}^{n_{g}}u_{gip}(x_{ij}-z_{gpj})^{2}}{n_{g}(z_{gpj}-z_{gGj})^{2}}), \end{aligned} \end{aligned}$$
(7)

such that \(u_{gip}\in \{0,1\}\), \(\displaystyle \sum _{p=1}^{c_{k}} u_{gip}=1\), and \(\displaystyle \sum _{j=1}^{m}w_{gpj}=1\).

In the context of our study, the distance metric \(d_{g}\) is defined as a weighted sum of distances \(d_{w_{gpj}}\), where \(d_{w_{gpj}}\) represents the distance metric for the p-th subgroup of the g-th apriori class. The vector of weights \(W_{gp}\) demonstrates the adaptivity of the distance metric \(d_{w_{gpj}}\):

$$\begin{aligned} \begin{aligned} d_{g}(X_i,Z_{gp}) = \sum _{j=1}^{m}d_{w_{gpj}}(x_{ij},z_{gpj}) = \sum _{j=1}^{m} w_{gpj}(x_{ij},z_{gpj})^2. \end{aligned} \end{aligned}$$
(8)

Let us assume that the present group is the \(g^{th}\) group. \(z_{gGj}\) represents the \(j^{th}\) feature of the global subgroups centroid of all other apriori groups, excluding the current group g.

We calculate \(z_{gGj}\) as

$$\begin{aligned} z_{gGj}=\frac{\displaystyle \sum _{h\in \{1,\ldots ,N\}\setminus \{g\}}c_{h}\sum _{q=1}^{c_{h}}z_{hqj}}{c_{1}+\cdots +c_{g-1}+c_{g+1}+\cdots +c_{N}}. \end{aligned}$$
(9)

To initiate the solution process of the objective function, it is necessary to initialize the parameters \(\hat{U}\), \(\hat{W}\), and \(\hat{Z}\) of all groups (for \(g=1, \ldots , N)\). Subsequently, the partition of the group g is evaluated, thus reducing the minimization issue as

$$\begin{aligned} \begin{aligned} P(U,W,Z) =&\sum _{p=1}^{c_{g}}\sum _{i=1}^{n_{g}}u_{gip}\sum _{j=1}^{m}w^{\beta }_{gpj}\frac{(x_{ij}-z_{gpj})^{2}}{n_{g}(z_{gpj}-z_{gGj})^{2}}, \end{aligned} \end{aligned}$$
(10)

such that \(u_{ip}\in \{0,1\}\), \(\displaystyle \sum _{p=1}^{c_{g}} u_{gip}=1\), and \(\displaystyle \sum _{j=1}^{m}w_{gpj}=1\), \(1\le p\le n_g\).

To minimize Eq. 10, it is necessary to solve the problems P1, P2, and P3 iteratively.

  1. 1.

    Specifically, the representation step requires solving two distinct problems P1 and P2: Problem P1: fix \(U=\hat{U}\), \(W=\hat{W}\) and solve the reduced problem \(P(\hat{U}, Z, \hat{W})\) Problem P2: fix \(U=\hat{U}\), \(Z=\hat{Z}\) and solve the reduced problem \(P(\hat{U}, \hat{Z}, W)\)

  2. 2.

    To address the allocation problem, it is necessary to solve the problem denoted as P3: Problem P3: fix \(Z=\hat{Z}\), \(W=\hat{W}\) and solve the reduced problem \(P(U, \hat{Z}, \hat{W})\)

To solve the problem P1, we calculate the gradient of P with respect to \(z_{gpj}\) as

$$\begin{aligned} \frac{\partial P(\hat{U},\hat{W},Z)}{\partial z_{gpj}}=-2w_{gpj}^{\beta }\sum _{i=1}^{n_{g}}u_{gip}\frac{(x_{ij}-z_{gpj})(z_{gpj}-z_{gGj})^{2}+(z_{gpj}-z_{gGj})(x_{ij}-z_{gpj})^{2}}{n_{k}(z_{gpj}-z_{gGj})^{4}}; \end{aligned}$$
(11)

by setting Eq. 11 to zero, we have:

$$\begin{aligned} z_{gpj}=\frac{\sum _{i=1}^{n_{k}}u_{gip}x_{ij}(x_{ij}-z_{gGj})}{\sum _{i=1}^{n_{g}}u_{gip}(x_{ij}-z_{gGj})}. \end{aligned}$$
(12)

The initial section of the supplementary material contains the proof and the necessary and sufficient conditions required for the realization of this finding.

It is worth noticing that \(z_{gpj}\), the representative (e.g., centroid) of \(C_{gp}\), can be interpreted as a weighted average of the elements of the \(p^{th}\) subgroup, with weights being the difference between \(x_{ij}\) and the global centroid \(z_{gG_j}\) computed as in Eq. 9 on all other apriori groups, excluding the current group.

The higher the difference, the more the subgroup element contributes to the determination of the subgroup’s centroid. This result is due to the optimization of the discriminant component of the criterion which emphasizes the separation between classes.

The internality condition of the centroid \(z_{gpj}\) of the subgroup \(C_{gp}\) (for each variable j) is guaranteed under the conditions demonstrated in the Appendix, whereas it can become external to the cluster interval of values (for each j) the closer \(z_{gGj}\) is to the mean of the elements of the cluster \(C_{gp}\).

The problem P2 will be solved by setting up a Lagrangian equation to \(P(\hat{U}, \hat{Z}, W)\) with multiplier \(\lambda\). Let \(L(W, \lambda )\) be the Lagrangian

$$\begin{aligned} L(W, \lambda )=\sum _{p=1}^{c_{g}}\sum _{j=1}^{m}w^{\beta }_{gpj}D_{gpj}-\lambda (\sum _{j=1}^{m}w_{gpj}-1), \end{aligned}$$
(13)

where \(D_{gpj}=\sum _{i=1}^{n_{k}}u_{gip}\frac{(x_{ij}-z_{gpj})^{2}}{n_{g}(z_{gpj}-z_{gGj})^{2}}\). Setting the gradient of Eq. 13 with respect to \(w_{gpj}\) and \(\lambda\) to zero, we obtain

$$\begin{aligned} \frac{\partial L(W, \lambda )}{\partial w_{gpj}}=\beta w_{gpj}^{\beta -1}D_{gpj}-\lambda =0; \end{aligned}$$
(14)

from Eq. 14, we obtain

$$\begin{aligned} w_{gpj}=\left( \frac{\lambda }{\beta D_{gpj}}\right) ^{\frac{1}{\beta -1}}. \end{aligned}$$
(15)

The gradient with respect to \(\lambda\)

$$\begin{aligned} \frac{\partial L(W, \lambda )}{\partial \lambda }=-(\sum _{j=1}^{m}w_{gpj}-1)=0; \end{aligned}$$
(16)

substituting Eq. 15 into Eq. 16, we obtain

$$\begin{aligned} \lambda ^{\frac{1}{\beta -1}}=\displaystyle \frac{\beta ^{\frac{1}{\beta -1}}}{\displaystyle \sum _{j=1}^{m}D_{gpj}^{-\frac{1}{\beta -1}}}; \end{aligned}$$
(17)

substituting Eq. 17 into Eq. 15, we have

$$\begin{aligned} w_{gpj}=\frac{1}{\displaystyle (D_{gpj})^{\frac{1}{\beta -1}}\sum _{l=1}^{m}D_{gpl}^{-\frac{1}{\beta -1}}}, \end{aligned}$$
(18)

The minimizer \(W_k\) of the optimization problem P2 is given by

$$\begin{aligned} w_{gpj} = {\left\{ \begin{array}{ll} 0 &{} \text {if}\, {\displaystyle (z_{gpj}-z_{gGj})^2=0}, \\ 0 &{} \text {if}\, {\displaystyle D_{gpj}\ne 0, \;\;\text {but}\;\;D_{gpl}=0,\;\;\text {for some}\;l},\\ \displaystyle \frac{1}{(D_{gpj})^{\frac{1}{\beta -1}}\displaystyle \sum _{l=1}^{m}D_{gpl}^{-\frac{1}{\beta -1}}} &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(19)

The problem P3 is solved by

$$\begin{aligned} u_{gip} = \left\{ \begin{array}{ll} 1 &{} \text {if}\, {\displaystyle \sum _{j=1}^{m}w^{\beta }_{gpj}{\frac{(x_{ij}-z_{gpj})^{2}}{n_{g}(z_{gpj}-z_{gGj})^{2}}}\le \sum _{j=1}^{m}w^{\beta }_{gpj}\frac{(x_{ij}-z_{grj})^{2}}{n_{g}(z_{grj}-z_{gGj})^{2}}}, \\ 0 &{} \text {otherwise,} \end{array}\right. \end{aligned}$$
(20)

where \(1\le r \le g\), \(r\ne p\).

The same process is provided to the partition of the other groups \(q \ne g\) for \(q=1, \ldots , N\) of the primary datasets, by optimally computing \(u_{lip}\), \(z_{lpj}\), and \(w_{lpj}\).

3.3 New DC Variant Algorithm

In this section, we provide a comprehensive explanation of the algorithm used in the novel DC variant clustering method. The aim of this algorithm is to create new subgroups from the available labeled data by detecting hidden patterns.

The algorithm is designed to associate a unique distance metric with each cluster, which is used to compare clusters and their representatives. This distance measure is not fixed and varies from subgroup to subgroup, changing with each iteration until convergence. The adaptive nature of this distance measure offers the advantage of assigning weights to the variables that are more representative or informative of a particular cluster, resulting in a more accurate clustering algorithm.

This adaptive approach aims to identify a partition of each original class, denoted as \(G_{1},\ldots ,G_{N}\), respectively into \(n_{c_{1}}, \ldots ,n_{c_{N}}\) subgroups. Here, \(G_{g}=\{C_{g1},\ldots ,C_{gn_{c_{g}}}\}\) specifies the partitions for group g, and the corresponding centroids, denoted as \(Z_{g}=\{Z_{g1},\ldots , Z_{gn_{c_{g}}}\}\), for each group are computed using the formula for centroids Eq. 12. Additionally, for each subgroup, a set of weights is assigned from the set \(W_{g}=\{W_{g1},\ldots ,W_{gn_{c_{g}}}\}\).

The algorithm we propose looks for a local minimum of the objective function in Eq. 7.

It requires, as an input, the training dataset, with N apriori classes, and the number of subgroups for each apriori class. It starts from an initial random partitioning of the apriori classes into subgroups, then initializes the weights of variables for each subgroup to 1/m (where m is the number of variables). An initial set of centroids Z is computed according to Eq. 12, based on the initial random partition and on the weights in W.

The iterative part of the algorithm alternates, at each iteration t, the representation and allocation step introduced in Sect. 3.2, in order to provide the partition of the N apriori classes \(G_{1},\ldots ,G_{N}\), into \(n_{c_{1}}, \ldots ,n_{c_{N}}\) subgroups; the set of centroids Z; the weights W.

At each iteration t, a check of the convergence of the algorithm is performed by evaluating the criterion \(P_t\):

$$\begin{aligned} {\begin{matrix} P_{t} =&\sum _{g=1}^{N}\sum _{p=1}^{c_{g}}\frac{\sum _{i=1}^{n_{g}}u_{gip}d^{2}_{g}(X_{i},Z_{gp})}{n_{g}d^{2}(Z_{gp},Z_{gG})}, \end{matrix}} \end{aligned}$$
(21)

where \(X_i\), \(Z_{gp}\), and \(u_{gip}\) are defined as before.

The algorithm will run when \(\Vert P_{t+1}-P_{t}\Vert >0\), which means that a further iteration improves the criterion. In other words, the algorithm continues as long as there is a decrease in the intra-cluster distances between subgroups of an original group and/or an increment in the inter-cluster distances.

Algorithm 1 displays the pseudocode of the suggested DC clustering algorithm.

Algorithm 1
figure a

Weighted dynamic clustering algorithm with adaptive euclidean distance.

3.4 KNN Classifier Based on Adaptive Distances and Novel Patterns

The K-nearest neighbor is a supervised learning technique that uses training data and a predetermined k value to find the k nearest data based on the idea of using distance computation to discover the nearby points of the query from the training set and assign a class label to the query through the majority voting rule. However, its efficacy is comparable to the most complex classifiers in the literature. This classifier relies heavily on measuring the distance or similarity between the tested examples and the training examples. This raises an essential question about which distance or similarity measures should be used for the KNN classifier out of the numerous options available. Therefore, we propose an adaptive distance parameterized by weight vectors. The weights are estimated during the first clustering step on apriori classes so that each subgroup is associated with its weight vector. The main idea of the KNN classifier with adaptive distances is that there is a distance to compare the test objects and their nearest points from the training dataset, which changes with each training point. However, the objects belonging to the same subgroup have the same vector weights.

Let \(T = {(X_{i}, y'_{ip})}_{i=1}^{n}\) represent a training set consisting of n training instances, with each instance belonging to one of N classes. Each training instance \(X_{i}\) is an element of a m-dimensional space \(R^{m}\), and its corresponding class label \(y'_{ip}\) is obtained from the initial clustering step, with p represents the subgroup of \(X_i\) in the original apriori class \(y_i\). When a new query \(S_{h}\) is given, we first compute the adaptive distances between \(S_{h}\) and each training instance in T. The adaptive distance for a given data point \(X_{i}\) and query \(S_{h}\) is defined as follows:

$$\begin{aligned} \begin{aligned} d_{y'_{ip}}(X_{i},S_{h})= d_{W_{y_{i}p}}(X_{i},S_{h}) = \sum _{j=1}^{m}w_{y_{i}pj}(x_{ij}-s_{hj})^{2}. \end{aligned} \end{aligned}$$
(22)

Here, \(W_{y_{i}p}\) is a vector of weights corresponding to the p-th subgroup of the apriori class \(y_{i}\).

The n distances are then arranged ascendingly. \(N_{K}(S_{h})=\{(X_{j},y'_{jp})\}_{j=1}^{K}\) denotes the K-nearest neighbors of \(S_{h}\), which are the K training instances with the top K smallest distances. Ultimately, the majority voting rule is used to assign the query \(S_{h}\) to subgroup \(C_{gp}\):

$$\begin{aligned} C_{gp} = \underset{C_{ip'}}{{{\,\textrm{argmax}\,}}} \sum _{(x,y)\in N_{k}(S_{h})}\mathbbm {1}_{C_{ip'}}(y),\;\;i=1,\dots ,N\;\;and\;\;p'=1,\dots ,c_i \end{aligned}$$
(23)

where \(\mathbbm {1}_{C}(.)\) is the indicator function:

$$\begin{aligned} \mathbbm {1}_{C}(y)={\left\{ \begin{array}{ll} 1 &{} \text {if } y\in C,\\ 0 &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(24)

3.5 The DC-KNN Combined Algorithm

DC-KNN (dynamic clustering and K-nearest neighbors) is an algorithm that combines the efficiency of the DC algorithm with the classification by KNN. The basic idea behind DC-KNN is to use DC with adaptive distances to re-cluster the classes of the training dataset into different subgroups and then use KNN to classify the test set based on the new labels obtained from the clustering step.

The algorithm starts by using DC to cluster each original group into a specific number of clusters; the optimal number of subgroups can be determined by the Silhouette or Elbow method. The resulting clusters will be used as preprocessing for the KNN algorithm, allowing it to work on more homogeneous subsets of data. This combination improves the accuracy and efficiency of the KNN algorithm, as it enables the algorithm to learn all the patterns of the dataset necessary for its learning process.

After the clustering step, KNN is used to classify new data points based on the newly discovered patterns. The KNN algorithm finds the K nearest neighbors of a given data point and determines the class of the majority of those neighbors. In this way, the DC-KNN algorithm is able to effectively combine the strengths of both DC and KNN, making it a powerful tool for data classification.

It is worth noting that the DC-KNN algorithm can be sensitive to the initial partitions. Hence, choosing the appropriate number of clusters and setting the parameters for the DC algorithm is essential.

The execution of the DC-KNN classification algorithm follows the steps outlined in Algorithm 2.

Algorithm 2
figure b

DC-KNN combining algorithm.

3.6 The DC-KNN Using the Centroids as the Nearest Neighbors

A new variant of the KNN algorithm is introduced in this study for more accurate predictions of new data points based on the new subgroups. The proposed classification approach focuses on classifying instances to their nearest neighbor class by computing the distance between new instances and subgroups centroids. The allocation is then determined using the label of the apriori class to which the subgroups belong.

The DC-KNN classifier algorithm utilizes the centroids and weights of DC to determine the nearest neighbor of a new element. Here, K denotes the number of neighborhoods of centroids that are closest to the new query. The optimum value of K is chosen from the range of 1 to (\(\displaystyle \min _{1\le i\le N}(c_i)+1\)). The algorithm follows the sequential steps outlined in Algorithm 3.

Algorithm 3
figure c

DC-KNN combining algorithm using centroids as NN.

4 Experiments

This section conducts extensive experiments on different real and synthetic datasets to validate the classification performance of the proposed DC-KNN classifiers. The DC-KNN approach is compared to the KNN and Kmeans-KNN methods, in terms of classification accuracy.

4.1 Experimental Results on Real Datasets

To thoroughly assess the performance and robustness of the proposed DC-KNN algorithms, experiments are conducted comparing them with classical KNN and Kmeans-KNN. The latter uses the K-means clustering algorithm as a first step and then applies KNN using the results of the clustering. The comprehensive experiments are conducted on real datasets sourced from the UCI Machine Learning Repository Bache and Lichman (2013), KEEL attribute noise datasets Alcala-Fdez et al. (2011), and UCR Time Series Classification Repository Dau et al. (2018). The classification accuracy was used to measure the performance of all the approaches in each experiment. Note that DC-KNN1 represents Algorithm 2 that employs data points as nearest neighbors. It is important to note that in the KNN algorithm, the selection of the k nearest neighbors involves an aggregation from all available classes of data points. Similarly, in the KNN algorithm with classical K-means and DC-KNN1 algorithms, the nearest neighbors are selected from the data points of all the subgroups of the apriori classes. However, in the DC-KNN2 algorithm, the nearest neighbors are the centroids of the subgroups obtained from DC.

The objective of the experiments is to demonstrate the powerful classification capabilities of the proposed techniques on different kinds of datasets that represent real datasets, noisy numerical datasets obtained from the KEEL machine learning repository, and time series datasets from the UCR database. The utilized datasets mentioned here are clearly outlined in Table 1. Their information shows variations in the quantities of total samples, features, classes, and test samples.

We utilize the Wisconsin Breast Cancer Wisconsin (Diagnostic) dataset, abbreviated as “Breast,” “ILPD,” and the noise “Yeast” datasets from the UCI database. Additionally, we perform tests on the“’Computers,” “ScreenType,” and “StarLightCurves” datasets from the UCR repository for our time series analysis. Following the approach employed in Maturo and Verde (2022), we utilize functional data analysis to describe the time series datasets and extract the coefficients of the b-spline decomposition as features. The six noise datasets from the KEEL repository are “Sonar,” “Iono,” “Heart,” “Pima,” “Spambase,” and “Iris.” The experiments employ the abbreviations Yeast-n, Sonar-n, Iono-n, Heart-n, Pima-n, Spambase-n, and Iris-n to distinguish the noisy data. The six noise datasets consist of samples that are exposed to a noise intensity of 10%. To clarify, around 10% of the samples in each dataset are chosen at random, and around one-third of the total samples from each dataset are chosen as test samples, while the rest of the samples are designated as training samples. The values of a particular attribute for these samples are then assigned using random values that are within the minimum and maximum range of the attribute’s domain. This assignment follows a uniform distribution. Each noise dataset within the KEEL repository has been divided into five unique subsets for training and testing purposes. The quantity of testing samples for each set is presented in Table 1. The ultimate classification evaluation of each competing method is determined by calculating the average of the classification results from five divisions on each noise dataset. In addition, the majority of the datasets have a small number of samples. However, these datasets can effectively be utilized to validate the classification performance in scenarios with a small sample size.

The classical KNN algorithm is recognized for its effectiveness in scenarios with a clear separation between classes. However, accurate classification of datasets containing noise poses a greater challenge. To address this, we evaluated the proposed DC-KNN methods on selected noisy datasets from the KEEL repository. The evaluation is based on the classification accuracy as shown in Table 2.

Table 1 Experimental datasets from UCR, UCI, and KEEL repositories used in the study
Table 2 Classification accuracy (%) of different methods on various datasets: Values of K with the number of subgroups are presented in parenthesis

In the experiments, we assess the classification performance of the proposed DC-KNN methods by varying the neighborhood size (K) on each dataset. The parameters \((n_{c_1}, \ldots , n_{c_N})\) of the DC step, which represents the number of subgroups determined by the Silhouette method, are also taken into consideration. The values of K range from 1 to 20, and for DC-KNN2, range from 1 to 7, incrementing by 1, for all the datasets. The classification results of the suggested techniques, with different values of K, are shown in Fig. 2.

Fig. 2
figure 2

The classification accuracy of each approach is evaluated on different datasets, with varying values of K

The DC-KNN2 Algorithm 3, utilizing centroids of DC for detecting the nearest neighbors, outperforms other algorithms with lower numbers of neighbors and achieves the highest level of accuracy across the majority of datasets in comparison to other algorithms.

Moreover, on most of the real data sets, the classification accuracy remains consistently constant when the value of K increases. This is because the DC-KNN2 algorithm requires that the number of neighbors be smaller than \((\displaystyle \min _{1\le i\le N}(c_i)+1)\). On the other hand, the DC-KNN1 Algorithm 2 consistently achieves satisfactory classification results when varying the value of K in comparison to the classical methods, particularly at higher values of K. It implies that the suggested DC-KNN1 and DC-KNN2 algorithms exhibit more robustness when the values of K are changed, while still achieving accurate classification. The explanation for this advantage may be attributed to the utilization of DC and a novel objective function in the initial phase of both approaches, which allows for the good performance of the classifier. The classification results depicted in Fig. 2 clearly show the good classification performance of the two proposed approaches. In most instances, the proposed method outperforms the other comparison methods.

4.2 Experimental Simulation Results

In order to show the effectiveness of the proposed DC-KNN classifiers, we generate and adapt different models in this subsection. In this experiment, we performed simulations using various data generating processes (DGPs) with distinct characteristics. The specifications of the DGPs are detailed in the second section of the Supplementary Material and illustrated in Fig. 3. Specifically, we generated datasets based on different DGPs and examined scenarios where the number of clusters remained fixed at the true value, as well as scenarios where the number of clusters was estimated.

Fig. 3
figure 3

Two-dimensional representation using principal component analysis (PCA) simulation

In order to determine the number of subgroups inside each apriori class, we can employ either the Silhouette or Elbow approach. Nevertheless, the traditional approaches do not ensure the enhancement of the approach’s performance. Consequently, our forthcoming work will concentrate on ameliorating this aspect.

The results of the simulations performed on the data generating processes (DGPs) are displayed in Table 3. The table presents a detailed analysis of the effectiveness of the new techniques developed utilizing DC-KNN1 and DC-KNN2, which utilize the clustering results, as stated in Algorithms 2 and 3. The comparison is made over the classical KNN and Kmeans-KNN methods. As expected, utilizing the DC-KNN algorithms leads to increased accuracy values. The unsatisfactory results of classical KNN can be due to the complexity and the overlapping of data. However, a more effective approach is to first use clustering to discover hidden patterns within the classes before doing classification.

Table 3 Classification accuracy (%) of different methods on simulated datasets: Values of K with the number of subgroups are presented in parenthesis

A detailed analysis of the classification effectiveness of DC-KNN techniques with varying K values can be found in Section 3 of the Supplementary Material.

The effectiveness of the proposed DC-KNN approaches in classifying diverse dataset types, including real data sets, time series data sets, noisy data sets, and simulated datasets, has been thoroughly shown through extensive experiments. Based on the results of these classification studies, it is essential to highlight important observations that emphasize the significant contributions of our research:

  1. 1.

    The DC-KNN techniques demonstrate robustness to changes in the neighborhood size K, as compared to its competitors. The experimental results consistently show that the proposed DC-KNN1 and DC-KNN2 algorithms consistently outperform other approaches and achieve good classification performance. In particular, the DC-KNN2 algorithm demonstrates strong and consistent performance when the value of K is smaller than \((\displaystyle \min _{1\le i\le N}(c_i)+1)\); this indicates a sensitivity of the DC-KNN2 algorithm to the number of centroid neighbors.

  2. 2.

    The process of learning clustering with adaptive distances aims to uncover concealed patterns within training groups. This is achieved by optimizing a newly proposed objective function and utilizing the outcomes of the clustering step in conjunction with the KNN classifier. This approach effectively enhances the performance of KNN-based classification.

  3. 3.

    DC-KNN exhibits strong performance in scenarios with limited training data. The performance of KNN-based classification can be significantly influenced by the selection of neighbors, particularly when working with various data sets and small sample sizes. Nevertheless, the experimental results in these situations demonstrate that our DC-KNN outperforms the competing methods.

  4. 4.

    The DC-KNN algorithms exhibit greater resilience to noisy data. Our DC-KNN algorithms outperform existing algorithms when applied to data containing noise.

The excellent performance of our methods can be attributed to several factors. Firstly, we utilize DC to uncover hidden patterns by adjusting the distances between data points. This allows us to effectively weigh the features of different subclasses. Secondly, we introduce a novel objective function that considers both the compactness within each apriori group and the separation between apriori classes. This enables us to accurately cluster the training predefined groups. Lastly, we adapt the KNN classifier to incorporate the augmented labels obtained from the clustering step, enhancing the accuracy of classification tasks. Therefore, our DC-KNN algorithms exhibit strong potential as a KNN-based classifier due to their robustness and efficacy in pattern classification.

5 Conclusions and Future Works

This research presents the DC-KNN algorithm, a novel supervised approach that combines dynamic clustering and the K-nearest neighbor classifier. The unsupervised clustering phase is used to discover new information from the original datasets that can help to improve supervised classification accuracy. DC in the unsupervised phase uses a new objective function that takes into account both intra-cluster and inter-cluster similarity, with cluster weights for variables being computed automatically and optimized as the algorithm converges. These weights can be used to identify important variables for clustering and eliminate variables that could introduce noise in the classification process. In the supervised phase, the new weights are employed to determine the nearest neighbor of new data points. Overall, the DC-KNN algorithm provides a unique and effective classification technique by combining DC and KNN.

Based on the results of the application on the real dataset test using the usual K-means and dynamical clustering algorithm to enhance the KNN supervised classification, better results were obtained than using dynamical clustering before training the supervised classifier. The reason is that the proposed method can provide more precise information and homogeneous clusters using clustering in the first step as a preprocessing step to discover the hidden patterns. As a result, the classification results obtained by the proposed method provide better results and increase the classifier’s accuracy.

The focus of this study is to discover hidden information that can be comprehended through novel patterns, leading to the identification of subgroups of instances classified by these new sub-patterns within the previously established classes. The objective is to ascertain whether the implementation of the DC-KNN methodology enhances the accuracy of classification. Initially, a DC algorithm is employed to identify novel patterns within the original classes. This research demonstrates the utilization of this algorithm across a range of datasets unaffected by data points that deviate from the norm. It is worth noting that alternative metrics can be chosen to handle outliers and determine the adaptive distances between the data points and centroids.

The primary aim of this investigation is to examine the theoretical aspect of integrating unsupervised and supervised classification and to evaluate whether this novel approach enhances classification performance compared to traditional classifiers. Additionally, several techniques can be employed to ascertain the optimal number of subgroups for each original class.

In this two-stage study, the unsupervised method utilized is DC, while the supervised strategy employed is KNN. Future research aims may concentrate on the combination of different clustering techniques with alternative classifiers to investigate the performance of combining unsupervised and supervised classification using various strategies, as well as determining how such combinations can impact the final outcome. Moreover, attempts could be made to formulate an objective function that condenses the process into a single-step strategy.