Introduction

Artificial intelligence and machine learning have been developed in many applications, such as face recognition [1, 2], optical character recognition (OCR) [3], medical image processing [4, 5], gesture recognition [6, 7], fault detection [8, 9], communication systems [10], and news classification [11]. The classification section is an unavoidable part of most of these applications.

It is essential to define convincing and applicable classifiers for small datasets. These days, deep learning-based classifiers obtain exciting results in different applications [12, 13]. However, deep learning-based methods need large datasets for training the network and determining parameters. Therefore, they cannot be employed on every small dataset. Already, many classifiers have been presented, such as K-nearest neighbors (K-NN) [14], support vector machine (SVM) [15], and neural network-based classifiers [16]. K-NN based classifiers are very convenient because of their simplicity and suitable performance. In the conventional K-NN classifier, first, K-nearest neighbors of the query sample are determined based on Euclidean distance. Then, by majority voting on classes of the K selected neighbors, the query sample is classified.

Recently, some K-NN-based classifiers have been introduced such as weighted representation-based K-NN (WRKNN) [17], weighted local mean representation-based K-NN (WLMRKNN) [17], collaborative representation-based nearest neighbor (CRNN) [18], distance-weighted K-NN (DWKNN) [19], multi-local means-based nearest neighbor (MLMNN) [20], local mean-based K-NN (LMKNN) [21], pseudo-nearest neighbor (PNN) [22], local mean-based pseudo-nearest neighbor (LMPNN) [23], generalized mean-distance-based k-nearest neighbor classifier (GMDKNN) [24], and representation coefficient-based k-nearest centroid neighbor method (RCKNCN) [25]. Generally, K-NN-based classifiers can be categorized into three groups: majority voting-based, mean-distance-based, and minimum reconstruction error-based classifiers.

The majority voting-based K-NN classifiers, such as conventional K-NN, DWKNN, and CRNN, predict the category of a query sample by majority voting on the K neighbor classes. In CRNN, the query sample is linearly reconstructed versus all data with a constraint on their Euclidean distances. Then, K samples corresponding to the K largest reconstruction coefficients are selected as neighbors of the query sample. Finally, similar to the conventional K-NN, the query sample is classified by majority voting on the K neighbors classes [18]. In DWKNN, K weights are calculated according to the distances among K neighbors. Then, the query sample is assigned to the class for which the weights of representatives among the K-nearest neighbors sum to the greatest value [19].

In mean-distance-based K-NN classifiers, such as LMKNN, PNN, and LMPNN, the query sample is classified using a pseudo-neighbor that is calculated by neighbors. In LMKNN, pseudo-neighbor per class is calculated by the mean of K selected neighbors. Next, the query sample is classified to the minimum distance between the query sample and the pseudo-neighbors [21]. Similarly, in PNN, after determining K neighbors per class, a pseudo-neighbor is calculated as a weighted mean of the K neighbors per class [22]. Then, the query sample is classified into the class corresponding to the closest pseudo-neighbor. LMPNN is also an extended version of the PNN classifier, which uses local mean-based pseudo-neighbors [23]. GMDKNN uses multi-generalized mean and nested generalized mean distances, which are based on the characteristic of the generalized mean [24].

In minimum reconstruction error-based K-NN classifiers, such as MLMNN, WRKNN, and WLMRKNN, the query sample is linearly reconstructed versus the K neighbors per class. Then, the query sample is assigned to the category with the minimum reconstruction error. In MLMNN, the reconstruction coefficients are constrained by a \({l}_{2}\)-norm on their values [20]. WRKNN calculates the coefficients with constraints on the Euclidean distance of the selected neighbors [17]. In WLMRKNN, first, local mean-based pseudo-neighbors are calculated using neighbors. Then, similar to WRKNN, the reconstruction coefficients are calculated [17].

Except for the CRNN classifier, other mentioned K-NN-based classifiers select neighbors of the query sample in the same way. While these classifiers decide the query sample with different criteria. Therefore, it can make decrease the performance of the classifiers.

Our motivation and contribution

It is necessary to introduce convincing and effective classifiers for small datasets. K-NN-based classifiers are relatively simple and efficient. In this manuscript, we try to improve their performance and increase the recognition rate. All types of K-NN-based classifiers select a subset of samples as neighbors of the query sample. The selection of neighbors is the unavoidable part of the K-NN based classifiers. Therefore, how to choose neighbors can be pivotal to the performance of the K-NN-based classifiers. Most K-NN-based classifiers select neighbors based on the minimum Euclidean distance. The Euclidean distance-based selection of neighbors is rational for majority voting-based and mean-distance-based K-NN classifiers. However, this scheme of selection of neighbors is not logical for minimum reconstruction error-based K-NN classifiers, which decide about query sample according to the minimum error value. Sometimes, a sample is closer to a query sample than another sample, but it cannot well linearly reconstruct the query sample. It can reduce the performance of the minimum reconstruction error classifiers. On the other hand, a sample may provide minimum reconstruction error, but it is not in proximity to the query sample.

The minimum reconstruction error-based K-NN classifiers typically have the best performance [17]. In this manuscript, I propose a neighbor selection method based on the minimization reconstruction error of the query sample. In the proposed method, a subset of data that minimizes the reconstruction error is assigned as the neighbors of the query sample. Euclidian distance is not considered as a criterion to select the neighbors of the query sample. Also, an \({l}_{0}\)-based sparse representation scheme is introduced for determining the proposed neighbors. The proposed neighbor selection method is applicable for minimum reconstruction error-based classifiers. Three \({l}_{0}\)-MLMNN, \({l}_{0}\)-WRKNN, and \({l}_{0}\)-WLMRKNN classifiers are defined based on the proposed neighbor selection method.

The examples are based on different databases which include University of California Irvine (UCI) machine learning repository [26], UCR time-series classification archive [27], and a small subset of Modified National Institute of Standards and Technology (MNIST) handwritten digit database [28]. The results exhibit the suitable performance of the proposed method.

The rest of the manuscript is organized as follows. The system model and related works are presented in “System model and related works”. Next, the proposed neighbor selection method and proposed K-NN-based classifiers are described. In “Simulation results”, the simulation results and the discussion of the results are given. Finally, “Conclusion” concludes the manuscript.

System model and related works

Figure 1 shows the general scheme of minimum reconstruction error-based K-NN classifiers. All minimum reconstruction error-based K-NN classifiers select K samples as neighbors of the query sample per class. Then, based on the reconstruction errors, the category of the query sample is determined.

Fig. 1
figure 1

The general scheme of minimum reconstruction error-based K-NN classifiers

Generally, minimum reconstruction error-based K-NN classifiers include three steps: (1) selecting neighbors using minimum Euclidean distance, (2) calculating reconstruction coefficients, and (3) classifying based on the minimum error.

Suppose \({\varvec{X}}=\left[{{\varvec{x}}}_{1}, \ldots ,{{\varvec{x}}}_{N}\right]\in {\mathfrak{R}}^{D\times N}\) includes N labeled samples, \({\varvec{y}}\in {\mathfrak{R}}^{D}\) is the query sample, \(L=\left\{1, \ldots ,C\right\}\) is the label set of data (\(C\) is the number of classes), and \({{\varvec{X}}}_{\mathrm{KNN}}^{j}=\left[{{\varvec{x}}}_{1\mathrm{NN}}^{j}, \ldots ,{{\varvec{x}}}_{\mathrm{KNN}}^{j}\right]\in {\mathfrak{R}}^{D\times K}\) expresses K-nearest neighbors belong to the jth class. MLMNN, WRKNN, and WLMRKNN are presented in the following.

MLMNN classifier

First, K neighbors of the query sample are determined per class based on the minimum Euclidean distance. Then, K local mean pseudo-neighbors per class are calculated as

$${\overline{{\varvec{x}}} }_{i\mathrm{NN}}^{j}=\frac{1}{i}\sum_{l=1}^{i}{{\varvec{x}}}_{l\mathrm{NN}}^{j},\quad i=1,\ldots ,K.$$
(1)

Then, the query sample is linearly reconstructed per class using K local mean pseudo-neighbors with a \({l}_{2}\)-norm constraint on reconstruction coefficients (\({{\boldsymbol{\beta}}}^{j}\)) as

$${{{\boldsymbol{\beta}}}^{j}}^{*}=\underset{{{\boldsymbol{\beta}}}^{j}}{\text{arg min}}\left\{{\Vert {\varvec{y}}-{\overline{{\varvec{X}}} }_{\mathrm{KNN}}^{j}{{\boldsymbol{\beta}}}^{j}\Vert }_{2}^{2}+\mu {\Vert {{\boldsymbol{\beta}}}^{j}\Vert }_{2}^{2}\right\},$$
(2)

where \(\mu \) is a regularization parameter, \({\overline{{\varvec{X}}} }_{\mathrm{KNN}}^{j}=\left[{\overline{{\varvec{x}}} }_{1{\mathrm{NN}}}^{j}, {\overline{{\varvec{x}}} }_{2{\mathrm{NN}}}^{j},\ldots ,{\overline{{\varvec{x}}} }_{\mathrm{KNN}}^{j}\right]\), and \({{\boldsymbol{\beta}}}^{j}=\left[{\beta }_{1}^{j},{\beta }_{2}^{j},\ldots ,{\beta }_{K}^{j}\right]\) is the reconstruction coefficients vector of the jth class. The optimum \({{\boldsymbol{\beta}}}^{j}\) can be calculated as a closed-form solution

$${{{\boldsymbol{\beta}}}^{j}}^{*}={\left({\left({\overline{{\varvec{X}}} }_{\mathrm{KNN}}^{j}\right)}^{\mathrm{T}}{\overline{{\varvec{X}}} }_{\mathrm{KNN}}^{j}+\mu {\varvec{I}}\right)}^{-1}{\left({\overline{{\varvec{X}}} }_{\mathrm{KNN}}^{j}\right)}^{\mathrm{T}}{\varvec{y}}.$$
(3)

Then, reconstruction errors are computed as

$${r}_{\mathrm{MLMNN}}^{j}\left({\varvec{y}}\right)={\Vert {\varvec{y}}-{\overline{{\varvec{X}}} }_{\mathrm{KNN}}^{j}{{{\boldsymbol{\beta}}}^{j}}^{*}\Vert }_{2}^{2}.$$
(4)

Finally, the query sample is classified into the class with minimum reconstruction error [20].

WRKNN and WLMRKNN classifiers

Similar to MLMNN, first, K neighbors of the query sample are calculated per class. Then, reconstruction coefficients (\({{\boldsymbol{\eta}}}^{j}\)) of the query sample are calculated per class with a constraint on Euclidean distances of K-nearest neighbors (\({{\varvec{X}}}_{\mathrm{KNN}}^{j}\)) as

$${{{\boldsymbol{\eta}}}^{j}}^{*}=\underset{{{\boldsymbol{\eta}}}^{j}}{\text{arg min}}\left\{{\Vert {\varvec{y}}-{{\varvec{X}}}_{\mathrm{KNN}}^{j}{{\boldsymbol{\eta}}}^{j}\Vert }_{2}^{2}+\gamma {\Vert {{\varvec{T}}}^{j}{{\boldsymbol{\eta}}}^{j}\Vert }_{2}^{2}\right\},$$
(5)

where, \(\gamma \) is a regularization parameter, and \({{\varvec{T}}}^{j}\) is a diagonal matrix of Euclidean distances as

$${{\varvec{T}}}^{j}=\left[\begin{array}{c@{\quad}c@{\quad}c}{\Vert {\varvec{y}}-{{\varvec{x}}}_{1{\mathrm{NN}}}^{j}\Vert }_{2}& \cdots & 0\\ \vdots & \ddots & \vdots \\ 0& \cdots & {\Vert {\varvec{y}}-{{\varvec{x}}}_{\mathrm{KNN}}^{j}\Vert }_{2}\end{array}\right],$$
(6)

and \({{\boldsymbol{\eta}}}^{j}\) can be solved in a closed-form per class as

$${{{\boldsymbol{\eta}}}^{j}}^{*}= {\left({\left({{\varvec{X}}}_{\mathrm{KNN}}^{j}\right)}^{\mathrm{T}}{{\varvec{X}}}_{\mathrm{KNN}}^{j}+\gamma {\left({{\varvec{T}}}^{j}\right)}^{\mathrm{T}}{{\varvec{T}}}^{j}\right)}^{-1}{\left({{\varvec{X}}}_{\mathrm{KNN}}^{j}\right)}^{\mathrm{T}}{\varvec{y}}.$$
(7)

After computing the optimum \({{\boldsymbol{\eta}}}^{j}\) per class, the query sample is classified to the class with minimum reconstruction error, which is calculated as

$${r}_{\mathrm{WRKNN}}^{j}\left({\varvec{y}}\right)={\Vert {\varvec{y}}-{{\varvec{X}}}_{\mathrm{KNN}}^{j}{{{\boldsymbol{\eta}}}^{j}}^{*}\Vert }_{2}^{2} .$$
(8)

Most steps of WLMRKNN are similar to WRKNN. WLMRKNN employs local mean pseudo-neighbors (\({\overline{{\varvec{X}}} }_{\mathrm{KNN}}^{j}\), Eq. (1)) instead of nearest neighbors (\({{\varvec{X}}}_{\mathrm{KNN}}^{j}\)). The decision is also made based on the minimum reconstruction error of the query sample versus the pseudo-neighbors per class as

$${r}_{\mathrm{WLMRKNN}}^{j}\left({\varvec{y}}\right)={\Vert {\varvec{y}}-{\overline{{\varvec{X}}} }_{\mathrm{KNN}}^{j}{{{\varvec{S}}}^{j}}^{*}\Vert }_{2}^{2} ,\quad j=1,\ldots ,C,$$
(9)

where \({{\varvec{S}}}^{j}\) is the reconstruction coefficients vector of the jth class [17].

It has been shown that minimum reconstruction-based K-NN classifiers, i.e., WRKNN and WLMRKNN, have the best performance [17].

Proposed l 0-based neighbor selection

In minimum reconstruction error-based KNN classifiers, the neighbors are selected based on the Euclidean distance, while their decision metric is based on the minimum reconstruction error. There is no certainty that K Euclidean distance-based nearest samples obtain the minimum reconstruction error. On the other hand, it is intuitive that the samples with the same category of the query sample often provide the best representation of the query sample and minimize the reconstruction error. Therefore, choosing the neighbors based on the minimum reconstruction error can reduce the reconstruction error value and improve the accuracy of the classifiers. This statement can be justified by an example. Suppose there are four two-dimensional samples (Fig. 2).

Fig. 2
figure 2

An example to explain the proposed method

Assign sample #1 as a query sample and three residual samples as neighbors of it. Although samples #2 and #3 are closer to the query sample than sample #4, the reconstruction error using sample #4 is less than the reconstruction error using samples #2 and #3. In fact, based on the minimum reconstruction error criterion, sample #4 can better represent the query sample. In this manuscript, a neighbor selection method is proposed based on the below principles:

  • There is no constraint on the distance of the neighbors.

  • The samples with the same category cause minimum reconstruction error.

In the proposed method, a subset of training data is selected as neighbors, which makes the minimum reconstruction error of the query sample. Also, there is not any constraint on the Euclidean distance of the neighbors. The proposed equation is defined as

$$\left\{{{{\varvec{X}}}_{K}}^{\boldsymbol{*}},{{\boldsymbol{\alpha }}_{K}}^{\boldsymbol{*}}\right\}=\underset{{{\varvec{X}}}_{K},{\boldsymbol{\alpha }}_{K}}{\text{arg min}}\left\{{\Vert {\varvec{y}}-{{{\varvec{X}}}_{K}}^{\mathrm{T}}{\boldsymbol{\alpha }}_{K}\Vert }_{2}\right\},$$
(10)

where \({{\varvec{X}}}_{K}\) is a set of K samples from \({\varvec{X}}\), and \({\boldsymbol{\alpha }}_{K}\in {\mathfrak{R}}^{K\times 1}\) is the vector of coefficients. Equation (10) can be rewritten as

$${\boldsymbol{\alpha }}^{\boldsymbol{*}}=\underset{\boldsymbol{\alpha }}{\text{arg min}}\left\{{\Vert {\varvec{y}}-{{\varvec{X}}}^{\mathrm{T}}\boldsymbol{\alpha }\Vert }_{2}\right\}, \quad\text{s.t.}\ {\Vert \boldsymbol{\alpha }\Vert }_{0}=K,$$
(11)

where \({\boldsymbol{\alpha }}\in {\mathfrak{R}}^{N\times 1}\) is a sparse vector of coefficients with K nonzero values, and \({{{\varvec{X}}}_{K}}^{\boldsymbol{*}}\) is determined from \({\varvec{X}}\) according to the nonzero values of \({\boldsymbol{\alpha }}^{\boldsymbol{*}}\). Equation (11) is an \({l}_{0}\)-based sparse representation problem. Equation (11) is an NP-hard problem, and the orthogonal matching pursuit method is a semi-optimum method to solve it. Suppose \({{\varvec{X}}}_{K-{l}_{0}}=\left[{{\varvec{x}}}_{1-{l}_{0}}, {{\varvec{x}}}_{2-{l}_{0}}, \ldots , {{\varvec{x}}}_{K-{l}_{0}}\right]\) is the set of K \({l}_{0}\)-based selected neighbors, K is the number of neighbors, \({\varvec{y}}\) is the query sample, and \(\varnothing \) is the symbol of an empty set. Algorithm 1 shows steps of the proposed \({l}_{0}\)-based neighbor selection method. \({{\varvec{X}}}_{K-{l}_{0}}\) includes K samples that reconstruct \({\varvec{y}}\) with minimum error.

figure a

In the proposed method, the K samples are selected as neighbors that obtain minimum reconstruction error. There is no constraint on Euclidean distances of the neighbors, and each K subset of samples can be assigned as neighbors. Therefore, the reconstruction error value using the proposed method is less than the reconstruction error value using the neighbors that are based on the minimum Euclidean distance. Figure 3 shows the diagram of the proposed neighbor selection method.

Fig. 3
figure 3

The diagram of the proposed \({l}_{0}\)-based neighbor selection method

In the following, K selected samples are assigned as neighbors of minimum reconstruction error-based K-NN classifiers. Based on the proposed neighbor selection method, \({l}_{0}\)-MLMNN, \({l}_{0}\)-WRKNN, and \({l}_{0}\)-WLMRKNN are introduced in Algorithms 2–4, respectively.

figure b
figure c
figure d

In WRKNN and WLMRKNN, neighbors are close to the query sample. However, in the proposed \({l}_{0}\)-WRKNN and \({l}_{0}\)-WLMRKNN, there is not any constraint on the distance of the neighbors from the query sample, and it can vary in a wide range. Therefore, we normalize the Euclidean distance matrix in \({l}_{0}\)-WRKNN (step 3) and \({l}_{0}\)-WLMRKNN (step 4).

In the following, the computational complexity of the proposed \({l}_{0}\)-based neighbor selection method and proposed classifiers are investigated. Generally, determining K-nearest data points to the query sample is the same in all K-NN-based classifiers. Based on the brute-force neighbor search, the time complexity of the K-NN-based classifiers is \(O(N)\). The proposed method (Algorithm 1) consists of two nested loops: a loop with K repetitions (for determining the K neighbors) and a loop with N repetitions (for doing step 1 of the algorithm). Therefore, the \({l}_{0}\)-based neighbor selection increases the computational complexity. The computational complexity of the proposed method is \(O(N\times K)\), while the computational complexity of the Euclidean distance-based neighbor selection is \(O(N)\).

Also, MLMNN, WRKNN, and WLMRKNN classifiers consist of two nested loops: a loop with C repetitions (C is the number of categories) and a loop for determining K neighbors with \(O(N)\). Therefore, the computational complexity of the MLMNN, WRKNN, and WLMRKNN is \(O(C\times N)\). In the other word, the proposed \({l}_{0}\)-MLMNN, \({l}_{0}\)-WRKNN, and \({l}_{0}\)-WLMRKNN classifiers consist of three nested loops: a loop with C repetitions (C is the number of categories) and two nested loops with \(O(N\times K)\) for determining the \({l}_{0}\)-based neighbors. Therefore, the computational complexity of the proposed \({l}_{0}\)-based classifiers is \(O(C\times N\times K)\).

Simulation results

Performance of the proposed \({l}_{0}\)-based neighbor selection method is investigated on UCI machine learning repository, UCR time-series classification archive, and a small subset of MNIST handwritten digit database. In [17], it has been shown that the minimum reconstruction error-based K-NN classifiers have the best performance among K-NN-based classifiers. It is shown that the proposed \({l}_{0}\)-based neighbor selection method improves the performance of the minimum reconstruction error-based K-NN classifiers and increases the precision of the classifiers. The regularization parameters are set as \(\mu =\gamma =\delta =0.5\). Also, the results are compared with the SVM classifier on experimented databases.

The results of the evaluation on UCI and UCR datasets

The proposed method is evaluated on seven datasets of the UCI machine learning repository and five datasets of the UCR time-series classification archive. Characteristics of the employed UCI and UCR datasets are given in Tables 1 and 2, respectively.

Table 1 The characteristics of the seven UCI datasets [26]
Table 2 The characteristics of the five UCR time-series datasets [27]

Each of the UCI datasets is randomly divided into the training (66.7%) and test (33.3%) subsets. The recognition rates of each UCI dataset are provided in each K by averaging the results for 50 independent iterations. Then, the average of the recognition rate on seven datasets is calculated for K = 1,…,15. Figures 4, 5 and 6 show the mean of the recognition rates on the seven UCI datasets. Also, the standard-deviation values of the accuracy on seven UCI datasets are given for different numbers of the neighbors in Table 3. The standard-deviation values of the proposed classifiers are less than the investigated minimum reconstruction error-based KNN classifiers for most numbers of neighbors. The accuracy and standard-deviation values show that the proposed \({l}_{0}\)-based method almost improves the performance of classifiers on all seven UCI datasets.

Fig. 4
figure 4

The mean recognition rates on all seven UCI datasets using \({l}_{0}\)-MLMNN and MLMNN classifiers

Fig. 5
figure 5

The mean recognition rates on all seven UCI datasets using \({l}_{0}\)-WRKNN and WRKNN classifiers

Fig. 6
figure 6

The mean recognition rates on all seven UCI datasets using \({l}_{0}\)-WLMRKNN and WLMRKNN classifiers

Table 3 The standard-deviation values of the accuracy for different numbers of the neighbors using the proposed classifiers on seven datasets of UCI

UCR includes some time-series datasets. In UCR, the recognition rates are calculated in accordance with the given test and training subsets of each dataset. Then, the average of the recognition rate on five datasets are calculated for K = 1,…, 15. Figures 7, 8 and 9 show the mean recognition rates on five UCR datasets using three common and proposed \({l}_{0}\)-based minimum reconstruction error-based K-NN classifiers. Also, Table 4 shows the standard-deviation values of the accuracy for different numbers of the neighbors on five UCR datasets. The higher recognition rate values and lower standard-deviation values demonstrate that the proposed method improves the performance of the minimum reconstruction error-based K-NN classifiers on all five UCR datasets.

Fig. 7
figure 7

The mean recognition rates on the five UCR datasets using \({l}_{0}\)-MLMNN and MLMNN classifiers

Fig. 8
figure 8

The mean recognition rates on the five UCR datasets using \({l}_{0}\)-WRKNN and WRKNN classifiers

Fig. 9
figure 9

The mean recognition rates on the five UCR datasets using \({l}_{0}\)-WLMRKNN and WLMRKNN classifiers

Table 4 The standard-deviation values of the accuracy for different numbers of the neighbors using the proposed classifiers on five datasets of UCR

As a consideration, Fig. 5 shows the performance of the \({l}_{0}\)-WRKNN is worse than WRKNN when the number of neighbors is greater than 12. In the WRKNN classifier, the distances among the query sample and neighbors are influential in calculating the coefficients vector; and then on the reconstruction error value (Eqs. 5 and 6). In the proposed \({l}_{0}\)-based neighbor selection method, when the number of neighbors increases, the samples with high Euclidean distances from the query sample can be selected as neighbors, because there is no constraint on the Euclidean distance of the neighbors. Therefore, the neighbors with extremely high Euclidean distances are lower effective in reconstructing the query sample. On the other hand, increasing the number of neighbors provides more freedom to reconstruct the query sample. Consequently, the WRKNN classifier sometimes performs better than the \({l}_{0}\)-WRKNN for higher numbers of neighbors. However, the WLMRKNN classifier is similar to WRKNN. However, in WLMRKNN, the local mean-based neighbors (Eq. 1) have been used to reconstruct the query sample. The Euclidean distances of the local mean-based neighbors are not very high because of the mean operation on the pre-selected neighbors. Thus, increasing the number of neighbors does not reduce the performance of the \({l}_{0}\)-WLMRKNN (Figs. 6 and 9).

In addition, McNemar’s statistical test is used to compare the proposed \({l}_{0}\)-based classifiers and the mentioned minimum reconstruction error-based K-NN classifiers. McNemar’s test is a statistical method to compare the performance of two classifiers on the same test set. Suppose there are two classifiers: classifier A and classifier B. In McNemar’s test, the null hypothesis is defined as A and B classifiers having the same error rate (i.e., \({n}_{01}={n}_{10}\)). Thus, the alternative hypothesis is that the performances of the classifiers are not the same. The below parameters are considered for either A and B classifiers:

  • \({n}_{01}\): number of test samples misclassified by A but not by B,

  • \({n}_{10}\): number of test samples misclassified by B but not by A.

Then, \({\chi }^{2}\) statistic value is computed as \({\chi }^{2}={\left({n}_{01}-{n}_{10}\right)}^{2}/\left({n}_{01}+{n}_{10}\right)\). \({\chi }^{2}\) is a chi-squared distribution with one degree of freedom. For a significance threshold of 0.05, i.e., p-value = 0.0.5, if the \({\chi }^{2}\) statistic value is greater than 3.48, the null hypothesis is rejected, and there is a significant difference between the A and B classifiers. In the following, the results of McNemar’s test (\({\chi }^{2}\) statistic value) on five UCR datasets are given for different numbers of the neighbors in Table 5.

Table 5 The results of McNemar’s test (\({\chi }^{2}\) statistic value) of three paired MLMNN and \({l}_{0}\)-MLMNN, WRKNN and \({l}_{0}\)-WRKNN, and WLMRKNN and \({l}_{0}\)-WLMRKNN classifiers for different numbers of the neighbors on five UCR datasets

The results demonstrate that there is a significant difference; and the recognition rate values (Figs. 7, 8 and 9) show the superiority of the proposed \({l}_{0}\)-based classifiers, especially \({l}_{0}\)-WLMRKNN, at most numbers of neighbors on most investigated UCR datasets. Of course, it should be noted that the results of McNemar’s test have not been provided on UCI datasets, because, in the evaluated experiments, the recognition rates of each UCI dataset are provided in each K by averaging the results for 50 independent iterations. Also, each UCI dataset is randomly divided into the training and test subsets in each iteration.

The results of the evaluation on MNIST handwritten digit database

In this manuscript, a small subset of MNIST is used to evaluate the proposed method. MNIST database includes 60,000 train and 10,000 test samples of English handwritten digit images [28]. A training subset with 10,000 samples and a test subset with 5000 samples are used in our experiments. The train and test subsets are randomly selected from the train and test samples, respectively. The recognition rates using \({l}_{0}\)-MLMNN, \({l}_{0}\)-WRKNN, and \({l}_{0}\)-WLMRKNN classifiers are given in Figs. 10, 11 and 12, respectively. The results demonstrate that the proposed method improves the performance of all three reconstruction error-based KNN classifiers. Also, the results of the McNemar statistical test on the MNIST dataset are given for different numbers of the neighbors in Table 6. The results demonstrate that there is a significant difference between the investigated paired classifiers, especially for the number of neighbors of less than 9. Again, the recognition rate values show the proposed \({l}_{0}\)-based classifiers, especially \({l}_{0}\)-WLMRKNN, have the best performance.

Fig. 10
figure 10

The recognition results on the MNIST database using \({l}_{0}\)-MLMNN and MLMNN classifiers

Fig. 11
figure 11

The recognition results on the MNIST database using \({l}_{0}\)-WRKNN and WRKNN classifiers

Fig. 12
figure 12

The recognition rates on the MNIST database using \({l}_{0}\)-WLMRKNN and WLMRKNN classifiers

Table 6 The results of McNemar’s test (\({\chi }^{2}\) statistic value) of three paired MLMNN and \({l}_{0}\)-MLMNN, WRKNN and \({l}_{0}\)-WRKNN, and WLMRKNN and \({l}_{0}\)-WLMRKNN classifiers for different numbers of the neighbors on MNIST dataset

Figures 10 and 11 show that the proposed method has similar recognition rates to the conventional minimum reconstruction error-based KNN classifiers for large K. The larger numbers of neighbors cause larger samples to be involved in reconstructing the query sample. Therefore, there is more freedom in reconstructing the query sample, and it can cause the performance of the classifiers will be closer to each other. However, the performance is depended on the data distribution and classifier type.

Also, as is inferrable, the proposed \({l}_{0}\)-based neighbor selection method can be more effective on datasets with a small number of samples or datasets with high variability per class. Results on MNIST show that the proposed \({l}_{0}\)-based classifiers and the mentioned minimum reconstruction error-based K-NN classifiers perform the same for the large amount of K when the training subsets include more than 10,000 samples. K-nearest neighbors of the query sample likely make minimum reconstruction error when there are datasets with large amounts of samples per class.

The results on the whole MNIST digit database are evaluated for more investigations in the following. The recognition rate results are given in Figs. 13, 14 and 15. The results are similar to the obtained results on a small subset of MNIST (Figs. 10, 11 and 12) and demonstrate the better performance of the proposed method, especially for the smaller number of neighbors. For the number of neighbors greater than 12, the performance of the WRKNN is better than the \({l}_{0}\)-WRKNN, but \({l}_{0}\)-WLMRKNN and \({l}_{0}\)-MLMNN perform similarly to WLMRKNN and MLMNN, respectively. The reasons have been described in the paragraph above Table 4. Also, in the MLMNN, the same as WLMRKNN, the local mean-based neighbors have been used to reconstruct the query sample. Therefore, the high Euclidean distances of the \({l}_{0}\)-based neighbors do not have any unsuitable effect on calculating and controlling the reconstruction coefficient values (Eq. 2). On the other hand, increasing the number of neighbors provides more freedom to reconstruct a query sample. It can make the performance of the MLMNN similar to \({l}_{0}\)-MLMNN.

Fig. 13
figure 13

The recognition results on the whole MNIST database using \({l}_{0}\)–MLMNN and MLMNN classifiers

Fig. 14
figure 14

The recognition results on the whole MNIST database using \({l}_{0}\)-WRKNNNN and WRKNN classifiers

Fig. 15
figure 15

The recognition results on the whole MNIST database using \({l}_{0}\)-WLMRKNN and WLMRKNN classifiers

The results of the evaluation using SVM classifier

Furthermore, the results of the proposed classifiers are compared with the SVM classifier. The results of the recognition rates are given in Table 7. The results demonstrate the better performance of the proposed classifiers than the SVM classifier on UCI, UCR, and MNIST databases. Also, the results of the proposed K-NN-based classifiers are approximately constant for K > 5, which exhibits less sensitivity of them to the initial parameter of the number of neighbors. Generally, the proposed method is a suitable classifier for classifying small datasets.

Table 7 The results of the recognition rates using \({l}_{0}\)-WLMRKNN, \({l}_{0}\)-WRKNN, \({l}_{0}\)-MLMNN, and SVM classifiers

Discussion of the results

The minimum reconstruction error-based K-NN classifiers have the best performance among K-NN-based classifiers and are less sensitive to the number of neighbors. Usually, the neighbors are selected based on the minimum Euclidean distance of the data from the query sample. However, different kinds of K-NN-based classifiers make the decision using their specific criteria. Therefore, neighbor selection according to the criterion of the classifier can improve their performance. The proposed \({l}_{0}\)-based neighbor selection method decreases the reconstruction errors of the minimum reconstruction error K-NN-based classifiers and can improve their performance. Figure 16 shows the sum of the reconstruction error values for the query samples according to their category samples on the Chlorine Concentration dataset of UCR. The minimum reconstruction error values of query samples (i.e., \(r\left({\varvec{y}},{{\varvec{X}}}_{K}\right)={\Vert {\varvec{y}}-{{{\varvec{X}}}_{K}}^{\mathrm{T}}{\boldsymbol{\omega}}\Vert }_{2}^{2}\)) have been calculated using K training samples with the same category of each query sample. The results are provided for K = 2,…, 15 using WRKNN and \({l}_{0}\)-WRKNN classifiers. The Chlorine Concentration dataset includes 3840 query samples, and the number of the categories is 3 (C = 3). The results demonstrate the reconstruction errors are decreased using the proposed neighbor selection method.

Fig. 16
figure 16

The sum of the reconstruction error values of all query samples according to their category samples on the Chlorine Concentration dataset using WRKNN and \({l}_{0}\)-WRKNN classifiers

The reconstruction error of the query sample, i.e., \(r\left({\varvec{y}},{{\varvec{X}}}_{K}\right)={\Vert {\varvec{y}}-{{{\varvec{X}}}_{K}}^{\mathrm{T}}{\boldsymbol{\omega}}\Vert }_{2}^{2}\), is the decision metric in all reconstruction error-based K-NN classifiers. \({{\varvec{X}}}_{K}\) is the matrix of K selected neighbors, and \({\boldsymbol{\omega}}\) is the reconstruction coefficients vector. \(\frac{\partial r}{\partial {{\varvec{x}}}_{ik}}\) exhibits the contribution of the \({{\varvec{x}}}_{ik}\) in classifying \({\varvec{y}}\). In \({l}_{0}\)-WRKNN, the weighted contribution of each neighbor can be calculated per class as

$$\begin{aligned}\frac{\partial {r}^{j}}{\partial {{\varvec{x}}}_{i-{l}_{0}}^{j}}&=\frac{\partial {\bigg\Vert {\varvec{y}}-{\left({{\varvec{X}}}_{K-{l}_{0}}^{j}\right)}^{\mathrm{T}}{{\boldsymbol{\eta}}}_{K-{l}_{0}}^{j}\bigg\Vert }_{2}^{2}}{\partial {{\varvec{x}}}_{i-{l}_{0}}}\\ &=2{\eta }_{i-{l}_{0}}^{j}\left({\varvec{y}}-{\left({{\varvec{X}}}_{K-{l}_{0}}^{j}\right)}^{\mathrm{T}}{{\boldsymbol{\eta}}}_{K-{l}_{0}}^{j}\right),\end{aligned}$$
(12)

where \({{\varvec{x}}}_{i-{l}_{0}}^{j}\) is the ith \({l}_{0}\)-based neighbor from the jth class, and \({\eta }_{i-{l}_{0}}^{j}\) is its corresponding coefficient. Equation (12) expresses behavior similar to WRKNN and shows that neighbors have different contributions in classification, corresponding to their reconstruction coefficients. This conclusion is also correct for \({l}_{0}\)-WLMRKNN and \({l}_{0}\)-MLMNN.

Besides, by deriving r to the reconstruction coefficients, i.e., \(\frac{\partial r}{\partial {w}_{iK}}\), the weighted contribution of the reconstruction coefficients can be evaluated. In \({l}_{0}\)-WRKNN, \(\frac{\partial {r}^{j}}{\partial {\eta }_{i-{l}_{0}}^{j}}\) is calculated as

$$\begin{aligned}\frac{\partial {r}^{j}}{\partial {\eta }_{i-{l}_{0}}^{j}}&=\frac{\partial {\Vert {\varvec{y}}-{\left({{\varvec{X}}}_{K-{l}_{0}}^{j}\right)}^{\mathrm{T}}{{\boldsymbol{\eta}}}_{K-{l}_{0}}^{j}\Vert }_{2}^{2}}{\partial {\eta }_{i-{l}_{0}}^{j}}\\ &=2{\left({{\varvec{x}}}_{i-{l}_{0}}^{j}\right)}^{\mathrm{T}}\left({\varvec{y}}-{\left({{\varvec{X}}}_{K-{l}_{0}}^{j}\right)}^{\mathrm{T}}{{\boldsymbol{\eta}}}_{K-{l}_{0}}^{j}\right).\end{aligned}$$
(13)

Equation (13) shows that the weighted contribution of the \({\eta }_{i-{l}_{0}}^{j}\) depends on both the corresponding sample and the reconstruction error value. Because of selecting neighbors based on the minimum reconstruction error, the proposed \({l}_{0}\)-based K-NN classifiers can be less sensitive to the reconstruction coefficients. Generally, Eq. (13) can be applied for evaluating the reconstruction coefficients in every reconstruction error-based K-NN classifier. \(f\left(K,t\right)=\sum_{j=1}^{C}\sum_{i=1}^{K}\left|\frac{\partial {r}_{t}^{j}}{\partial {\eta }_{i}^{j}}\right|\) is introduced for comparing the sensitivity of the minimum reconstruction error-based K-NN classifiers to the reconstruction coefficients. \(\sum_{i=1}^{K}\left|\frac{\partial {r}_{t}^{j}}{\partial {\eta }_{i}^{j}}\right|\) is the summation of K absolute values of \(\frac{\partial {r}_{t}^{j}}{\partial {\eta }_{i}^{j}}\) corresponding K neighbors of the tth query sample. And \(f\left(K,t\right)\) is the summation of c values of \(\sum_{i=1}^{K}\left|\frac{\partial {r}_{t}^{j}}{\partial {\eta }_{i}^{j}}\right|\) corresponding to c categories. \({r}_{t}\) is the reconstruction error of the tth query sample. The smaller \(f\left(.,.\right)\) exhibits less sensitivity of the reconstruction error-based K-NN classifiers to the reconstruction coefficients.

Figure 17 shows \(f\left(5,.\right)\) of the query samples of the Chlorine Concentration dataset of UCR using WRKNN and \({l}_{0}\)-WRKNN. Also, \(\sum_{t=1}^{{N}_{t}}f\left(K,t\right)\) values using WRKNN, \({l}_{0}\)-WRKNN, WLMRKNN, and \({l}_{0}\)-WLMRKNN are given in Fig. 18 for K = 2,…, 15. \({N}_{t}\) is the number of query samples. Figures 17 and 18 demonstrate the proposed minimum reconstruction error-based K-NN classifiers are less sensitive to the reconstruction coefficients.

Fig. 17
figure 17

\(f\left(5,.\right)\) values of the query samples of the Chlorine Concentration dataset of UCR using a WRKNN and b \({l}_{0}\)-WRKNN

Fig. 18
figure 18

\(\sum_{t=1}^{{N}_{t}}f\left(K,t\right)\) values for K = 2,…, 15 on the Chlorine Concentration dataset of UCR using WRKNN, \({l}_{0}\)-WRKNN, WLMRKNN, and \({l}_{0}\)-WLMRKNN

The minimum reconstruction error-based K-NN classifiers make the decision about a query sample based on the minimum reconstruction error. Generally, the reconstruction error of a sample versus the samples with the same class is less than the reconstruction errors versus the samples with a different class. On the other hand, the neighbors corresponding to Euclidean distance cannot always reconstruct the query sample with minimum error. The proposed \({l}_{0}\)-based neighbor selection method selects K samples from each class that obtain minimum reconstruction error. Therefore, it is more probable that the samples with the same class as the query sample provide the least reconstruction error.

Conclusion

Deep learning obtains exciting results in many applications, such as classifying includes electroencephalography (EEG) [29], text [13], time-series data [30], remote sensing images [31], etc. However, for training the deep neural network, large datasets are needed. In this manuscript, a robust and powerful K-NN-based classifier is proposed for small datasets. There are different types of K-NN-based classifiers. Neighbors’ selection is the first and one of the most significant steps of K-NN-based classifiers. Selecting neighbors according to the decision criterion of the classifier can improve the performance of the classifier. The neighbors are selected using Euclidean distance for most K-NN-based classifiers while are not corresponding to their decision criteria. In this manuscript, an \({l}_{0}\)-based neighbor selection method has been introduced (Algorithm 1) for minimum reconstruction error-based K-NN classifiers. There is no constraint on the distance of the selected neighbors, and the neighbors are determined using a sparse representation problem scheme.

Based on the proposed neighbor selection method, \({l}_{0}\)-MLMNN, \({l}_{0}\)-WRKNN, and \({l}_{0}\)-WLMRKNN classifiers have been introduced. Steps of the proposed classifiers have been given in Algorithms 2–4. The reconstruction error of the query sample versus the neighbors has significantly been decreased using \({l}_{0}\)-based neighbors; therefore, the performance of the minimum reconstruction error-based K-NN classifiers has been improved.

Also, the computational complexity of the proposed neighbor selection method (\(O(N\times K)\)) is more than the conventional minimum Euclidean distance-based method (\(O(N)\)). Also, the used matching pursuit algorithm is a semi-optimum solution of (11), which can reduce the performance of the proposed method. Furthermore, the proposed neighbor selection method is just applicable for minimum reconstruction error-based classifiers.

The proposed \({l}_{0}\)-based neighbor selection method is suitable for data with low samples or high variability per class. The performances of the \({l}_{0}\)-based neighbor selection method and conventional Euclidean distance-based method are the same for data with a large number of samples per class or low variability. Evaluations on UCI machine learning repository (Figs. 4, 5 and 6, and Table 3), UCR time-series classification archive (Figs. 7, 8 and 9, and Tables 4, 5), and the subset of the MNIST handwritten digit database (Figs. 10, 11, 12, 13, 14 and 15, and Table 6) demonstrate the suitable performance of the proposed classifiers. It has been shown that the proposed reconstruction error-based K-NN classifiers are less sensitive to the reconstruction coefficients than the conventional minimum reconstruction error-based K-NN classifiers. Also, the proposed classifiers have performed better than the SVM classifier on all three datasets. For future research, the performance of the K-NN-based classifiers can be evaluated using different distance metrics.