1 Introduction

The k-Nearest Neighbor (kNN) classification algorithm is one of the most popular approaches used by researchers and practitioners in the areas of Pattern Recognition and Machine Learning. Altogether with the Support Vector Machine (SVM), it is considered a firm representative of the classification by analogy principle [4].

Generally speaking, kNN only needs one parameter to be adjusted, k, which represents how many closest neighbors are to be considered to classify an unseen object. Once this parameter is set, two main approaches are followed in order to classify an object, (i), the vote of the majority of the k neighbors, and (ii), a weighted vote of all k neighbors considering the distance from where each of them are located with respect to the object to classify. Following these two ideas, the kNN algorithm has been successfully applied in such diverse learning task such as data mining [14], image processing [6], and recommender systems [7].

For classification purposes, all kNN variants, up to now, have assumed that, independently of the voting strategy that they follow (by majority or weighted) all objects in the training set are equal in their classification power. For instance, if two objects from different classes are exactly at the same distance of a test object, both objects will contribute the same amount to the final decision. Another way to perceive this is by saying that the two training objects have the same relevance. In this work, we are interested in proposing some ideas to alter this behavior. Motivated by how big bodies exert and influence to proximate objects, we think of assigning a mass to each of the objects in the training set.

There are several scenario applications that make us hypothesize that assigning a mass to all the training objects could have positive effects in the classification performance of the kNN algorithm. Particularly, this could be of interest when some aspect or natural feature of the problem needs to be considered. For example, within the field of Natural Language Processing (NLP), for the task of news classification, capturing the temporal aspect may be relevant, i.e. more recent news could be more informative (or have more context) than older onesFootnote 1. In this case, we could think of the more recent news to have a larger influence, thus a larger mass. Another application of this approach could be the recognition of highly heterogeneous categories. In this case it is usual that the majority of the neighbors (to the object to classify) vote for a wrong label. With objects with different masses it would be possible to overcome this decision, i.e. if the objects with the right class have proper mass.

In this work we approach these ideas by proposing two different ways to calculate a mass for a given object. We formulate the kNN algorithm to take into consideration this mass by using a voting strategy based on Newton’s gravitational force. We tested our proposal in 13 benchmark data sets and contrasted the results against the regular kNN and weighted kNN algorithms.

2 Related Work

Literature has reported several ways in which the kNN algorithm could improve its performance. Naturally, finding an optimal value of k has been one of the questions that some works have attempted to solve [16, 17]. Besides finding this k value, there is an open question regarding which distance metric is the more suitable to use. In this regard, some previous works have evaluated new and traditional metrics in a variety of classification problems [2, 8, 15].

Using a weighting scheme was firstly proposed by Dudani [5] in the 70’s, this variant of kNN is called the Distance-Weighted k-Nearest-Neighbor Rule (DWkNN). Since then, different weighting schemes have been proposed. Among the most recent works, Tan [12] proposed the algorithm Neighbor-Weighted k-Nearest Neighbor (NWkNN), which applies a weighting strategy based on the distribution of classes. When working with unbalanced data sets, NWkNN gives a minor weight to objects of majority classes and more weight to objects less represented. For the case of text classification, Soucy and Mineau [11] proposed a weighting based on the similarity of texts (objects), measured by the cosine similarity between their bag-of-word representations. Mateos-García et al. [9] developed a technique similar to those used in Artificial Neural Networks to optimize some weights that would indicate the importance that each neighbor has with respect to the test objects. Finally, Parvinnia et al. [10] also computed a weight for each training object based on a matching strategy between the training and testing data sets.

3 Proposed Algorithm

In this section we present two approaches to calculate a mass for a given object in the training set. We then explain the complete kNN framework that exploits the concept of mass, by considering Newton’s gravitational force.

3.1 Mass Assignment

Approach 1: Circled by Its Own Class (CC). This approach is based on a instance selection strategy known as Edited Nearest Neighbor (ENN) originally proposed by Wilson [13]. The rationale of ENN is to keep an instance that is surrounded (or circled) by other instances of its same class. For the CC approach, the mass of an object x is directly proportional to the number of objects from its same class that circled it. By doing this, we aim to give less importance to objects that are in regions of the feature space that are more likely to represent a different class. In other words, the idea is to penalize rare objects and, as a consequence, make the classifier more robust to outliers. To calculate the mass via CC we apply the Eq. 1.

$$\begin{aligned} m(x\in c_i)=\log _2(SN_k(x,c_i)+2) \end{aligned}$$
(1)

where x is a training object, \(c_i\) is its class and the function \(SN_k()\) calculates how many out of the k closest objects to x belong to its same class. The \(log_2()\) function serves as a smoothing factor; we include a constant 2 to avoid computation errors or obtaining masses equal to zero.

Approach 2: Circled by Different Classes (CD). This approach is the opposite of the CC approach. It gives more mass to objects that are surrounded by objects from different classes, that is, the mass is inversely proportional to the number of objects of the same class. CD aims to balance the discriminative power of an outlier object, since it could be relevant to classify other outlier object in the testing set. It also allows to better modeling heterogeneous classes formed by different small subgroups of objects. To assign a mass following this approach we applied the Eq. 2. The interpretation of its elements is the same as in Eq. 1.

$$\begin{aligned} m(x\in c_i)=\log _2(k-SN_k(x,c_i)+2) \end{aligned}$$
(2)

3.2 Weighted Attraction Force kNN algorithm (WAF-kNN)

The traditional weighted kNN algorithm is as follows: given a set of training objects \(\{(x_1,f(x_1)),...,(x_i,f(x_i))\}\) (being \(x_i\) an object and \(f(x_i)\) its label), an unlabeled object \(x_q\), and the set of the k closest neighbors to \(x_q\) in the training set \(\{x_1,...,x_k\}\), the class of \(x_q\) is determined by Eq. 3:

$$\begin{aligned} f(x_q)\leftarrow \arg \max _{c\in C} \sum _{i=1}^k weight(x_i)\times \delta (c,f(x_i)) \end{aligned}$$
(3)

where C represents the set of classes, \(weight(x_i)\) indicates the weight for the vote from object \(x_i\), and \(\delta (c,f(x_i))\) is a function that returns 1 if \(x_i\) belongs to class c or 0, otherwise.

Supported on this framework, our proposal, that we call Weighted Attraction Force kNN, or simply WAF-kNN, uses a weighting scheme based on the Law of Universal Gravitation as presented by Eq. 4.

$$\begin{aligned} weight(x_i)=G\frac{m(x_q)m(x_i)}{dist^2(x_q,x_i)}\simeq \frac{m(x_i)}{dist^2(x_q,x_i)} \end{aligned}$$
(4)

where \(weight(x_i)\) is the attraction force or the voting amount exerted by the training object \(x_i\) to classify the object \(x_q\). \(m(x_q)\) and \(m(x_i)\) are the masses of the testing and training objects respectively, and \(dist(\cdot ,\cdot )\) is a distance metric between the two objects. The reader could detect that there are two constants that we could omit to simplify the original equation, since they only serve as scaling factors without affecting how the vote is computed. These two constants are G and \(m(x_q)\). Note that \(m(x_i)\) could be calculated by any of the two approaches, CC or CD, that we already presented in Sect. 3.1 for mass assignment.

4 Experiments and Results

4.1 Experimental Configuration

For the evaluation of the proposed approach we considered 13 different data sets from the UCI data repositoryFootnote 2. All these data sets exclusively contain numeric features and do not show any missing value. These data sets are commonly used in classification tasks. Table 1 presents some statistics on these data sets such as the number of instances, features, and classes.

Table 1. Data sets characteristics.

We applied a common experimental setting for the experiments across all the collections. Firstly, we considered three different values for k, namely, 3, 5 and 7. Then, we standardized the data by means of their z-scores. In all the experiments we used the Euclidean distance as the distance measure, and employed the \(\mathrm {F}_1\) score as main evaluation metric due to its appropriateness for describing results in unbalanced data sets. A 10-fold cross-validation procedure was applied to get the results. Finally, we applied the non-parametric Bayesian Signed-Rank (BSR) test [1] for analyzing the statistical significance of the obtained results.

4.2 Results

Table 2 presents a first comparison of the approaches used to calculate the masses (CC and CD), each employed within the WAF-kNN algorithm. This table is organized by the three k values that were evaluated. The best results, for each k, are shown in bold face. Globally, the CD approach slightly outperforms the CC approach, being this more evident when \(k=7\); notwithstanding, there are data sets where the CC approach is better for all k values, e.g. Arcene and Ecoli. The analysis of the Ecoli data set tell us that classes are more or less well defined in homogeneous clusters. Being this the case, the CD approach gives more mass to outliers, causing a larger classification error than CC, which assigns less mass to objects away from their class main centroid and having the effect of reducing noise. Both approaches, CC and CD, aim to offer a better weighting scheme to improve classification performance, but which one to use will ultimately depend on the distribution of classes in the data set of interest.

Table 2. \(\mathrm {F}_1\) scores of WAF-kNN, using the two approaches for mass assignment.

To evaluate our proposal against kNN and DWkNN algorithms, we chose the CD approach given its consistent performance in the previous experiment. This new comparison is presented in Table 3, where it can be observed that our proposal outperforms the baseline methods in the majority of data sets. This behavior is consistent among the three values of k that are considered. Again, the best performance is obtained with \(k=7\).

Table 3. Comparison of kNN, DWkNN and WAF-kNN using CD masses.

To further analyze these results, we applied the non-parametric BSR test [3]. According to this test three possibilities do exist for a given pairwise comparison of methods A and B: (scenario 1) A outperforms B, (scenario 2) both methods show the same performance, or (scenario 3) B outperforms A. The BSR test computes the probability of occurrence of each scenario when we applied approaches A and B over a given data set. Table 4 presents the probabilities of occurrence for each scenario when comparing the baseline approaches kNN and DWkNN with our proposed WAF-kNN approach, respectively.

Table 4. BSR output probabilities. A refers to the baseline methods, kNN and DWkNN respectively, whereas B refers to the proposed WAF-kNN approach.

According to the performance of the WAF algorithm in each data set (with \(k=7\)), it was in Ionosphere and Ecoli, where we obtained the largest improvement and decrement with respect to the baseline methods, respectively. When visualizing these data sets, it is possible to notice some data characteristics that could shed some light on details about the behavior of the method.

Fig. 1.
figure 1

t-SNE mapping of the Ionosphere and Ecoli data sets. (Color figure online)

Figure 1 shows the distribution of objects in these two data sets using the t-distributed Stochastic Neighbor Embedding (t-SNE). The Ionosphere data set is composed by two classes. Class 1, represented in red color and grouped in two well defined clusters which are located in the upper and lower section of the space. Class 2, represented in blue color and mainly spread along the mapping space with an identifiable cluster on the right side of the figure. For this case, the CD approach favors the classification of objects of class 2 by assigning more mass to training objects that are located in the central and upper left region, which are clearly circled by objects of class 1, thus getting right label assignment even in regions where majority of objects belong to different class. On the other hand, in the Ecoli data set, CD gives more mass to hypothetical noisy objects located away from their normal behavior of its own class (see blue and white objects over the green cluster objects), then negatively affecting the classifier.

5 Conclusions

In this work we introduced the WAF-kNN algorithm, which is a variant of the weighted kNN algorithm but based on the attraction force that exist between two objects. We present two methods of assigning mass to training objects, i.e. Circled by its own class (CC) and Circled by different classes (CD). For testing purposes 13 known data sets were employed. Comparisons indicate that our proposal obtained better classification results than kNN and is statistically competitive with DWkNN. These results were validated with a non-parametric BSR test.