Multifunctional nearestneighbour classification
 719 Downloads
Abstract
The k nearestneighbour (kNN) algorithm has enjoyed much attention since its inception as an intuitive and effective classification method. Many further developments of kNN have been reported such as those integrated with fuzzy sets, rough sets, and evolutionary computation. In particular, the fuzzy and rough modifications of kNN have shown significant enhancement in performance. This paper presents another significant improvement, leading to a multifunctional nearestneighbour (MFNN) approach which is conceptually simple to understand. It employs an aggregation of fuzzy similarity relations and class memberships in playing the critical role of decision qualifier to perform the task of classification. The new method offers important adaptivity in dealing with different classification problems by nearestneighbour classifiers, due to the large and variable choice of available aggregation methods and similarity metrics. This flexibility allows the proposed approach to be implemented in a variety of forms. Both theoretical analysis and empirical evaluation demonstrate that conventional kNN and fuzzy nearest neighbour, as well as two recently developed fuzzyrough nearestneighbour algorithms can be considered as special cases of MFNN. Experimental results also confirm that the proposed approach works effectively and generally outperforms many stateoftheart techniques.
Keywords
Aggregation Classification Nearestneighbour Similarity relation1 Introduction
Classification systems have played an important role in many application problems, including design, analysis, diagnosis and tutoring (Duda et al. 2001). The goal of developing such a system is to find a model that minimises classification error on data that have not been used during the learning process. Generally, a classification problem can be solved from a variety of perspectives, such as probability theory (Kolmogorov 1950) [e.g. Bayesian networks (John and Langley 1995)], decision tree learning (Breiman et al. 1984) [e.g. C4.5 (Quinlan 1993)] and instancebased learning [e.g. k nearest neighbours or kNN (Cover and Hart 1967)]. In particular, variations of kNN have been successfully applied to a wide range of realworld problems. It is generally recognised that such an instancebased learning is both practically more effective and intuitively more realistic than many other learning classifier schemes (Daelemans and den Bosch 2005).
Central to the kNN approach and its variations is a nonlinear classification technique for categorising objects based on the k closest training objects of a given feature space (Cover and Hart 1967). As a type of instancebased learning, it works by assigning a test object to the decision class that is most common amongst its k nearest neighbours, i.e. the k training objects that are closest to the test object. A fuzzy extension of kNN is proposed in Keller et al. (1985), known as fuzzy nearest neighbours (FNN), which exploits the varying membership degrees of classes embedded in the training data objects, in order to improve classification performance. Also, an ownership function (Sarkar 2007) has been integrated with the FNN algorithm, producing class confidence values that do not necessarily sum up to one. This method uses fuzzyrough sets (Dubois and Prade 1992; Yao 1998) (and is abbreviated to FRNNO hereafter), but it does not utilise the central concepts of lower and upper approximations in rough set theory (Pawlak 1991).
Fuzzyrough nearest neighbour (FRNN) (Jensen and Cornelis 2011) further extends the kNN and FNN algorithms, by using a single test object’s nearest neighbours to construct the fuzzy upper and lower approximations for each decision class. The approach offers many different ways in which to construct the fuzzy upper and lower approximations. These include the traditional implicator or Tnormbased models (Radzikowska and Kerre 2002), as well as more advanced methods that utilise vaguely quantified rough sets (VQRS) (Cornelis et al. 2007). Experimental results show that a nearestneighbour classifier based on VQRS, termed VQNN, performs robustly in presence of noisy data. However, the mathematical sophistication required to understand the concepts underlying this and related techniques often hinders their applications.
This paper presents a multifunctional nearestneighbour (MFNN) classification approach, in order to strengthen the efficacy of the existing advanced nearestneighbour techniques. An aggregation mechanism for both fuzzy similarity and class membership measured over selected nearest neighbours is employed to act as the decision qualifier. Many similarity metrics [e.g. fuzzy tolerance relations (Das et al. 1998), fuzzy Tequivalence relations (Baets and Mesiar 1998, 2002), Euclidean distance, etc] and aggregation operators [e.g. Snorms, cardinality measure, Addition operator and OWA (Yager 1988), etc] may be adopted for such application. Furthermore, this paper provides a theoretical analysis, showing that with specific implementation of: the aggregator, the similarity relation and the class membership function, FRNN, VQNN, kNN and FNN all become particular instances of the MFNN. This observation indicates that the MFNN algorithm grants a flexible framework to the existing nearestneighbour classification methods. That is, this work helps to ensure that the resulting MFNN is of good adaptivity in dealing with different classification problems given such a wide range of potential choices. The performance of the proposed novel approach is evaluated through a series of systematic experimental investigations. In comparison with alternative nearestneighbour methods, including kNN, FNN, FRNNO, and other stateoftheart classifiers such as: PART (Witten and Frank 1998, 2000), and J48 (Quinlan 1993), versions of MFNN that are implemented with commonly adopted similarity metrics and aggregation operators generally offer improved classification performance.
The remainder of this paper is organised as follows. The theoretical foundation for the multifunctional nearestneighbour approach is introduced in Sect. 2, including their properties of MFNN and a worked example. The relationship between MFNN and the FRNN and VQNN algorithms is analysed in Sect. 3, supported with theoretical proofs, and that between MFNN and the kNN and FNN algorithms is discussed in Sect. 4. The proposed approach is systematically compared with the existing work through experimental evaluations in Sect. 5. Finally, the paper is concluded in Sect. 6, together with a discussion of potential further research.
2 Multifunctional nearestneighbour classifier
For completeness, the kNN (Cover and Hart 1967) and fuzzy nearestneighbour (FNN) (Keller et al. 1985) techniques are briefly recalled first. The multifunctional nearestneighbour classification is then presented, including its properties and a worked example.
Notationally, in the following, \(\mathbb {U}\) denotes a given training dataset that involves a set of features P; y denotes the object to be classified; and a(x) denotes the value of an attribute \(a, a\in P\), for an instance \(x\in \mathbb {U}\).
2.1 K nearest neighbour
2.2 Fuzzy nearest neighbour
2.3 Multifunctional nearestneighbour algorithm
 1.
The fuzzy similarity between the test object and any existing (training) data is calculated, and the k nearest neighbours are selected according to the k greatest resulting similarities. This step is popularly applied for most nearestneighbourbased methods.
 2.
The aggregation of fuzzy similarities and the class membership degrees obtained from the k nearest neighbours is computed using the decision qualifier, subject to the choice of a certain aggregation operator. The resulting aggregation generalises the indicator in making decision by a certain nearestneighbourbased method, such as kNN or FNN, etc. This may help to increase the quality of the classification results (see later).
 3.
Based on the drive to achieve the maximum correctness in decisionmaking, the final classification decision is drawn on the basis of the aggregation as the one that returns the highest value for the decision qualifier.
2.4 Properties of MFNN

Łukasiewicz Snorm: \(S(x,y)=\min (x+y,1)\);

Gödel Snorm: \(S(x,y)=\max (x,y)\);

Algebraic Snorm: \(S(x,y)=(x+y)(x*y)\);

Einstein Snorm: \(S(x,y) = (x + y) / (1 + x * y)\).
Training data for the example
Object  a  b  c  q 

1  −0.4  0.2  −0.5  Yes 
2  −0.4  0.1  −0.1  No 
3  0.2  −0.3  0  No 
4  0.2  0  0  Yes 
Note that for the MFNN algorithm with crisp class membership, of all the typical Snorms, the Gödel Snorm is the most notable in terms of its behaviour. This is because of the means by which the k nearest neighbours are generated. That is, the maximum similarity between the test object and the existing data in the universe is also the maximum amongst the k nearest neighbours. Therefore, with the Gödel Snorm, MFNN always classifies a test object into the class where a sample (i.e., the class membership degree is one) has the highest similarity to the test object regardless of the number of nearest neighbours. In other words, when using the Gödel Snorm and crisp class membership, the classification accuracy of MFNN is not affected by the choice of the value for k.
Computationally, there are two loops in the MFNN algorithm: one to iterate through the classes and another to iterate through the neighbours. Thus, in the worst case (where the nearest neighbourhood covers the entire universe of discourse \(\mathbb {U}\)), the complexity of MFNN is \(O(\mathcal {C}\cdot \mathbb {U})\).
2.5 Worked example
Test data for the example
Object  a  b  c  q 

t1  0.3  −0.3  0  No 
t2  −0.3  0.3  −0.3  Yes 
3 Relationship between MFNN and FRNN/VQNN
The flexible framework of the multifunctional nearestneighbour algorithm allows it to cover many nearestneighbour approaches as its special cases. This is demonstrated through the analysis of its relations with the stateoftheart fuzzyrough nearestneighbour classification algorithms below.
3.1 Fuzzyrough nearestneighbour classification
The original fuzzyrough nearestneighbour (FRNN) algorithm was proposed in Jensen and Cornelis (2011). In contrast to approaches such as fuzzyrough ownership nearest neighbour (FRNNO) (Sarkar 2007), FRNN employs the central rough set concepts in their fuzzified forms: fuzzy upper and lower approximations. These important concepts are used to determine the assignment of class membership to a given test object.
3.2 FRNN and VQNN as special instances of MFNN
Conceptually, MFNN is much simpler without the need of directly involving complicated mathematical definitions such as the above. However, its generality covers both FRNN and VQNN as its specific cases. Of the many methods for aggregating similarities, Gödel (Maximum) Snorm and the Addition operator are arguably amongst those which are most commonly used. By using these two aggregators, two particular implementations of MFNN can be devised, denoting them as MFNN_G (MFNN with Gödel Snorm) and MFNN_A (MFNN with Addition operator). If the fuzzy similarity defined in (3) and the crisp class membership function are employed to implement MFNN_G and MFNN_A, then, these two methods are, respectively, equivalent to FRNN and VQNN, in terms of classification outcomes. The proofs are given as follows.
Theorem 1
With the use of an identical fuzzy similarity metric, MFNN_G achieves the same classification accuracy as FRNN.
Proof
Theorem 2
With the use of an identical fuzzy similarity metric, MFNN_A achieves the same classification accuracy as VQNN.
Proof
Given a set \(\mathbb {U}\) and a test object y, Within the selected nearest neighbours, let M be the class that \(\sum \nolimits _{x\in M}\mu _{R_P}(x,y)=\max \limits _{X\in \mathcal {C}}\{\sum \nolimits _{x\in X}\mu _{R_P}(x,y)\}\), where \(\mathcal {C}\) denotes the set of decision classes.
It is noteworthy that, implemented by above configuration, MFNN_G and MFNN_A, are equivalent to FRNN and VQNN, respectively, only in the sense that they can achieve the same classification outcome and hence the same classification accuracy. However, the actual underlying predicted probabilities and models of MFNN_G/MFNN_A and FRNN/VQNN are different. Therefore, for other classification performance indicators, such as rootmeansquared error (RMSE), which are determined by the numerical predicted probabilities, the equivalent relationships between MFNN_G /MFNN_A and FRNN/VQNN may not hold.
4 Popular nearestneighbour methods as special cases of MFNN
To further demonstrate the generality of MFNN, this section discusses the relationship between MFNN and k nearest neighbour and that between MFNN and fuzzy nearest neighbour.
4.1 K nearest neighbour and MFNN
As introduced previously, the k nearestneighbour (kNN) classifier (Cover and Hart 1967) assigns a test object to the class that is best represented amongst its k nearest neighbours.
4.2 Fuzzy nearest neighbour and MFNN
In fuzzy nearest neighbour (FNN) (Keller et al. 1985), the k nearest neighbours of the test instance are determined first. Then, the test instance is assigned to the class for which the decision qualifier (2) is maximal.
5 Experimental evaluation
This section presents a systematic evaluation of MFNN experimentally. The results and discussions are divided into four different parts, after an introduction to the experimental setup. The first part investigates the influence of the number of nearest neighbours on the MFNN algorithm and how this may affect classification performance. The second compares MFNN with five other nearestneighbour methods in term of classification accuracy. It also demonstrates empirically how MFNN may be equivalent to the fuzzyrough setbased FRNN and VQNN approaches, supporting the earlier formal proofs. The third and fourth parts provide a comparative investigation of the performance of several versions of the MFNN algorithm against four stateoftheart classifier learners. Once again, these comparisons are made with regard to the classification accuracy.
5.1 Experimental setup
Datasets used for evaluation
Dataset  Objects  Attributes 

Cleveland  297  13 
Ecoli  336  7 
Glass  214  9 
Handwritten  1593  256 
Heart  270  13 
Liver  345  6 
Multifeat  2000  649 
Olitos  120  25 
Pageblock  5473  10 
Satellite  6435  36 
Sonar  208  60 
Water 2  390  38 
Water 3  390  38 
Waveform  5000  40 
Wine  178  14 
Wisconsin  683  9 
For the present study, MFNN employs the popular relation of (4) and the KleeneDienes Tnorm (Dienes 1949; Kleene 1952) to measure the fuzzy similarity as per (3) in the following experiments. Whilst this does not allow the opportunity to finetune individual MFNN methods, it ensures that methods are compared on equal footing. For simplicity, in the following experiments, the class membership functions utilised in MFNN are consistently set to be crispvalued.
Stratified \(10\times 10\)fold crossvalidation (10FCV) is employed throughout the experimentation. In 10FCV, an original dataset is partitioned into 10 subsets of data objects. Of these 10 subsets, a single subset is retained as the testing data for the classifier, and the remaining 9 subsets are used for training. The crossvalidation process is repeated for 10 times. The 10 sets of results are then averaged to produce a single classifier estimation. The advantage of 10FCV over random subsampling is that all objects are used for both training and testing, and each object is used for testing only once per fold. The stratification of the data prior to its division into folds ensures that each class label (as far as possible) has equal representation in all folds, thereby helping to alleviate bias/variance problems (Bengio and Grandvalet 2005).
Classification accuracy: MFNN_G versus others
Dataset  MFNN_G  FRNN  VQNN  FNN  FRNNO  kNN 

Cleveland  53.44  53.44  58.46  49.75  46.85*  55.73 
Ecoli  80.57  80.57  86.85v  86.55v  77.95  86.20v 
Glass  73.54  73.54  68.95  68.57  71.70  63.23* 
Handwritten  91.13  91.13  91.37  91.40  89.94*  90.18 
Heart  76.63  76.63  82.19v  66.11*  66.00*  81.30 
Liver  62.81  62.81  66.26  69.52  62.37  61.25 
Multifeat  97.57  97.57  97.95  94.34*  96.96  97.88 
Olitos  78.67  78.67  80.75  63.25*  67.58*  81.50 
Pageblock  96.04  96.04  95.99  95.94  96.53  95.19* 
Satellite  90.92  90.92  90.99  90.73  90.89  90.30 
Sonar  85.25  85.25  79.38*  73.21*  85.06  75.25 
Water2  84.38  84.38  85.15  77.97*  79.79*  84.26 
Water3  79.82  79.82  81.28  74.64*  73.21*  80.90 
Waveform  73.77  73.77  81.55v  83.19v  79.71v  80.46v 
Wine  97.47  97.47  97.14  96.40  95.62  96.07 
Wisconsin  96.38  96.38  96.69  97.20  96.00  96.92 
Summary  (v//*)  (0/16/0)  (3/12/1)  (2/8/6)  (1/9/6)  (2/11/3) 
Classification accuracy: MFNN_A versus others
Dataset  MFNN_A  FRNN  VQNN  FNN  FRNNO  kNN 

Cleveland  58.46  53.44  58.46  49.75*  46.85*  55.73 
Ecoli  86.85  80.57*  86.85  86.55  77.95  86.20 
Glass  68.95  73.54  68.95  68.57  71.70  62.23* 
Handwritten  91.37  91.13  91.37  91.40  89.94*  90.18* 
Heart  82.19  76.63*  82.19  66.11*  66.00*  81.30 
Liver  66.26  62.81  66.26  69.52  62.37  61.25* 
Multifeat  97.95  97.57  97.95  94.34*  96.96*  97.88 
Olitos  80.75  78.67  80.75  63.25*  67.58*  81.50 
Pageblock  95.99  96.04  95.99  95.94  96.53  95.19* 
Satellite  90.99  90.92  90.99  90.73  90.89  90.30* 
Sonar  79.38  85.25v  79.38  73.21*  85.06v  75.25 
Water2  85.15  84.38  85.15  77.97*  79.79*  84.26 
Water3  81.28  79.82  81.28  74.64*  73.21*  80.90 
Waveform  81.55  73.77*  81.55  83.19v  79.71*  80.46 
Wine  97.14  97.47  97.14  96.40  95.62  96.07 
Wisconsin  96.69  96.38  96.69  97.20  96.00  96.92 
Summary  (v//*)  (1/12/3)  (0/16/0)  (1/8/7)  (1/7/8)  (0/11/5) 
5.2 Influence of number of neighbours
As mentioned previously, when using the Gödel operator and crisp class membership, the classification accuracy of MFNN is not affected by the selection of the value of k. This is not generally the case when other operators are employed. In order to demonstrate this empirically, the impact of different values of k is investigated for MFNN on two datasets, heart and olitos. The Einstein and Addition operators as well as the Gödel Snorm are used to implement MFNN, with the resulting classifiers denoted as MFNN_E, MFNN_A and MFNN_G, respectively. For the investigation described here, k is initially set to \(\mathbb {U}\) (the total number of the objects in the dataset) and then decremented by 1 / 30 of \(\mathbb {U}\) each time, with an extra round for the case when \(k=1\). This results in 31 runs for each of the two datasets. For each value of k, \(10\times 10\)fold crossvalidation is performed. The results for these two datasets are shown in Figs. 1 and 2.
It can be seen that MFNN_G is unaffected by the choice of k. However, MFNN_E and MFNN_A initially exhibit improvement in classification performance, followed by a degradation for both datasets as the value of k decreases. Therefore, the choice of value for k is an important consideration when using an aggregator other than the Gödel operator. Careful offline selection of an appropriate k is necessary before MFNN is applied (unless MFNN_G is to be used). This conforms to the general findings in the kNN literature.
5.3 Comparison with other nearest neighbour methods
Classification accuracy: MFNN_E versus others
Dataset  MFNN_E  FRNN  VQNN  FNN  FRNNO  kNN 

Cleveland  53.64  53.44  58.46  49.75  46.85*  55.73 
Ecoli  81.93  80.57*  86.85v  86.55v  77.95  86.20v 
Glass  74.29  73.54  68.95  68.57  71.70  63.23* 
Handwritten  91.20  91.13  91.37  91.40  89.94*  90.18 
Heart  76.70  76.63  82.19v  66.11*  66.00*  81.30 
Liver  63.07  62.81  66.26  69.52  62.37  61.25 
Multifeat  97.59  97.57  97.95  94.34*  96.96  97.88 
Olitos  78.83  78.67  80.75  63.25*  67.58*  81.50 
Pageblock  96.18  96.04  95.99  95.94  96.53  95.19* 
Satellite  91.52  90.92*  90.99  90.73*  90.89*  90.30* 
Sonar  85.35  85.25  79.38*  73.21*  85.06  75.25* 
Water2  85.21  84.38  85.15  77.97*  79.79*  84.26 
Water3  80.77  79.82  81.28  74.64*  73.21*  80.90 
Waveform  74.97  73.77*  81.55v  83.19v  79.71v  80.46v 
Wine  97.47  97.47  97.14  96.40  95.62  96.07 
Wisconsin  96.38  96.38  96.69  97.20  96.00  96.92 
Summary  (v//*)  (0/13/3)  (3/12/1)  (2/7/7)  (1/8/7)  (2/10/4) 
Comparing the three MFNN classifiers themselves, MFNN_A statistically achieves the best performance, with an exception for the sonar dataset. MFNN_E also performs slightly better than MFNN_G. This is due to the fact that the classification results gained by MFNN_G only rely on one sample. In this case, MFNN_G is sensitive to noisy data, whilst MFNN_A is significantly more robust in the presence of noisy data.
Overall, considering all the experimental results, this method outperforms all of the existing methods. This makes MFNN_A a potentially good candidate for many classification tasks.
5.4 Comparison with the state of the art: use of different aggregators

PART (Witten and Frank 1998, 2000) generates rules by means of repeatedly creating partial decision trees from the data. The algorithm adopts a divideandconquer strategy such that it removes instances already covered by the current ruleset during the learning processing. Essentially, a rule is created by building a pruned tree for the current set of instances; the branch leading to a leaf with the highest coverage is promoted to a classification rule. In this paper, this method is empirically learned with a confident factor of 0.25.

J48 is based on ID3 (Quinlan 1993) and creates decision trees by choosing the most informative features and recursively partitioning a training data table into subtables based on the values of such features. Each node in the tree represents a feature, with the subsequent nodes branching from the possible values of this node according to the current subtable. Partitioning stops when all data items in the subtable have the same classification. A leaf node is then created to represent this classification. In this paper, J48 is set with the pruning confidence threshold \(C=0.25\).

SMO (Smola and Schölkopf 1998) is an algorithm for efficiently solving optimisation problems which arise during the training of a support vector machine (Cortes and Vapnik 1995). It breaks optimisation problems into a series of smallest possible subproblems, which are then resolved analytically. In this paper, SMO is set with \(C=1\), tolerance \(L=0.001\), roundoff error=\(10^{12}\), data running on normalised and polynomial kernel.

NB (Naive Bayes) (John and Langley 1995) is a simple probabilistic classifier, directly applying Bayes’ theorem (Papoulis 1984) with strong (naive) independence assumptions. Depending on the precise nature of the probability model used, naive Bayesian classifiers can be trained very efficiently in a supervised learning setting. The learning only requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification.
Classification accuracy of MFNN_G
Dataset  MFNN_G  PART  J48  SMO  NB 

Cleveland  53.44  52.44  53.39  58.31  56.06 
Ecoli  80.57  81.79  82.83  83.48  85.50v 
Glass  73.54  69.12  68.08  57.77*  47.70* 
Handwritten  91.13  79.34*  76.13*  93.58v  86.19* 
Heart  76.63  77.33  78.15  83.89v  83.59v 
Liver  62.81  65.25  65.84  57.98  54.89 
Multifeat  97.57  94.68*  94.62*  98.39v  95.27* 
Olitos  78.67  67.00*  65.75*  87.92v  78.50 
Pageblock  96.04  96.93v  96.99v  92.84*  90.01* 
Satellite  90.92  86.63*  86.41*  86.78*  79.59* 
Sonar  85.25  77.40*  73.61*  76.60*  67.71* 
Water2  84.38  83.85  83.18  83.64  69.72* 
Water3  79.82  82.72  81.59  87.21v  85.49v 
Waveform  73.77  77.62v  75.25  86.48v  80.01v 
Wine  97.47  92.24*  93.37  98.70  97.46 
Wisconsin  96.38  95.68  95.44  97.01  96.34 
Summary  (v//*)  (2/8/6)  (1/10/5)  (6/6/4)  (4/5/7) 
Classification accuracy of MFNN_E
Dataset  MFNN_E  PART  J48  SMO  NB 

Cleveland  53.64  52.44  53.39  58.31  56.06 
Ecoli  81.93  81.79  82.83  83.48  85.50 
Glass  74.29  69.12  68.08  57.77*  47.70* 
Handwritten  91.20  79.34*  76.13*  93.58v  86.19* 
Heart  76.70  77.33  78.15  83.89v  83.59v 
Liver  63.07  65.25  65.84  57.98  54.89 
Multifeat  97.59  94.68*  94.62*  98.39v  95.27* 
Olitos  78.83  67.00*  65.75*  87.92v  78.50 
Pageblock  96.18  96.93v  96.99v  92.84*  90.01* 
Satellite  91.52  86.63*  86.41*  86.78*  79.59* 
Sonar  85.35  77.40*  73.61*  76.60*  67.71* 
Water2  85.21  83.85  83.18  83.64  69.72* 
Water3  80.77  82.72  81.59  87.21v  85.49v 
Waveform  74.97  77.62v  75.25  86.48v  80.01v 
Wine  97.47  92.24*  93.37  98.70  97.46 
Wisconsin  96.38  95.68  95.44  97.01  96.34 
Summary  (v//*)  (2/8/6)  (1/10/5)  (6/6/4)  (3/6/7) 
Classification accuracy of MFNN_A
Dataset  MFNN_A  PART  J48  SMO  NB 

Cleveland  58.46  52.44*  53.39  58.31  56.06 
Ecoli  86.85  81.79*  82.83*  83.48  85.50 
Glass  68.95  69.12  68.08  57.77*  47.70* 
Handwritten  91.37  79.34*  76.13*  93.58v  86.19* 
Heart  82.19  77.33  78.15  83.89  83.59 
Liver  66.26  65.25  65.84  57.98*  54.89* 
Multifeat  97.95  94.68*  94.62*  98.39  95.27* 
Olitos  80.75  67.00*  65.75*  87.92v  78.50 
Pageblock  95.99  96.93v  96.99v  92.84*  90.01* 
Satellite  90.99  86.63*  86.41*  86.78*  79.59* 
Sonar  79.38  77.40  73.61  76.60  67.71* 
Water2  85.15  83.85  83.18  83.64  69.72* 
Water3  81.28  82.72  81.59  87.21v  85.49v 
Waveform  81.55  77.62*  75.25*  86.48v  80.01 
Wine  97.14  92.24*  93.37  98.70  97.46 
Wisconsin  96.69  95.68  95.44  97.01  96.34 
Summary  (v//*)  (1/7/8)  (1/9/6)  (4/8/4)  (1/7/8) 
Classification accuracy of MFNN_AW
Dataset  MFNN_AW  PART  J48  SMO  NB 

Cleveland  58.46  52.44*  53.39  58.31  56.06 
Ecoli  87.12  81.79*  82.83*  83.48  85.50 
Glass  66.76  69.12  68.08  57.77*  47.70* 
Handwritten  91.32  79.34*  76.13*  93.58v  86.19* 
Heart  82.78  77.33  78.15  83.89  83.59 
Liver  64.65  65.25  65.84  57.98*  54.89* 
Multifeat  98.05  94.68*  94.62*  98.39  95.27* 
Olitos  82.42  67.00*  65.75*  87.92  78.50 
Pageblock  96.40  96.93v  96.99v  92.84*  90.01* 
Satellite  90.76  86.63*  86.41*  86.78*  79.59* 
Sonar  81.31  77.40  73.61*  76.60  67.71* 
Water2  85.90  83.85  83.18  83.64  69.72* 
Water3  81.15  82.72  81.59  87.21v  85.49v 
Waveform  81.09  77.62*  75.25*  86.48v  80.01 
Wine  96.53  92.24  93.37  98.70  97.46 
Wisconsin  96.66  95.68  95.44  97.01  96.34 
Summary  (v//*)  (1/8/7)  (1/8/7)  (3/9/4)  (1/7/8) 
It can be seen from these results that in general, all three implemented MFNN methods perform well. In particular, even considering MFNN_G, the least performer amongst the three, for the glass, satellite, sonar, and water 2 datasets, it achieves statistically better classification performance against all the other types of classifier. Only for the ecoli and waveform datasets, MFNN_G does not perform so well as it does on the other datasets.
Amongst the three MFNN implementations, MFNN_A is again the best performer. It is able to generally improve the classification accuracies achievable by both MFNN_G and MFNN_E, for the cleveland, ecoli, heart, liver, waveform datasets. However, from a statistical perspective, its performance is similar to the other two overall. Nevertheless, in terms of accuracy, MFNN_A is statistically better than PART, J48 and NB, respectively, for 8, 6 and 8 datasets, with an equal statistical performance to that of SMO.
5.5 Comparison with the state of the art: use of different similarity metrics
It was mentioned previously that the MFNN approach offers significant flexibility as it allows the use of different similarity metrics and aggregation operators. To provide a more comprehensive view of the performance of MFNN, this section investigates the effect of employing different similarity metrics. In particular, kernelbased fuzzy similarity metrics are employed in this section (Qu et al. 2015). Such similarity metrics are induced by the stationary kernel functions and robustness in statistics. Specifically, as the overall best, MFNN_A is modified to use either the wave kernel function or the rational quadratic kernel function. The resultant classifiers are denoted by MFNN_AW and MFNN_AR, respectively.
Classification accuracy of MFNN_AR
Dataset  MFNN_AR  PART  J48  SMO  NB 

Cleveland  57.21  52.44  53.39  58.31  56.06 
Ecoli  87.68  81.79*  82.83*  83.48*  85.50 
Glass  74.98  69.12  68.08  57.77*  47.70* 
Handwritten  91.29  79.34*  76.13*  93.58v  86.19* 
Heart  81.93  77.33  78.15  83.89  83.59 
Liver  64.88  65.25  65.84  57.98*  54.89* 
Multifeat  97.75  94.68*  94.62*  98.39v  95.27* 
Olitos  80.67  67.00*  65.75*  87.92v  78.50 
Pageblock  96.09  96.93v  96.99v  92.84*  90.01* 
Satellite  90.01  86.63*  86.41*  86.78*  79.59* 
Sonar  77.02  77.40  73.61  76.60  67.71* 
Water2  85.03  83.85  83.18  83.64  69.72* 
Water3  80.82  82.72  81.59  87.21v  85.49v 
Waveform  80.14  77.62*  75.25*  86.48v  80.01 
Wine  95.97  92.24  93.37  98.70  97.46 
Wisconsin  96.35  95.68  95.44  97.01  96.34 
Summary  (v//*)  (1/9/6)  (1/9/6)  (5/6/5)  (1/7/8) 
In summary, examining all of the results obtained, it has been experimentally shown that when a kernelbased fuzzy similarity metric is employed, MFNN offers a better and more robust performance than the other classifiers.
6 Conclusion
This paper has presented a multifunctional nearestneighbour approach (MFNN). In this work, the combination of fuzzy similarities and class memberships may be performed using different aggregators. Such an aggregated similarity measure is then employed as the decision qualifier for the process of decisionmaking in classification.
The wide and variable choice of the aggregators and fuzzy similarity metrics ensures that the proposed approach has a high flexibility and is of significant generality. For example, using appropriate similarity relations, aggregators and class membership functions, MFNN can perform the tasks of classical kNN and FNN. Such construction helps to ensure that the resulting MFNN is adaptive in dealing with different classification problems given a wide range of choices. Furthermore, using the Gödel Snorm and Addition operator, the resulting specific MFNN implementations (with crisp membership) have the ability to achieve the same classification accuracy as two advanced fuzzyrough set methods: fuzzyrough nearest neighbour (FRNN) and vaguely quantified nearest neighbours (VQNN). That is, both traditional nearestneighbour methods and advanced fuzzyroughbased classifiers can be seen as special cases of MFNN. This observation indicates that the MFNN algorithm grants a flexible framework to the existing nearestneighbour classification methods. These results are proven by theoretical analysis, supported with empirical results.
To demonstrate the efficacy of the MFNN approach, systematic experiments have been carried out from the perspective of classification accuracy. The results of the experimental evaluation have been very promising. They demonstrate that implemented with specific aggregators, particularly whilst employing the Addition operator, MFNN can generally outperform a range of stateoftheart learning classifiers in terms of these performance indicators.
Topics for further research include a more comprehensive study of how the proposed approach would perform in regression or other prediction tasks, where the decision variables are not crisp. Also, recently, a proposal has been made to develop techniques for efficient information aggregation and unsupervised feature selection, which exploits the concept of nearestneighbourbased data reliability (Boongoen and Shen 2010; Boongoen et al. 2011). An investigation into how the present work could be used to perform such tasks or perhaps (semi)supervised feature selection (Jensen and Shen 2008) remains active research.
Notes
Acknowledgements
This work was jointly supported by the National Natural Science Foundation of China (No. 61502068) and the China Postdoctoral Science Foundation (Nos. 2013M541213 and 2015T80239). The authors would also like to thank the financial support provided by Aberystwyth University and the colleagues in the Advanced Reasoning Group with the Department of Computer Science, Institute of Mathematics, Physics and Computer Science at Aberystwyth University, UK.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
References
 Armanino C, Leardia R, Lanteria S, Modi G (1989) Chemometric analysis of tuscan olive oils. Chemom Intell Lab Syst 5:343–354CrossRefGoogle Scholar
 Baets BD, Mesiar R (1998) Tpartitions. Fuzzy Sets Syst 97:211–223CrossRefMATHGoogle Scholar
 Baets BD, Mesiar R (2002) Metrics and Tequalities. J Math Anal Appl 267:531–547MathSciNetCrossRefMATHGoogle Scholar
 Bengio Y, Grandvalet Y (2005) Bias in estimating the variance of Kfold crossvalidation. In: Duchesne P, RÉMillard B (eds) Statistical modeling and analysis for complex data problems. Springer, Baston, pp 75–95Google Scholar
 Blake CL, Merz CJ (1998) UCI repository of machine learning databases. University of California, School of Information and Computer Sciences, IrvineGoogle Scholar
 Boongoen T, Shang C, IamOn N, Shen Q (2011) Nearestneighbor guided evaluation of data reliability and its applications. IEEE Trans Syst Man Cybern Part B Cybern 41:1705–1714CrossRefGoogle Scholar
 Borkowski L (ed) (1970) Selected works by Jan Łukasiewicz. NorthHolland Publishing Co., AmsterdamGoogle Scholar
 Boongoen T, Shen Q (2010) Nearestneighbor guided evaluation of data reliability and its applications. IEEE Trans Syst Man Cybern Part B Cybern 40:1622–1633CrossRefGoogle Scholar
 Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth & Brooks, Monterey, CAGoogle Scholar
 Cornelis C, Cock MD, Radzikowska A (2007) Vaguely quantified rough sets. In: Lecture notes in artificial intelligence, vol 4482. pp 87–94Google Scholar
 Cortes C, Vapnik V (1995) Supportvector networks. Mach Learn 20:273–297MATHGoogle Scholar
 Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13:21–27CrossRefMATHGoogle Scholar
 Daelemans W, den Bosch AV (2005) Memorybased language processing. Cambridge University Press, CambridgeCrossRefGoogle Scholar
 Das M, Chakraborty MK, Ghoshal TK (1998) Fuzzy tolerance relation, fuzzy tolerance space and basis. Fuzzy Sets Syst 97:361–369MathSciNetCrossRefMATHGoogle Scholar
 Dienes SP (1949) On an implication function in manyvalued systems of logic. J Symb Logic 14:95–97MathSciNetCrossRefMATHGoogle Scholar
 Dubois D, Prade H (1992) Putting rough sets and fuzzy sets together. Intelligent decision support, Springer, Dordrecht, pp 203–232Google Scholar
 Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, HobokenMATHGoogle Scholar
 Jensen R, Cornelis C (2011) A new approach to fuzzyrough nearest neighbour classification. In: Transactions on rough sets XIII, LNCS, vol 6499. pp 56–72Google Scholar
 Jensen R, Shen Q (2008) Computational intelligence and feature selection: rough and fuzzy approaches. Wiley, IndianapolisGoogle Scholar
 Jensen R, Shen Q (2009) New approaches to fuzzyrough feature selection. IEEE Trans Fuzzy Syst 17:824–838CrossRefGoogle Scholar
 John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Eleventh conference on uncertainty in artificial intelligence, pp 338–345Google Scholar
 Keller JM, Gray MR, Givens JA (1985) A fuzzy knearest neighbor algorithm. IEEE Trans Syst Man Cybern 15:580–585CrossRefGoogle Scholar
 Kleene SC (1952) Introduction to metamathematics. Van Nostrand, New YorkMATHGoogle Scholar
 Kolmogorov AN (1950) Foundations of the theory of probability. Chelsea Publishing Co., ChelseaMATHGoogle Scholar
 Papoulis A (1984) Probability, random variables, and stochastic processes, 2nd edn. McGrawHill, New YorkGoogle Scholar
 Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer Academic Publishing, NorwellCrossRefMATHGoogle Scholar
 Qu Y, Shang C, Shen Q, Mac Parthaláin N, Wu W (2015) Kernelbased fuzzyrough nearestneighbour classification for mammographic risk analysis. Int J Fuzzy Syst 17:471–483Google Scholar
 Quinlan JR (1993) C4.5: Programs for machine learning., The Morgan Kaufmann series in machine learningMorgan Kaufmann, BurlingtonGoogle Scholar
 Radzikowska AM, Kerre EE (2002) A comparative study of fuzzy rough sets. Fuzzy Sets Syst 126:137–155MathSciNetCrossRefMATHGoogle Scholar
 Sarkar M (2007) Fuzzyrough nearest neighbors algorithm. Fuzzy Sets Syst 158:2123–2152MathSciNetCrossRefGoogle Scholar
 Smola AJ, Schölkopf B (1998) A tutorial on support vector regression. NeuroCOLT2 Technical Report Series, NC2TR1998030Google Scholar
 Witten IH, Frank E (1998) Generating accurate rule sets without global optimisation. In: Proceedings of the 15th international conference on machine learning, San Francisco. Morgan KaufmannGoogle Scholar
 Witten IH, Frank E (2000) Data mining: practical machine learning tools with Java implementations. Morgan Kaufmann, BurlingtonGoogle Scholar
 Yager RR (1988) On ordered weighted averaging aggregation operators in multicriteria decision making. IEEE Trans Syst Man Cybern 18:183–190CrossRefMATHGoogle Scholar
 Yao YY (1998) A comparative study of fuzzy sets and rough sets. Inf Sci 109:227–242MathSciNetCrossRefMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.