Abstract
The fuzzy knearest neighbor (FKNN) algorithm, one of the most wellknown and effective supervised learning techniques, has often been used in data classification problems but rarely in regression settings. This paper introduces a new, more general fuzzy knearest neighbor regression model. Generalization is based on the usage of the Minkowski distance instead of the usual Euclidean distance. The Euclidean distance is often not the optimal choice for practical problems, and better results can be obtained by generalizing this. Using the Minkowski distance allows the proposed method to obtain more reasonable nearest neighbors to the target sample. Another key advantage of this method is that the nearest neighbors are weighted by fuzzy weights based on their similarity to the target sample, leading to the most accurate prediction through a weighted average. The performance of the proposed method is tested with eight realworld datasets from different fields and benchmarked to the knearest neighbor and three other stateoftheart regression methods. The Manhattan distance and Euclidean distancebased FKNNreg methods are also implemented, and the results are compared. The empirical results show that the proposed Minkowski distancebased fuzzy regression (MdFKNNreg) method outperforms the benchmarks and can be a good algorithm for regression problems. In particular, the MdFKNNreg model gave the significantly lowest overall average root mean square error (0.0769) of all other regression methods used. As a special case of the Minkowski distance, the Manhattan distance yielded the optimal conditions for MdFKNNreg and achieved the best performance for most of the datasets.
Introduction
In machine learning, a regression problem refers to estimating a realvalued continuous response (output) based on the values of one or more input variables. By determining the relationships between output and input variables, a regression method numerically predicts a target value. In the literature, various regression techniques have been introduced for a wide range of machine learning problems. Among them, knearest neighbor regression (KNNreg) (Benedetti 1977; Stone 1977; Turner 1977) has become one of the most widely used regression techniques due to its simplicity and robustness (Buza et al. 2015). This method is an adapted version of the knearest neighbor (KNN) model that was initially introduced by Cover and Hart (1967) for applying classification problems. The main idea of the KNNreg is to predict the output value for a given test sample by averaging the output values of the nearest neighbor samples (Hu et al. 2014).
Though the KNN method has many significant advantages, it intuitively suffers from some weaknesses, for example, giving equal importance to all nearest neighbors (even if some of them are quite far from the test sample) in the classification process. To improve model and alleviate such issues, Keller et al. (1985) introduced the idea of using the degree of membership in the KNN method to propose its fuzzy version, called the fuzzy knearest neighbor (FKNN) classifier. Thanks to its capability of tackling uncertainty issues in the data, the FKNN model has proven promising for classification problems (Chen et al. 2013; Yu et al. 2002) compared to the classical KNN method. Although the FKNN classifier has received much attention in terms of classification, it has received less attention in the context of regression. This motivated us to establish the fuzzy knearest neighbor regression (FKNNreg) model in this research by modifying the original FKNN rule.
Typically, the distance metric is one of the main components of distancebased classifiers such as the KNN and FKNN methods (Rastin et al. 2021). Even though the Euclidean distance is the most common distance metric used in such methods to measure the similarity between two data samples, it is often not optimal for every problem domain (Cai et al. 2020; Nguyen et al. 2016). Several research papers have reported better results with a more general choice of distance metric (Chang et al. 2006; Dettmann et al. 2011; Jenicka and Suruliandi 2011; Kaski et al. 2001; Koloseni et al. 2012, 2013). Besides, the Euclidean distance has several drawbacks. For example, if two data samples have no feature values in common, they might have a shorter distance than the other sample pairs, including the same feature values (Shirkhorshidi et al. 2015). These facts encouraged us to examine the effectiveness of the Minkowski distance in the FKNN rule in the regression setting for low and highdimensional datasets.
The main goal of this study is to introduce the FKNNreg using the Minkowski distance metric and to examine its efficiency. The combination of the Minkowski distance metric and the FKNNreg has not been studied in the literature before. This led us to create the Minkowski distancebased fuzzy knearest neighbor regression (MdFKNNreg) algorithm. A key advantage of this method is that the nearest neighbors are weighted by fuzzy weights considering their similarity to the test sample, leading to the most accurate prediction through a weighted average. Also, utilization of the Minkowski distance allows greater flexibility for obtaining more relevant neighboring samples close to the target sample.
Intuitively, most available regression models (e.g., multiple linear regression [MLR], least absolute shrinkage and selection operator [LASSO] regression) are based on assumptions regarding the distribution of the data. However, it is rarely confirmed that these assumptions apply to realworld problems. That is being said, an interesting fact about the KNNreg methods is that they do not explicitly make any assumptions about the underlying data (Yao and Ruzzo 2006) or model’s components and simply use training data to make predictions. Another advantage is that they are, in general, relatively easy to implement and interpret and can potentially be applied even for nonlinear problems (Hu et al. 2014). Moreover, support vector regression (SVR) is recognized as one of the wellknown methods applied for nonlinear regression problems. However, its utilization is restricted in various disciplines due to the difficulty of selecting suitable parameters for the model (Liu et al. 2013). In this regard, FKNNreg methods could be better alternatives in the regression context, and the proposed new KNNreg method is found to be significant for nonlinear regression problems.
To study the performance of the proposed MdFKNNreg model, we conducted an experiment using realworld data from various applications. We compared the regression performance of the proposed variant with the KNNreg, Lasso, SVR, and multiple linear regression models. In addition, the Manhattan distancebased fuzzy knearest neighbor regression (ManFKNNreg) and Euclidean distancebased fuzzy knearest neighbor regression (EucFKNNreg) methods were also implemented, and the results were compared. To evaluate the regression performance, we used root mean square error (RMSE) and the coefficient of determination (\(R^2\)) values as the evaluation metrics. We also tested whether there was a statistically significant difference between the regression results for the MdFKNNreg and baseline methods.
The main contributions of this paper can be summarized as follows:

(1)
We propose a new regression approach based on the FKNN algorithm.

(2)
We introduce the Minkowski distance into the nearest neighbors search in the proposed algorithm and investigate its efficiency and robustness.

(3)
We demonstrate the performance of the proposed regression model on low and highdimensional realworld data coming from different domains.

(4)
We analyze, compare, and benchmark the regression results of the proposed method with select wellknown stateoftheart regression methods.
The remainder of this paper is organized as follows. Section 2 discusses the background information related to the present study. Section 3 briefly provides the theoretical underpinning of the KNNreg and FKNNreg models and the Minkowski distance measure. Section 4 proposes the MdFKNNreg method. Section 5 introduces the data used and the experiment setting for the proposed method and presents and discusses the empirical results obtained with the proposed method and benchmarks. Section 6 summarizes the main findings and provides concluding remarks.
Related work
The KNNreg model has the potential to tackle linear and nonlinear problems in an effective way (Cai et al. 2020) and performs especially well in a highdimensional space. Accordingly, the growing popularity of the KNNreg method can be seen in various fields, including renewable energy (Hu et al. 2014; Huang and Perry 2016; Zhou et al. 2020), physics research (Durbin et al. 2021), biological studies (Yao and Ruzzo 2006), transportation (Cai et al. 2020; Dell’Acqua et al. 2015), robotics (Chen and Lau 2016), and telecommunication (Adege et al. 2018). In addition, some studies have also employed the KNNreg model with other approaches to develop effective hybrid models for specific applications. For example, Chen and Hao (2017) proposed an integrated framework by employing support vector machine (SVM) and KNNreg for stock market prediction. Salari et al. (2015) also presented a novel hybrid approach with a combination of a genetic algorithm (GA), the KNNreg method, and artificial neural network (ANN) for classification problems. Cheng et al. (2019) utilized the same idea as the KNNreg to introduce a novel approach for missing value imputations. Furthermore, the simplicity and strength of the KNNreg algorithm have encouraged researchers to develop different enhanced variants (for examples, see Buza et al. 2015; Guillen et al. 2010; Nguyen et al. 2016; Song et al. 2017) and to construct mathematical estimations (Biau et al. 2012).
An ideal distance measure must have the ability to precisely detect the similarity between two samples while allowing the researchers to understand how to compare, classify, or cluster those samples. Therefore, such metrics have great potential to influence the outcomes of the models used (Bergamasco and Nunes 2019). Accordingly, some previous studies focused only on which similarity measure best fit the particular situation (for examples, see Rodrigues 2018; Moghtadaiee and Dempster 2015; Huo et al. 2021). The Minkowski distance is the most investigated measure among the frequently applied techniques for measuring the similarity between instances in machine learningbased applications (Bergamasco and Nunes 2019; Cordeiro and Makarenkov 2016; Gueorguieva et al. 2017). The Minkowski distance is the main focus of this research because it offers the opportunity to compute the distance between two instances in several different ways and holds several wellknown distances as special cases, e.g., the Manhattan and Euclidean distances.
The concept of the fuzzy theory, originally introduced by Zadeh (1965), can operate under uncertainty and has advanced in many different ways in various applications (for examples, see Chen et al. 1990; Chen and Hsiao 2005; Chen and Chen 2007; Chen and Chang 2010; Chen et al. 2009; Horng et al. 2005; Zeng et al. 2019). The FKNN classifier (Keller et al. 1985) was derived from fuzzy theory and has been one of the most effective techniques in supervised machine learning tasks. Nikoo et al. (2018) applied the FKNN classifier to a regression application without modifying its original algorithm explicitly (i.e., it was operated as a classification task). However, to the best of our knowledge, no one has attempted to utilize the FKNN model in the regression setting. Thus, the effectiveness of FKNNreg for machine learning applications requires further investigation.
Preliminaries
This section briefly discusses the KNNreg method, the FKNN method, and the Minkowski distance measure.
Knearest neighbor regression
KNNreg (Benedetti 1977; Stone 1977; Turner 1977) is a simple, effective, and robust nonlinear regression method. The basic idea of KNNreg is to predict an output value to a given input sample based on a fixed number (k) of its nearest neighbors found from the inputoutput training samples. The k is a smoothing parameter, and its value controls the adaptability of the KNNreg method (Hu et al. 2014). KNNreg does not require an explicit training step besides the initial dataset’s inputs and outputs, which represent a unique property. The notion of the KNNreg model can be formally defined as follows.
Let \(T=\{(X_i, y_i)\}_{i=1}^N\) be a training dataset with N samples, where \(X_i=\{x_1^i, x_2^i, \ldots , x_m^i\}\in \mathbb {R}^m\) is an input sample i from mdimensional feature space, and its output value (response variable) is \(y_i \in Y\), where \(Y=\{y_1, y_2, \ldots , y_N \}\) denotes the set of output values. For a given new data sample X, the goal is to learn the predictor function h(X) from the training dataset such that \(\hat{y}\approx h(X)\), where \(\hat{y}\) is the estimated value for the output y of X. The KNNreg starts with measuring the distance (d) between the test sample X and each sample \(X_i\) in T. In this case, the Euclidean distance is the most commonly adopted distance metric, and its formulation for the distance between \(X=\{x_1, x_2, \ldots , x_m\}\) and \(X_i\) is presented by Eq. (1).
Next, the set of k nearest neighbors, \(N_X^{k}=\{(X_i, y_i)\}_{i=1}^k\) for X, is found from the reordered training samples in T according to the increasing Euclidean distances. Finally, the output value y for X is estimated by taking the arithmetic mean of the output values (\(y_1, y_2, \ldots , y_k\)) of the nearest neighbors (Song et al. 2017; Biau et al. 2012; Györfi et al. 2002) as follows:
This is based on the assumptions that training samples in the \(N_X^{k}\) have similar output values to h(X) (Kramer 2011) and also that all nearest neighbors in the \(N_X^{k}\) have equal importance in the prediction (Cover and Hart 1967).
Fuzzy knearest neighbor classification method
Unlike the KNN algorithm, the FKNN method uses the unbiased weighing scheme in the decision rule using the distances between the test sample and the nearest neighbor samples. Put it differently, the FKNN model computes a membership to the test sample for each class and makes the class decision according to the highest membership degree (Keller et al. 1985). These fuzzy memberships have excellent potential for accurate predictions (Kumbure et al. 2020). The membership degree of a given new sample X in a class i that is represented by the k nearest neighbors^{Footnote 1} is measured as follows:
where \(q\in (1, +\infty )\) is the fuzzy strength parameter that controls the Euclidean distance \(\Vert XX_j\Vert ^{2}\) between X and \(X_j\) to weigh the contribution of each nearest neighbor to the membership value. Also, \(u_{ij}\) is the membership of the sample \(X_j\) from the training data to the class i among the k nearest neighbors. Two methods are used to measure the \(u_{ij}\): crisp memberships and fuzzy memberships. More details about these methods can be found in the work by Chen et al. (2011).
Minkowski distance
The Minkowski distance measure (also called \(L_p\) norm space) is a class of various distance functions that are formed by the parameter p. For two given samples \(X_i\) and \(X_j\) where \(X_i=\{x_1^i, x_2^i, \ldots , x_m^i\}\in \mathbb {R}^m\) and \(X_j=\{x_1^j, x_2^j, \ldots , x_m^j\}\in \mathbb {R}^m\), the Minkowski distance metric is defined as follows:
From this metric, we can specify different distance functions by changing the value of p. For example, we can obtain the Manhattan distance (also known as the city block distance or \(L_1\) norm) by setting \(p=1\) and the Euclidean distance, also referred to as \(L_2\) norm (see also Eq. (1)) by setting \(p=2\).
Proposed fuzzy knearest neighbor regression model using Minkowski distance
In this research, we focus on the fuzzy knearest neighbor regression. Given this, we define the FKNN method for regression together with the Minkowski distance. In this way, the novel regression method, MdFKNNreg, is introduced. This method aims to achieve a reliable prediction for the predictor function by allowing the Minkowski distance to be adapted to the particular context with the optimal conditions. The procedure of the MdFKNNreg method mainly includes four steps: measuring the distances, recognizing the nearest neighbors, computing the fuzzy weights, and making the prediction. The detailed process of this method is presented using the same notations in Sect. 3.1 as follows.
Step 1: Determine the Minkowski distance \(d_{Md}(X, X_j)\) between X and \(X_i\) in T according to: \(d_{Md}(X, X_j) = \big (\sum _{t=1}^{m}\vert x_{t} x_{t}^j\vert ^p\big )^{1/p}\).
Step 2: Find the set of k nearest neighbors \(N_X^{k}\) from the ranked training data samples according to increased Minkowski distances. Here, we used a gridbased search to find the optimal parameter p for the Minkowski distance and k that best fit a particular dataset.
Step 3: Calculate the fuzzy weight (\(w_i\)) for each nearest neighbor j using \(d_{Md}(X, X_j)\) as follows:
where q is a fuzzy strength parameter, and \((\frac{2}{q1})\) indicates the fuzziness exponent. The closer q is to 1, the larger the weights are. For distances over 1 unit, the larger q is, the smaller the weights are.
The purpose of these weights is to define a comprehensive linear predictor for the output value y such that \(h(X)=W^TY\), where \(W=\{w_1, w_2, \ldots , w_k\} \in \mathbb {R}^k\). The weighted value \(w_j\) (\(0\le w_j \le 1\)) of the nearest neighbor \(X_j\) reflects its relative importance to Y.
Step 4: Predict the output value \(\hat{y}\) for X by taking the weighted average (with the fuzzy weights) of the outputs \(y_{j}\) for \(j=1, 2, \ldots , k\) in the \(N_X^k\) according to the following equation:
It is clear that in this method, the Minkowski distance is used not only to find the nearest neighbors but also to measure the weights. Accordingly, the Minkowski distance plays a critical role in the proposed framework. Besides, the MdFKNNreg method is intuitively adaptive to the number of nearest neighbors k and the p of the distance function to vary with iterations throughout its search in a particular situation. This characteristic allows the method to expand the search area to a broader domain. The steps of the MdFKNNreg method discussed above are summarized in Algorithm 1 by introducing a pseudocode to it. In addition, the pseudocode for the grid search method used is presented in Algorithm 2.
In the KNNreg method, the prediction of the output for a new sample is made through a uniform weighting scheme (Cheng 1984). This means it makes the prediction by taking the simple average of the outputs of the nearest neighbor samples and does not consider the distances between the new sample and its k nearest neighbors (i.e., all nearest neighbors have equal influence across the prediction) (Kramer 2011). In contrast, the FKNNreg uses an inverse weighting scheme that assigns higher weights to the closer training samples, allowing them more influence over the prediction. Moreover, the fuzzy strength parameter q controls how heavily the distance is weighted when determining the contribution of each nearest neighbor to the target sample (Keller et al. 1985). For example, when \(q=2\), the contributions of the nearest neighbors are weighted by the reciprocal of their distances to the target sample. Regarding the distance, the adopted distance metric plays a crucial role in achieving the best possible nearest neighbors and weighting them. Accordingly, the Minkowski distance is utilized in the proposed MdFKNN method since it offers a more generalized nature^{Footnote 2} than the Euclidean distance and Manhattan distance. It has shown superior performance with supervised and unsupervised machine learning models compared to other distance measures (for examples, see Aggarwal et al. 2001; Ranmya and Sasikala 2019). Considering the above facts, overall, the proposed MdFKNNreg is expected to produce significantly better results than the KNNreg and FKNNreg methods.
Experiment and results
This section presents the descriptions of the data sets used and the empirical procedures of the experiments conducted to investigate the regression performance of the proposed MdFKNNreg model.
Data description
For our experiment, we selected eight realworld datasets that are freely available at the UCI Machine Learning repository (Dheeru and Taniskidou 2017) and at the Knowledge Extraction based on Evolutionary Learning (KEEL) repository (AlcalaFdez et al. 2011). As summarized in Table 1, each of these datasets holds different characteristics in terms of the number of instances and features. Also, the related area of each of the datasets is provided in the “Domain” in Table 1.
Experimental setting
In each collected dataset, the data samples were divided into \(40\%\) for training, \(40\%\) for validation, and \(20\%\) for testing based on the works of Kumbure et al. (2019, 2020). Before data splitting, all the datasets were normalized into the unit interval of [0, 1] to avoid data differences between small and large ranges. For crossvalidation, we adopted the holdout method (Arlot and Celisse 2010), in which the training and validation datasets were randomly sampled 30 times, and mean performance measures were computed from the results. The examination of the proposed method using the data can be categorized into two phases, training & validation and testing. In the training and validation step, we trained the model by optimizing the parameter values for p in the Minkowski distance and the number of k nearest neighbors. Here, we used mean regression error to determine the optimal parameter values. To find the best possible values for the parameters, we deployed a grid search technique during the training and validation. Then, we evaluated the performance of the model with optimal parameters in the testing phase. The steps of this process are summarized by the flowchart in Fig. 1.
The proposed MdFKNNreg, ManFKNNreg and EucFKNNreg models were implemented using MATLAB 2019b software. The KNNreg was implemented from scratch. The SVR, LASSO, and MLR models were developed using the Statistics and Machine Learning Toolbox in MATLAB. The computer used was an Intel\(^{\circledR }\) Core^{TM} i5 1.8GHz, 16GB RAM with Microsoft Windows 10 operating system.
Baseline models
We compared the performance of the developed MdFKNNreg method with the classical KNNreg, ManFKNNreg, and EucFKNNreg methods. In addition, we also selected three more commonly used regression techniques, namely SVR (Drucker et al. 1997), LASSO regression (Tibshirani 1996; Wang et al. 2018), and MLR (Montgomery et al. 2012). The basic concepts of these methods are briefly presented.
SVR, a variant of SVM (Cortes and Vapnik 1995), is a nonlinear kernelbased regression approach that performs the regression by constructing a hyperplane in a highdimensional space. For a given test sample X, SVR develops a predictor function: \(h(X)=\sum _{i=1}^{N}(\alpha _i  \alpha _i^*) K(X, X_i)+b\) by mapping training samples onto the highdimensional features space. Here, \(\alpha _i\) and \(\alpha _i^*\) are nonzero Lagrange multipliers, b is a bias constant, and K is the kernel function that represents the inner product of X and train sample \(X_i\).
LASSO is a regularizationbased^{Footnote 3} linear regression model. The model is selected to minimize the objective function: \(\sum _{i=1}^{n}(w_0+\sum _{j=1}^{m}w_jx_{j}^iy_i)^2+\lambda \sum _{j=1}^{m}w_j\), where \(\lambda\) is the regularization parameter that is used to control the empirical error. As the \(\lambda\) value increases, LASSO changes more coefficients to zero (Wang et al. 2018).
MLR (also referred to as ordinary least squares regression) is one of the oldest and most frequently employed techniques for analyzing the relationship between the response variable and multiple input variables. The general form of the MLR model can be given by \(h(X_i) = w_0 + \sum _{j=1}^{m}w_jx_j^i + \epsilon\), where \(h(X_i)\) is the predictor function for the sample \(X_i=x_1^i, x_2^i,\ldots , x_m^i\), \(w_0\) is the constant, \(w_j\) is the coefficient for the variable j, and \(\epsilon\) is the error term (\(\sim N(0, \sigma ^2)\)) of the model.
Parameter settings
The detailed parameter settings for the proposed method and benchmarks are presented in this subsection. In the MdFKNNreg algorithm, the value for p of the Minkowski distance was selected from \(\{1, 1.5, \ldots , 5\}\). The number of nearest neighbors k was chosen from the range \(\{1, 2, \ldots , 25\}\) for all nearest neighbor regression approaches. The value of the fuzzy strength parameter m was kept constant at \(m=1.5\) according to Arif et al. (2010) for both the MdFKNNreg and FKNNreg models.
The kernel function is the most critical ingredient in the SVR model (Ali and SmithMiles 2006) because it helps the model achieve robust mapping from training samples to the prediction. Accordingly, we tested the performance of the SVR model using three different kernels: linear, Radial Basis Function (RBF), and polynomial based on Ali and SmithMiles (2006). For the Lasso model, the regularization parameter \(\lambda\) was tuned from the range \(\{0.001, 0.01, \ldots , 100\}\) by following the experiments of Saccoccio et al. (2014). Here, we attempted to create a balance for the \(\lambda\) values due to the fact that low values of \(\lambda\) prefer predictor functions that achieve a small training error while larger values tend to obtain simple prediction functions (Wang et al. 2018). With the multiple regression model, we tested four different model types from the toolbox: linear, interaction^{Footnote 4}, purequadratic^{Footnote 5}, and quadratic^{Footnote 6}. Using these different types of models, we were able to generate more sophisticated MLR versions for competition in the comparison even though highorder terms of the features were not deployed with the nearest neighbor methods. The rest of the parameter values in the SVR, LASSO, and MLR models were set to the default values according to the toolbox specifications.
Evaluation metrics
To evaluate the prediction performance of the proposed regression method and benchmarks, we adopted two frequently applied measures: RMSE and \(R^2\). RMSE computes the square root of the average differences between the model’s predictions and the true values. \(R^2\) is the proportion of the variation in the response variable, which is “explained” by the regression model compared to the mean (KurzKim and Loretan 2014). It is a statistical measure that implies how closely the data points in the response variable fit to the values of the regression model. In general, higher \(R^2\) values and smaller RMSE values reflect better performance in the regression model (Pham 2019). The formulas used for both evaluation methods are defined as follows:
where n is the number of samples in the test data, \(\hat{Y_i}\) indicates the predicted value, \(Y_i\) indicates the true value of the ith test sample, and \(\bar{Y}\) is the average of the true values. As shown in Eq. (8), the percentage values of \(R^2\) are considered.
When it is necessary to apply several models to a particular problem and pick the best one, the usual method is to use several evaluation metrics to measure the models’ performance and select the best model with the highest performance. However, when trying to prove that one model outperforms another for a particular problem, we must use a statistical test of significance and validate the claim of improved performance (Borovicka et al. 2012).
Following Chen et al. (2011), we adopted the paired t test, one of the most commonly used statistical tests in machine learning. This analysis tested the null hypothesis that there is no significant difference between two mean RMSE rates at the 0.05 significant level. Here we considered the error samples from the holdout method (size of \(30\times 1\)) for each regression model when the optimal parameter values were employed. The standard deviations were also computed.
Results and discussion
In this subsection, we first present the results of the proposed method compared with the baseline methods from the training and validation step. After that, optimal parameter values observed for each model are discussed. Then, the performances of the fitted models for the training and validation data are evaluated with the test datasets.
Results with the validation data samples
Table 2 summarizes the results of all the methods for each of the datasets from the training and validation step. In the table, we report the minimum RMSE and standard deviation (STD) for the RMSE as the performance measures. Notice that these mean RMSEs and standard deviations are the result of the holdout method with 30 repetitions.
From Table 2, it is apparent that the proposed MdFKNNreg method outperformed all benchmarks for six datasets and had the secondbest performance for the rest of the datasets. Also, the standard deviations of the proposed method were reasonable for all cases. In particular, the MdFKNNreg method achieved significantly improved performance compared with the EucFKNNreg method even though the results of the two methods were comparable in some cases, for example, in the cases of Servo and Stock. Additionally, the MdFKNNreg and ManFKNNreg models produced the same results over six datasets. Moreover, the KNNreg and SVR models achieved the lowest mean RMSE results in the cases of Servo and Laser, respectively. Finally, neither the LASSO and nor MLR models offered comparative results for the datasets compared to the other models.
Figure 2 illustrates the \(R^2\)values of each for the optimized regression models from the training and validation step for each dataset. Notice that the \(R^2\) values revealed by the barheights in the graphs refer to the maximum of mean \(R^2\) values obtained from the holdout crossvalidation. The bar graphs in the figure are also displayed so that the blue bar represents the highest \(R^2\) while the other bars indicate the rest of the values. From Fig. 2, one can observe a similar indications about the regression performance of the proposed MdFKNNreg method and benchmarks as from Table 2.
Evaluation of the optimal parameter values
During the training and validation, we evaluated the regression results by tuning the parameters of the MdFKNNreg and benchmark models. Figure 3 displays the impacts of different combinations of the parameters k and p in the MdFKNNreg method on the RMSE (and \(R^2\)) with the Stock dataset. Figure 3 demonstrates how these parameters maintain the good performance of the MdFKNNreg method for the Stock data set.
Corresponding to the results in Table 2, the optimal parameter values observed for the proposed MdFKNNreg and benchmark models are presented in Table 3. The table shows that \(p=1\) (i.e., the Manhattan distance) obtained the best performance with the MdFKNNreg method for the majority of the datasets. We also applied the Manhattan distancebased FKNNreg and received the same results as for the MdFKNNreg method with \(p=1\). This finding is consistent with the implications of Aggarwal et al. (2001), who showed that the Manhattan distance (\(L_1\) norm) is the best option for highdimensional applications. Additionally, it seems that relatively low k values (varying from 1 to 13) are better suited to the original KNNreg method, while high k values (ranging from 2 to 25) work better for its fuzzy versions. Moreover, the RBF kernel appears to be the most suitable for the SVR model because it achieved high performance for all datasets except for Baseball and Servo data. Regarding the LASSO model having the most instances of the leastsquares estimations, we believe that this is because of the lowest \(\lambda =0.001\) shows the optimum. The optimal type of MLR model varied depending on the particular datasets. By taking the parameters and other results together, it is evident that the fuzzy variants have more potential for linear and nonlinear problems. For instance, the case with the Baseball data could be considered a linear problem since both SVR and MLR hold linear parameters, but high performance was achieved with the MdFKNNreg, ManFKNNreg, and EucFKNNreg methods.
Results with the test data samples
This subsection presents the regression results of the MdFKNNreg and baseline models with the testing data samples that were initially split from the original datasets. In this testing phase, we used the optimized parameter values and training data samples stored from the crossvalidation step to evaluate the models’ performances with the previously unseen test data samples.
Table 4 summarizes the results with the mean RMSEs and the standard deviations (STD) of the proposed MdFKNNreg and benchmarks models for the selected datasets. In addition, the average computational time (Com. time, in seconds) of each method in the testing phase is also reported.
The results of Table 4 show that the MdFKNNreg method outperformed all benchmarks for all datasets, verifying the effectiveness of the proposed MdFKNNreg method for regression problems. Compared with the ManFKNNreg results, the MdFKNNreg model achieved somewhat higher accuracy for the Airfoil and Qsar Fish datasets. Additionally, the MdFKNNreg showed significantly better performance than the EucFKNNreg method across almost all datasets. This reveals that introducing the Minkowski distance in the learning part can result in finding more reasonable nearest neighbors, leading to better performance compared to the Manhattan and Euclidean distances. Considering the KNNreg and SVR models, even though these achieved the lowest errors in some cases during the training and validation, the MdFKNNreg method outperformed them on all testing cases. This proves that the proposed MdFKNNreg model has more power to overcome overfitting issues than both KNNreg and SVR models. Moreover, the lowest regression performance for the testing data, similar to the validation data, occurred for both the LASSO and MLR models.
Based on the testing times, it is clear that the computational complexity of the FKNNreg methods was relatively high compared with the other methods used. This might be because of the inclusion of the weight generation process based on the inverse of the distances and the fuzzy strength parameter. Additionally, the proposed MdFKNNreg method required more time (in seconds) than the EucFKNNreg and ManFKNNreg methods to deliver the results since it includes an additional computation with the parameter p.
To validate the test results statistically, a paired t test was applied, and the observed results for the P values and test statistics are presented in Table 5. The t test results demonstrate whether there is a statistically significant difference between mean RMSEs of the compared regression methods. From the evidence in the table, it is apparent that the MdFKNNreg method yielded statistically significantly better performance than the benchmarks in terms of the lowest error. In particular, we observed that the proposed method could not produce statistically significant results for either the Servo dataset (compared with the EucFKNNreg and KNNreg methods) or the Laser dataset (compared with the EucFKNNreg and SVR methods). It should be noted here that we did not find a statistically significant difference between MdFKNNreg and ManFKNNreg results for any dataset. In addition to finding no difference between the test results of MdFKNNreg and ManFKNNreg for six datasets, the t test produced no evidence of a significant difference between these methods for the Airfoil and Qsar Fish cases.
Conclusions
This paper proposed a new generalized regression model based on the FKNN rule and investigated its effectiveness on different regression problems from various domains. The Minkowski distance metric was introduced into the nearest neighbor search in the proposed algorithm to examine how it improves accuracy. Accordingly, the proposed method was named the MdFKNNreg model. The effectiveness of the MdFKNNreg method was evaluated in comparative experiments with the standard nearest neighbor and three wellknown regression methods, namely SVR, LASSO, and MLR. In addition, ManFKNNreg and EucFKNNreg methods were implemented, and their results were compared. For the experiments, we used eight realworld datasets that are freely available from machine learning repositories. The regression performance of each model for those datasets were discussed in terms of the RMSE, \(R^2\) and standard deviation. The results showed that the MdFKNNreg method outperformed the benchmarks and is a suitable choice for regression problems. In our experiments, MdFKNNreg gave the lowest overall average RMSE of 0.0769.
The results of the experiments showed that the proposed MdFKNNreg method achieved statistically significantly higher performance than the other methods in almost all cases in terms of the RMSE means. Additionally, we found that the Minkowski distance with \(p=1\) yielded the optimal MdFKNNreg model, which then achieved the best performance for the majority of the datasets. In other words, the ManFKNNreg showed promising results for the regression at large, supporting the indications in the study by Aggarwal et al. (2001).
However, it should be noted that the computational complexity of the proposed method is relatively high compared with the EucFKNNreg and KNNreg methods because an additional calculation with the Minkowski distance is included in the learning algorithm. Despite that, this research has offered some insight into further investigations. For example, it would be interesting to see how the MdFKNNreg method adapts and performs in regression applications in which KNNreg was previously utilized (e.g., Hu et al. 2014; Cai et al. 2020; Yao and Ruzzo 2006; Huang and Perry 2016; Zhou et al. 2020; Durbin et al. 2021; Dell’Acqua et al. 2015). Furthermore, future research directions may also test the effect of combining MdFKNNreg with other efficient variants, such as SVM (Chen and Hao 2017) and ANN (Salari et al. 2015) in a hybrid framework.
Availability of data and materials
The data used in the manuscript are freely available from the UCI and KEEL repositories.
Code availability
Notes
 1.
The set of nearest neighbors here is defined in terms of a classification problem, which means each nearest neighbor i contains m features values and a class label \(c_i\) (i.e., \(X^i=\{x_1^i, x_2^i, \ldots , x_m^i, c_i\}\)). X is also shaped by similar characteristics.
 2.
This can be defined in terms of the “metric space”: Minkowski metric space \(\sim\) (\(\mathbb {R}^m\), \(\mathbb {L}^p\)), Euclidean space \(\sim\) (\(\mathbb {R}^m\), \(\mathbb {L}^2\)), and Manhattan space \(\sim\) (\(\mathbb {R}^m\), \(\mathbb {L}^1\)).
 3.
In machine learning, regularization is a more sophisticated technique that is used to solve model overfitting problems.
 4.
Includes an intercept, linear term for each variable, and all products of pairs of distinct variables.
 5.
Includes an intercept term and linear and squared terms for each variable.
 6.
Includes a constant term, linear and squared terms of each variable, and all products of pairs of distinct variables.
References
Adege AB, Yayeh Y, Berie G, Lin H, Yen L, Li YR (2018) Indoor localization using knearest neighbor and artificial neural network back propagation algorithms. In: 27th Wireless and Optical Communication Conference (WOCC), pp 1–2
Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional space. Database Theory ICDT 2001. Springer, Berlin, pp 420–434
AlcalaFdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel datamining software tool: data set repository, integration of algorithms and experimental analysis framework. J MultValued Log Soft Comput 17:255–287, https://sci2s.ugr.es/keel/datasets.php
Ali S, SmithMiles KA (2006) A metalearning approach to automatic kernel selection for support vector machines. Neurocomputing 70:173–186
Arif M, Akram MU, Minhas FA (2010) Pruned fuzzy knearest neighbor classifier for beat classification. J Biomed Sci Eng 3:380–3899
Arlot S, Celisse A (2010) A survey of crossvalidation procedures for model selection. Stat Surv 4:40–79
Benedetti JK (1977) On the nonparametric estimation of regression functions. J R Stat Soc Series B 39:248–253
Bergamasco LCC, Nunes FLS (2019) Intelligent retrieval and classification in threedimensional biomedical imagesa systematic mapping. Comput Sci Rev 31:19–38
Biau G, Devroye L, Dujmović V, Krzyżak A (2012) An affine invariant knearest neighbor regression estimate. J Multivar Anal 112:24–34
Borovicka T, Jirina MJ, Kordik P, Jirina M (2012) Selecting representatives data sets Advances in Data Mining Knowledge Discovery and Applications. Rijeka, Croatia, pp 43–70
Buza K, Nanopoulos A, Nagy G (2015) Nearest neighbor regression in the presence of bad hubs. Knowl Based Syst 86:250–260
Cai L, Yu Y, Zhang S, Song Y, Xiong Z, Zhou T (2020) A samplerebalanced outlierrejected \(k\) nearest neighbor regression model for shortterm traffic flow forecasting. IEEE Access 8:22686–22696
Chang H, Yeung DY, Cheung WK (2006) Relaxational metric adaptation and its application to semisupervised clustering and contentbased image retrieval. Pattern Recognit 39:1905–1917
Chen SM, Chang YC (2010) Multivariable fuzzy forecasting based on fuzzy clustering and fuzzy rule interpolation techniques. Inf Sci 180:4772–4783
Chen S, Chen L (2007) A fuzzy hierarchical clustering method for clustering documents based on dynamic cluster centers. J Chin Inst Eng 30:169–172
Chen Y, Hao Y (2017) A feature weighted support vector machine and Knearest neighbor algorithm for stock market indices prediction. Expert Syst Appl 80:340–355. https://doi.org/10.1016/j.eswa.2017.02.044
Chen SM, Hsiao HR (2005) A new method to estimate null values in relational database systems based on automatic clustering techniques. Inf Sci 169:47–69
Chen J, Lau HYK (2016) Learning the inverse kinematics of tendondriven soft manipulators with knearest neighbors regression and gaussian mixture regression. In: 2nd International conference on control, automation and robotics (ICCAR), pp 103–107
Chen HL, Liu DY, Yang B, Wang SJ (2011) An adaptive fuzzy knearest neighbor method based on parallel particle swarm optimization for bankruptcy prediction. In: Lecture Notes in computer science 6634 LNAI (PART 1), pp 249–264
Chen SM, Ke JS, Chang JF (1990) Knowledge representation using fuzzy petri nets. IEEE Trans Knowl Data Eng 2:311–319
Chen SM, Wang NY, Pan JS (2009) Forecasting enrollments using automatic clustering techniques and fuzzy logical relationships. Expert Syst Appl 36:11070–11076
Chen HL, Huang CC, Yu XG, Xu X, Sun X, Wang G, Wang SJ (2013) An efficient diagnosis system for detection of parkinson’s disease using fuzzy knearest neighbor approach. Expert Syst Appl 40(1):263–271
Cheng PE (1984) Strong consistency of nearest neighbor regression function estimators. J Multivar Anal 15:63–72
Cheng CH, Chan CP, Sheu YJ (2019) A novel puritybased k nearest neighbors imputation method and its application in financial distress prediction. Eng Appl Artif Intell 81:283–299
Cordeiro R, Makarenkov V (2016) Applying subclustering and lp distance in weighted kmeans with distributed centroids. Neurocomputing 173:700–707. https://doi.org/10.1016/j.neucom.2015.08.018
Cortes C, Vapnik V (1995) Supportvector networks. Mach Learn 20:273–297
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27
Dell’Acqua P, Bellotti F, Berta R, De Gloria A (2015) Timeaware multivar. nearest neighbor regression methods for traffic flow prediction. IEEE Trans Intell Transp Syst 16:3393–3402
Dettmann E, Becker C, Schmeiser C (2011) Distance functions for matching in small samples. Comput Stat Data Anal 55:1942–1960
Dheeru D, Taniskidou EK (2017) Uci machine learning repository. http://archive.ics.uci.edu/ml
Drucker H, Burges CJC, Kaufman L, Smola A, Vapnik V (1997) Support vector regression machines. Neural Inf Proc Syst 9:155–161
Durbin M, Wonders MA, Flaska M, Lintereur AT (2021) Knearest neighbors regression for the discrimination of gamma rays and neutrons in organic scintillators. Nucl Instrum Methods Phys Res A 987:164826
Gueorguieva N, Valova I, Georgiev G (2017) M&mfcm: Fuzzy cmeans clustering with mahalanobis and minkowski distance metrics. Proc Comput Sci 114:224–233. https://doi.org/10.1016/j.procs.2017.09.064
Guillen A, Herrera LJ, Rubio G, Pomares H, Lendasse A, Rojas I (2010) New method for instance or prototype selection using mutual information in time series prediction. Neurocomputing 73:2030–2038
Györfi L, Kohler M, Krzyzak A, Walk H (2002) A distribution free theory of nonparametric regression. Springer, New York
Horng YJ, Chen SM, Chang YC, Lee CH (2005) A new method for fuzzy information retrieval based on fuzzy hierarchical clustering and fuzzy inference techniques. IEEE Trans Fuzzy Syst 13:216–228
Hu C, Jain G, Zhang P, Schmidt C, Gomadam P, Gorka T (2014) Datadriven method based on particle swarm optimization and knearest neighbor regression for estimating capacity of lithiumion battery. Appl Energy 129:49–55
Huang J, Perry M (2016) A semiempirical approach using gradient boosting and knearest neighbors regression for GEFCom2014 probabilistic solar power forecasting. Int J Forecast 32:1081–1086
Huo J, Ma Y, Lu C, Li C, Duan K, Li H (2021) Mahalanobis distance based similarity regression learning of nirs for quality assurance of tobacco product with different variable selection methods. Spectrochimica Acta A Mol Biomol Spectrosc 251:119364
Jenicka S, Suruliandi A (2011) Empirical evaluation of distance measures for supervised classification of remotely sensed image with modified multivariate local binary pattern. In: International conference on emerging trends in electrical and computer technology (ICETECT), pp 762–767
Kaski S, Sinkkonen J, Peltonen J (2001) Bankruptcy analysis with selforganizing maps in learning metrics. IEEE Trans Neural Netw Learn Syst 12(4):936–947
Keller JM, Gray MR, Givens JA (1985) A fuzzy knearest neighbor algorithm. IEEE Trans Syst Man Cybern Syst 15:580–585
Koloseni D, Lampinen J, Luukka P (2012) Optimized distance metrics for differential evolution based nearest prototype classifier. Expert Syst Appl 39(12):10564–10570
Koloseni D, Lampinen J, Luukka P (2013) Differential evolution based nearest prototype classifier with optimized distance measures for the features in the data set. Expert Syst Appl 40(10):4075–4081
Kramer O (2011) Dimensionality reduction by unsupervised Knearest neighbor regression. In: Proceedings of the 10th International Conference on Machine Learning and Applications, ICMLA, pp 275–278
Kumbure MM, Luukka P, Collan M (2019) An enhancement of fuzzy knearest neighbor classifier using multilocal power means. In: Proceedings of the 11th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT 2019), Atlantis Press, pp 83–90
Kumbure MM, Luukka P, Collan M (2020) A new fuzzy knearest neighbor classifier based on the Bonferroni mean. Pattern Recognit Lett 140:172–178. https://doi.org/10.1016/j.patrec.2020.10.005
KurzKim JR, Loretan M (2014) On the properties of the coefficient of determination in regression models with infinite variance variables. J Econ 181:15–24
Liu X, BeyrendDur D, Dur G, Ban S (2013) Effects of temperature on life history traits of Eodiaptomus japonicus (copepoda: Calanoida) from lake biwa (japan). Limnology 15:85–97
Moghtadaiee V, Dempster AG (2015) Determining the best vector distance measure for use in location fingerprinting. Pervasive Mob Comput 23:59–79. https://doi.org/10.1016/j.pmcj.2014.11.002
Montgomery DC, Peck EA, Vining GG (2012) Introduction to linear regression analysis. John Wiley & Sons, Hoboken
Nguyen B, Morell C, Baets BD (2016) Largescale distance metric learning for knearest neighbors regression. Neurocomputing 214:805–814
Nikoo MR, Kerachian R, Alizadeh MR (2018) A fuzzy knnbased model for significant wave height prediction in large lakes. Oceanologia 60:153–168
Pham H (2019) A new criterion for model selection. Mathematics 7:1215
Ranmya R, Sasikala T (2019) An efficient minkowski distancebased matching with merkle hash tree authentication for biometric recognition in cloud computing. Soft Comput 23:13423–13431
Rastin N, Jahromi MZ, Taheri M (2021) A generalized weighted distance knearest neighbor for multilabel problems. Pattern Recognit 114:107526. https://doi.org/10.1016/j.patcog.2020.107526
Rodrigues EO (2018) Combining minkowski and chebyshev: new distance proposal and survey of distance metrics using knearest neighbours classifier. Pattern Recognit Lett 110:66–71
Saccoccio M, Wan TH, Chen C, Ciucci F (2014) Optimal regularization in distribution of relaxation times applied to electrochemical impedance spectroscopy: ridge and lasso regression methods  a theoretical and experimental study. Electrochim Acta 147:470–482
Salari N, Shohaimi S, Najafi F, Nallappan M, Karishnarajah I (2015) Timeaware multivar. nearest neighbor regression methods for traffic flow prediction. IEEE Trans Intell Transp Syst 16:3393–3402
Shirkhorshidi AS, Aghabozorgi S, Wah TY (2015) A comparison study on similarity and dissimilarity measures in clustering continuous data. PLOS ONE 10(12):1–20
Song Y, Liang J, Lu J, Zhao X (2017) An efficient instance selection algorithm for k nearest neighbor regression. Neurocomputing 251:26–34
Stone CJ (1977) Consistent nonparametric regression. Ann Stat 5:595–645
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 58:267–288
Turner T (1977) Exploratory data analysis. Addison Wesley, Reading
Wang S, Ji B, Zhao J, Liu W, Xu T (2018) Predicting ship fuel consumption based on LASSO regression. Transp Res D Transp 65:817–824
Yao Z, Ruzzo W (2006) A regressionbased k nearest neighbor algorithm for gene function prediction from heterogeneous data. BMC Bioinformatics 7:S11
Yu S, De Backer S, Scheunders P (2002) Genetic feature selection combined with composite fuzzy nearest neighbor classifiers for hyperspectral satellite imagery. Pattern Recognit Lett 23(1):183–190
Zadeh LA (1965) Fuzzy sets. Inf and Control 8:338–353
Zeng S, Chen SM, Teng MO (2019) Fuzzy forecasting based on linear combinations of independent variables, subtractive clustering algorithm and artificial bee colony algorithm. Inf Sci 484:350–366
Zhou Y, Huang M, Pecht M (2020) Remaining useful life estimation of lithiumion cells based on k nearest neighbor regression with differential evolution optimization. J Clean Prod 249:119409. https://doi.org/10.1016/j.jclepro.2019.119409
Funding
Open access funding provided by LUT University (previously Lappeenranta University of Technology (LUT)).
Author information
Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mailagaha Kumbure, M., Luukka, P. A generalized fuzzy knearest neighbor regression model based on Minkowski distance. Granul. Comput. (2021). https://doi.org/10.1007/s4106602100288w
Received:
Accepted:
Published:
Keywords
 Fuzzy knearest neighbor
 Regression
 Minkowski distance
 Machine learning
 Performance measures