Abstract
The renowned k-nearest neighbor decision rule is widely used for classification tasks, where the label of any new sample is estimated based on a similarity criterion defined by an appropriate distance function. It has also been used successfully for regression problems where the purpose is to predict a continuous numeric label. However, some alternative neighborhood definitions, such as the surrounding neighborhood, have considered that the neighbors should fulfill not only the proximity property, but also a spatial location criterion. In this paper, we explore the use of the k-nearest centroid neighbor rule, which is based on the concept of surrounding neighborhood, for regression problems. Two support vector regression models were executed as reference. Experimentation over a wide collection of real-world data sets and using fifteen odd different values of k demonstrates that the regression algorithm based on the surrounding neighborhood significantly outperforms the traditional k-nearest neighborhood method and also a support vector regression model with a RBF kernel.
Similar content being viewed by others
References
Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult-Valued Log Soft Comput 17:255–287
Biau G, Devroye L, Dujimovič V, Krzyzak A (2012) An affine invariant -nearest neighbor regression estimate. J Multivar Anal 112:24–34
Buza K, Nanopoulos A, Nagy G (2015) Nearest neighbor regression in the presence of bad hubs. Knowl-Based Syst 86:250–260
Caruana R, Niculescu-Mizil A (2004) Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: 10th ACM SIGKDD international conference on knowledge discovery and data mining. New York, pp 69–78
Chaudhuri B (1996) A new definition of neighborhood of a point in multi-dimensional space. Pattern Recognit Lett 17(1):11–17
Cheng CB, Lee E (1999) Nonparametric fuzzy regression \(k\)-nn and kernel smoothing techniques. Comput Math Appl 38(3–4):239–251
Dasarathy B (1990) Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press, Los Alomitos
Dell’Acqua P, Belloti F, Berta R, Gloria AD (2015) Time-aware multivariate nearest neighbor regression methods for traffic flow prediction. IEEE Trans Intell Transp Syst 16(6):3393–3402
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Eronen AJ, Klapuri AP (2010) Music tempo estimation with \(k\)-nn regression. IEEE Trans Audio Speech Lang Process 18(1):50–57
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 130:2044–2064
García V, Sánchez JS, Martín-Félez R, Mollineda RA (2012) Surrounding neighborhood-based SMOTE for learning from imbalanced data sets. Prog Artif Intell 1(4):347–362
Guyader A, Hengartner N (2013) On the mutual nearest neighbors estimate in regression. J Mach Learn Res 14:2361–2376
Hu C, Jain G, Zhang P, Schmidt C, Gomadam P, Gorka T (2014) Data-driven method based on particle swarm optimization and k-nearest neighbor regression for estimating capacity of lithium-ion battery. Appl Energy 129:49–55
Lee SY, Kang P, Cho S (2014) Probabilistic local reconstruction for \(k\)-nn regression and its application to virtual metrology in semiconductor manufacturing. Neurocomputing 131:427–439
Leon F, Popescu E (2017) Using large margin nearest neighbor regression algorithm to predict student grades based on social media traces. In: International conference in methodologies and intelligent systems for technology enhanced learning. Porto, Portugal, pp 12–19
Mack YP (1981) Local properties of \(k\)-nn regression estimates. SIAM J Algebr Discrete Methods 2(3):311–323
Ounpraseuth S, Lensing SY, Spencer HJ, Kodell RL (2012) Estimating misclassification error: a closer look at cross-validation based methods. BMC Res Notes 5(656):1–12
Sánchez JS, Marqués AI (2002) Enhanced neighbourhood specifications for pattern classification. In: Pattern recognition and string matching, pp 673–702
Sánchez JS, Pla F, Ferri FJ (1998) Improving the k-NCN classification rule through heuristic modifications. Pattern Recognit Lett 19(13):1165–1170
Shevade SK, Keerthi SS, Bhattacharyya C, Murthy KRK (2000) Improvements to the SMO algorithm for SVM regression. IEEE Trans Neural Netw 11(5):1188–1193
Song Y, Liang J, Lu J, Zhao X (2017) An efficient instance selection algorithm for \(k\) nearest neighbor regression. Neurocomputing 251(16):26–34
Treiber N, Kramer O (2015) Evolutionary feature weighting for wind power prediction with nearest neighbor regression. In: IEEE congress on evolutionary computation. Sendai, Japan, pp 332–337
Xiao Y, Griffin MP, Lake DE, Moorman JR (2010) Nearest-neighbor and logistic regression analyses of clinical and heart rate characteristics in the early diagnosis of neonatal sepsis. Med Decis Making 30(2):258–266
Yang S, Zhao C (2006) Regression nearest neighbor in face recognition. In: 18th International conference on pattern recognition. Hong Kong, China, pp 515–518
Yao Z, Ruzo W (2006) A regression-based k nearest neighbor algorithm for gene function prediction from heterogeneous data. BMC Bioinform 7(1):1–11
Yu J, Hong C (2017) Exemplar-based 3D human pose estimation with sparse spectral embedding. Neurocomputing 269:82–89
Zhang J, Yim YS, Yang J (1997) Intelligent selection of instances for prediction functions in lazy learning algorithms. Artif Intell Rev 11(1–5):175–191
Acknowledgements
This work has partially been supported by the Generalitat Valenciana under Grant [PROMETEOII/2014/062] and the Spanish Ministry of Economy, Industry and Competitiveness under Grant [TIN2013-46522-P].
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
García, V., Sánchez, J.S., Marqués, A.I. et al. A regression model based on the nearest centroid neighborhood. Pattern Anal Applic 21, 941–951 (2018). https://doi.org/10.1007/s10044-018-0706-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-018-0706-3