A regression model based on the nearest centroid neighborhood

  • V. García
  • J. S. Sánchez
  • A. I. Marqués
  • R. Martínez-Peláez
Original Article


The renowned k-nearest neighbor decision rule is widely used for classification tasks, where the label of any new sample is estimated based on a similarity criterion defined by an appropriate distance function. It has also been used successfully for regression problems where the purpose is to predict a continuous numeric label. However, some alternative neighborhood definitions, such as the surrounding neighborhood, have considered that the neighbors should fulfill not only the proximity property, but also a spatial location criterion. In this paper, we explore the use of the k-nearest centroid neighbor rule, which is based on the concept of surrounding neighborhood, for regression problems. Two support vector regression models were executed as reference. Experimentation over a wide collection of real-world data sets and using fifteen odd different values of k demonstrates that the regression algorithm based on the surrounding neighborhood significantly outperforms the traditional k-nearest neighborhood method and also a support vector regression model with a RBF kernel.


Nearest neighborhood Regression analysis Surrounding neighborhood Symmetry criterion 



This work has partially been supported by the Generalitat Valenciana under Grant [PROMETEOII/2014/062] and the Spanish Ministry of Economy, Industry and Competitiveness under Grant [TIN2013-46522-P].


  1. 1.
    Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult-Valued Log Soft Comput 17:255–287Google Scholar
  2. 2.
    Biau G, Devroye L, Dujimovič V, Krzyzak A (2012) An affine invariant -nearest neighbor regression estimate. J Multivar Anal 112:24–34MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Buza K, Nanopoulos A, Nagy G (2015) Nearest neighbor regression in the presence of bad hubs. Knowl-Based Syst 86:250–260CrossRefGoogle Scholar
  4. 4.
    Caruana R, Niculescu-Mizil A (2004) Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: 10th ACM SIGKDD international conference on knowledge discovery and data mining. New York, pp 69–78Google Scholar
  5. 5.
    Chaudhuri B (1996) A new definition of neighborhood of a point in multi-dimensional space. Pattern Recognit Lett 17(1):11–17MathSciNetCrossRefGoogle Scholar
  6. 6.
    Cheng CB, Lee E (1999) Nonparametric fuzzy regression \(k\)-nn and kernel smoothing techniques. Comput Math Appl 38(3–4):239–251MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Dasarathy B (1990) Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press, Los AlomitosGoogle Scholar
  8. 8.
    Dell’Acqua P, Belloti F, Berta R, Gloria AD (2015) Time-aware multivariate nearest neighbor regression methods for traffic flow prediction. IEEE Trans Intell Transp Syst 16(6):3393–3402CrossRefGoogle Scholar
  9. 9.
    Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetzbMATHGoogle Scholar
  10. 10.
    Eronen AJ, Klapuri AP (2010) Music tempo estimation with \(k\)-nn regression. IEEE Trans Audio Speech Lang Process 18(1):50–57CrossRefGoogle Scholar
  11. 11.
    García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 130:2044–2064CrossRefGoogle Scholar
  12. 12.
    García V, Sánchez JS, Martín-Félez R, Mollineda RA (2012) Surrounding neighborhood-based SMOTE for learning from imbalanced data sets. Prog Artif Intell 1(4):347–362CrossRefGoogle Scholar
  13. 13.
    Guyader A, Hengartner N (2013) On the mutual nearest neighbors estimate in regression. J Mach Learn Res 14:2361–2376MathSciNetzbMATHGoogle Scholar
  14. 14.
    Hu C, Jain G, Zhang P, Schmidt C, Gomadam P, Gorka T (2014) Data-driven method based on particle swarm optimization and k-nearest neighbor regression for estimating capacity of lithium-ion battery. Appl Energy 129:49–55CrossRefGoogle Scholar
  15. 15.
    Lee SY, Kang P, Cho S (2014) Probabilistic local reconstruction for \(k\)-nn regression and its application to virtual metrology in semiconductor manufacturing. Neurocomputing 131:427–439CrossRefGoogle Scholar
  16. 16.
    Leon F, Popescu E (2017) Using large margin nearest neighbor regression algorithm to predict student grades based on social media traces. In: International conference in methodologies and intelligent systems for technology enhanced learning. Porto, Portugal, pp 12–19Google Scholar
  17. 17.
    Mack YP (1981) Local properties of \(k\)-nn regression estimates. SIAM J Algebr Discrete Methods 2(3):311–323MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Ounpraseuth S, Lensing SY, Spencer HJ, Kodell RL (2012) Estimating misclassification error: a closer look at cross-validation based methods. BMC Res Notes 5(656):1–12Google Scholar
  19. 19.
    Sánchez JS, Marqués AI (2002) Enhanced neighbourhood specifications for pattern classification. In: Pattern recognition and string matching, pp 673–702Google Scholar
  20. 20.
    Sánchez JS, Pla F, Ferri FJ (1998) Improving the k-NCN classification rule through heuristic modifications. Pattern Recognit Lett 19(13):1165–1170CrossRefzbMATHGoogle Scholar
  21. 21.
    Shevade SK, Keerthi SS, Bhattacharyya C, Murthy KRK (2000) Improvements to the SMO algorithm for SVM regression. IEEE Trans Neural Netw 11(5):1188–1193CrossRefzbMATHGoogle Scholar
  22. 22.
    Song Y, Liang J, Lu J, Zhao X (2017) An efficient instance selection algorithm for \(k\) nearest neighbor regression. Neurocomputing 251(16):26–34CrossRefGoogle Scholar
  23. 23.
    Treiber N, Kramer O (2015) Evolutionary feature weighting for wind power prediction with nearest neighbor regression. In: IEEE congress on evolutionary computation. Sendai, Japan, pp 332–337Google Scholar
  24. 24.
    Xiao Y, Griffin MP, Lake DE, Moorman JR (2010) Nearest-neighbor and logistic regression analyses of clinical and heart rate characteristics in the early diagnosis of neonatal sepsis. Med Decis Making 30(2):258–266CrossRefGoogle Scholar
  25. 25.
    Yang S, Zhao C (2006) Regression nearest neighbor in face recognition. In: 18th International conference on pattern recognition. Hong Kong, China, pp 515–518Google Scholar
  26. 26.
    Yao Z, Ruzo W (2006) A regression-based k nearest neighbor algorithm for gene function prediction from heterogeneous data. BMC Bioinform 7(1):1–11CrossRefGoogle Scholar
  27. 27.
    Yu J, Hong C (2017) Exemplar-based 3D human pose estimation with sparse spectral embedding. Neurocomputing 269:82–89CrossRefGoogle Scholar
  28. 28.
    Zhang J, Yim YS, Yang J (1997) Intelligent selection of instances for prediction functions in lazy learning algorithms. Artif Intell Rev 11(1–5):175–191CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.División Multidisciplinaria en Ciudad UniversitariaUniversidad Autónoma de Ciudad JuárezCiudad JuárezMexico
  2. 2.Department of Computer Languages and Systems, Institute of New Imaging TechnologiesUniversitat Jaume ICastelló de la PlanaSpain
  3. 3.Department of Business Administration and MarketingUniversitat Jaume ICastelló de la PlanaSpain
  4. 4.Facultad de Tecnologías de la InformaciónUniversidad De La Salle BajíoLeónMexico

Personalised recommendations