Skip to main content
Log in

A regression model based on the nearest centroid neighborhood

  • Original Article
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

The renowned k-nearest neighbor decision rule is widely used for classification tasks, where the label of any new sample is estimated based on a similarity criterion defined by an appropriate distance function. It has also been used successfully for regression problems where the purpose is to predict a continuous numeric label. However, some alternative neighborhood definitions, such as the surrounding neighborhood, have considered that the neighbors should fulfill not only the proximity property, but also a spatial location criterion. In this paper, we explore the use of the k-nearest centroid neighbor rule, which is based on the concept of surrounding neighborhood, for regression problems. Two support vector regression models were executed as reference. Experimentation over a wide collection of real-world data sets and using fifteen odd different values of k demonstrates that the regression algorithm based on the surrounding neighborhood significantly outperforms the traditional k-nearest neighborhood method and also a support vector regression model with a RBF kernel.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult-Valued Log Soft Comput 17:255–287

    Google Scholar 

  2. Biau G, Devroye L, Dujimovič V, Krzyzak A (2012) An affine invariant -nearest neighbor regression estimate. J Multivar Anal 112:24–34

    Article  MathSciNet  Google Scholar 

  3. Buza K, Nanopoulos A, Nagy G (2015) Nearest neighbor regression in the presence of bad hubs. Knowl-Based Syst 86:250–260

    Article  Google Scholar 

  4. Caruana R, Niculescu-Mizil A (2004) Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: 10th ACM SIGKDD international conference on knowledge discovery and data mining. New York, pp 69–78

  5. Chaudhuri B (1996) A new definition of neighborhood of a point in multi-dimensional space. Pattern Recognit Lett 17(1):11–17

    Article  Google Scholar 

  6. Cheng CB, Lee E (1999) Nonparametric fuzzy regression \(k\)-nn and kernel smoothing techniques. Comput Math Appl 38(3–4):239–251

    Article  MathSciNet  Google Scholar 

  7. Dasarathy B (1990) Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press, Los Alomitos

    Google Scholar 

  8. Dell’Acqua P, Belloti F, Berta R, Gloria AD (2015) Time-aware multivariate nearest neighbor regression methods for traffic flow prediction. IEEE Trans Intell Transp Syst 16(6):3393–3402

    Article  Google Scholar 

  9. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  10. Eronen AJ, Klapuri AP (2010) Music tempo estimation with \(k\)-nn regression. IEEE Trans Audio Speech Lang Process 18(1):50–57

    Article  Google Scholar 

  11. García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 130:2044–2064

    Article  Google Scholar 

  12. García V, Sánchez JS, Martín-Félez R, Mollineda RA (2012) Surrounding neighborhood-based SMOTE for learning from imbalanced data sets. Prog Artif Intell 1(4):347–362

    Article  Google Scholar 

  13. Guyader A, Hengartner N (2013) On the mutual nearest neighbors estimate in regression. J Mach Learn Res 14:2361–2376

    MathSciNet  MATH  Google Scholar 

  14. Hu C, Jain G, Zhang P, Schmidt C, Gomadam P, Gorka T (2014) Data-driven method based on particle swarm optimization and k-nearest neighbor regression for estimating capacity of lithium-ion battery. Appl Energy 129:49–55

    Article  Google Scholar 

  15. Lee SY, Kang P, Cho S (2014) Probabilistic local reconstruction for \(k\)-nn regression and its application to virtual metrology in semiconductor manufacturing. Neurocomputing 131:427–439

    Article  Google Scholar 

  16. Leon F, Popescu E (2017) Using large margin nearest neighbor regression algorithm to predict student grades based on social media traces. In: International conference in methodologies and intelligent systems for technology enhanced learning. Porto, Portugal, pp 12–19

    Google Scholar 

  17. Mack YP (1981) Local properties of \(k\)-nn regression estimates. SIAM J Algebr Discrete Methods 2(3):311–323

    Article  MathSciNet  Google Scholar 

  18. Ounpraseuth S, Lensing SY, Spencer HJ, Kodell RL (2012) Estimating misclassification error: a closer look at cross-validation based methods. BMC Res Notes 5(656):1–12

    Google Scholar 

  19. Sánchez JS, Marqués AI (2002) Enhanced neighbourhood specifications for pattern classification. In: Pattern recognition and string matching, pp 673–702

    Chapter  Google Scholar 

  20. Sánchez JS, Pla F, Ferri FJ (1998) Improving the k-NCN classification rule through heuristic modifications. Pattern Recognit Lett 19(13):1165–1170

    Article  Google Scholar 

  21. Shevade SK, Keerthi SS, Bhattacharyya C, Murthy KRK (2000) Improvements to the SMO algorithm for SVM regression. IEEE Trans Neural Netw 11(5):1188–1193

    Article  Google Scholar 

  22. Song Y, Liang J, Lu J, Zhao X (2017) An efficient instance selection algorithm for \(k\) nearest neighbor regression. Neurocomputing 251(16):26–34

    Article  Google Scholar 

  23. Treiber N, Kramer O (2015) Evolutionary feature weighting for wind power prediction with nearest neighbor regression. In: IEEE congress on evolutionary computation. Sendai, Japan, pp 332–337

  24. Xiao Y, Griffin MP, Lake DE, Moorman JR (2010) Nearest-neighbor and logistic regression analyses of clinical and heart rate characteristics in the early diagnosis of neonatal sepsis. Med Decis Making 30(2):258–266

    Article  Google Scholar 

  25. Yang S, Zhao C (2006) Regression nearest neighbor in face recognition. In: 18th International conference on pattern recognition. Hong Kong, China, pp 515–518

  26. Yao Z, Ruzo W (2006) A regression-based k nearest neighbor algorithm for gene function prediction from heterogeneous data. BMC Bioinform 7(1):1–11

    Article  Google Scholar 

  27. Yu J, Hong C (2017) Exemplar-based 3D human pose estimation with sparse spectral embedding. Neurocomputing 269:82–89

    Article  Google Scholar 

  28. Zhang J, Yim YS, Yang J (1997) Intelligent selection of instances for prediction functions in lazy learning algorithms. Artif Intell Rev 11(1–5):175–191

    Article  Google Scholar 

Download references

Acknowledgements

This work has partially been supported by the Generalitat Valenciana under Grant [PROMETEOII/2014/062] and the Spanish Ministry of Economy, Industry and Competitiveness under Grant [TIN2013-46522-P].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. S. Sánchez.

Appendix

Appendix

Tables 8 and 9 report the average RMSE results for all the data sets and for each value of k. In addition, the Friedman’s rankings are given in the last row of each table.

Table 8 Average RMSE results on 31 real regression data sets with k-NNR
Table 9 Average RMSE results on 31 real regression data sets with k-NCNR

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

García, V., Sánchez, J.S., Marqués, A.I. et al. A regression model based on the nearest centroid neighborhood. Pattern Anal Applic 21, 941–951 (2018). https://doi.org/10.1007/s10044-018-0706-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-018-0706-3

Keywords

Navigation