Selecting promising classes from generated data for an efficient multi-class nearest neighbor classification
The nearest neighbor rule is one of the most considered algorithms for supervised learning because of its simplicity and fair performance in most cases. However, this technique has a number of disadvantages, being the low computational efficiency the most prominent one. This paper presents a strategy to overcome this obstacle in multi-class classification tasks. This strategy proposes the use of Prototype Reduction algorithms that are capable of generating a new training set from the original one to try to gather the same information with fewer samples. Over this reduced set, it is estimated which classes are the closest ones to the input sample. These classes are referred to as promising classes. Eventually, classification is performed using the original training set using the nearest neighbor rule but restricted to the promising classes. Our experiments with several datasets and significance tests show that a similar classification accuracy can be obtained compared to using the original training set, with a significantly higher efficiency.
KeywordsNearest neighbor classification Prototype Reduction Promising classes
This work has been supported by the Vicerrectorado de Investigación, Desarrollo e Innovación de la Universidad de Alicante through the FPU programme (UAFPU2014–5883), the Spanish Ministerio de Educación, Cultura y Deporte through a FPU Fellowship (Ref. AP2012–0939) and the Spanish Ministerio de Economía y Competitividad through Project TIMuL (No. TIN2013-48152-C2-1-R, supported by UE FEDER funds).
Compliance with ethical standards
Conflict of interest
Authors declare that they have no conflict of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
- Bhatia N (2010) Vandana: survey of nearest neighbor techniques. arXiv preprint arXiv:1007.0085
- Brighton H, Mellish C (1999) On the consistency of information filters for lazy learning algorithms. In: Żytkow JM, Rauch J (eds) Principles of data mining and knowledge discovery, lecture notes in computer science, vol 1704. Springer, Berlin, pp 283–288Google Scholar
- Calvo-Zaragoza J, Oncina J (2014) Recognition of pen-based music notation: the HOMUS dataset. In: Proceedings of the 22nd international conference on pattern recognition, ICPR, pp 3038–3043Google Scholar
- Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 321–357Google Scholar
- Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: Advances in intelligent computing. Springer, Berlin, pp 878–887Google Scholar
- Latecki LJ, Lakämper R, Eckhardt U (2000) Shape descriptors for non-rigid shapes with a single closed contour. In: Proceedings of IEEE conference computer vision and pattern recognition, pp 424–429Google Scholar
- LeCun Y, Bottou L, Bengio Y, Haffner P (2001) Gradient-based learning applied to document recognition. In: Intelligent signal processing. IEEE Press, New York, pp 306–351Google Scholar
- Mitchell TM (1997) Machine learning. McGraw-Hill, Inc, New YorkGoogle Scholar
- Nanni L, Lumini A (2011) Prototype reduction techniques: a comparison among different approaches. Expert Syst Appl 38(9):11820–11828. doi:10.1016/j.eswa.2011.03.070
- Wilson DR, Martinez TR (1997) Instance pruning techniques. In: Proceedings of the fourteenth international conference on machine learning, ICML ’97. Morgan Kaufmann Publishers Inc., San Francisco, pp 403–411Google Scholar