Selecting promising classes from generated data for an efficient multi-class nearest neighbor classification

Calvo-Zaragoza, Jorge; Valero-Mas, Jose J.; Rico-Juan, Juan R.

doi:10.1007/s00500-016-2176-0

Selecting promising classes from generated data for an efficient multi-class nearest neighbor classification

Methodologies and Application
Published: 06 May 2016

Volume 21, pages 6183–6189, (2017)
Cite this article

Soft Computing Aims and scope Submit manuscript

Jorge Calvo-Zaragoza¹,
Jose J. Valero-Mas¹ &
Juan R. Rico-Juan¹

282 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

The nearest neighbor rule is one of the most considered algorithms for supervised learning because of its simplicity and fair performance in most cases. However, this technique has a number of disadvantages, being the low computational efficiency the most prominent one. This paper presents a strategy to overcome this obstacle in multi-class classification tasks. This strategy proposes the use of Prototype Reduction algorithms that are capable of generating a new training set from the original one to try to gather the same information with fewer samples. Over this reduced set, it is estimated which classes are the closest ones to the input sample. These classes are referred to as promising classes. Eventually, classification is performed using the original training set using the nearest neighbor rule but restricted to the promising classes. Our experiments with several datasets and significance tests show that a similar classification accuracy can be obtained compared to using the original training set, with a significantly higher efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Introduction to Machine Learning

References

Angiulli F (2007) Fast nearest neighbor condensation for large data sets classification. IEEE Trans Knowl Data Eng 19(11):1450–1464
Article Google Scholar
Bhatia N (2010) Vandana: survey of nearest neighbor techniques. arXiv preprint arXiv:1007.0085
Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
MATH Google Scholar
Brighton H, Mellish C (1999) On the consistency of information filters for lazy learning algorithms. In: Żytkow JM, Rauch J (eds) Principles of data mining and knowledge discovery, lecture notes in computer science, vol 1704. Springer, Berlin, pp 283–288
Calvo-Zaragoza J, Oncina J (2014) Recognition of pen-based music notation: the HOMUS dataset. In: Proceedings of the 22nd international conference on pattern recognition, ICPR, pp 3038–3043
Calvo-Zaragoza J, Valero-Mas JJ, Rico-Juan JR (2015) Improving kNN multi-label classification in prototype selection scenarios using class proposals. Pattern Recogn 48(5):1608–1622
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 321–357
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27. doi:10.1109/TIT.1967.1053964
Article MATH Google Scholar
Decaestecker C (1997) Finding prototypes for nearest neighbour classification by means of gradient descent and deterministic annealing. Pattern Recogn 30(2):281–288
Article Google Scholar
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MathSciNet MATH Google Scholar
Fernández F, Isasi P (2004) Evolutionary design of nearest prototype classifiers. J Heuristics 10(4):431–454
Article Google Scholar
Garcia S, Derrac J, Cano JR, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435
Article Google Scholar
García S, Luengo J, Herrera F (2015) Data preprocessing in data mining. Springer, Berlin
Book Google Scholar
Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: Advances in intelligent computing. Springer, Berlin, pp 878–887
Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425
Article Google Scholar
Hull J (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal 16(5):550–554
Article Google Scholar
Latecki LJ, Lakämper R, Eckhardt U (2000) Shape descriptors for non-rigid shapes with a single closed contour. In: Proceedings of IEEE conference computer vision and pattern recognition, pp 424–429
LeCun Y, Bottou L, Bengio Y, Haffner P (2001) Gradient-based learning applied to document recognition. In: Intelligent signal processing. IEEE Press, New York, pp 306–351
Mitchell TM (1997) Machine learning. McGraw-Hill, Inc, New York
Nanni L, Lumini A (2011) Prototype reduction techniques: a comparison among different approaches. Expert Syst Appl 38(9):11820–11828. doi:10.1016/j.eswa.2011.03.070
Pekalska E, Duin RPW (2005) The dissimilarity representation for pattern recognition: foundations and applications (machine perception and artificial intelligence). World Scientific Publishing Co., Inc, Singapore
Book MATH Google Scholar
Rico-Juan JR, Iñesta JM (2012) New rank methods for reducing the size of the training set using the nearest neighbor rule. Pattern Recogn Lett 33(5):654–660
Article Google Scholar
Sánchez J (2004) High training set size reduction by space partitioning and prototype abstraction. Pattern Recogn 37(7):1561–1564
Article Google Scholar
Triguero I, Derrac J, García S, Herrera F (2012) A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans Syst Man Cybern C 42(1):86–100
Article Google Scholar
Wilson DR, Martinez TR (1997) Instance pruning techniques. In: Proceedings of the fourteenth international conference on machine learning, ICML ’97. Morgan Kaufmann Publishers Inc., San Francisco, pp 403–411

Download references

Acknowledgments

This work has been supported by the Vicerrectorado de Investigación, Desarrollo e Innovación de la Universidad de Alicante through the FPU programme (UAFPU2014–5883), the Spanish Ministerio de Educación, Cultura y Deporte through a FPU Fellowship (Ref. AP2012–0939) and the Spanish Ministerio de Economía y Competitividad through Project TIMuL (No. TIN2013-48152-C2-1-R, supported by UE FEDER funds).

Author information

Authors and Affiliations

Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Carretera San Vicente del Raspeig s/n, 03690, Alicante, Spain
Jorge Calvo-Zaragoza, Jose J. Valero-Mas & Juan R. Rico-Juan

Authors

Jorge Calvo-Zaragoza
View author publications
You can also search for this author in PubMed Google Scholar
Jose J. Valero-Mas
View author publications
You can also search for this author in PubMed Google Scholar
Juan R. Rico-Juan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jorge Calvo-Zaragoza.

Ethics declarations

Conflict of interest

Authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Calvo-Zaragoza, J., Valero-Mas, J.J. & Rico-Juan, J.R. Selecting promising classes from generated data for an efficient multi-class nearest neighbor classification. Soft Comput 21, 6183–6189 (2017). https://doi.org/10.1007/s00500-016-2176-0

Download citation

Published: 06 May 2016
Issue Date: October 2017
DOI: https://doi.org/10.1007/s00500-016-2176-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Selecting promising classes from generated data for an efficient multi-class nearest neighbor classification

Abstract

Access this article

Similar content being viewed by others

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Learning from imbalanced data: open challenges and future directions

Introduction to Machine Learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Selecting promising classes from generated data for an efficient multi-class nearest neighbor classification

Abstract

Access this article

Similar content being viewed by others

Supervised Classification Algorithms in Machine Learning: A Survey and Review

Learning from imbalanced data: open challenges and future directions

Introduction to Machine Learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation