Pattern Analysis and Applications

, Volume 13, Issue 4, pp 367–381 | Cite as

Data pre-processing through reward–punishment editing

Theoretical Advances

Abstract

The nearest neighbor (NN) classifier represents one of the most popular non-parametric classification approaches and has been successfully applied in several pattern recognition problems. The two main limitations of this technique are its computational complexity and its sensitivity to the presence of outliers in the training set. Though the first problem has been partially overcome thanks to the availability of inexpensive memory and high processing speeds, the second one still persists, and several editing and condensing techniques have been proposed, aimed at selecting a proper set of prototypes from the training set. In this work, an editing technique is proposed, based on the idea of rewarding the patterns that contribute to a correct classification and punishing those that provide a wrong one. The analysis is carried out both at local and at global level, by analyzing the training set at different scales. A score is calculated for each pattern, and the patterns whose score is lower than a predefined threshold are edited out. An extensive experimentation has been conducted on several classification problems both to evaluate the efficacy of the proposed technique with respect to other editing approaches and to investigate the advantage of using reward–punishment editing in combination with condensing techniques or as a pre-processing stage when classifiers different from the NN are adopted.

Keywords

Editing Nearest neighbor classifier 

References

  1. 1.
    Barandela R, Gasca E (2000) Decontamination of training samples for supervised pattern recognition methods. In: Proceedings of joint IAPR international workshops SSPR and SPR 2000, pp 621–630Google Scholar
  2. 2.
    Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum Press, New YorkMATHGoogle Scholar
  3. 3.
    Blake CL, Merz CJ (1998) UCI Repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine. http://www.ics.uci.edu/~mlearn/MLRepository.html
  4. 4.
    Cháves E, Figueroa K, Navarro G (2001) A fast algorithm for the all k nearest neighbors problem in general metric spaces. Escuela da Ciencias Fisicas y Matematicas, Universidad Michacana, MoreliaGoogle Scholar
  5. 5.
    Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27MATHCrossRefGoogle Scholar
  6. 6.
    Dasarathy BV, Sanchez JS, Townsend S (2000) Nearest neighbour editing and condensing tools—synergy exploitation. Pattern Anal Appl 3:19–30CrossRefGoogle Scholar
  7. 7.
    Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetGoogle Scholar
  8. 8.
    Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New YorkGoogle Scholar
  9. 9.
    Eick CF, Zeidat N, Vilalta R (2004) Using representative-based clustering for nearest neighbor dataset editing. In: Proceedings of IEEE international conference on data mining, pp 375–378Google Scholar
  10. 10.
    Franco A, Maltoni D, Nanni L (2004) Reward-punishment editing. In: Proceedings of 17th international conference on pattern recognition, vol 4, pp 424–427Google Scholar
  11. 11.
    Gaede V, Gunther O (1998) Multidimensional access methods. ACM Comput Surv 30(2):170–231CrossRefGoogle Scholar
  12. 12.
    García V, Mollineda RA, Sánchez JS (2010) On the k-NN performance in a challenging scenario of imbalance and overlapping. Pattern Anal Appl (in press)Google Scholar
  13. 13.
    Hart PE (1968) The condensed nearest neighbor rule. IEEE Trans Inf Theory 14:515–516CrossRefGoogle Scholar
  14. 14.
    Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844CrossRefGoogle Scholar
  15. 15.
    Kohonen T (2001) Self-organizing maps, 3rd edn. Springer, BerlinGoogle Scholar
  16. 16.
    Koplowitz J, Brown TA (1981) On the relation of performance to editing in nearest neighbor rules. Pattern Recogn 13:251–255CrossRefGoogle Scholar
  17. 17.
    Kuncheva LI (1995) Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recogn Lett 16:809–814CrossRefGoogle Scholar
  18. 18.
    Li Y, Huang J, Zhang W, Zhang X (2005) New prototype selection rule integrated condensing with editing process for the nearest neighbor rules. In: Proceedings of IEEE international conference on industrial technology, pp 950–954Google Scholar
  19. 19.
    Mollineda RA, Ferri FJ, Vidal E (2002) An efficient prototype merging strategy for the condensed 1-NN rule through class-conditional hierarchical clustering. Pattern Recogn 35:2771–2782MATHCrossRefGoogle Scholar
  20. 20.
    Paredes R, Vidal E (2006) Learning prototypes and distances: a prototype reduction technique based on nearest neighbor error minimization. Pattern Recogn 39:180–188MATHCrossRefGoogle Scholar
  21. 21.
    Paredes R, Vidal E (2000) Weighting prototypes, a new editing approach. In: Proceedings of international conference on pattern recognition, vol II, pp 25–28Google Scholar
  22. 22.
    Pedreira C (2006) Learning vector quantization with training data selection. IEEE Trans Pattern Anal Mach Intell 18(1):157–162CrossRefMathSciNetGoogle Scholar
  23. 23.
    Reinhardt A, Hubbard T (1998) Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res 26(9):2230–2236CrossRefGoogle Scholar
  24. 24.
    Riquelme JC, Aguilar-Ruiz JS, Toro M (2003) Finding representative patterns with ordered projections. Pattern Recogn 36:1009–1018CrossRefGoogle Scholar
  25. 25.
    Rögnvaldsson T, You L (2004) Why neural networks should not be used for HIV-1 protease cleavage site prediction. Bioinformatics 20(11):1702–1709CrossRefGoogle Scholar
  26. 26.
    Sánchez JS, Pla F, Ferri FJ (1998) On the use of neighborhood-based non-parametric classifiers. Pattern Recogn Lett 18(11–13):1179–1186Google Scholar
  27. 27.
    Sanchez JS, Barandela R, Marquez AI, Alejo R, Badenas J (2003) Analysis of new techniques to obtain quality training sets. Pattern Recogn Lett 24:1015–1022CrossRefGoogle Scholar
  28. 28.
    Sanchez JS, Mollineda RA, Sotoca JM (2007) An analysis of how training data complexity affects the nearest neighbor classifiers. Pattern Anal Appl 10:189–201CrossRefMathSciNetGoogle Scholar
  29. 29.
    Shakhnarovich G, Darrell T, Indyk P (2006) Nearest-neighbor methods in learning and vision: theory and practice. MIT Press, CambridgeGoogle Scholar
  30. 30.
    Tomek I (1976) An experiment with the edited nearest neighbor. IEEE Trans Syst Man Cybern 6(2):121–126MATHMathSciNetGoogle Scholar
  31. 31.
    Vapnik V (1998) Statistical learning theory. Wiley, New YorkGoogle Scholar
  32. 32.
    Vazquez F, Sanchez JS, Pla F (2005) A stochastic approach to Wilson’s editing algorithm. In: Proceedings of Iberian conference on pattern recognition and image analysis, pp 35–42Google Scholar
  33. 33.
    Watson CI, Wilson CL (1992) NIST Special Database 4, Fingerprint database. U.S. National Institute of Standards and TechnologyGoogle Scholar
  34. 34.
    Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2:408–421Google Scholar
  35. 35.
    Yin XC, Liu CP, Han Z (2005) Feature combination using boosting. Pattern Recogn Lett 26:2195–2205CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2010

Authors and Affiliations

  1. 1.DEIS, IEIIT, Università di BolognaBolognaItaly

Personalised recommendations