Neural Computing and Applications

, Volume 28, Issue 9, pp 2415–2424 | Cite as

Prototype generation on structural data using dissimilarity space representation

  • Jorge Calvo-Zaragoza
  • Jose J. Valero-Mas
  • Juan R. Rico-Juan
IBPRIA 2015

Abstract

Data reduction techniques play a key role in instance-based classification to lower the amount of data to be processed. Among the different existing approaches, prototype selection (PS) and prototype generation (PG) are the most representative ones. These two families differ in the way the reduced set is obtained from the initial one: While the former aims at selecting the most representative elements from the set, the latter creates new data out of it. Although PG is considered to delimit more efficiently decision boundaries, the operations required are not so well defined in scenarios involving structural data such as strings, trees, or graphs. This work studies the possibility of using dissimilarity space (DS) methods as an intermediate process for mapping the initial structural representation to a statistical one, thereby allowing the use of PG methods. A comparative experiment over string data is carried out in which our proposal is faced to PS methods on the original space. Results show that the proposed strategy is able to achieve significantly similar results to PS in the initial space, thus standing as a clear alternative to the classic approach, with some additional advantages derived from the DS representation.

Keywords

kNN classification Prototype generation Structural pattern recognition Dissimilarity space 

Notes

Acknowledgments

This work was partially supported by the Spanish Ministerio de Educación, Cultura y Deporte through a FPU fellowship (AP2012–0939), Vicerrectorado de Investigación, Desarrollo e Innovación de la Universidad de Alicante through FPU program (UAFPU2014–5883), and the Spanish Ministerio de Economía y Competitividad through Project TIMuL (No. TIN2013-48152-C2-1-R supported by EU FEDER funds).

References

  1. 1.
    Abreu J, Rico-Juan JR (2014) A new iterative algorithm for computing a quality approximated median of strings based on edit operations. Pattern Recognit Lett 36:74–80CrossRefGoogle Scholar
  2. 2.
    Angiulli F (2007) Fast nearest neighbor condensation for large data sets classification. IEEE Trans Knowl Data Eng 19(11):1450–1464CrossRefGoogle Scholar
  3. 3.
    Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, SODA ’07Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 1027–1035Google Scholar
  4. 4.
    Borzeshi EZ, Piccardi M, Riesen K, Bunke H (2013) Discriminative prototype selection methods for graph embedding. Pattern Recognit 46(6):1648–1657CrossRefMATHGoogle Scholar
  5. 5.
    Bunke H, Riesen K (2012) Towards the unification of structural and statistical pattern recognition. Pattern Recognit Lett 33(7):811–825CrossRefGoogle Scholar
  6. 6.
    Calvo-Zaragoza J, Oncina J (2014) Recognition of pen-based music notation: the HOMUS dataset. In: Proceedings of the 22nd international conference on pattern recognition, ICPR, pp 3038–3043Google Scholar
  7. 7.
    Calvo-Zaragoza J, Valero-Mas JJ, Rico-Juan JR (2015) Improving kNN multi-label classification in prototype selection scenarios using class proposals. Pattern Recognit 48(5):1608–1622CrossRefGoogle Scholar
  8. 8.
    Calvo-Zaragoza J, Valero-Mas JJ, Rico-Juan JR (2015) Prototype generation on structural data using dissimilarity space representation: a case of study. In: Paredes R, Cardoso JS, Pardo XM (eds) 7th Iberian conference on pattern recognition and image analysis (IbPRIA). Springer, Santiago de Compostela, pp 72–82Google Scholar
  9. 9.
    Cano JR, Herrera F, Lozano M (2006) On the combination of evolutionary algorithms and stratified strategies for training set selection in data mining. Appl Soft Comput 6(3):323–332CrossRefGoogle Scholar
  10. 10.
    Decaestecker C (1997) Finding prototypes for nearest neighbour classification by means of gradient descent and deterministic annealing. Pattern Recognit 30(2):281–288CrossRefGoogle Scholar
  11. 11.
    Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetMATHGoogle Scholar
  12. 12.
    Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New YorkMATHGoogle Scholar
  13. 13.
    Duin RPW, Pekalska E (2012) The dissimilarity space: bridging structural and statistical pattern recognition. Pattern Recognit Lett 33(7):826–832CrossRefGoogle Scholar
  14. 14.
    Eshelman LJ (1990) The CHC adaptive search algorithm: how to have safe search when engaging in nontraditional genetic recombination. In: Proceedings of the first workshop on foundations of genetic algorithms, Indiana, USA, pp 265–283Google Scholar
  15. 15.
    Fernández F, Isasi P (2004) Evolutionary design of nearest prototype classifiers. J Heuristics 10(4):431–454CrossRefGoogle Scholar
  16. 16.
    Ferrer M, Bunke H (2010) An iterative algorithm for approximate median graph computation. In: Pattern recognition (ICPR), 20th international conference on, pp 1562–1565Google Scholar
  17. 17.
    Freeman H (1961) On the encoding of arbitrary geometric configurations. Electron Comput IRE Trans EC-10(2):260–268Google Scholar
  18. 18.
    Garcia S, Derrac J, Cano J, Herrera F (2012) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435CrossRefGoogle Scholar
  19. 19.
    García S, Luengo J, Herrera F (2015) Data preprocessing in data mining. Springer, BerlinCrossRefGoogle Scholar
  20. 20.
    García-Pedrajas N, De Haro-García A (2014) Boosting instance selection algorithms. Knowl Based Syst 67:342–360CrossRefGoogle Scholar
  21. 21.
    Hart P (1968) The condensed nearest neighbor rule (corresp.). IEEE Trans Inform Theory 14(3):515–516CrossRefGoogle Scholar
  22. 22.
    de la Higuera C, Casacuberta F (2000) Topology of strings: median string is NP-complete. Theor Comput Sci 230(1–2):39–48MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    Hjaltason G, Samet H (2003) Properties of embedding methods for similarity searching in metric spaces. Pattern Anal Mach Intell IEEE Trans 25(5):530–549CrossRefGoogle Scholar
  24. 24.
    Hull J (1994) A database for handwritten text recognition research. IEEE Trans Pattern Anal 16(5):550–554CrossRefGoogle Scholar
  25. 25.
    Kotsiantis SB, Kanellopoulos D, Pintelas PE (2007) Data preprocessing for supervised learning. Int J Comput Electr Autom Control Inf Eng 1(12):4091–4096Google Scholar
  26. 26.
    Latecki LJ, Lakmper R, Eckhardt U (2000) Shape descriptors for non-rigid shapes with a single closed contour. In: Proceedings of IEEE conference computer vision and pattern recognition, pp 424–429Google Scholar
  27. 27.
    LeCun Y, Bottou L, Bengio Y, Haffner P (2001) Gradient-based learning applied to document recognition. In: Haykin S, Kosko B (eds) Intelligent signal processing. IEEE Press, Piscataway, NJ, USA, pp 306–351Google Scholar
  28. 28.
    Mitchell TM (1997) Machine learning. McGraw-Hill Inc, NYMATHGoogle Scholar
  29. 29.
    Nanni L, Lumini A (2011) Prototype reduction techniques: a comparison among different approaches. Expert Syst Appl 38(9):11820–11828. doi:10.1016/j.eswa.2011.03.070 CrossRefGoogle Scholar
  30. 30.
    Pekalska E, Duin RPW (2005) The dissimilarity representation for pattern recognition: foundations and applications (machine perception and artificial intelligence). World Scientific Publishing Co., Inc, SingaporeCrossRefMATHGoogle Scholar
  31. 31.
    Rico-Juan JR, Iñesta JM (2012) New rank methods for reducing the size of the training set using the nearest neighbor rule. Pattern Recognit Lett 33(5):654–660CrossRefGoogle Scholar
  32. 32.
    Sánchez J (2004) High training set size reduction by space partitioning and prototype abstraction. Pattern Recognit 37(7):1561–1564CrossRefGoogle Scholar
  33. 33.
    Serrano A, Micó L, Oncina J (2013) Which fast nearest neighbour search algorithm to use? In: Sanches JM, Micó L, Cardoso JS (eds) 6th Iberian conference on pattern recognition and image analysis (IbPRIA). Funchal, Madeira, PortugalGoogle Scholar
  34. 34.
    Triguero I, Derrac J, García S, Herrera F (2012) A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans Syst Man Cybern C 42(1):86–100CrossRefGoogle Scholar
  35. 35.
    Tsai CF, Eberle W, Chu CY (2013) Genetic algorithms in feature and instance selection. Knowl Based Syst 39:240–247CrossRefGoogle Scholar
  36. 36.
    Wagner RA, Fischer MJ (1974) The string-to-string correction problem. J ACM 21(1):168–173MathSciNetCrossRefMATHGoogle Scholar
  37. 37.
    Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© The Natural Computing Applications Forum 2016

Authors and Affiliations

  • Jorge Calvo-Zaragoza
    • 1
  • Jose J. Valero-Mas
    • 1
  • Juan R. Rico-Juan
    • 1
  1. 1.Department of Software and Computing SystemsUniversity of AlicanteAlicanteSpain

Personalised recommendations