An Efficient Prototype Selection Algorithm Based on Dense Spatial Partitions

  • Joel Luís CarboneraEmail author
  • Mara Abel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10842)


In order to deal with big data, techniques for prototype selection have been applied for reducing the computational resources that are necessary to apply data mining approaches. However, most of the proposed approaches for prototype selection have a high time complexity and, due to this, they cannot be applied for dealing with big data. In this paper, we propose an efficient approach for prototype selection. It adopts the notion of spatial partition for efficiently dividing the dataset in sets of similar instances. In a second step, the algorithm extracts a prototype of each of the densest spatial partitions that were previously identified. The approach was evaluated on 15 well-known datasets used in a classification task, and its performance was compared to those of 6 state-of-the-art algorithms, considering two measures: accuracy and reduction. All the obtained results show that, in general, the proposed approach provides a good trade-off between accuracy and reduction, with a significantly lower running time, when compared with other approaches.


Prototype selection Data reduction Data mining Machine learning Big data 


  1. 1.
    Anwar, I.M., Salama, K.M., Abdelbar, A.M.: Instance selection with ant colony optimization. Procedia Comput. Sci. 53, 248–256 (2015)CrossRefGoogle Scholar
  2. 2.
    Brighton, H., Mellish, C.: Advances in instance selection for instance-based learning algorithms. Data Min. Knowl. Disc. 6(2), 153–172 (2002)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Carbonera, J.L.: An efficient approach for instance selection. In: Bellatreche, L., Chakravarthy, S. (eds.) DaWaK 2017. LNCS, vol. 10440, pp. 228–243. Springer, Cham (2017). Scholar
  4. 4.
    Carbonera, J.L., Abel, M.: A density-based approach for instance selection. In: 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 768–774. IEEE (2015)Google Scholar
  5. 5.
    Carbonera, J.L., Abel, M.: A novel density-based approach for instance selection. In: 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 549–556. IEEE (2016)Google Scholar
  6. 6.
    Carbonera, J.L., Abel, M.: Efficient prototype selection supported by subspace partitions. In: 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 921–928. IEEE (2017)Google Scholar
  7. 7.
    Chou, C.H., Kuo, B.H., Chang, F.: The generalized condensed nearest neighbor rule as a data reduction method. In: 2006 18th International Conference on Pattern Recognition, ICPR 2006, vol. 2, pp. 556–559. IEEE (2006)Google Scholar
  8. 8.
    García, S., Luengo, J., Herrera, F.: Data preprocessing in data mining. Springer, Switzerland (2015). Scholar
  9. 9.
    Gates, G.W.: Reduced nearest neighbor rule. IEEE Trans. Inf. Theory 18(3), 431–433 (1972)CrossRefGoogle Scholar
  10. 10.
    Hamidzadeh, J., Monsefi, R., Yazdi, H.S.: IRAHC: instance reduction algorithm using hyperrectangle clustering. Pattern Recogn. 48(5), 1878–1889 (2015)CrossRefGoogle Scholar
  11. 11.
    Hart, P.E.: The condensed nearest neighbor rule. IEEE Trans. Inf. Theory 14, 515–516 (1968)CrossRefGoogle Scholar
  12. 12.
    Leyva, E., González, A., Pérez, R.: Three new instance selection methods based on local sets: a comparative study with several approaches from a bi-objective perspective. Pattern Recogn. 48(4), 1523–1537 (2015)CrossRefGoogle Scholar
  13. 13.
    Lin, W.C., Tsai, C.F., Ke, S.W., Hung, C.W., Eberle, W.: Learning to detect representative data for large scale instance selection. J. Syst. Softw. 106, 1–8 (2015)CrossRefGoogle Scholar
  14. 14.
    Nikolaidis, K., Goulermas, J.Y., Wu, Q.: A class boundary preserving algorithm for data condensation. Pattern Recogn. 44(3), 704–715 (2011)CrossRefGoogle Scholar
  15. 15.
    Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)CrossRefGoogle Scholar
  16. 16.
    Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. SMC 2(3), 408–421 (1972)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.IBM ResearchRio de JaneiroBrazil
  2. 2.UFRGSPorto AlegreBrazil

Personalised recommendations