An Efficient Prototype Selection Algorithm Based on Dense Spatial Partitions
In order to deal with big data, techniques for prototype selection have been applied for reducing the computational resources that are necessary to apply data mining approaches. However, most of the proposed approaches for prototype selection have a high time complexity and, due to this, they cannot be applied for dealing with big data. In this paper, we propose an efficient approach for prototype selection. It adopts the notion of spatial partition for efficiently dividing the dataset in sets of similar instances. In a second step, the algorithm extracts a prototype of each of the densest spatial partitions that were previously identified. The approach was evaluated on 15 well-known datasets used in a classification task, and its performance was compared to those of 6 state-of-the-art algorithms, considering two measures: accuracy and reduction. All the obtained results show that, in general, the proposed approach provides a good trade-off between accuracy and reduction, with a significantly lower running time, when compared with other approaches.
KeywordsPrototype selection Data reduction Data mining Machine learning Big data
- 4.Carbonera, J.L., Abel, M.: A density-based approach for instance selection. In: 2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 768–774. IEEE (2015)Google Scholar
- 5.Carbonera, J.L., Abel, M.: A novel density-based approach for instance selection. In: 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 549–556. IEEE (2016)Google Scholar
- 6.Carbonera, J.L., Abel, M.: Efficient prototype selection supported by subspace partitions. In: 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 921–928. IEEE (2017)Google Scholar
- 7.Chou, C.H., Kuo, B.H., Chang, F.: The generalized condensed nearest neighbor rule as a data reduction method. In: 2006 18th International Conference on Pattern Recognition, ICPR 2006, vol. 2, pp. 556–559. IEEE (2006)Google Scholar