Fast Tree-Based Classification via Homogeneous Clustering
Data reduction, achieved by collecting a small subset of representative prototypes from the original patterns, aims at alleviating the computational burden of training a classifier without sacrificing performance. We propose an extension of the Reduction by finding Homogeneous Clusters algorithm, which utilizes the k-means method to propose a set of homogeneous cluster centers as representative prototypes. We propose two new classifiers, which recursively produce homogeneous clusters and achieve higher performance than current homogeneous clustering methods with significant speed up. The key idea is the development of a tree data structure that holds the constructed clusters. Internal tree nodes consist of clustering models, while leaves correspond to homogeneous clusters where the corresponding class label is stored. Classification is performed by simply traversing the tree. The two algorithms differ on the clustering method used to build tree nodes: the first uses k-means while the second applies EM clustering. The proposed algorithms are evaluated on a variety datasets and compared with well-known methods. The results demonstrate very good classification performance combined with large computational savings.
KeywordsClassification k-means EM Prototype generation
- 1.Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2011)Google Scholar
- 4.Scikit-learn developers: scikit-learn user guide, March 2019. https://Scikit-learn.org