Ensemble Clustering Based Dimensional Reduction
Distance metric over a given space of data should reflect the precise comparison among objects. The Euclidean distance of data points represented by a large number of features is not capturing the actual relationship between those points. However, objects of similar cluster both often have some common attributes despite the fact that their geometrical distance could be somewhat large. In this study, we proposed a new method that replaced the given data space to categorical space based on ensemble clustering (EC). The EC space is defined by tracking the membership of the points over multiple runs of clustering algorithms. To assess our suggested method, it was integrated within the framework of the Decision Trees, K Nearest Neighbors, and the Random Forest classifiers. The results obtained by applying EC on 10 datasets confirmed that our hypotheses embedding the EC space as a distance metric, would improve the performance and reduce the feature space dramatically.
KeywordsDecision trees Ensemble clustering Classification
This research was supported by the Max Stern Yezreel Valley College for LA and by Zefat Academic College for MY.
- 1.Topchy, A., Jain, A.K., Punch, W.: Combining multiple weak clusterings. In: Third IEEE International Conference on Data Mining, pp. 0–7 (2003)Google Scholar
- 5.Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the Twentieth International Conference on Machine Learning, vol. 20, pp. 186–193 (2003)Google Scholar
- 12.Griffiths-Jones, S.: miRBase: microRNA sequences and annotation. Curr. Protoc. Bioinform. Chapter 12, Unit 12.9.1–10 (2010)Google Scholar
- 15.Sacar, M.D., Allmer, J.: Data mining for microRNA gene prediction: on the impact of class imbalance and feature number for microRNA gene prediction. In: 2013 8th International Symposium on Health Informatics and Bioinformatics, pp. 1–6 (2013)Google Scholar
- 16.Yousef, M., Yousef, A., Allmer, J.: K-mer Distance a New Set of Features for Delineating among Pre-Cursor microRNAs from Different Species (2018)Google Scholar