Ensemble Clustering Based Dimensional Reduction

  • Loai AbddallahEmail author
  • Malik Yousef
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 903)


Distance metric over a given space of data should reflect the precise comparison among objects. The Euclidean distance of data points represented by a large number of features is not capturing the actual relationship between those points. However, objects of similar cluster both often have some common attributes despite the fact that their geometrical distance could be somewhat large. In this study, we proposed a new method that replaced the given data space to categorical space based on ensemble clustering (EC). The EC space is defined by tracking the membership of the points over multiple runs of clustering algorithms. To assess our suggested method, it was integrated within the framework of the Decision Trees, K Nearest Neighbors, and the Random Forest classifiers. The results obtained by applying EC on 10 datasets confirmed that our hypotheses embedding the EC space as a distance metric, would improve the performance and reduce the feature space dramatically.


Decision trees Ensemble clustering Classification 



This research was supported by the Max Stern Yezreel Valley College for LA and by Zefat Academic College for MY.


  1. 1.
    Topchy, A., Jain, A.K., Punch, W.: Combining multiple weak clusterings. In: Third IEEE International Conference on Data Mining, pp. 0–7 (2003)Google Scholar
  2. 2.
    Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Topchy, A., Jain, A.K., Punch, W.: Clustering ensembles: models of consensus and weak partitions. IEEE Trans. Pattern Anal. Mach. Intell. 27(12), 1866–1881 (2005)CrossRefGoogle Scholar
  4. 4.
    Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a clustering procedure. Bioinformatics 19(9), 1090–1099 (2003)CrossRefGoogle Scholar
  5. 5.
    Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the Twentieth International Conference on Machine Learning, vol. 20, pp. 186–193 (2003)Google Scholar
  6. 6.
    Fischer, B., Buhmann, J.M.: Bagging for path-based clustering. IEEE Trans. Pattern Anal. Mach. Intell. 25(11), 1411–1415 (2003)CrossRefGoogle Scholar
  7. 7.
    Derbeko, P., El-Yaniv, R., Meir, R.: Explicit learning curves for transduction and application to clustering and compression algorithms. J. Artif. Intell. Res. 22, 117–142 (2004)MathSciNetCrossRefGoogle Scholar
  8. 8.
    AbedAllah, L., Shimshoni, I.: k nearest neighbor using ensemble clustering. In: Cuzzocrea, A., Dayal, U. (eds.) DaWaK 2012. LNCS, vol. 7448, pp. 265–278. Springer, Heidelberg (2012). Scholar
  9. 9.
    AbedAllah, L., Shimshoni, I.: An ensemble-clustering-based distance metric and its applications. Int. J. Bus. Intell. Data Min. 8(3), 264–287 (2013)CrossRefGoogle Scholar
  10. 10.
    Yousef, M., Khalifa, W., AbedAllah, L.: Ensemble clustering classification compete SVM and one-class classifiers applied on plant microRNAs data. J. Integr. Bioinform. 13(5), 304 (2016)CrossRefGoogle Scholar
  11. 11.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  12. 12.
    Griffiths-Jones, S.: miRBase: microRNA sequences and annotation. Curr. Protoc. Bioinform. Chapter 12, Unit 12.9.1–10 (2010)Google Scholar
  13. 13.
    Yousef, M., Nigatu, D., Levy, D., Allmer, J., Henkel, W.: Categorization of species based on their microRNAs employing sequence motifs, information-theoretic sequence feature extraction, and k-mers. EURASIP J. Adv. Signal Process. 2017(1), 70 (2017)CrossRefGoogle Scholar
  14. 14.
    Yousef, M., Nebozhyn, M., Shatkay, H., Kanterakis, S., Showe, L.C., Showe, M.K.: Combining multi-species genomic data for microRNA identification using a Naive Bayes classifier. Bioinformatics 22(11), 1325–1334 (2006)CrossRefGoogle Scholar
  15. 15.
    Sacar, M.D., Allmer, J.: Data mining for microRNA gene prediction: on the impact of class imbalance and feature number for microRNA gene prediction. In: 2013 8th International Symposium on Health Informatics and Bioinformatics, pp. 1–6 (2013)Google Scholar
  16. 16.
    Yousef, M., Yousef, A., Allmer, J.: K-mer Distance a New Set of Features for Delineating among Pre-Cursor microRNAs from Different Species (2018)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Information SystemsThe Max Stern Yezreel Valley Academic CollegeJezreelIsrael
  2. 2.Department of Community Information SystemsZefat Academic CollegeZefatIsrael

Personalised recommendations