Forests of unstable hierarchical clusters for pattern classification


Classification of patterns is a key ability shared by intelligent systems. One of the crucial components of a pattern classification pipeline is the classifier. There have been many classifiers that have been proposed in literature, and it has been shown recently that ensembles of decisions trees tend to perform and generalize well to unseen test data. In this paper, we propose a novel ensemble classifier that consists of a diverse group of hierarchical clusterings on data. The proposed algorithm is fast to train, fully automatic and outperforms existing decision tree ensemble techniques and other state-of-the-art classifiers. We empirically show the effectiveness of the algorithm by evaluating on four publicly available datasets.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. Boser BE, Guyon IM, Vapnik VN (1991) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on Computational learning theory, pp 144–152. ACM

  2. Breiman L (1996) Out-of-bag estimation. Technical Report, Citeseer

  3. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  4. Bylander T (2002) Estimating generalization error on two-class datasets using out-of-bag estimates. Mach Learn 48(1–3):287–297

    Article  MATH  Google Scholar 

  5. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  6. Dredze M, Gevaryahu R, Elias-Bachrach A (2007) Learning fast classifiers for image spam. In: CEAS

  7. El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit 44(3):572–587

    Article  MATH  Google Scholar 

  8. Erdélyi M, Garzó A, Benczúr AA (2011) Web spam classification: a few features worth more. In: Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality, pp 27–34, ACM

  9. Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on, computer vision and pattern recognition, CVPR 2008. pp 1–8

  10. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188

    Article  Google Scholar 

  11. Freund Y, Schapire RE (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory, pp 23–37, Springer, Berlin

  12. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Statist 29(5):1189–1232

  13. Gualdi G, Prati A, Cucchiara R (2011) A multi-stage pedestrian detection using monolithic classifiers. In: IEEE international conference on advanced video and signal-based surveillance (AVSS), pp 267–272

  14. Guyon I, Gunn S, Ben-Hur A, Dror G (2004) Result analysis of the NIPS 2003 feature selection challenge. In: Advances in neural information processing systems, pp 545–552

  15. Ho TK, Kleinberg EM (1996) Building projectable classifiers of arbitrary complexity. In: Proceedings of the 13th international conference on pattern recognition, 1996, vol 2, pp 880–885

  16. Htike KK, Hogg D (2014) Efficient non-iterative domain adaptation of pedestrian detectors to video scenes. In: 22nd International conference on, pattern recognition (ICPR), 2014, pp 654–659

  17. Htike KK, Hogg D (2016) Adapting pedestrian detectors to new domains: a comprehensive review. Eng Appl Artif Intell 50:142–158

    Article  Google Scholar 

  18. Juang BH, Hou W, Lee CH (1997) Minimum classification error rate methods for speech recognition. IEEE Trans Speech Audio Process 5(3):257–265

    Article  Google Scholar 

  19. Kwon OW, Chan K, Hao J, Lee TW (2003) Emotion recognition by speech signals. In: INTERSPEECH. Citeseer

  20. Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22

    Google Scholar 

  21. Lichman M (2013) UCI machine learning repository.

  22. Littlestone N (1988) Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Mach Learn 2(4):285–318

    Google Scholar 

  23. MacQueen J, et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, vol 1, pp. 281–297. Oakland, CA, USA

  24. Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386

    Article  Google Scholar 

  25. Rumelhart DE, Hinton GE, Williams RJ (1988) Learning representations by back-propagating errors. MIT Press, Cambridge, MA, USA

    Google Scholar 

  26. Russell S, Norvig P, Intelligence A (1995) A modern approach. Artificial Intelligence. Prentice-Hall, Egnlewood Cliffs 25, 27

  27. Sánchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245

    MathSciNet  Article  MATH  Google Scholar 

  28. Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Machine learning 37(3):297–336

    Article  MATH  Google Scholar 

  29. Szummer M, Picard RW (1998) Indoor-outdoor image classification. In: IEEE International workshop on, content-based access of image and video database, 1998. pp 42–51

  30. Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10(5):293–302

    Article  Google Scholar 

  31. Wagner A, Wright J, Ganesh A, Zhou Z, Mobahi H, Ma Y (2012) Toward a practical face recognition system: Robust alignment and illumination by sparse representation. IEEE Trans Pattern Anal Mach Intell 34(2):372–386

    Article  Google Scholar 

  32. Walker SH, Duncan DB (1967) Estimation of the probability of an event as a function of several independent variables. Biometrika 54(1–2):167–179

    MathSciNet  Article  MATH  Google Scholar 

  33. Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: IEEE conference on, computer vision and pattern recognition (CVPR), 2010, pp 3360–3367

  34. Yang J, Chu D, Zhang L, Xu Y, Yang J (2013) Sparse representation classifier steered discriminative projection with applications to face recognition. IEEE Trans Neural Networks and Learn Syst 24(7):1023–1035

    Article  Google Scholar 

  35. Yang M, Zhang L, Shiu SC, Zhang D (2013) Gabor feature based robust representation and classification for face recognition with gabor occlusion dictionary. Pattern Recogn 46(7):1865–1878

    Article  Google Scholar 

  36. Yu B, Xu Zb (2008) A comparative study for content-based dynamic spam classification using four machine learning algorithms. Knowl-Based Syst 21(4):355–362

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Kyaw Kyaw Htike.

Ethics declarations

Conflict of interest

The author declares that he has no conflict of interest.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Htike, K.K. Forests of unstable hierarchical clusters for pattern classification. Soft Comput 22, 1711–1718 (2018).

Download citation


  • Forest
  • Classifier
  • Binary classification
  • Ensemble method
  • Hierarchical Clustering