Soft Computing

, Volume 22, Issue 5, pp 1711–1718 | Cite as

Forests of unstable hierarchical clusters for pattern classification

  • Kyaw Kyaw HtikeEmail author
Methodologies and Application


Classification of patterns is a key ability shared by intelligent systems. One of the crucial components of a pattern classification pipeline is the classifier. There have been many classifiers that have been proposed in literature, and it has been shown recently that ensembles of decisions trees tend to perform and generalize well to unseen test data. In this paper, we propose a novel ensemble classifier that consists of a diverse group of hierarchical clusterings on data. The proposed algorithm is fast to train, fully automatic and outperforms existing decision tree ensemble techniques and other state-of-the-art classifiers. We empirically show the effectiveness of the algorithm by evaluating on four publicly available datasets.


Forest Classifier Binary classification Ensemble method Hierarchical Clustering 


Compliance with ethical standards

Conflict of interest

The author declares that he has no conflict of interest.


  1. Boser BE, Guyon IM, Vapnik VN (1991) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on Computational learning theory, pp 144–152. ACMGoogle Scholar
  2. Breiman L (1996) Out-of-bag estimation. Technical Report, CiteseerGoogle Scholar
  3. Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefzbMATHGoogle Scholar
  4. Bylander T (2002) Estimating generalization error on two-class datasets using out-of-bag estimates. Mach Learn 48(1–3):287–297CrossRefzbMATHGoogle Scholar
  5. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297zbMATHGoogle Scholar
  6. Dredze M, Gevaryahu R, Elias-Bachrach A (2007) Learning fast classifiers for image spam. In: CEASGoogle Scholar
  7. El Ayadi M, Kamel MS, Karray F (2011) Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit 44(3):572–587CrossRefzbMATHGoogle Scholar
  8. Erdélyi M, Garzó A, Benczúr AA (2011) Web spam classification: a few features worth more. In: Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality, pp 27–34, ACMGoogle Scholar
  9. Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on, computer vision and pattern recognition, CVPR 2008. pp 1–8Google Scholar
  10. Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7(2):179–188CrossRefGoogle Scholar
  11. Freund Y, Schapire RE (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory, pp 23–37, Springer, BerlinGoogle Scholar
  12. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Statist 29(5):1189–1232Google Scholar
  13. Gualdi G, Prati A, Cucchiara R (2011) A multi-stage pedestrian detection using monolithic classifiers. In: IEEE international conference on advanced video and signal-based surveillance (AVSS), pp 267–272Google Scholar
  14. Guyon I, Gunn S, Ben-Hur A, Dror G (2004) Result analysis of the NIPS 2003 feature selection challenge. In: Advances in neural information processing systems, pp 545–552Google Scholar
  15. Ho TK, Kleinberg EM (1996) Building projectable classifiers of arbitrary complexity. In: Proceedings of the 13th international conference on pattern recognition, 1996, vol 2, pp 880–885Google Scholar
  16. Htike KK, Hogg D (2014) Efficient non-iterative domain adaptation of pedestrian detectors to video scenes. In: 22nd International conference on, pattern recognition (ICPR), 2014, pp 654–659Google Scholar
  17. Htike KK, Hogg D (2016) Adapting pedestrian detectors to new domains: a comprehensive review. Eng Appl Artif Intell 50:142–158CrossRefGoogle Scholar
  18. Juang BH, Hou W, Lee CH (1997) Minimum classification error rate methods for speech recognition. IEEE Trans Speech Audio Process 5(3):257–265CrossRefGoogle Scholar
  19. Kwon OW, Chan K, Hao J, Lee TW (2003) Emotion recognition by speech signals. In: INTERSPEECH. CiteseerGoogle Scholar
  20. Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22Google Scholar
  21. Lichman M (2013) UCI machine learning repository.
  22. Littlestone N (1988) Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Mach Learn 2(4):285–318Google Scholar
  23. MacQueen J, et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, vol 1, pp. 281–297. Oakland, CA, USAGoogle Scholar
  24. Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386CrossRefGoogle Scholar
  25. Rumelhart DE, Hinton GE, Williams RJ (1988) Learning representations by back-propagating errors. MIT Press, Cambridge, MA, USAzbMATHGoogle Scholar
  26. Russell S, Norvig P, Intelligence A (1995) A modern approach. Artificial Intelligence. Prentice-Hall, Egnlewood Cliffs 25, 27Google Scholar
  27. Sánchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245MathSciNetCrossRefzbMATHGoogle Scholar
  28. Schapire RE, Singer Y (1999) Improved boosting algorithms using confidence-rated predictions. Machine learning 37(3):297–336CrossRefzbMATHGoogle Scholar
  29. Szummer M, Picard RW (1998) Indoor-outdoor image classification. In: IEEE International workshop on, content-based access of image and video database, 1998. pp 42–51Google Scholar
  30. Tzanetakis G, Cook P (2002) Musical genre classification of audio signals. IEEE Trans Speech Audio Process 10(5):293–302CrossRefGoogle Scholar
  31. Wagner A, Wright J, Ganesh A, Zhou Z, Mobahi H, Ma Y (2012) Toward a practical face recognition system: Robust alignment and illumination by sparse representation. IEEE Trans Pattern Anal Mach Intell 34(2):372–386CrossRefGoogle Scholar
  32. Walker SH, Duncan DB (1967) Estimation of the probability of an event as a function of several independent variables. Biometrika 54(1–2):167–179MathSciNetCrossRefzbMATHGoogle Scholar
  33. Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: IEEE conference on, computer vision and pattern recognition (CVPR), 2010, pp 3360–3367Google Scholar
  34. Yang J, Chu D, Zhang L, Xu Y, Yang J (2013) Sparse representation classifier steered discriminative projection with applications to face recognition. IEEE Trans Neural Networks and Learn Syst 24(7):1023–1035CrossRefGoogle Scholar
  35. Yang M, Zhang L, Shiu SC, Zhang D (2013) Gabor feature based robust representation and classification for face recognition with gabor occlusion dictionary. Pattern Recogn 46(7):1865–1878CrossRefGoogle Scholar
  36. Yu B, Xu Zb (2008) A comparative study for content-based dynamic spam classification using four machine learning algorithms. Knowl-Based Syst 21(4):355–362CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.School of Information TechnologyUCSI UniversityKuala LumpurMalaysia

Personalised recommendations