Advertisement

Rank Forest: Systematic Attribute Sub-spacing in Decision Forest

  • Zaheer Babar
  • Md Zahidul Islam
  • Sameen Mansha
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 845)

Abstract

Decision Trees are well known classification algorithms that are also appreciated for their capacity for knowledge discovery. In the literature two major shortcomings of decision trees have been pointed out: (1) instability, and (2) high computational cost. These problems have been addressed to some extent through ensemble learning techniques such as Random Forest. Unlike decision trees where the whole attribute space of a dataset is used to discover the best test attribute for a node, in Random Forest a random subspace of attributes is first selected from which the test attribute for a node is then identified. The property that randomly selects an attribute subspace can cause the selection of all/many poor quality attributes in a subspace resulting in an individual tree with low accuracy. Therefore, in this paper we propose a probabilistic selection of attributes (instead of a random selection) where the probability of the selection of an attribute is proportionate to its quality. Although we developed this approach independently, after the research was completed we discovered that some existing techniques also took the same approach. While in this paper we use mutual information as a measure of an attribute quality, the papers in the literature used information gain ratio and a t-test as the measure. The proposed technique has been evaluated using nine different datasets and a stable performance can be seen in terms of the accuracy (ensemble accuracy and individual tree accuracy) and efficiency.

Keywords

Ensemble learning Decision forest Reduced attribute subspace Random Forest 

References

  1. 1.
    Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2006)CrossRefGoogle Scholar
  2. 2.
    Martin, J.K., Hirschberg, D.S.: The time complexity of decision tree induction (1995)Google Scholar
  3. 3.
    Chikalov, I.: Average Time Complexity of Decision Trees. Intelligent Systems Reference Library, vol. 12. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-22661-8CrossRefzbMATHGoogle Scholar
  4. 4.
    Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000).  https://doi.org/10.1007/3-540-45014-9_1CrossRefGoogle Scholar
  5. 5.
    Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001).  https://doi.org/10.1023/A:1010933404324CrossRefzbMATHGoogle Scholar
  6. 6.
    Biau, G., Biau, G.: Analysis of a random forests model. J. Mach. Learn. Res. 13, 1063–1095 (2012)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Amaratunga, D., Cabrera, J., Lee, Y.-S.: Enriched random forests. Bioinformatics 24, 2010–2014 (2008).  https://doi.org/10.1093/bioinformatics/btn356CrossRefGoogle Scholar
  8. 8.
    Zhao, H., Williams, G.J., Huang, J.Z.: wsrf: an R package for classification with scalable weighted subspace random forests (2017). jstatsoft.org
  9. 9.
    Hssina, B., Merbouha, A., Ezzikouri, H., Erritali, M.: A comparative study of decision tree ID3 and C4.5. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 4(2) (2014)Google Scholar
  10. 10.
    Islam, M.Z.: EXPLORE: a novel decision tree classification algorithm. In: MacKinnon, L.M. (ed.) BNCOD 2010. LNCS, vol. 6121, pp. 55–71. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-25704-9_7CrossRefGoogle Scholar
  11. 11.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Los Altos (1993)Google Scholar
  12. 12.
    Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)Google Scholar
  13. 13.
    Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)zbMATHGoogle Scholar
  14. 14.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, Los Altos (2006)zbMATHGoogle Scholar
  15. 15.
    Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998).  https://doi.org/10.1109/34.709601CrossRefGoogle Scholar
  16. 16.
    Adnan, M.N., Islam, M.Z.: Forest CERN: a new decision forest building technique. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9651, pp. 304–315. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-31753-3_25CrossRefGoogle Scholar
  17. 17.
    Li, J., Liu, H.: Ensembles of cascading trees. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 585–588 (2003)Google Scholar
  18. 18.
    Islam, M.Z., Giggins, H.: Knowledge discovery through SysFor -a systematically developed forest of multiple decision trees. In: Proceedings of the 9th Australasian Data Mining Conference (2011)Google Scholar
  19. 19.
    Xu, Y., Jones, G., Li, J., Wang, B., Sun, C.: A study on mutual information-based feature selection for text categorization. J. Comput. Inf. Syst. 3(3), 203–213 (2005)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Institute of Computing and Information SciencesRadboud UniversityNijmegenThe Netherlands
  2. 2.School of Computing and MathematicsCharles Sturt UniversityBathurstAustralia
  3. 3.School of Information Technology and Electrical EngineeringUniversity of QueenslandBrisbaneAustralia

Personalised recommendations