Skip to main content
Log in

A fine-grained Random Forests using class decomposition: an application to medical diagnosis

  • Predictive Analytics Using Machine Learning
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Class decomposition describes the process of segmenting each class into a number of homogeneous subclasses. This can be naturally achieved through clustering. Utilising class decomposition can provide a number of benefits to supervised learning, especially ensembles. It can be a computationally efficient way to provide a linearly separable data set without the need for feature engineering required by techniques like support vector machines and deep learning. For ensembles, the decomposition is a natural way to increase diversity, a key factor for the success of ensemble classifiers. In this paper, we propose to adopt class decomposition to the state-of-the-art ensemble learning Random Forests. Medical data for patient diagnosis may greatly benefit from this technique, as the same disease can have a diverse of symptoms. We have experimentally validated our proposed method on a number of data sets that are mainly related to the medical domain. Results reported in this paper show clearly that our method has significantly improved the accuracy of Random Forests.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. https://archive.ics.uci.edu/ml/machinelearningdatabases/thyroiddisease/annReadme.

  2. http://cran.r-project.org/web/packages/randomForest/randomForest.pdf.

References

  1. Abdallah ZS, Gaber MM (2011) Kb-cb-n classification: towards unsupervised approach for supervised learning. In: Computational intelligence and data mining (CIDM), 2011 IEEE symposium on IEEE, pp 283–290

  2. Abdallah ZS, Gaber MM, Srinivasan B, Krishnaswamy S (2015) Adaptive mobile activity recognition system with evolving data streams. Neurocomputing 150:304–317

    Article  Google Scholar 

  3. Amaratunga D, Cabrera J, Lee Y-S (2008) Enriched random forests. Bioinformatics 24(18):2010–2014

    Article  Google Scholar 

  4. Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Comput. 9(7):1545–1588

    Article  Google Scholar 

  5. Bader-El-Den M, Gaber M (2012) Garf: towards self-optimised random forests. In: Neural information processing. Springer, pp 506–515

  6. Breiman L (1996) Bagging predictors. Mach. Learn. 24(2):123–140

    MathSciNet  MATH  Google Scholar 

  7. Breiman L (2001) Random forests. Mach. Learn. 45(1):5–32

    Article  MathSciNet  MATH  Google Scholar 

  8. Dietterich TG, Bakiri G (1991) Error-correcting output codes: a general method for improving multiclass inductive learning programs. In: AAAI, Citeseer, pp 572–577

  9. Drucker H, Cortes C, Jackel LD, LeCun Y, Vapnik V (1994) Boosting and other ensemble methods. Neural Comput. 6(6):1289–1301

    Article  MATH  Google Scholar 

  10. Elter M, Schulz-Wendtland R, Wittenberg T (2007) The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. Med. Phys. 34:4164

    Article  Google Scholar 

  11. Fawagreh K, Gaber MM, Elyan E (2014) Diversified random forests using random subspaces. In: Intelligent data engineering and automated learning–IDEAL 2014. Springer, pp 85–92

  12. Fawagreh K, Gaber MM, Elyan E (2014) Random forests: from early developments to recent advancements. Syst Sci Control Eng Open Access J 2(1):602–609

    Article  Google Scholar 

  13. Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15:3133–3181

    MathSciNet  MATH  Google Scholar 

  14. Freund Y (1995) Boosting a weak learning algorithm by majority. Inf Comput 121(2):256–285

    Article  MathSciNet  MATH  Google Scholar 

  15. Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378

    Article  MathSciNet  MATH  Google Scholar 

  16. Ho TK (1995) Random decision forests. In: Proceedings of the third international conference on document analysis and recognition, vol 1, pp 278–282

  17. Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844

    Article  Google Scholar 

  18. Hong Z-Q, Yang J-Y (1991) Optimal discriminant plane for a small number of samples and design method of classifier on the plane. Pattern Recognit 24(4):317–324

    Article  MathSciNet  Google Scholar 

  19. Jain AK, Dubes RC et al (1988) Algorithms for clustering data, vol 6. Prentice hall, Englewood Cliffs

    MATH  Google Scholar 

  20. Latinne P, Debeir O, Decaestecker C (2001) Limiting the number of trees in random forests. In: Multiple classifier systems. Springer, pp 178–187

  21. Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22

    Google Scholar 

  22. Lichman M (2013) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml

  23. Little MA, McSharry PE, Roberts SJ, Costello DA, Moroz IM (2007) Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection. BioMed Eng OnLine 6(1):23. http://doi.org/10.1186/1475-925X-6-23

    Article  Google Scholar 

  24. MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth berkeley symposium on math, statistics, and probability, vol 1, pp 281–297

  25. Mangasarian OL, Street WN, Wolberg WH (1995) breast cancer diagnosis and prognosis via linear programming. Oper Res 43:570–577

    Article  MathSciNet  MATH  Google Scholar 

  26. Polaka I (2013) Clustering algorithm specifics in class decomposition. In: International conference on applied information and communication technologies (AICT2013), 25–26 April 2013, Jelgava, Latvia

  27. Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45

    Article  Google Scholar 

  28. Repository U (1996) Heart Disease dataset. https://archive.ics.uci.edu/ml/datasets/Statlog+(Heart). Accessed Dec 2014

  29. Robnik-Šikonja M (2004) Improving random forests. In: Machine learning: ECML 2004. Springer, pp 359–370

  30. Tsymbal A, Pechenizkiy M, Cunningham P (2006) Dynamic integration with random forests. In: Machine learning: ECML 2006. Springer, pp 801–808

  31. Vilalta R, Achari M-K, Eick CF (2003) Class decomposition via clustering: a new framework for low-variance classifiers. In: Data mining, 2003, ICDM 2003, third IEEE international conference on IEEE, pp 673–676

  32. Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259

    Article  MathSciNet  Google Scholar 

  33. Woolson RF (2008) Wilcoxon signed-rank test. Wiley encyclopedia of clinical trials, pp 1–3. doi:10.1002/9780471462422.eoct979

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Medhat Gaber.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Elyan, E., Gaber, M.M. A fine-grained Random Forests using class decomposition: an application to medical diagnosis. Neural Comput & Applic 27, 2279–2288 (2016). https://doi.org/10.1007/s00521-015-2064-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-015-2064-z

Keywords

Navigation