A fine-grained Random Forests using class decomposition: an application to medical diagnosis

Elyan, Eyad; Gaber, Mohamed Medhat

doi:10.1007/s00521-015-2064-z

A fine-grained Random Forests using class decomposition: an application to medical diagnosis

Predictive Analytics Using Machine Learning
Published: 22 September 2015

Volume 27, pages 2279–2288, (2016)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Eyad Elyan¹ &
Mohamed Medhat Gaber¹

722 Accesses
27 Citations
4 Altmetric
Explore all metrics

Abstract

Class decomposition describes the process of segmenting each class into a number of homogeneous subclasses. This can be naturally achieved through clustering. Utilising class decomposition can provide a number of benefits to supervised learning, especially ensembles. It can be a computationally efficient way to provide a linearly separable data set without the need for feature engineering required by techniques like support vector machines and deep learning. For ensembles, the decomposition is a natural way to increase diversity, a key factor for the success of ensemble classifiers. In this paper, we propose to adopt class decomposition to the state-of-the-art ensemble learning Random Forests. Medical data for patient diagnosis may greatly benefit from this technique, as the same disease can have a diverse of symptoms. We have experimentally validated our proposed method on a number of data sets that are mainly related to the medical domain. Results reported in this paper show clearly that our method has significantly improved the accuracy of Random Forests.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Medical Imbalanced Data Classification Based on Random Forests

Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications

Article Open access 02 November 2017

An Improved Classification Method Based on Random Forest for Medical Image

Notes

References

Abdallah ZS, Gaber MM (2011) Kb-cb-n classification: towards unsupervised approach for supervised learning. In: Computational intelligence and data mining (CIDM), 2011 IEEE symposium on IEEE, pp 283–290
Abdallah ZS, Gaber MM, Srinivasan B, Krishnaswamy S (2015) Adaptive mobile activity recognition system with evolving data streams. Neurocomputing 150:304–317
Article Google Scholar
Amaratunga D, Cabrera J, Lee Y-S (2008) Enriched random forests. Bioinformatics 24(18):2010–2014
Article Google Scholar
Amit Y, Geman D (1997) Shape quantization and recognition with randomized trees. Neural Comput. 9(7):1545–1588
Article Google Scholar
Bader-El-Den M, Gaber M (2012) Garf: towards self-optimised random forests. In: Neural information processing. Springer, pp 506–515
Breiman L (1996) Bagging predictors. Mach. Learn. 24(2):123–140
MathSciNet MATH Google Scholar
Breiman L (2001) Random forests. Mach. Learn. 45(1):5–32
Article MathSciNet MATH Google Scholar
Dietterich TG, Bakiri G (1991) Error-correcting output codes: a general method for improving multiclass inductive learning programs. In: AAAI, Citeseer, pp 572–577
Drucker H, Cortes C, Jackel LD, LeCun Y, Vapnik V (1994) Boosting and other ensemble methods. Neural Comput. 6(6):1289–1301
Article MATH Google Scholar
Elter M, Schulz-Wendtland R, Wittenberg T (2007) The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. Med. Phys. 34:4164
Article Google Scholar
Fawagreh K, Gaber MM, Elyan E (2014) Diversified random forests using random subspaces. In: Intelligent data engineering and automated learning–IDEAL 2014. Springer, pp 85–92
Fawagreh K, Gaber MM, Elyan E (2014) Random forests: from early developments to recent advancements. Syst Sci Control Eng Open Access J 2(1):602–609
Article Google Scholar
Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15:3133–3181
MathSciNet MATH Google Scholar
Freund Y (1995) Boosting a weak learning algorithm by majority. Inf Comput 121(2):256–285
Article MathSciNet MATH Google Scholar
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378
Article MathSciNet MATH Google Scholar
Ho TK (1995) Random decision forests. In: Proceedings of the third international conference on document analysis and recognition, vol 1, pp 278–282
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844
Article Google Scholar
Hong Z-Q, Yang J-Y (1991) Optimal discriminant plane for a small number of samples and design method of classifier on the plane. Pattern Recognit 24(4):317–324
Article MathSciNet Google Scholar
Jain AK, Dubes RC et al (1988) Algorithms for clustering data, vol 6. Prentice hall, Englewood Cliffs
MATH Google Scholar
Latinne P, Debeir O, Decaestecker C (2001) Limiting the number of trees in random forests. In: Multiple classifier systems. Springer, pp 178–187
Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22
Google Scholar
Lichman M (2013) UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. http://archive.ics.uci.edu/ml
Little MA, McSharry PE, Roberts SJ, Costello DA, Moroz IM (2007) Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection. BioMed Eng OnLine 6(1):23. http://doi.org/10.1186/1475-925X-6-23
Article Google Scholar
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth berkeley symposium on math, statistics, and probability, vol 1, pp 281–297
Mangasarian OL, Street WN, Wolberg WH (1995) breast cancer diagnosis and prognosis via linear programming. Oper Res 43:570–577
Article MathSciNet MATH Google Scholar
Polaka I (2013) Clustering algorithm specifics in class decomposition. In: International conference on applied information and communication technologies (AICT2013), 25–26 April 2013, Jelgava, Latvia
Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45
Article Google Scholar
Repository U (1996) Heart Disease dataset. https://archive.ics.uci.edu/ml/datasets/Statlog+(Heart). Accessed Dec 2014
Robnik-Šikonja M (2004) Improving random forests. In: Machine learning: ECML 2004. Springer, pp 359–370
Tsymbal A, Pechenizkiy M, Cunningham P (2006) Dynamic integration with random forests. In: Machine learning: ECML 2006. Springer, pp 801–808
Vilalta R, Achari M-K, Eick CF (2003) Class decomposition via clustering: a new framework for low-variance classifiers. In: Data mining, 2003, ICDM 2003, third IEEE international conference on IEEE, pp 673–676
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259
Article MathSciNet Google Scholar
Woolson RF (2008) Wilcoxon signed-rank test. Wiley encyclopedia of clinical trials, pp 1–3. doi:10.1002/9780471462422.eoct979

Download references

Author information

Authors and Affiliations

School of Computing Science and Digital Media, Robert Gordon University, Garthdee Road, Aberdeen, AB10 7GJ, United Kingdom
Eyad Elyan & Mohamed Medhat Gaber

Authors

Eyad Elyan
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Medhat Gaber
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Medhat Gaber.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Elyan, E., Gaber, M.M. A fine-grained Random Forests using class decomposition: an application to medical diagnosis. Neural Comput & Applic 27, 2279–2288 (2016). https://doi.org/10.1007/s00521-015-2064-z

Download citation

Received: 09 February 2015
Accepted: 08 September 2015
Published: 22 September 2015
Issue Date: November 2016
DOI: https://doi.org/10.1007/s00521-015-2064-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A fine-grained Random Forests using class decomposition: an application to medical diagnosis

Abstract

Access this article

Similar content being viewed by others

Medical Imbalanced Data Classification Based on Random Forests

Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications

An Improved Classification Method Based on Random Forest for Medical Image

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A fine-grained Random Forests using class decomposition: an application to medical diagnosis

Abstract

Access this article

Similar content being viewed by others

Medical Imbalanced Data Classification Based on Random Forests

Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications

An Improved Classification Method Based on Random Forest for Medical Image

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation