Balanced training of a hybrid ensemble method for imbalanced datasets: a case of emergency department readmission prediction

Artetxe, Arkaitz; Graña, Manuel; Beristain, Andoni; Ríos, Sebastián

doi:10.1007/s00521-017-3242-y

Balanced training of a hybrid ensemble method for imbalanced datasets: a case of emergency department readmission prediction

Original Article
Published: 14 October 2017

Volume 32, pages 5735–5744, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Arkaitz Artetxe¹,
Manuel Graña ORCID: orcid.org/0000-0001-7373-4097²,
Andoni Beristain¹ &
…
Sebastián Ríos³

766 Accesses
15 Citations
Explore all metrics

Abstract

Dealing with imbalanced datasets is a recurrent issue in health-care data processing. Most literature deals with small academic datasets, so that results often do not extrapolate to the large real-life datasets, or have little real-life validity. When minority class sample generation by interpolation is meaningless, the recourse to undersampling the majority class is mandatory in order to reach some acceptable results. Ensembles of classifiers provide the advantage of the diversity of their members, which may allow adaptation to the imbalanced distribution. In this paper, we present a pipeline method combining random undersampling with bootstrap aggregation (bagging) for a hybrid ensemble of extreme learning machines and decision trees, whose diversity improves adaptation to the imbalanced class dataset. The approach is demonstrated on a realistic greatly imbalanced dataset of emergency department patients from a Chilean hospital targeted to predict patient readmission. Computational experiments show that our approach outperforms other well-known classification algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Robust Ensemble Method for Classification in Imbalanced Datasets in the Presence of Noise

The Usefulness of Roughly Balanced Bagging for Complex and High-Dimensional Imbalanced Data

An Application of Oversampling, Undersampling, Bagging and Boosting in Handling Imbalanced Datasets

References

Arora S, Patel P, Lahewala S, Patel N, Patel NJ, Thakore K, Amin A, Tripathi B, Kumar V, Shah H, Shah M, Panaich S, Deshmukh A, Badheka A, Gidwani U, Gopalan R (2017) Etiologies, trends, and predictors of 30-day readmission in patients with heart failure. Am J Cardiol 119(5):760–769
Article Google Scholar
Artetxe A, Ayerdi B, Graa M, Rios, S (2017) Using anticipative hybrid extreme rotation forest to predict emergency service readmission risk. J Comput Sci
Artetxe A, Beristain A, Graña M, Besga A (2016) Predicting 30-day emergency readmission risk. In: International conference on European transnational education, Springer, pp 3–12
Billings J, Blunt I, Steventon A, Georghiou T, Lewis G, Bardsley M (2012) Development of a predictive model to identify inpatients at risk of re-admission within 30 days of discharge (parr-30). BMJ Open 2(4):e001,667
Article Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article MATH Google Scholar
Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239
Article Google Scholar
He H, Bai Y, Garcia EA, Li S (2008) Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference on neural networks, 2008. IJCNN 2008, IEEE world congress on computational intelligence, IEEE, pp 1322–1328
Huang G, Huang GB, Song S, You K (2015) Trends in extreme learning machines: a review. Neural Netw 61:32–48
Article MATH Google Scholar
Huang GB, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2(2):107–122
Article Google Scholar
Kansagara D, Englander H, Salanitro A, Kagen D, Theobald C, Freeman M, Kripalani S (2011) Risk prediction models for hospital readmission: a systematic review. JAMA 306(15):1688–1698
Article Google Scholar
Khalilia M, Chakraborty S, Popescu M (2011) Predicting disease risks from highly imbalanced data using random forest. BMC Med Inf Decis Mak 11(1):1
Article Google Scholar
Lin SJ, Chang C, Hsu MF (2013) Multiple extreme learning machines for a two-class imbalance corporate life cycle prediction. Knowl Based Syst 39:214–223
Article Google Scholar
López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141
Article Google Scholar
Mateo F, Soria-Olivas E, Martınez-Sober M, Téllez-Plaza M, Gómez-Sanchis J, Redón J (2016) Multi-step strategy for mortality assessment in cardiovascular risk patients with imbalanced data. In: European symposium on artificial neural networks, computational intelligence and machine learning
Mazurowski MA, Habas PA, Zurada JM, Lo JY, Baker JA, Tourassi GD (2008) Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw 21(2):427–436
Article Google Scholar
Meadem N, Verbiest N, Zolfaghar K, Agarwal J, Chin SC, Roy SB (2013) Exploring preprocessing techniques for prediction of risk of readmission for congestive heart failure patients. In: Data mining and healthcare (DMH), at international conference on knowledge discovery and data mining (KDD)
Mortazavi BJ, Downing NS, Bucholz EM, Dharmarajan K, Manhapra A, Li SX, Negahban SN, Krumholz HM (2016) Analysis of machine learning techniques for heart failure readmissions. Circ Cardiovasc Qual Outcomes 9:629–664
Article Google Scholar
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106
Google Scholar
Shi X, Xu G, Shen F, Zhao J (2015) Solving the data imbalance problem of p300 detection via random under-sampling bagging SVMs. In: 2015 international joint conference on Neural networks (IJCNN), IEEE, pp 1–5
Steinberg D, Colla P (1995) Cart: tree-structured non-parametric data analysis. Salford Systems, San Diego
Google Scholar
Sun Y, Wong AKC, Kamel MS (2009) Classification of imbalanced data: a review. Int J Pattern Recognit Artif Intell 23(04):687–719
Article Google Scholar
Turgeman L, May JH (2016) A mixed-ensemble model for hospital readmission. Artif Intell Med 72:72–82
Article Google Scholar
Urma D, Huang CC (2017) Interventions and strategies to reduce 30-day readmission rates. Hosp Med Clin 6(2):216–228
Article Google Scholar
Wang B, Pineau J (2016) Online bagging and boosting for imbalanced data streams. IEEE Trans Knowl Data Eng 28(12):3353–3366
Article Google Scholar
Yang Q, Wu X (2006) Ten challenging problems in data mining research. Int J Inf Technol Decis Mak 5(04):597–604
Article Google Scholar
Yoon K, Kwek S (2007) A data reduction approach for resolving the imbalanced data issue in functional genomics. Neural Comput Appl 16(3):295–306
Article Google Scholar
Young WA, Nykl SL, Weckman GR, Chelberg DM (2015) Using voronoi diagrams to improve classification performances when modeling imbalanced datasets. Neural Comput Appl 26(5):1041–1054
Article Google Scholar
Zhang Y, Fu P, Liu W, Chen G (2014) Imbalanced data classification based on scaling kernel-based support vector machine. Neural Comput Appl 25(3):927–935
Article Google Scholar
Zhang Z, Krawczyk B, Garcia S, Rosales-Perez A, Herrera F (2016) Empowering one-versus-one decomposition with ensemble learning for multi-class imbalanced data. Knowl Based Syst 106:251–263
Article Google Scholar
Zheng B, Zhang J, Yoon SW, Lam SS, Khasawneh M, Poranki S (2015) Predictive modeling of hospital readmissions using metaheuristics and data mining. Expert Syst Appl 42(20):7110–7120
Article Google Scholar

Download references

Author information

Authors and Affiliations

Vicomtech-IK4 Research Centre, Mikeletegi Pasealekua 57, 20009, San Sebastián, Spain
Arkaitz Artetxe & Andoni Beristain
Computation Intelligence Group, Basque University (UPV/EHU), P. Manuel Lardizabal 1, 20018, San Sebastián, Spain
Manuel Graña
CEINE, Universidad de Chile, Av. República 701, Santiago, Región Metropolitana, Chile
Sebastián Ríos

Authors

Arkaitz Artetxe
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Graña
View author publications
You can also search for this author in PubMed Google Scholar
Andoni Beristain
View author publications
You can also search for this author in PubMed Google Scholar
Sebastián Ríos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manuel Graña.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest

Rights and permissions

Reprints and permissions

About this article

Cite this article

Artetxe, A., Graña, M., Beristain, A. et al. Balanced training of a hybrid ensemble method for imbalanced datasets: a case of emergency department readmission prediction. Neural Comput & Applic 32, 5735–5744 (2020). https://doi.org/10.1007/s00521-017-3242-y

Download citation

Received: 27 March 2017
Accepted: 04 October 2017
Published: 14 October 2017
Issue Date: May 2020
DOI: https://doi.org/10.1007/s00521-017-3242-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Balanced training of a hybrid ensemble method for imbalanced datasets: a case of emergency department readmission prediction

Abstract

Access this article

Similar content being viewed by others

A Robust Ensemble Method for Classification in Imbalanced Datasets in the Presence of Noise

The Usefulness of Roughly Balanced Bagging for Complex and High-Dimensional Imbalanced Data

An Application of Oversampling, Undersampling, Bagging and Boosting in Handling Imbalanced Datasets

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Balanced training of a hybrid ensemble method for imbalanced datasets: a case of emergency department readmission prediction

Abstract

Access this article

Similar content being viewed by others

A Robust Ensemble Method for Classification in Imbalanced Datasets in the Presence of Noise

The Usefulness of Roughly Balanced Bagging for Complex and High-Dimensional Imbalanced Data

An Application of Oversampling, Undersampling, Bagging and Boosting in Handling Imbalanced Datasets

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation