HBagging-MCDM: an ensemble classifier combined with multiple criteria decision making for rectal cancer survival prediction

Zhang, Fengyu; Li, Xihua

doi:10.1007/s10479-023-05642-6

HBagging-MCDM: an ensemble classifier combined with multiple criteria decision making for rectal cancer survival prediction

Original Research
Published: 28 October 2023

Volume 335, pages 469–490, (2024)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

186 Accesses
Explore all metrics

Abstract

As a main type of colorectal cancer, rectal cancer has a high risk and mortality rate so it is very important to accurately predict the survivability of patients to make better decisions on medical treatment and preparation for medical expenses. In recent years, many scholars have studied the survivability of selected common cancers such as lung cancer using machine learning approaches. Therefore, this research proposes a heterogeneous ensemble classification model to predict the survivability of rectal cancer patients. The model employs four different types of classifiers as component classifiers and Bagging algorithm to generate example sets for training component classifiers. In the proposed model, heterogeneous ensemble can help improve the diversity of component classifiers and Bagging can lower the variance and enhance the stability of the model. Finally, a fuzzy multiple criteria decision making method named fuzzy TOPSIS is employed to fuse the results of component classifiers. We evaluated the proposed model on the rectal cancer patient records dataset extracted from Surveillance, Epidemiology, and End Results (SEER) database. The results show that the proposed model obtains a significant improvement in terms of four standard metrics, including accuracy, specificity, sensitivity and area under the receiver operating characteristic curve, compared with single component classifiers and some other state-of-the-art ensemble classification models, such as Random Forest and Gradient Boosting Tree. Experiments also show that fusing component classifiers with fuzzy TOPSIS is superior to voting and simple weighted average methods. The proposed model outperforms other techniques in rectal cancer survival prediction, thereby improving the prognosis of rectal cancer patients and further assisting clinicians in developing better treatment plans.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-modality radiomics model predicts axillary lymph node metastasis of breast cancer using MRI and mammography

Article 10 February 2024

Machine learning for risk stratification of thyroid cancer patients: a 15-year cohort study

Article 30 October 2023

A Review on Random Forest: An Ensemble Classifier

Data availability

The data this paper uses is extracted from Surveillance, Epidemiology, and End Results (SEER), a public cancer records database in the United States. Every legitimate researcher has access to the data.

Code availability

For privacy, the code has not been publicly available.

References

Afrash, M. R., Mirbagheri, E., Mashoufi, M., & Kazemi-Arpanahi, H. (2023). Optimizing prognostic factors of five-year survival in gastric cancer patients using feature selection techniques with machine learning algorithms: A comparative study. BMC Medical Informatics and Decision Making, 23(1), 54. https://doi.org/10.1186/s12911-023-02154-y
Article Google Scholar
Alshdaifat, E., Al-hassan, M., & Aloqaily, A. (2021). Effective heterogeneous ensemble classification: An alternative approach for selecting base classifiers. ICT Express, 7(3), 342–349. https://doi.org/10.1016/j.icte.2020.11.005
Article Google Scholar
Anand, S. S., Smith, A. E., Hamilton, P. W., Anand, J. S., Hughes, J. G., & Bartels, P. H. (1999). An evaluation of intelligent prognostic systems for colorectal cancer. Artificial Intelligence in Medicine, 15(2), 193–214. https://doi.org/10.1016/s0933-3657(98)00052-9
Article Google Scholar
Asmita, S., & Shukla, K. K. (2014). Review on the architecture, algorithm and fusion strategies in ensemble learning. International Journal of Computer Applications, 108(8), 21–28. https://doi.org/10.5120/18932-0337
Article Google Scholar
Atanassov, K. T. (1986). Intuitionistic fuzzy sets. Fuzzy Sets and Systems, 20(1), 87–96. https://doi.org/10.1016/S0165-0114(86)80034-3
Article Google Scholar
Bardhi, O., & Zapirain, B. G. (2021). Machine learning techniques applied to electronic healthcare records to predict cancer patient survivability. Computers Materials and Continua, 68(2), 1595–1613. https://doi.org/10.32604/cmc.2021.015326
Article Google Scholar
Battineni, G., Sagaro, G. G., Chinatalapudi, N., & Amenta, F. (2020). Applications of machine learning predictive models in the chronic disease diagnosis. Journal of Personalized Medicine, 10(2), 21. https://doi.org/10.3390/jpm10020021
Article Google Scholar
Bhaskarla, A., Tang, P., Mashtare, T., Nwogu, C. E., Demmy, T. L., Adjei, A. A., Reid, M. E., & Yendamuri, S. (2010). Analysis of second primary lung cancers in the SEER database. Journal of Surgical Research, 162(1), 1–6. https://doi.org/10.1016/j.jss.2009.12.030
Article Google Scholar
Boeri, C., Chiappa, C., Galli, F., Berardinis, V. D., & Rovera, F. (2020). Machine learning techniques in breast cancer prognosis prediction: A primary evaluation. Cancer Medicine, 9(9), 3234–3243. https://doi.org/10.1002/cam4.2811
Article Google Scholar
Bowles, T. L., Hu, C., You, N., Skibber, J. M., Rodriguez-Bigas, M. A., & Chang, G. (2013). An individualized conditional survival calculator for patients with rectal cancer. Diseases of the Colon & Rectum, 56(5), 551–559. https://doi.org/10.1097/DCR.0b013e31827bd287
Article Google Scholar
Breiman, L. (1996). Bagging Predictors. Machine Learning, 24, 123–140. https://doi.org/10.10203/A:1018054314350
Article Google Scholar
Breiman, L. (2001). Random forest. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Article Google Scholar
Brenner, H., Gefeller, O., & Hakulinen, T. (2002). A computer program for period analysis of cancer patient survival. European Journal of Cancer, 38(5), 690–695. https://doi.org/10.1016/s0959-8049(02)00003-5
Article Google Scholar
Chang, S., Patel, N., Du, M., & Liang, P. (2021). Trends in early-onset vs. late-onset colorectal cancer incidence by race/ethnicity in the us cancer statistics database. Clinical Gastroenterology and Hepatology. https://doi.org/10.1016/j.cgh.2021.07.035
Article Google Scholar
Chen, C. (2000). Extensions of the TOPSIS for group decision-making under fuzzy environment. Fuzzy Sets & Systems, 114(1), 1–9. https://doi.org/10.1016/S0165-0114(97)00377-1
Article Google Scholar
Chen, S., & Hwang, C. (1992). Fuzzy multiple attribute decision making. Springer.
Book Google Scholar
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1023/A:1022627411411
Article Google Scholar
Cruz, J. A., & Wishart, D. S. (2007). Applications of machine learning in cancer prediction and prognosis. Cancer Informatics, 2, 59–77. https://doi.org/10.1177/117693510600200030
Article Google Scholar
Csiszár, O., Csiszár, G., & Dombi, J. (2020). How to implement mcdm tools and continuous logic into neural computation?: Towards better interpretability of neural networks. Knowledge-Based Systems, 210, 106530. https://doi.org/10.1016/j.knosys.2020.106530
Article Google Scholar
Delen, D., Walker, G., & Kadam, A. (2005). Predicting breast cancer survivability: A comparison of three data mining methods. Artificial Intelligence in Medicine, 34(2), 113–127. https://doi.org/10.1016/j.artmed.2004.07.002
Article Google Scholar
Demir, A., Alan, O., & Oruc, E. (2019). Tumor budding for predicting prognosis of resected rectum cancer after neoadjuvant treatment. World journal of surgical oncology, 17(1), 50. https://doi.org/10.1186/s12957-019-1588-6
Article Google Scholar
Dietterich, T. G. (1997). Machine-learning research: Four current directions. Ai Magazine, 18(4), 97–136. https://doi.org/10.1177/105971239700500310
Article Google Scholar
Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7), 1895–1923. https://doi.org/10.1162/089976698300017197
Article Google Scholar
El Rahman, S. A. (2021). Predicting breast cancer survivability based on machine learning and features selection algorithms: A comparative study. Journal of Ambient Intelligence and Humanized Computing, 12, 8585–8623. https://doi.org/10.1007/s12652-020-02590-y
Article Google Scholar
Elghazel, H., Aussem, A., & Perraud, F. (2011). Trading-off diversity and accuracy for optimal ensemble tree selection in random forests. In O. Okun, G. Valentini, & M. Re (Eds.), Ensembles in machine learning applications studies in computational intelligence (Vol. 373, pp. 169–179). Springer. https://doi.org/10.1007/978-3-642-22910-7_10
Chapter Google Scholar
Feng, Y., Wang, X., & Zhang, J. (2021). A heterogeneous ensemble learning method for neuroblastoma survival prediction. IEEE Journal of Biomedical and Health Informatics, 26(4), 1472–1483. https://doi.org/10.1109/JBHI.2021.3073056
Article Google Scholar
Ferlay, J., Colombet, M., Soerjomataram, I., Parkin, D. M., & Bray, F. (2021). Cancer statistics for the year 2020: An overview. International Journal of Cancer, 149, 778–789. https://doi.org/10.1002/ijc.33588
Article Google Scholar
Freund, Y., Schapire, R.E. (1996). Experiments with a new boosting algorithm. In Proceedings of the thirteenth international conference on international conference on machine learning, 148–156.
Fu, J., Kau, T. Y., Severson, R. K., & Kalemkerian, G. P. (2005). Lung cancer in women: Analysis of the national surveillance, epidemiology, and end results database. Chest, 127(3), 768–777. https://doi.org/10.1378/chest.127.3.768
Article Google Scholar
Gu, S., & Jin, Y. (2014). Generating diverse and accurate classifier ensembles using multi-objective optimization. IEEE Symposium on Computational Intelligence in Multi-Criteria Decision-Making, 2014, 9–15. https://doi.org/10.1109/MCDM.2014.7007182
Article Google Scholar
Gunasekaran, A., Karta, H., Oztekin, A., & Cebi, F. (2016). An integrated decision analytic framework of machine learning with multi-criteria decision making for multi-attribute inventory classification. Computer & Industrial Engineering, 101, 599–613. https://doi.org/10.1016/j.cie.2016.06.004
Article Google Scholar
Hamidreza, K., Moghadam, A. M. E., & Dehghan, M. (2021). Big data classification using heterogeneous ensemble classifiers in Apache Spark based on MapReduce paradigm. Expert System with Applications. https://doi.org/10.1016/j.eswa.2021.115369
Article Google Scholar
Hazra, A., Bera, N., & Mandal, A. (2017). Predicting lung cancer survivability using SVM and logistic regression algorithms. International Journal of Computer Applications, 174(2), 19–24. https://doi.org/10.5120/ijca2017915325
Article Google Scholar
Hu, Y., Chen, J., Chen, J., Wang, W., Zhao, S., & Hu, X. (2023). An ensemble classification model for depression based on wearable device sleep data. IEEE Journal of Biomedical and Health Informatics. https://doi.org/10.1109/JBHI.2023.3258601
Article Google Scholar
Hwang, C., & Yoon, K. (1981). Multiple attribute decision making. Lecture Notes in Economics & Mathematical Systems, 404(4), 287–288. https://doi.org/10.1007/978-3-642-48318-9
Article Google Scholar
Joshi, D., & Kumar, S. (2014). Intuitionistic fuzzy entropy and distance measure based TOPSIS method for multi-criteria decision making. Egyptian Informatics Journal, 15(2), 97–104. https://doi.org/10.1016/j.eij.2014.03.002
Article Google Scholar
Kadkhodaei, H. R., Moghadam, A., & Dehghan, M. (2020). Hboost: A heterogeneous ensemble classifier based on the boosting method and entropy measurement. Expert Systems with Applications, 157(2), 113482. https://doi.org/10.1016/j.eswa.2020.113482
Article Google Scholar
Kalcan, S., Sisik, A., Basak, F., Hasbahceci, M., Kilic, A., Kosmaz, K., Kivanc, A. E., Kudas, I., Bas, G., & Alimoglu, O. (2018). Evaluating factors affecting survival in colon and rectum cancer: A prospective cohort study with 161 patients. Journal of Cancer Research and Therapeutics, 14(2), 416–420. https://doi.org/10.4103/0973-1482.199390
Article Google Scholar
Kaur, I., Doja, M. N., & Ahmad, T. (2022). Data mining and machine learning in cancer survival research: An overview and future recommendations. Journal of Biomedical Informatics, 128, 104026. https://doi.org/10.1016/j.jbi.2022.104026
Article Google Scholar
Khan, U., Shin, H., Choi, J., & Kim, M. (2008). wFDT-weighted fuzzy decision trees for prognosis of breast cancer survivability. Conferences in Research and Practice in Information Technology Series, 87, 141–152.
Google Scholar
Kim, S. I., Lee, S., Choi, C. H., Lee, M., & Kim, Y. B. (2020). Prediction of disease recurrence according to surgical approach of primary radical hysterectomy in patients with early-stage cervical cancer using machine learning methods. Gynecologic Oncology, 159, 185–186. https://doi.org/10.1016/j.ygyno.2020.05.283
Article Google Scholar
Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V., & Fotiadis, D. I. (2014). Machine learning applications in cancer prognosis and prediction. Computational and Structural Biotechnology Journal, 13, 8–17. https://doi.org/10.1016/j.csbj.2014.11.005
Article Google Scholar
Li, K., Huang, H., Ye, X., & Cui, L. (2004). A selective approach to neural network ensemble based on clustering technology. In Proceedings of 2004 international conference on machine learning & cybernetics. https://doi.org/10.1109/ICMLC.2004.1378592
Lynch, C. M., Abdollahi, B., Fuqua, J. D., de Carlo, A. R., Bartholomai, J. A., Balgemann, R. N., van Berkel, V. H., & Frieboes, H. B. (2017). Prediction of lung cancer patient survival via supervised machine learning classification techniques. International Journal of Medical Informatics, 108, 1–8. https://doi.org/10.1016/j.ijmedinf.2017.09.013
Article Google Scholar
Naghizadeh, M., & Habibi, N. (2019). A model to predict the survivability of cancer comorbidity through ensemble learning approach. Expert Systems, 36(3), e12392. https://doi.org/10.1111/exsy.12392
Article Google Scholar
Nguyen, C., Yong, W., & Nguyen, H. N. (2013). Random forest classifier combined with feature selection for breast cancer diagnosis and prognostic. Journal of Biomedical Science & Engineering, 06(5), 551–560. https://doi.org/10.4236/jbise.2013.65070
Article Google Scholar
Nourelahi, M., Zamani, A., Talei, A., & Tahmasebi, S. (2019). A model to predict breast cancer survivability using logistic regression. Middle East Journal of Cancer, 10(2), 132–138. https://doi.org/10.30476/mejc.2019.78569
Article Google Scholar
Opitz, D., & Maclin, R. (1999). Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11, 169–198. https://doi.org/10.1613/jair.614
Article Google Scholar
Ostvar, N., & Moghadam, A. M. E. (2020). HDEC: A heterogeneous dynamic ensemble classifier for binary datasets. Computational Intelligence and Neuroscience Journal. https://doi.org/10.1155/2020/8826914
Article Google Scholar
Park, K., Ali, A., Kim, D., An, Y., Kim, M., & Shin, H. (2013). Robust predictive model for evaluating breast cancer survivability. Engineering Applications of Artificial Intelligence, 26(9), 2194–2205. https://doi.org/10.1016/j.engappai.2013.06.013
Article Google Scholar
Prentice, R. L., & Gloeckler, L. A. (1978). Regression analysis of grouped survival data with application to breast cancer data. Biometrics, 34(1), 57–67. https://doi.org/10.2307/2529588
Article Google Scholar
Pruitt, S. L., Gerber, D. E., Zhu, H., Heitjan, D. F., Maddineni, B., Xiong, D., Singal, A. G., Tavakkoli, A., Halm, E. A., & Murphy, C. C. (2021). Survival of patients newly diagnosed with colorectal cancer and with a history of previous cancer. Cancer Medicine, 10(14), 4752–4767. https://doi.org/10.1002/cam4.4036
Article Google Scholar
Sharma, A., & Rani, R. (2021). A systematic review of applications of machine learning in cancer prediction and diagnosis. Archives of Computational Methods in Engineering. https://doi.org/10.1007/s11831-021-09556-z
Article Google Scholar
Sharma, R. (2021). A systematic examination of burden of childhood cancers in 183 countries: Estimates from Globocan 2018. European Journal of Cancer Care. https://doi.org/10.1111/ecc.13438
Article Google Scholar
Simsek, S., Kursuncu, U., Kibis, E., AnisAbdellatif, M., & Dag, A. (2020). A hybrid data mining approach for identifying the temporal effects of variables associated with breast cancer survival. Expert Systems with Applications, 139, 112863. https://doi.org/10.1016/j.eswa.2019.112863
Article Google Scholar
Tewari, S., & Dwivedi, U. D. (2020). A comparative study of heterogeneous ensemble methods for the identification of geological lithofacies. Journal of Petroleum Exploration and Production Technology, 10(5), 1849–1868. https://doi.org/10.1007/s13202-020-00839-y
Article Google Scholar
Thongkam, J., Xu, G., & Zhang, Y. (2008). AdaBoost algorithm with random forests for predicting breast cancer survivability. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). https://doi.org/10.1109/IJCNN.2008.4634231
Wang, K., Makond, B., Chen, K., & Wang, K. (2014). A hybrid classifier combining SMOTE with PSO to estimate 5-year survivability of breast cancer patients. Applied Soft Computing, 20, 15–24. https://doi.org/10.1016/j.asoc.2013.09.014
Article Google Scholar
Wang, K., Makond, B., & Wang, K. (2013). An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data. BMC Medical Informatics and Decision Making, 13(4), 409–418. https://doi.org/10.1186/1472-6947-13-124
Article Google Scholar
Wang, S., Emery, R., Fuller, C. D., Kim, J. S., Sittig, D. F., & Thomas, C. R. (2007). Conditional survival in gastric cancer: A seer database analysis. Gastric Cancer, 10(3), 153–158. https://doi.org/10.1007/s10120-007-0424-9
Article Google Scholar
Wang, Y., Wang, D., Ye, X., Wang, Y., Yin, Y., & Jin, Y. (2018). A tree ensemble-based two-stage model for advanced-stage colorectal cancer survival prediction. Information Sciences, 474, 106–124. https://doi.org/10.1016/j.ins.2018.09.046
Article Google Scholar
West, D., Mangiameli, P., Rampal, R., & West, V. (2005). Ensemble strategies for a medical diagnostic decision support system: A breast cancer diagnosis application. European Journal of Operational Research, 162(2), 532–551. https://doi.org/10.1016/j.ejor.2003.10.013
Article Google Scholar
Xu, Z., & Yager, R. (2006). Some geometric aggregation operators based on intuitionistic fuzzy sets. International Journal of General System, 35(4), 417–433. https://doi.org/10.1080/03081070600574353
Article Google Scholar
Zampino, M. G., Labianca, R., Beretta, G. D., Magni, E., Gatta, G., Leonardi, M. C., Chiappa, A., Biffi, R., de Braud, F., & Wils, J. (2009). Rectal cancer. Critical Reviews in Oncology Hematology, 70(2), 160–182. https://doi.org/10.1016/j.critrevonc.2008.10.010
Article Google Scholar
Zhou, Z. H., Wu, J., & Tang, W. (2002). Ensembling neural networks: Many could be better than all. Artificial Intelligence, 137(1–2), 239–263. https://doi.org/10.1016/S0004-3702(02)00190-X
Article Google Scholar
Zolfani, S. H., & Derakhti, A. (2020). Synergies of text mining and multiple attribute decision making: A criteria selection and weighting system in a prospective MADM outline. Symmetry, 12(5), 868. https://doi.org/10.3390/sym12050868
Article Google Scholar

Download references

Funding

This research was funded by National Nature Science Foundation of China (71971223).

Author information

Authors and Affiliations

School of Business, Central South University, Changsha, 410083, China
Fengyu Zhang & Xihua Li

Authors

Fengyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xihua Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

XL: Study design and development, literature collection, continued research support, final paper review, FZ: Study design and development, programming, data collection and analysis, final paper review.

Corresponding author

Correspondence to Xihua Li.

Ethics declarations

Conflict of interest

The authors (Xihua Li, Fengyu Zhang) do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, F., Li, X. HBagging-MCDM: an ensemble classifier combined with multiple criteria decision making for rectal cancer survival prediction. Ann Oper Res 335, 469–490 (2024). https://doi.org/10.1007/s10479-023-05642-6

Download citation

Received: 21 March 2022
Accepted: 04 October 2023
Published: 28 October 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s10479-023-05642-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HBagging-MCDM: an ensemble classifier combined with multiple criteria decision making for rectal cancer survival prediction

Abstract

Access this article

Similar content being viewed by others

Multi-modality radiomics model predicts axillary lymph node metastasis of breast cancer using MRI and mammography

Machine learning for risk stratification of thyroid cancer patients: a 15-year cohort study

A Review on Random Forest: An Ensemble Classifier

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

HBagging-MCDM: an ensemble classifier combined with multiple criteria decision making for rectal cancer survival prediction

Abstract

Access this article

Similar content being viewed by others

Multi-modality radiomics model predicts axillary lymph node metastasis of breast cancer using MRI and mammography

Machine learning for risk stratification of thyroid cancer patients: a 15-year cohort study

A Review on Random Forest: An Ensemble Classifier

Data availability

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation