Skip to main content

Advertisement

Log in

A robust innovative pipeline-based machine learning framework for predicting COVID-19 in Mexican patients

  • ORIGINAL ARTICLE
  • Published:
International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Abstract

The emergence of COVID-19 in late 2019 in Wuhan, China, has led to a global health crisis that has claimed many lives worldwide. A thorough understanding of the available COVID-19 datasets can enable healthcare professionals to identify cases at an early stage. This study presents an innovative pipeline-based framework for predicting survival and mortality in patients with COVID-19 by leveraging the Mexican COVID-19 patient dataset (COVID-19-MPD dataset). Preprocessing plays a pivotal role in ensuring that the framework delivers high-quality outcomes. We deploy various machine learning models with optimized hyperparameters within the framework. Through consistent experimental conditions and dataset utilization, we conducted multiple experiments employing diverse preprocessing techniques and models to maximize the area under the receiver operating characteristic curve (AUC) for COVID-19 prediction. Given the considerable dimensions of the dataset, feature selection is crucial for identifying factors influencing COVID-19 mortality or survival. We employ feature dimension reduction methods, such as principal component analysis and independent component analysis, in addition to feature selection techniques such as maximum relevance minimum redundancy and permutation feature importance. Impactful features related to patient outcomes can significantly aid experts in disease management by enhancing treatment efficacy and control measures. Following various experiments with standardized data and AUC assessment using the k-nearest neighbor algorithm with four components, the proposed framework achieves optimal results, attaining an AUC of 100%. Given its effectiveness in COVID-19 prediction, this framework has the potential for integration into medical decision support systems.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

Data were used from a publicly available dataset [Online] Available: https://www.gob.mx/salud/documentos/datos-abiertos-152127

References

  • Abbas NAM, Salman HM (2020) Enhancing linear independent component analysis: comparison of various metaheuristic methods. Iraqi J Electr Electron Eng 16(1)

  • Abdulkareem NM, Abdulazeez AM, Zeebaree DQ, Hasan DA (2021) COVID-19 world vaccination progress using machine learning classification algorithms. Qubahan Acad J 1(2):100–105

    Article  Google Scholar 

  • Abnoosian K, Farnoosh R, Behzadi MH (2023a) A pipeline-based framework for early prediction of diabetes. J Health Biomed Inform 10(2):125–140

    Google Scholar 

  • Abnoosian K, Farnoosh R, Behzadi MH (2023b) Prediction of diabetes disease using an ensemble of machine learning multiclassifier models. BMC Bioinformatics 24(1):337

    Article  Google Scholar 

  • Aguirre AA, Catherina R, Frye H, Shelley L (2020) Illicit wildlife trade, wet markets, and COVID-19: preventing future pandemics. World Medical & Health Policy 12(3):256–265

    Article  Google Scholar 

  • Akila A, Parameswari R, Jayakumari C (2022) Big data in healthcare: management, analysis, and future prospects. Handbook of Intelligent Healthcare Analytics: Knowledge Engineering with Big Data Analytics. https://doi.org/10.1002/9781119792550.ch14

    Article  Google Scholar 

  • Alkady W, ElBahnasy K, Leiva V, Gad W (2022) Classifying COVID-19 based on amino acids encoding with machine learning algorithms. Chemom Intell Lab Syst 224:104535

    Article  Google Scholar 

  • Althouse LA, Ware WB, Ferron JM (1998) Detecting departures from normality: a monte carlo simulation of a new omnibus test based on moments.

  • Bakar NA, Rosbi S (2020) Effect of coronavirus disease (COVID-19) to tourism industry. Int J Adv Eng Res Sci 7(4):189–193

    Article  Google Scholar 

  • Barut Z, Altuntaş V (2023) Comparison of performance of different k values with k-fold cross validation in a graph-based learning model for incrna-disease prediction. Kırklareli Üniversitesi Mühendislik Ve Fen Bilimleri Dergisi 9(1):63–82

    Article  Google Scholar 

  • Charbuty B, Abdulazeez A (2021) Classification based on decision tree algorithm for machine learning. J Appl Sci Technol Trends 2(01):20–28

    Article  Google Scholar 

  • Choo SW et al (2020) Are pangolins scapegoats of the COVID-19 outbreak-CoV transmission and pathology evidence? Conserv Lett 13(6):e12754

    Article  Google Scholar 

  • Claesen M, Simm J, Popovic D, Moreau Y, De Moor B (2014) Easy hyperparameter search using optunity. arXiv preprint arXiv:1412.1114

  • Cleff T (2014) Exploratory data analysis in business and economics. Explor Data Anal Bus Econ. https://doi.org/10.1007/978-3-319-01517-0

    Article  Google Scholar 

  • Dash S, Shakyawar SK, Sharma M, Kaushik S (2019) Big data in healthcare: management, analysis and future prospects. J Big Data 6(1):1–25

    Article  Google Scholar 

  • Davenport T, Kalakota R (2019) The potential for artificial intelligence in healthcare. Future Healthc J 6(2):94

    Article  Google Scholar 

  • Dsouza J (2020) Using exploratory data analysis for generating inferences on the correlation of COVID-19 cases. In: 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT). IEEE, pp 1–6

  • Faraggi D, Reiser B (2002) Estimation of the area under the ROC curve. Stat Med 21(20):3093–3106

    Article  Google Scholar 

  • Forte GF, Bauza JMT, de Pau V, Vall M, Camps A (2013) Experimental study on the performance of RFI detection algorithms in microwave radiometry: toward an optimum combined test. IEEE Trans Geosci Remote Sens 51(10):4936–4944

    Article  Google Scholar 

  • Garg M et al (2021) Computed tomography chest in COVID-19: when & why? Indian J Med Res 153(1–2):86

    Article  Google Scholar 

  • Habehh H, Gohel S (2021) Machine learning in healthcare. Curr Genomics 22(4):291–300

    Article  Google Scholar 

  • Hong SR, Hullman J, Bertini E (2020) Human factors in model interpretability: Industry practices, challenges, and needs. Proc ACM on Human-Comput Interact 4(CSCW1):1–26

    Article  Google Scholar 

  • https://data.who.int/dashboards/covid19/cases

  • https://www.gob.mx/salud/documentos/datos-abiertos-152127

  • Hulsen T et al (2019) From big data to precision medicine. Front Med 6:34

    Article  Google Scholar 

  • Hymer C, Smith AD (2022) Harnessing the positive side of negative cases: Exemplars and queries for qualitative researchers. Academy of management proceedings, 2022(1) Academy of Management Briarcliff Manor, NY 10510 Academy of Management, 202(1):14341

  • Jamwal S, Gautam A, Elsworth J, Kumar M, Chawla R, Kumar P (2020) An updated insight into the molecular pathogenesis, secondary complications and potential therapeutics of COVID-19 pandemic. Life Sci 257:118105

    Article  Google Scholar 

  • Karpievitch YV, Dabney AR, Smith RD (2012) Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinformatics 13(16):1–9

    Google Scholar 

  • Kim ES et al (2020) Clinical course and outcomes of patients with severe acute respiratory syndrome coronavirus 2 infection: a preliminary report of the first 28 patients from the Korean cohort study on COVID-19. J Korean Med Sci 35(13):e142

    Article  Google Scholar 

  • La Rosa G, Bonadonna L, Lucentini L, Kenmoe S, Suffredini E (2020) Coronavirus in water environments: occurrence, persistence and concentration methods-A scoping review. Water Res 179:115899

    Article  Google Scholar 

  • Lei H-Y et al (2021) Potential effects of SARS-CoV-2 on the gastrointestinal tract and liver. Biomed Pharmacother 133:111064

    Article  Google Scholar 

  • Linnenbrink J, Milà C, Ludwig M, Meyer H (2023) kNNDM: k-fold nearest neighbour distance matching cross-validation for map accuracy estimation. Egusphere 2023:1–16

    Google Scholar 

  • Magge A et al (2021) Proceedings of the sixth social media mining for health (#SMM4H) workshop and shared task. In: Proceedings of the sixth social media mining for health (# SMM4H) workshop and shared task

  • Maleki M, Mahmoudi MR, Wraith D, Pho K-H (2020) Time series modelling to forecast the confirmed and recovered cases of COVID-19. Travel Med Infect Dis 37:101742

    Article  Google Scholar 

  • Mehta N, Pandit A, Shukla S (2019) Transforming healthcare with big data analytics and artificial intelligence: a systematic mapping study. J Biomed Inform 100:103311

    Article  Google Scholar 

  • Mohamad IB, Usman D (2013) Standardization and its effects on K-means clustering algorithm. Res J Appl Sci Eng Technol 6(17):3299–3303

    Article  Google Scholar 

  • Munazhif NF, Yanris GJ, Hasibuan MNS (2023) Implementation of the K-nearest neighbor (kNN) method to determine outstanding student classes. Sinkron: Jurnal Dan Penelitian Teknik Informatika 8(2):719–732

    Article  Google Scholar 

  • Nadarajan R, Sulaiman N (2023) Evaluation of K-fold value in breast cancer diagnosis technique using SVM and bioinspired optimization algorithm (JA-ABC5). In: 2023 IEEE 13th symposium on computer applications & industrial electronics (ISCAIE). IEEE, pp 130–135

  • Nielsen SH et al (2021) 31,600-year-old human virus genomes support a Pleistocene origin for common childhood infections. BioRxiv. https://doi.org/10.1101/2021.06.28.450199

    Article  Google Scholar 

  • Oja E, Yuan Z (2006) The fastica algorithm revisited: convergence analysis. IEEE Trans Neural Netw 17(6):1370–1381

    Article  Google Scholar 

  • Ortiz-Prado E et al (2020) Clinical, molecular, and epidemiological characterization of the SARS-CoV-2 virus and the coronavirus disease 2019 (COVID-19), a comprehensive literature review. Diagn Microbiol Infect Dis 98(1):115094

    Article  Google Scholar 

  • Oyedele O (2023) Determining the optimal number of folds to use in a K-fold cross-validation: a neural network classification experiment. Res Math 10(1):2201015

    Article  MathSciNet  Google Scholar 

  • Pandeva T, Forré P (2023) Multi-view independent component analysis with shared and individual sources. In: Uncertainty in artificial intelligence, PMLR, pp 1639–1650

  • Pattnayak P, Panda AR (2021) Innovation on machine learning in healthcare services—An introduction. IN: Technical advancements of machine learning in healthcare. Springer, pp 1–30

  • Pleil JD (2016) QQ-plots for assessing distributions of biomarker measurements and generating defensible summary statistics. J Breath Res 10(3):035001

    Article  Google Scholar 

  • Ramírez-Gallego S et al (2017) Fast-mRMR: fast minimum redundancy maximum relevance algorithm for high-dimensional big data. Int J Intell Syst 32(2):134–152

    Article  Google Scholar 

  • Ramosaj B, Pauly M (2023) Consistent and unbiased variable selection under indepedent features using random forest permutation importance. Bernoulli 29(3):2101–2118

    Article  MathSciNet  Google Scholar 

  • Raoult D, Roux V (1997) Rickettsioses as paradigms of new or emerging infectious diseases. Clin Microbiol Rev 10(4):694–719

    Article  Google Scholar 

  • Sahlol AT, Yousri D, Ewees AA, Al-Qaness MA, Damasevicius R, Elaziz MA (2020) COVID-19 image classification using deep features and fractional-order marine predators algorithm. Sci Rep 10(1):1–15

    Article  Google Scholar 

  • Sakar CO, Kursun O, Gurgen F (2012) A feature selection method based on kernel canonical correlation analysis and the minimum redundancy-maximum relevance filter method. Expert Syst Appl 39(3):3432–3437

    Article  Google Scholar 

  • Schmidt JM, de Manuel M, Marques-Bonet T, Castellano S, Andrés AM (2019) Evidence that viruses, particularly SIV, drove genetic adaptation in natural populations of eastern chimpanzees. bioRxiv. https://doi.org/10.1101/582411

    Article  Google Scholar 

  • Sebe N, Lew MS, Cohen I, Garg A, Huang TS (2002) Emotion recognition using a cauchy naive bayes classifier. International conference on pattern recognition 1. IEEE, pp 17–20

  • Serrano CO et al (2020) Pediatric chest X-ray in covid-19 infection. Eur J Radiol 131:109236

    Article  Google Scholar 

  • Sethy PK, Behera SK (2020) Detection of coronavirus disease (covid-19) based on deep features

  • Sun X, Qourbani A (2023) Combining ensemble classification and integrated filter-evolutionary search for breast cancer diagnosis. J Cancer Res Clin Oncol 149(12):10753–10769

    Article  Google Scholar 

  • Tabaghi P, Khanzadeh M, Wang Y, Mirarab S (2023) Principal component analysis in space forms. arXiv preprint arXiv:2301.02750

  • Tebit DM et al (2020) Elucidating the viral and host factors enabling the cross-species transmission of primate lentiviruses from simians to humans. bioRxiv. https://doi.org/10.1101/2020.10.13.337303

    Article  Google Scholar 

  • Tsatsakis A et al (2020) SARS-CoV-2 pathophysiology and its clinical implications: an integrative overview of the pharmacotherapeutic management of COVID-19. Food Chem Toxicol 146:111769

    Article  Google Scholar 

  • Warren CJ, Sawyer SL (2023) Identifying animal viruses in humans. Science 379(6636):982–983

    Article  Google Scholar 

  • White J, Power SD (2023) k-fold cross-validation can significantly over-estimate true classification accuracy in common EEG-based passive BCI experimental designs: an empirical investigation. Sensors 23(13):6077

    Article  Google Scholar 

  • Woan Ching SL et al (2022) Multiclass convolution neural network for classification of COVID-19 CT images. Comput Intell Neurosci. https://doi.org/10.1155/2022/9167707

    Article  Google Scholar 

  • Xu Y et al (2021) Artificial intelligence: a powerful paradigm for scientific research. The Innovation 2(4):100179

    Article  MathSciNet  Google Scholar 

  • Yachou Y, El Idrissi A, Belapasov V, Ait Benali S (2020) Neuroinvasion, neurotropic, and neuroinflammatory events of SARS-CoV-2: understanding the neurological manifestations in COVID-19 patients. Neurol Sci 41(10):2657–2669

    Article  Google Scholar 

  • Yang S, Rothman RE (2004) PCR-based diagnostics for infectious diseases: uses, limitations, and future applications in acute-care settings. Lancet Infect Dis 4(6):337–348

    Article  Google Scholar 

  • Zarzoso V, Comon P, Kallel M (2006) How fast is FastICA?. In: 2006 14th European signal processing conference. IEEE, pp 1–5

Download references

Acknowledgements

Finally, we express our gratitude to Dr. Mitra Esmaeili Azad, MD, from Shahid Beheshti University of Medical Sciences, for helping with the medical aspects of this research.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Contributions

Karlo Abnoosian and Rahman Farnoosh conceived the method. Karlo Abnoosian developed the algorithm and performed the simulations. Karlo Abnoosian and Rahman Farnoosh analysed the results and wrote the paper. All the authors have read and approved the final manuscript.

Corresponding author

Correspondence to Rahman Farnoosh.

Ethics declarations

Conflict of interest

The authors declare no competing interests On behalf of all authors, the corresponding author states that there is no conflict of interest.

Ethics approval

This article is exempt and does not require ethics approval.

Consent to participate

Informed consent was obtained from all individual participants included in the study.

Consent to publish

The authors affirm that human research participants provided informed consent for publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Farnoosh, R., Abnoosian, K. A robust innovative pipeline-based machine learning framework for predicting COVID-19 in Mexican patients. Int J Syst Assur Eng Manag (2024). https://doi.org/10.1007/s13198-024-02354-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13198-024-02354-3

Keywords

Navigation