Skip to main content

Multi-objective Evolutionary Discretization of Gene Expression Profiles: Application to COVID-19 Severity Prediction

  • Conference paper
  • First Online:
Applications of Evolutionary Computation (EvoApplications 2023)

Abstract

Machine learning models can use information from gene expressions in patients to efficiently predict the severity of symptoms for several diseases. Medical experts, however, still need to understand the reasoning behind the predictions before trusting them. In their day-to-day practice, physicians prefer using gene expression profiles, consisting of a discretized subset of all data from gene expressions: in these profiles, genes are typically reported as either over-expressed or under-expressed, using discretization thresholds computed on data from a healthy control group. A discretized profile allows medical experts to quickly categorize patients at a glance. Building on previous works related to the automatic discretization of patient profiles, we present a novel approach that frames the problem as a multi-objective optimization task: on the one hand, after discretization, the medical expert would prefer to have as few different profiles as possible, to be able to classify patients in an intuitive way; on the other hand, the loss of information has to be minimized. Loss of information can be estimated using the performance of a classifier trained on the discretized gene expression levels. We apply one common state-of-the-art evolutionary multi-objective algorithm, NSGA-II, to the discretization of a dataset of COVID-19 patients that developed either mild or severe symptoms. The results show not only that the solutions found by the approach dominate traditional discretization based on statistical analysis and are more generally valid than those obtained through single-objective optimization, but that the candidate Pareto-optimal solutions preserve the sense-making that practitioners find necessary to trust the results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://pythonhosted.org/inspyred/.

  2. 2.

    https://www.ncbi.nlm.nih.gov/geo/.

References

  1. Alghamdi, H.S., Amoudi, G., Elhag, S., Saeedi, K., Nasser, J.: Deep learning approaches for detecting Covid-19 from chest x-ray images: a survey. IEEE Access 9, 20235–20254 (2021)

    Article  Google Scholar 

  2. Bernal, E., et al.: Activating killer-cell immunoglobulin-like receptors are associated with the severity of Covid-19. J. Infect. Diseases (2021)

    Google Scholar 

  3. Brazma, A., Vilo, J.: Gene expression data analysis. FEBS Lett. 480(1), 17–24 (2000)

    Article  Google Scholar 

  4. Breiman, L.: Pasting small votes for classification in large databases and on-line. Mach. Learn. 36(1–2), 85–103 (1999)

    Article  Google Scholar 

  5. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  6. Cantu, N., et al.: Synergistic effects of multiple factors involved in Covid-19-dependent muscle loss. Aging Disease, 9 (2021)

    Google Scholar 

  7. Cavallo, J.J., Donoho, D.A., Forman, H.P.: Hospital capacity and operations in the coronavirus disease 2019 (COVID-19) pandemic—planning for the nth patient. JAMA Health Forum 1(3), e200345 (2020). https://doi.org/10.1001/jamahealthforum.2020.0345

  8. de Chassey, B., et al.: The interactomes of influenza virus ns1 and ns2 proteins identify new host factors and provide insights for adar1 playing a supportive role in virus replication. PLoS Pathog. 9(7), e1003440 (2013)

    Article  Google Scholar 

  9. Chen, S., Duan, H., Xie, Y., Li, X., Zhao, Y.: Expression and prognostic analysis of rho gtpase-activating protein 11a in lung adenocarcinoma. Ann. Transl. Med. 9(10) (2021)

    Google Scholar 

  10. Chien, Y., Fu, K.S.: On the generalized karhunen-loève expansion (corresp.). IEEE Trans. Inf. Theory 13(3), 518–520 (1967)

    Google Scholar 

  11. Chin, L., Gray, J.W.: Translating insights from the cancer genome into clinical practice. Nature 452(7187), 553–563 (2008)

    Article  Google Scholar 

  12. Cox, D.R.: The regression analysis of binary sequences. J. Roy. Stat. Soc. Ser. B (Methodol.), 215–242 (1958)

    Google Scholar 

  13. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. J. Mach. Learn. Res. 7(Mar), 551–585 (2006)

    Google Scholar 

  14. Cruz-Rodriguez, N., Quijano, S.M., Enciso, L.J., Combita, A.L., Zabaleta, J.: Gene expression signature predicts induction treatment response and clinical outcome in adult Colombian patients with acute lymphoblastic leukemia (2016)

    Google Scholar 

  15. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)

    Article  Google Scholar 

  16. Delorey, T.M., et al.: Covid-19 tissue atlases reveal SARS-COV-2 pathology and cellular targets. Nature, 1–8 (2021)

    Google Scholar 

  17. Fang, K.Y., et al.: Screening the hub genes and analyzing the mechanisms in discharged Covid-19 patients retesting positive through bioinformatics analysis. J. Clin. Lab. Anal. 36(7), e24495 (2022)

    Article  Google Scholar 

  18. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. stat., 1189–1232 (2001)

    Google Scholar 

  19. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  20. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)

    Article  MATH  Google Scholar 

  21. Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1), 55–67 (1970). https://doi.org/10.1080/00401706.1970.10488634

    Article  MATH  Google Scholar 

  22. Klaric, L., et al.: Mendelian randomisation identifies alternative splicing of the FAS death receptor as a mediator of severe Covid-19. medRxiv (2021)

    Google Scholar 

  23. Klopfenstein, T., et al.: Features of anosmia in Covid-19. Medecine et maladies infectieuses 50(5), 436–439 (2020)

    Article  Google Scholar 

  24. Konigsberg, I.R., et al.: Host methylation predicts SARS-COV-2 infection and clinical outcome. Commun. Med. 1(1), 1–10 (2021)

    Article  Google Scholar 

  25. Lewis, P.: The characteristic selection problem in recognition systems. IRE Trans. Inf. Theory 8(2), 171–178 (1962)

    Article  MATH  Google Scholar 

  26. Lopez-Rincon, A., Martinez-Archundia, M., Martinez-Ruiz, G.U., Schoenhuth, A., Tonda, A.: Automatic discovery of 100-mirna signature for cancer classification using ensemble feature selection. BMC Bioinform. 20(1), 480 (2019)

    Article  Google Scholar 

  27. Lopez-Rincon, A., et al.: Machine learning-based ensemble recursive feature selection of circulating mirnas for cancer tumor classification. Cancers 12(7), 1785 (2020)

    Article  Google Scholar 

  28. Lu, Y., et al.: Dynamic edge-based biomarker non-invasively predicts hepatocellular carcinoma with hepatitis b virus infection for individual patients based on blood testing. J. Mol. Cell Biol. 11(8), 665–677 (2019)

    Article  Google Scholar 

  29. Ma, Y., Chen, S.S., Feng, Y.Y., Wang, H.L.: Identification of novel biomarkers involved in pulmonary arterial hypertension based on multiple-microarray analysis. Biosci. Rep. 40(9) (2020)

    Google Scholar 

  30. Mouhrim, N., Tonda, A., Rodríguez-Guerra, I., Kraneveld, A.D., Rincon, A.L.: An evolutionary approach to the discretization of gene expression profiles to predict the severity of COVID-19. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion. ACM, July 2022. https://doi.org/10.1145/3520304.3529001

  31. de Moura, M.C., et al.: Epigenome-wide association study of Covid-19 severity with respiratory failure. EBioMedicine 66, 103339 (2021)

    Article  Google Scholar 

  32. Ng, D.L., et al.: A diagnostic host response biosignature for Covid-19 from RNA profiling of nasal swabs and blood. Sci. Adv. 7(6), eabe5984 (2021)

    Google Scholar 

  33. Paiva, B., et al.: Phenotypic and genomic analysis of multiple myeloma minimal residual disease tumor cells: a new model to understand chemoresistance. Blood J. Am. Soc. Hematol. 127(15), 1896–1906 (2016)

    Google Scholar 

  34. Pedregosa, F.,et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    Google Scholar 

  35. Platt, J.: Others: probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classifiers 10(3), 61–74 (1999)

    Google Scholar 

  36. Rincon, A.L., Kraneveld, A.D., Tonda, A.: Batch correction of genomic data in chronic fatigue syndrome using CMA-ES. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, pp. 277–278. ACM, Cancun, July 2020. https://doi.org/10.1145/3377929.3389947

  37. Roser, M.: Covid-19 data explorer (2022). https://ourworldindata.org/explorers/coronavirus-data-explorer

  38. Safran, M., et al.: Genecards version 3: the human gene integrator. Database 2010 (2010)

    Google Scholar 

  39. Sussman, N.: Time for bed(s): hospital capacity and mortality from Covid-19. COVIDEconomics, pp. 116–129 (2020)

    Google Scholar 

  40. Torabi, A., et al.: Proinflammatory cytokines in the olfactory mucosa result in Covid-19 induced anosmia. ACS Chem. Neurosci. 11(13), 1909–1913 (2020)

    Article  Google Scholar 

  41. Turjya, R.R., Khan, M.A.A.K., Mir Md. Khademul Islam, A.B.: Perversely expressed long noncoding RNAs can alter host response and viral proliferation in SARS-COV-2 infection. Future Virol. 15(9), 577–593 (2020)

    Google Scholar 

  42. Vabalas, A., Gowen, E., Poliakoff, E., Casson, A.J.: Machine learning algorithm validation with a limited sample size. PLoS ONE 14(11), e0224365 (2019)

    Article  Google Scholar 

  43. Vignolo, L.D., Milone, D.H., Scharcanski, J.: Feature selection for face recognition based on multi-objective evolutionary wrappers. Expert Syst. Appl. 40(13), 5077–5084 (2013)

    Article  Google Scholar 

  44. Wilson, J.C., et al.: Integrated mirna/cytokine/chemokine profiling reveals severity-associated step changes and principal correlates of fatality in Covid-19. Iscience, 103672 (2021)

    Google Scholar 

  45. Xu, J., et al.: RhoGAPs attenuate cell proliferation by direct interaction with p53 tetramerization domain. Cell Rep. 3(5), 1526–1538 (2013)

    Article  Google Scholar 

  46. Yan, J., Li, P., Gao, R., Li, Y., Chen, L.: Identifying critical states of complex diseases by single-sample jensen-shannon divergence. Front. Oncol. 11, 1824 (2021)

    Google Scholar 

  47. Zhang, L., et al.: Long noncoding RNA expression profile from cryptococcal meningitis patients identifies dpy19l1p1 as a new disease marker. CNS Neurosci. Therapeutics 25(6), 772–782 (2019)

    Article  MathSciNet  Google Scholar 

  48. Zhang, T.: Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Twenty-First International Conference on Machine Learning - ICML 2004. ACM Press (2004). https://doi.org/10.1145/1015330.1015332

  49. Zhou, Z., Li, S., Qin, G., Folkert, M., Jiang, S., Wang, J.: Multi-objective-based radiomic feature selection for lesion malignancy classification. IEEE J. Biomed. Health Inform. 24(1), 194–204 (2020). https://doi.org/10.1109/jbhi.2019.2902298

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Rojas-Velazquez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rojas-Velazquez, D., Tonda, A., Rodriguez-Guerra, I., Kraneveld, A.D., Lopez-Rincon, A. (2023). Multi-objective Evolutionary Discretization of Gene Expression Profiles: Application to COVID-19 Severity Prediction. In: Correia, J., Smith, S., Qaddoura, R. (eds) Applications of Evolutionary Computation. EvoApplications 2023. Lecture Notes in Computer Science, vol 13989. Springer, Cham. https://doi.org/10.1007/978-3-031-30229-9_45

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30229-9_45

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30228-2

  • Online ISBN: 978-3-031-30229-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics