Skip to main content

Recent Dimensionality Reduction Techniques for High-Dimensional COVID-19 Data

  • 266 Accesses

Part of the Lecture Notes in Computer Science book series (LNBI,volume 13483)

Abstract

We are going through the last years of the COVID-19 pandemic, where almost the entire research community has focused on the challenges that constantly arise. From the computational and mathematical perspective, we have to deal with a dataset with ultra-high volume and ultra-high dimensionality in several experimental studies. An indicative example is DNA sequencing technologies, which offer a more realistic picture of human diseases at the molecular biology level. However, these technologies produce data with high complexity and ultra-high dimensionality. On the other hand, dimensionality reduction techniques are the first choice to address this complexity, revealing the hidden data structure in the original multidimensional space. Also, such techniques can improve the efficiency of machine learning tasks such as classification and clustering. Towards this direction, we study the behavior of seven well-known and cutting-edge dimensionality reduction techniques tailored for RNA-sequencing data. Along with the study of the effect of these algorithms, we propose the extension of the Random projection and Geodesic distance t-Stochastic Neighbor Embedding (RGt-SNE) algorithm, a recent t-Stochastic Neighbor Embedding (t-SNE) improvement. We suggest a new distance criterion for the kernel matrix construction. Our results show the potential of the proposed algorithm and, at the same time, highlight the complexity of the COVID-19 data, which are not separable, creating a significant challenge that the Machine Learning field will have to face.

Keywords

  • Dimensionality reduction
  • Single-cell RNA-sequencing
  • High-dimensional COVID-19 data

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   49.99
Price excludes VAT (Canada)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   64.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ioannidis, J.P., Salholz-Hillel, M., Boyack, K.W., Baas, J.: The rapid, massive growth of COVID-19 authors in the scientific literature. R. Soc. Open Sci. 8(9), 210389 (2021)

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  2. Bohn, M.K., Hall, A., Sepiashvili, L., Jung, B., Steele, S., Adeli, K.: Pathophysiology of COVID-19: mechanisms underlying disease severity and progression. Physiology 35(5), 288–301 (2020)

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  3. Feng, W., et al.: Molecular diagnosis of COVID-19: challenges and research needs. Anal. Chem. 92(15), 10196–10209 (2020)

    CrossRef  CAS  PubMed  Google Scholar 

  4. Qi, C., et al.: SCovid: single-cell atlases for exposing molecular characteristics of COVID-19 across 10 human tissues. Nucleic Acids Res. 50(D1), D867–D874 (2022)

    CrossRef  CAS  PubMed  Google Scholar 

  5. Saliba, A.E., Westermann, A.J., Gorski, S.A., Vogel, J.: Single-cell RNA-seq: advances and future challenges. Nucleic Acids Res. 42(14), 8845–8860 (2014)

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  6. Wilk, A.J., et al.: A single-cell atlas of the peripheral immune response in patients with severe COVID-19. Nat. Med. 26(7), 1070–1076 (2020)

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  7. Luecken, M.D., Theis, F.J.: Current best practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol. 15(6), e8746 (2019)

    CrossRef  PubMed  PubMed Central  Google Scholar 

  8. Sun, S., Zhu, J., Ma, Y., Zhou, X.: Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol. 20(1), 1–21 (2019)

    CrossRef  Google Scholar 

  9. Fernandes, J.D., et al.: The UCSC SARS-CoV-2 genome browser. Nat. Genet. 52(10), 991–998 (2020)

    CrossRef  PubMed  PubMed Central  Google Scholar 

  10. Hasin, Y., Seldin, M., Lusis, A.: Multi-omics approaches to disease. Genome Biol. 18(1), 1–15 (2017)

    CrossRef  Google Scholar 

  11. Abd-Alrazaq, A., et al.: Artificial intelligence in the fight against COVID-19: scoping review. J. Med. Internet Res. 22(12), e20756 (2020)

    CrossRef  PubMed  PubMed Central  Google Scholar 

  12. Van Der Maaten, L., Postma, E., Van den Herik, J.: Dimensionality reduction: a comparative. J. Mach. Learn. Res. 10(66–71), 13 (2009)

    Google Scholar 

  13. Jolliffe, I.T., Cadima, J.: Principal component analysis: a review and recent developments. Philos. Trans. R. Soc. A: Math. Phys. Eng. Sci. 374(2065), 20150202 (2016)

    CrossRef  Google Scholar 

  14. Kobak, D., Berens, P.: The art of using t-SNE for single-cell transcriptomics. Nat. Commun. 10(1), 1–14 (2019)

    CrossRef  CAS  Google Scholar 

  15. McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)

  16. Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11) (2008)

    Google Scholar 

  17. Becht, E., et al.: Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37(1), 38–44 (2019)

    CrossRef  CAS  Google Scholar 

  18. Narayan, A., Berger, B., Cho, H.: Assessing single-cell transcriptomic variability through density-preserving data visualization. Nat. Biotechnol. 39(6), 765–774 (2021)

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  19. Moon, K.R., et al.: Visualizing structure and transitions in high-dimensional biological data. Nat. Biotechnol. 37(12), 1482–1492 (2019)

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  20. Vrahatis, A.G., Tasoulis, S.K., Dimitrakopoulos, G.N., Plagianakos, V.P.: Visualizing high-dimensional single-cell RNA-seq data via random projections and geodesic distances. In: 2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 1–6. IEEE (2019)

    Google Scholar 

  21. Pardo-Diaz, J., Bozhilova, L.V., Beguerisse-Díaz, M., Poole, P.S., Deane, C.M., Reinert, G.: Robust gene coexpression networks using signed distance correlation. Bioinformatics 37(14), 1982–1989 (2021)

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  22. Liesecke, F., et al.: Ranking genome-wide correlation measurements improves microarray and RNA-seq based global and targeted co-expression networks. Sci. Rep. 8(1), 1–16 (2018)

    CrossRef  CAS  Google Scholar 

  23. Tarashansky, A.J., Xue, Y., Li, P., Quake, S.R., Wang, B.: Self-assembling manifolds in single-cell RNA sequencing data. Elife 8, e48994 (2019)

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  24. Lieberman, N.A., et al.: In vivo antiviral host transcriptional response to SARS-CoV-2 by viral load, sex, and age. PLoS Biol. 18(9), e3000849 (2020)

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  25. Ng, D.L., et al.: A diagnostic host response biosignature for COVID-19 from RNA profiling of nasal swabs and blood. Sci. Adv. 7(6), eabe5984 (2021)

    Google Scholar 

  26. Overmyer, K.A., et al.: Large-scale multi-omic analysis of COVID-19 severity. Cell Syst. 12(1), 23–40 (2021)

    CrossRef  CAS  PubMed  Google Scholar 

  27. Silvin, A., et al.: Elevated calprotectin and abnormal myeloid cell subsets discriminate severe from mild COVID-19. Cell 182(6), 1401–1418 (2020)

    CrossRef  CAS  PubMed  PubMed Central  Google Scholar 

  28. Handl, J., Knowles, J., Kell, D.B.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005)

    CrossRef  CAS  PubMed  Google Scholar 

  29. Rendón, E., Abundez, I., Arizmendi, A., Quiroz, E.M.: Internal versus external cluster validation indexes. Int. J. Comput. Commun. 5(1), 27–34 (2011)

    Google Scholar 

  30. Bolshakova, N., Azuaje, F.: Cluster validation techniques for genome expression data. Signal Process. 83(4), 825–833 (2003)

    CrossRef  Google Scholar 

  31. Cakir, B., Prete, M., Huang, N., Van Dongen, S., Pir, P., Kiselev, V.Y.: Comparison of visualization tools for single-cell RNAseq data. NAR Genomics Bioinform. 2(3), lqaa052 (2020)

    Google Scholar 

Download references

Acknowledgements

This project has received funding from the Hellenic Foundation for Research and Innovation(HFRI) and the General Secretariat for Research and Technology (GSRT), under grant agreement No 1901.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ioannis L. Dallas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dallas, I.L., Vrahatis, A.G., Tasoulis, S.K., Plagianakos, V.P. (2022). Recent Dimensionality Reduction Techniques for High-Dimensional COVID-19 Data. In: Chicco, D., et al. Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2021. Lecture Notes in Computer Science(), vol 13483. Springer, Cham. https://doi.org/10.1007/978-3-031-20837-9_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20837-9_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20836-2

  • Online ISBN: 978-3-031-20837-9

  • eBook Packages: Computer ScienceComputer Science (R0)