Skip to main content

Dimensionality Reduction of Single-Cell RNA-Seq Data

  • Protocol
  • First Online:
RNA Bioinformatics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2284))

Abstract

Dimensionality reduction is a crucial step in essentially every single-cell RNA-sequencing (scRNA-seq) analysis. In this chapter, we describe the typical dimensionality reduction workflow that is used for scRNA-seq datasets, specifically highlighting the roles of principal component analysis, t-distributed stochastic neighborhood embedding, and uniform manifold approximation and projection in this setting. We particularly emphasize efficient computation; the software implementations used in this chapter can scale to datasets with millions of cells.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Svensson V, da Veiga Beltrame E, Pachter L (2020) A curated database reveals trends in single-cell transcriptomics. Database 2020

    Google Scholar 

  2. Svensson V, Vento-Tormo R, Teichmann SA (2018) Exponential scaling of single-cell RNA-seq in the past decade. Nat Protoc 13(4):599–604

    Article  CAS  Google Scholar 

  3. Bellman RE (1961) Adaptive control processes: a guided tour. Princeton University Press, Princeton, N.J

    Book  Google Scholar 

  4. Jolliffe IT (1986) Principal component analysis and factor analysis. In: Principal component analysis. Springer, New York, pp 115–128

    Chapter  Google Scholar 

  5. van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605

    Google Scholar 

  6. Becht E, McInnes L, Healy J, Dutertre C-A, Kwok IW, Ng LG, Ginhoux F, Newell EW (2019) Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 37(1):38

    Article  CAS  Google Scholar 

  7. Hrvatin S, Hochbaum DR, Nagy MA, Cicconet M, Robertson K, Cheadle L, Zilionis R, Ratner A, Borges-Monroy R, Klein AM (2018) Single-cell analysis of experience-dependent transcriptomic states in the mouse visual cortex. Nat Neurosci 21(1):120

    Article  CAS  Google Scholar 

  8. Larsen RM (1998) Lanczos bidiagonalization with partial reorthogonalization. DAIMI Rep Ser 27(537)

    Google Scholar 

  9. Halko N, Martinsson P-G, Tropp JA (2011) Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev 53(2):217–288

    Article  Google Scholar 

  10. Baglama J, Reichel L, Lewis B (2017) irlba: Fast truncated singular value decomposition and principal components analysis for large dense and sparse matrices. R package version 2 (1)

    Google Scholar 

  11. Erichson NB, et al. (2019) Randomized matrix decompositions using R. J Stat Softw 89(1):1–48

    Google Scholar 

  12. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12(Oct):2825–2830

    Google Scholar 

  13. Van Der Maaten L (2014) Accelerating t-SNE using tree-based algorithms. J Mach Learn Res 15(1):3221–3245

    Google Scholar 

  14. Policar PG, Strazar M, Zupan B (2019) openTSNE: a modular Python library for t-SNE dimensionality reduction and embedding. BioRxiv:731877

    Google Scholar 

  15. Chan DM, Rao R, Huang F, Canny JF (2019) GPU accelerated t-distributed stochastic neighbor embedding. J Parallel Distrib Comput 131:1–13

    Article  Google Scholar 

  16. McInnes L, Healy J, Melville J (2018) Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:180203426

    Google Scholar 

  17. Butler A, Hoffman P, Smibert P, Papalexi E, Satija R (2018) Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol 36(5):411–420

    Article  CAS  Google Scholar 

  18. Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck IIIWM, Hao Y, Stoeckius M, Smibert P, Satija R (2019) Comprehensive integration of single-cell data. Cell 177(7):1888–1902.e1821

    Article  CAS  Google Scholar 

  19. Wolf FA, Angerer P, Theis FJ (2018) SCANPY: large-scale single-cell gene expression data analysis. Genome Biol 19(1):15

    Article  Google Scholar 

  20. Luecken MD, Theis FJ (2019) Current best practices in single-cell RNA-seq analysis: a tutorial. Mol Syst Biol 15(6)

    Google Scholar 

  21. Horn JL (1965) A rationale and test for the number of factors in factor analysis. Psychometrika 30(2):179–185

    Article  CAS  Google Scholar 

  22. Chung NC, Storey JD (2015) Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics 31(4):545–554

    Article  CAS  Google Scholar 

  23. Kobak D, Berens P (2019) The art of using t-SNE for single-cell transcriptomics. Nat Commun 10(1):1–14

    Article  CAS  Google Scholar 

  24. Kobak D, Linderman GC (2019) UMAP does not preserve global structure any better than t-SNE when using the same initialization. bioRxiv

    Google Scholar 

  25. Moon KR, Stanley JS III, Burkhardt D, van Dijk D, Wolf G, Krishnaswamy S (2018) Manifold learning-based methods for analyzing single-cell RNA-sequencing data. Curr Opin Syst Biol 7:36–46

    Article  Google Scholar 

  26. Sun S, Zhu J, Ma Y, Zhou X (2019) Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol 20(1):269

    Article  CAS  Google Scholar 

  27. Çakır B, Prete M, Huang N, van Dongen S, Pir P, Kiselev VY (2020) Comparison of visualization tools for single-cell RNAseq data. NAR Genomics and Bioinformatics 2(3):lqaa052. https://doi.org/10.1093/nargab/lqaa052

  28. Townes FW, Hicks SC, Aryee MJ, Irizarry RA (2019) Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol 20(1):1–16

    Article  Google Scholar 

  29. Li H, Linderman GC, Szlam A, Stanton KP, Kluger Y, Tygert M (2017) Algorithm 971: an implementation of a randomized algorithm for principal component analysis. ACM Trans Math Softw 43(3):1–14

    Article  Google Scholar 

Download references

Acknowledgments

Many thanks to Dmitry Kobak for helpful comments on a draft of this chapter.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to George C. Linderman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Linderman, G.C. (2021). Dimensionality Reduction of Single-Cell RNA-Seq Data. In: Picardi, E. (eds) RNA Bioinformatics. Methods in Molecular Biology, vol 2284. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1307-8_18

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-1307-8_18

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-1306-1

  • Online ISBN: 978-1-0716-1307-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics