Dimensionality Reduction of Single-Cell RNA-Seq Data

Linderman, George C.

doi:10.1007/978-1-0716-1307-8_18

George C. Linderman³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2284))

8832 Accesses
3 Citations

Abstract

Dimensionality reduction is a crucial step in essentially every single-cell RNA-sequencing (scRNA-seq) analysis. In this chapter, we describe the typical dimensionality reduction workflow that is used for scRNA-seq datasets, specifically highlighting the roles of principal component analysis, t-distributed stochastic neighborhood embedding, and uniform manifold approximation and projection in this setting. We particularly emphasize efficient computation; the software implementations used in this chapter can scale to datasets with millions of cells.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Svensson V, da Veiga Beltrame E, Pachter L (2020) A curated database reveals trends in single-cell transcriptomics. Database 2020
Google Scholar
Svensson V, Vento-Tormo R, Teichmann SA (2018) Exponential scaling of single-cell RNA-seq in the past decade. Nat Protoc 13(4):599–604
Article CAS Google Scholar
Bellman RE (1961) Adaptive control processes: a guided tour. Princeton University Press, Princeton, N.J
Book Google Scholar
Jolliffe IT (1986) Principal component analysis and factor analysis. In: Principal component analysis. Springer, New York, pp 115–128
Chapter Google Scholar
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
Google Scholar
Becht E, McInnes L, Healy J, Dutertre C-A, Kwok IW, Ng LG, Ginhoux F, Newell EW (2019) Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 37(1):38
Article CAS Google Scholar
Hrvatin S, Hochbaum DR, Nagy MA, Cicconet M, Robertson K, Cheadle L, Zilionis R, Ratner A, Borges-Monroy R, Klein AM (2018) Single-cell analysis of experience-dependent transcriptomic states in the mouse visual cortex. Nat Neurosci 21(1):120
Article CAS Google Scholar
Larsen RM (1998) Lanczos bidiagonalization with partial reorthogonalization. DAIMI Rep Ser 27(537)
Google Scholar
Halko N, Martinsson P-G, Tropp JA (2011) Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev 53(2):217–288
Article Google Scholar
Baglama J, Reichel L, Lewis B (2017) irlba: Fast truncated singular value decomposition and principal components analysis for large dense and sparse matrices. R package version 2 (1)
Google Scholar
Erichson NB, et al. (2019) Randomized matrix decompositions using R. J Stat Softw 89(1):1–48
Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12(Oct):2825–2830
Google Scholar
Van Der Maaten L (2014) Accelerating t-SNE using tree-based algorithms. J Mach Learn Res 15(1):3221–3245
Google Scholar
Policar PG, Strazar M, Zupan B (2019) openTSNE: a modular Python library for t-SNE dimensionality reduction and embedding. BioRxiv:731877
Google Scholar
Chan DM, Rao R, Huang F, Canny JF (2019) GPU accelerated t-distributed stochastic neighbor embedding. J Parallel Distrib Comput 131:1–13
Article Google Scholar
McInnes L, Healy J, Melville J (2018) Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:180203426
Google Scholar
Butler A, Hoffman P, Smibert P, Papalexi E, Satija R (2018) Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol 36(5):411–420
Article CAS Google Scholar
Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck IIIWM, Hao Y, Stoeckius M, Smibert P, Satija R (2019) Comprehensive integration of single-cell data. Cell 177(7):1888–1902.e1821
Article CAS Google Scholar
Wolf FA, Angerer P, Theis FJ (2018) SCANPY: large-scale single-cell gene expression data analysis. Genome Biol 19(1):15
Article Google Scholar
Luecken MD, Theis FJ (2019) Current best practices in single-cell RNA-seq analysis: a tutorial. Mol Syst Biol 15(6)
Google Scholar
Horn JL (1965) A rationale and test for the number of factors in factor analysis. Psychometrika 30(2):179–185
Article CAS Google Scholar
Chung NC, Storey JD (2015) Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics 31(4):545–554
Article CAS Google Scholar
Kobak D, Berens P (2019) The art of using t-SNE for single-cell transcriptomics. Nat Commun 10(1):1–14
Article CAS Google Scholar
Kobak D, Linderman GC (2019) UMAP does not preserve global structure any better than t-SNE when using the same initialization. bioRxiv
Google Scholar
Moon KR, Stanley JS III, Burkhardt D, van Dijk D, Wolf G, Krishnaswamy S (2018) Manifold learning-based methods for analyzing single-cell RNA-sequencing data. Curr Opin Syst Biol 7:36–46
Article Google Scholar
Sun S, Zhu J, Ma Y, Zhou X (2019) Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol 20(1):269
Article CAS Google Scholar
Çakır B, Prete M, Huang N, van Dongen S, Pir P, Kiselev VY (2020) Comparison of visualization tools for single-cell RNAseq data. NAR Genomics and Bioinformatics 2(3):lqaa052. https://doi.org/10.1093/nargab/lqaa052
Townes FW, Hicks SC, Aryee MJ, Irizarry RA (2019) Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol 20(1):1–16
Article Google Scholar
Li H, Linderman GC, Szlam A, Stanton KP, Kluger Y, Tygert M (2017) Algorithm 971: an implementation of a randomized algorithm for principal component analysis. ACM Trans Math Softw 43(3):1–14
Article Google Scholar

Download references

Acknowledgments

Many thanks to Dmitry Kobak for helpful comments on a draft of this chapter.

Author information

Authors and Affiliations

Department of Applied Mathematics, Yale University, New Haven, CT, USA
George C. Linderman

Authors

George C. Linderman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to George C. Linderman .

Editor information

Editors and Affiliations

Dipartimento Di Bioscienze Biotecnologie E Biofarmaceutica, Università degli Studi di Bari Aldo Moro, Bari, Italy
Ernesto Picardi

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Linderman, G.C. (2021). Dimensionality Reduction of Single-Cell RNA-Seq Data. In: Picardi, E. (eds) RNA Bioinformatics. Methods in Molecular Biology, vol 2284. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1307-8_18

Download citation

DOI: https://doi.org/10.1007/978-1-0716-1307-8_18
Published: 10 April 2021
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1306-1
Online ISBN: 978-1-0716-1307-8
eBook Packages: Springer Protocols

Publish with us

Policies and ethics