Abstract
Dimensionality reduction is a crucial step in essentially every single-cell RNA-sequencing (scRNA-seq) analysis. In this chapter, we describe the typical dimensionality reduction workflow that is used for scRNA-seq datasets, specifically highlighting the roles of principal component analysis, t-distributed stochastic neighborhood embedding, and uniform manifold approximation and projection in this setting. We particularly emphasize efficient computation; the software implementations used in this chapter can scale to datasets with millions of cells.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Svensson V, da Veiga Beltrame E, Pachter L (2020) A curated database reveals trends in single-cell transcriptomics. Database 2020
Svensson V, Vento-Tormo R, Teichmann SA (2018) Exponential scaling of single-cell RNA-seq in the past decade. Nat Protoc 13(4):599–604
Bellman RE (1961) Adaptive control processes: a guided tour. Princeton University Press, Princeton, N.J
Jolliffe IT (1986) Principal component analysis and factor analysis. In: Principal component analysis. Springer, New York, pp 115–128
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
Becht E, McInnes L, Healy J, Dutertre C-A, Kwok IW, Ng LG, Ginhoux F, Newell EW (2019) Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 37(1):38
Hrvatin S, Hochbaum DR, Nagy MA, Cicconet M, Robertson K, Cheadle L, Zilionis R, Ratner A, Borges-Monroy R, Klein AM (2018) Single-cell analysis of experience-dependent transcriptomic states in the mouse visual cortex. Nat Neurosci 21(1):120
Larsen RM (1998) Lanczos bidiagonalization with partial reorthogonalization. DAIMI Rep Ser 27(537)
Halko N, Martinsson P-G, Tropp JA (2011) Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev 53(2):217–288
Baglama J, Reichel L, Lewis B (2017) irlba: Fast truncated singular value decomposition and principal components analysis for large dense and sparse matrices. R package version 2 (1)
Erichson NB, et al. (2019) Randomized matrix decompositions using R. J Stat Softw 89(1):1–48
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12(Oct):2825–2830
Van Der Maaten L (2014) Accelerating t-SNE using tree-based algorithms. J Mach Learn Res 15(1):3221–3245
Policar PG, Strazar M, Zupan B (2019) openTSNE: a modular Python library for t-SNE dimensionality reduction and embedding. BioRxiv:731877
Chan DM, Rao R, Huang F, Canny JF (2019) GPU accelerated t-distributed stochastic neighbor embedding. J Parallel Distrib Comput 131:1–13
McInnes L, Healy J, Melville J (2018) Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:180203426
Butler A, Hoffman P, Smibert P, Papalexi E, Satija R (2018) Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol 36(5):411–420
Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck IIIWM, Hao Y, Stoeckius M, Smibert P, Satija R (2019) Comprehensive integration of single-cell data. Cell 177(7):1888–1902.e1821
Wolf FA, Angerer P, Theis FJ (2018) SCANPY: large-scale single-cell gene expression data analysis. Genome Biol 19(1):15
Luecken MD, Theis FJ (2019) Current best practices in single-cell RNA-seq analysis: a tutorial. Mol Syst Biol 15(6)
Horn JL (1965) A rationale and test for the number of factors in factor analysis. Psychometrika 30(2):179–185
Chung NC, Storey JD (2015) Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics 31(4):545–554
Kobak D, Berens P (2019) The art of using t-SNE for single-cell transcriptomics. Nat Commun 10(1):1–14
Kobak D, Linderman GC (2019) UMAP does not preserve global structure any better than t-SNE when using the same initialization. bioRxiv
Moon KR, Stanley JS III, Burkhardt D, van Dijk D, Wolf G, Krishnaswamy S (2018) Manifold learning-based methods for analyzing single-cell RNA-sequencing data. Curr Opin Syst Biol 7:36–46
Sun S, Zhu J, Ma Y, Zhou X (2019) Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol 20(1):269
Çakır B, Prete M, Huang N, van Dongen S, Pir P, Kiselev VY (2020) Comparison of visualization tools for single-cell RNAseq data. NAR Genomics and Bioinformatics 2(3):lqaa052. https://doi.org/10.1093/nargab/lqaa052
Townes FW, Hicks SC, Aryee MJ, Irizarry RA (2019) Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol 20(1):1–16
Li H, Linderman GC, Szlam A, Stanton KP, Kluger Y, Tygert M (2017) Algorithm 971: an implementation of a randomized algorithm for principal component analysis. ACM Trans Math Softw 43(3):1–14
Acknowledgments
Many thanks to Dmitry Kobak for helpful comments on a draft of this chapter.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Linderman, G.C. (2021). Dimensionality Reduction of Single-Cell RNA-Seq Data. In: Picardi, E. (eds) RNA Bioinformatics. Methods in Molecular Biology, vol 2284. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1307-8_18
Download citation
DOI: https://doi.org/10.1007/978-1-0716-1307-8_18
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1306-1
Online ISBN: 978-1-0716-1307-8
eBook Packages: Springer Protocols