Abstract
Normalization is an important step in the analysis of single-cell RNA-seq data. While no single method outperforms all others in all datasets, the choice of normalization can have profound impact on the results. Data-driven metrics can be used to rank normalization methods and select the best performers. Here, we show how to use R/Bioconductor to calculate normalization factors, apply them to compute normalized data, and compare several normalization approaches. Finally, we briefly show how to perform downstream analysis steps on the normalized data.
Key words
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bullard JH, Purdom E, Hansen KD, Dudoit S (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11:94. https://doi.org/10.1186/1471-2105-11-94
Vallejos CA, Risso D, Scialdone A, Dudoit S, Marioni JC (2017) Normalizing single-cell RNA sequencing data: challenges and opportunities. Nat Methods 14(6):565–571. https://doi.org/10.1038/nmeth.4292
Vieth B, Parekh S, Ziegenhain C, Enard W, Hellmann I (2019) A systematic evaluation of single cell RNA-seq analysis pipelines. Nat Commun 10(1):4667. https://doi.org/10.1038/s41467-019-12266-7
Lun AT, Bach K, Marioni JC (2016) Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol 17:75. https://doi.org/10.1186/s13059-016-0947-7
Qiu X, Hill A, Packer J, Lin D, Ma YA, Trapnell C (2017) Single-cell mRNA quantification and differential analysis with Census. Nat Methods 14(3):309–315. https://doi.org/10.1038/nmeth.4150
Bacher R, Chu LF, Leng N, Gasch AP, Thomson JA, Stewart RM, Newton M, Kendziorski C (2017) SCnorm: robust normalization of single-cell RNA-seq data. Nat Methods 14(6):584–586. https://doi.org/10.1038/nmeth.4263
Townes FW, Irizarry RA (2020) Quantile normalization of single-cell RNA-seq read counts without unique molecular identifiers. Genome Biol 21:160 https://doi.org/10.1186/s13059-020-02078-0
Vallejos CA, Marioni JC, Richardson S (2015) BASiCS: Bayesian analysis of single-cell sequencing data. PLoS Comput Biol 11(6):e1004333. https://doi.org/10.1371/journal.pcbi.1004333
Risso D, Perraudeau F, Gribkova S, Dudoit S, Vert JP (2018) A general and flexible method for signal extraction from single-cell RNA-seq data. Nat Commun 9(1):284. https://doi.org/10.1038/s41467-017-02554-5
Townes FW, Hicks SC, Aryee MJ, Irizarry RA (2019) Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol 20(1):295. https://doi.org/10.1186/s13059-019-1861-6
Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS, Bravo HC, Davis S, Gatto L, Girke T, Gottardo R, Hahne F, Hansen KD, Irizarry RA, Lawrence M, Love MI, MacDonald J, Obenchain V, Oleś AK, Pagès H, Reyes A, Shannon P, Smyth GK, Tenenbaum D, Waldron L, Morgan M (2015) Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods 12(2):115–121. https://doi.org/10.1038/nmeth.3252
Amezquita RA, Lun ATL, Becht E, Carey VJ, Carpp LN, Geistlinger L, Marini F, Rue-Albrecht K, Risso D, Soneson C, Waldron L, Pagès H, Smith ML, Huber W, Morgan M, Gottardo R, Hicks SC (2020) Orchestrating single-cell analysis with Bioconductor. Nat Methods 17(2):137–145. https://doi.org/10.1038/s41592-019-0654-x
Lun ATL, Pagès H, Smith ML (2018) beachmat: a Bioconductor C++ API for accessing high-throughput biological data from a variety of R matrix types. PLoS Comput Biol 14(5):e1006135. https://doi.org/10.1371/journal.pcbi.1006135
Tasic B, Menon V, Nguyen TN, Kim TK, Jarsky T, Yao Z, Levi B, Gray LT, Sorensen SA, Dolbeare T, Bertagnolli D, Goldy J, Shapovalova N, Parry S, Lee C, Smith K, Bernard A, Madisen L, Sunkin SM, Hawrylycz M, Koch C, Zeng H (2016) Adult mouse cortical cell taxonomy revealed by single cell transcriptomics. Nat Neurosci 19(2):335–346. https://doi.org/10.1038/nn.4216
Svensson V (2020) Droplet scRNA-seq is not zero-inflated. Nat Biotechnol 38(2):147–150. https://doi.org/10.1038/s41587-019-0379-5
Stoeckius M, Hafemeister C, Stephenson W, Houck-Loomis B, Chattopadhyay PK, Swerdlow H, Satija R, Smibert P (2017) Simultaneous epitope and transcriptome measurement in single cells. Nat Methods 14(9):865–868. https://doi.org/10.1038/nmeth.4380
McCarthy DJ, Campbell KR, Lun AT, Wills QF (2017) Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33(8):1179–1186. https://doi.org/10.1093/bioinformatics/btw777
Cole MB, Risso D, Wagner A, DeTomaso D, Ngai J, Purdom E, Dudoit S, Yosef N (2019) Performance assessment and selection of normalization procedures for single-cell RNA-Seq. Cell Syst 8(4):315–328.e318. https://doi.org/10.1016/j.cels.2019.03.010
Jiang L, Schlesinger F, Davis CA, Zhang Y, Li R, Salit M, Gingeras TR, Oliver B (2011) Synthetic spike-in standards for RNA-seq experiments. Genome Res 21(9):1543–1551. https://doi.org/10.1101/gr.121095.111
Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140. https://doi.org/10.1093/bioinformatics/btp616
Risso D, Ngai J, Speed TP, Dudoit S (2014) Normalization of RNA-seq data using factor analysis of control genes or samples. Nat Biotechnol 32(9):896–902. https://doi.org/10.1038/nbt.2931
Robinson MD, Oshlack A (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 11(3):R25. https://doi.org/10.1186/gb-2010-11-3-r25
Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11(10):R106. https://doi.org/10.1186/gb-2010-11-10-r106
Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, vol 344. John Wiley & Sons, Hoboken, NJ
Maaten Lvd HG (2008) Visualizing data using t-SNE. J Mach Learn Res 9(Nov):2579–2605
Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008
Zhang JM, Kamath GM, Tse DN (2019) Valid post-clustering differential analysis for single-cell RNA-Seq. Cell Syst 9(4):383–392.e386. https://doi.org/10.1016/j.cels.2019.07.012
Soneson C, Robinson MD (2018) Bias, robustness and scalability in single-cell differential expression analysis. Nat Methods 15(4):255–261. https://doi.org/10.1038/nmeth.4612
Sun S, Zhu J, Ma Y, Zhou X (2019) Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol 20(1):269
Duò A, Robinson MD, Soneson C (2018) A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Res 7
Haghverdi L, Lun ATL, Morgan MD, Marioni JC (2018) Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat Biotechnol 36(5):421–427. https://doi.org/10.1038/nbt.4091
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Risso, D. (2021). Normalization of Single-Cell RNA-Seq Data. In: Picardi, E. (eds) RNA Bioinformatics. Methods in Molecular Biology, vol 2284. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1307-8_17
Download citation
DOI: https://doi.org/10.1007/978-1-0716-1307-8_17
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1306-1
Online ISBN: 978-1-0716-1307-8
eBook Packages: Springer Protocols