Abstract
High-throughput sequencing (HTS) has revolutionized researchers’ ability to study the human transcriptome, particularly as it relates to cancer. Recently, HTS technology has advanced to the point where now one is able to sequence individual cells (i.e., “single-cell sequencing”). Prior to single-cell sequencing technology, HTS would be completed on RNA extracted from a tissue sample consisting of multiple cell types (i.e., “bulk sequencing”). In this chapter, we review the various bioinformatics and statistical methods used in the processing, quality control, and analysis of bulk and single-cell RNA sequencing methods. Additionally, we discuss how these methods are also being used to study tumor heterogeneity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Muller PA, Vousden KH (2013) p53 mutations in cancer. Nat Cell Biol 15(1):2–8. https://doi.org/10.1038/ncb2641
Baylin SB (2005) DNA methylation and gene silencing in cancer. Nat Clin Pract Oncol 2(Suppl 1):S4–S11. https://doi.org/10.1038/ncponc0354
Perou CM, Sorlie T, Eisen MB et al (2000) Molecular portraits of human breast tumours. Nature 406(6797):747–752. https://doi.org/10.1038/35021093
Cancer Genome Atlas Network (2012) Comprehensive molecular portraits of human breast tumours. Nature 490(7418):61–70
Parker JS, Mullins M, Cheang MC et al (2009) Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol 27(8):1160–1167. https://doi.org/10.1200/JCO.2008.18.1370. JCO.2008.18.1370 [pii]
Sorlie T, Perou CM, Tibshirani R et al (2001) Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A 98(19):10869–10874. https://doi.org/10.1073/pnas.191367098
Sorlie T, Tibshirani R, Parker J et al (2003) Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A 100(14):8418–8423. https://doi.org/10.1073/pnas.0932692100
Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev 10(1):57–63. https://doi.org/10.1038/nrg2484
Zhu S, Qing T, Zheng Y et al (2017) Advances in single-cell RNA sequencing and its applications in cancer research. Oncotarget 8(32):53763–53779. https://doi.org/10.18632/oncotarget.17893
Bian S, Hou Y, Zhou X et al (2018) Single-cell multiomics sequencing and analyses of human colorectal cancer. Science 362(6418):1060–1063. https://doi.org/10.1126/science.aao3791
Navin NE (2015) Delineating cancer evolution with single-cell sequencing. Sci Transl Med 7(296):296fs229. https://doi.org/10.1126/scitranslmed.aac8319
Lee MC, Lopez-Diaz FJ, Khan SY et al (2014) Single-cell analyses of transcriptional heterogeneity during drug tolerance transition in cancer cells by RNA sequencing. Proc Natl Acad Sci U S A 111(44):E4726–E4735. https://doi.org/10.1073/pnas.1404656111
Guo X, Zhang Y, Zheng L et al (2018) Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing. Nat Med 24(7):978–985. https://doi.org/10.1038/s41591-018-0045-3
Zheng C, Zheng L, Yoo JK et al (2017) Landscape of infiltrating T cells in liver cancer revealed by single-cell sequencing. Cell 169(7):1342–1356.e1316. https://doi.org/10.1016/j.cell.2017.05.035
Siegel RL, Miller KD, Jemal A (2019) Cancer statistics, 2019. CA Cancer J Clin 69(1):7–34. https://doi.org/10.3322/caac.21551
Cancer Genome Atlas Network (2015) Genomic classification of cutaneous melanoma. Cell 161(7):1681–1696. https://doi.org/10.1016/j.cell.2015.05.044
Nirschl CJ, Suarez-Farinas M, Izar B et al (2017) IFNgamma-dependent tissue-immune homeostasis is co-opted in the tumor microenvironment. Cell 170(1):127–141.e115. https://doi.org/10.1016/j.cell.2017.06.016
Gerber T, Willscher E, Loeffler-Wirth H et al (2017) Mapping heterogeneity in patient-derived melanoma cultures by single-cell RNA-seq. Oncotarget 8(1):846–862. https://doi.org/10.18632/oncotarget.13666
Kumar MP, Du J, Lagoudas G et al (2018) Analysis of single-cell RNA-Seq identifies cell-cell communication associated with tumor characteristics. Cell Rep 25(6):1458–1468.e1454. https://doi.org/10.1016/j.celrep.2018.10.047
Tirosh I, Izar B, Prakadan SM et al (2016) Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352(6282):189–196. https://doi.org/10.1126/science.aad0501
Picelli S, Bjorklund AK, Faridani OR et al (2013) Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat Methods 10(11):1096–1098. https://doi.org/10.1038/nmeth.2639
Hansen KD, Brenner SE, Dudoit S (2010) Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res 38(12):e131. https://doi.org/10.1093/nar/gkq224
Benjamini Y, Speed TP (2012) Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res 40(10):e72. https://doi.org/10.1093/nar/gks001
Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 85(8):2444–2448. https://doi.org/10.1073/pnas.85.8.2444
Cock PJ, Fields CJ, Goto N et al (2010) The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 38(6):1767–1771. https://doi.org/10.1093/nar/gkp1137
Li H, Handsaker B, Wysoker A et al (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16):2078–2079. https://doi.org/10.1093/bioinformatics/btp352
Fuller CW, Middendorf LR, Benner SA et al (2009) The challenges of sequencing by synthesis. Nat Biotechnol 27(11):1013–1023. https://doi.org/10.1038/nbt.1585
Kim D, Pertea G, Trapnell C et al (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14(4):R36. https://doi.org/10.1186/gb-2013-14-4-r36
Wang K, Singh D, Zeng Z et al (2010) MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 38(18):e178. https://doi.org/10.1093/nar/gkq622
Dobin A, Davis CA, Schlesinger F et al (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21. https://doi.org/10.1093/bioinformatics/bts635
Wu TD, Reeder J, Lawrence M et al (2016) GMAP and GSNAP for genomic sequence alignment: enhancements to speed, accuracy, and functionality. Methods Mol Biol 1418:283–334. https://doi.org/10.1007/978-1-4939-3578-9_15
Lunter G, Goodson M (2011) Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. Genome Res 21(6):936–939. https://doi.org/10.1101/gr.111120.110
Li H, Ruan J, Durbin R (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18(11):1851–1858. https://doi.org/10.1101/gr.078212.108
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14):1754–1760. https://doi.org/10.1093/bioinformatics/btp324. btp324 [pii]
Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9(4):357–359. https://doi.org/10.1038/nmeth.1923
Trapnell C, Williams BA, Pertea G et al (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28(5):511–515. https://doi.org/10.1038/nbt.1621
Pertea M, Pertea GM, Antonescu CM et al (2015) StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33(3):290–295. https://doi.org/10.1038/nbt.3122
Haas BJ, Papanicolaou A, Yassour M et al (2013) De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 8(8):1494–1512. https://doi.org/10.1038/nprot.2013.084
Grabherr MG, Haas BJ, Yassour M et al (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29(7):644–652. https://doi.org/10.1038/nbt.1883
Schulz MH, Zerbino DR, Vingron M et al (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28(8):1086–1092. https://doi.org/10.1093/bioinformatics/bts094
Li B, Dewey CN (2011) RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12:323. https://doi.org/10.1186/1471-2105-12-323
Patro R, Mount SM, Kingsford C (2014) Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nat Biotechnol 32(5):462–464. https://doi.org/10.1038/nbt.2862
Liao Y, Smyth GK, Shi W (2014) featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30(7):923–930. https://doi.org/10.1093/bioinformatics/btt656
Anders S, Pyl PT, Huber W (2015) HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31(2):166–169. https://doi.org/10.1093/bioinformatics/btu638
Bullard JH, Purdom E, Hansen KD et al (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11:94. https://doi.org/10.1186/1471-2105-11-94
Jiang L, Schlesinger F, Davis CA et al (2011) Synthetic spike-in standards for RNA-seq experiments. Genome Res 21(9):1543–1551. https://doi.org/10.1101/gr.121095.111
Mortazavi A, Williams BA, McCue K et al (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5(7):621–628. https://doi.org/10.1038/nmeth.1226. nmeth.1226 [pii]
Leek JT, Scharpf RB, Bravo HC et al (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev 11(10):733–739. https://doi.org/10.1038/nrg2825
Leek JT, Storey JD (2007) Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet 3(9):1724–1735. https://doi.org/10.1371/journal.pgen.0030161
Risso D, Ngai J, Speed TP et al (2014) Normalization of RNA-seq data using factor analysis of control genes or samples. Nat Biotechnol 32(9):896–902. https://doi.org/10.1038/nbt.2931
Hansen KD, Wu Z, Irizarry RA et al (2011) Sequencing technology does not eliminate biological variability. Nat Biotechnol 29(7):572–573. https://doi.org/10.1038/nbt.1910
Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11(10):R106. https://doi.org/10.1186/gb-2010-11-10-r106
Robinson MD, Oshlack A (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 11(3):R25. https://doi.org/10.1186/gb-2010-11-3-r25
Smyth GK (2005) limma: linear models for microarray data. In: Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S (eds) Bioinformatics and computational biology solutions using R and Bioconductor. Springer, Berlin, pp 397–420
Bolstad BM, Irizarry RA, Astrand M et al (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2):185–193
Pickrell JK, Marioni JC, Pai AA et al (2010) Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464(7289):768–772
Li B, Ruotti V, Stewart RM et al (2010) RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26(4):493–500. https://doi.org/10.1093/bioinformatics/btp692
Wagner GP, Kin K, Lynch VJ (2012) Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory Biosci 131(4):281–285. https://doi.org/10.1007/s12064-012-0162-3
Conesa A, Madrigal P, Tarazona S et al (2016) A survey of best practices for RNA-seq data analysis. Genome Biol 17:13. https://doi.org/10.1186/s13059-016-0881-8
Oshlack A, Wakefield MJ (2009) Transcript length bias in RNA-seq data confounds systems biology. Biol Direct 4:14. https://doi.org/10.1186/1745-6150-4-14
Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8(1):118–127. https://doi.org/10.1093/biostatistics/kxj037
Karpievitch YV, Nikolic SB, Wilson R et al (2014) Metabolomics data normalization with EigenMS. PLoS One 9(12):e116221. https://doi.org/10.1371/journal.pone.0116221
Tracy CA, Widom H (1994) Level spacing distributions and the Bessel kernel. Commun Math Phys 161(2):289–309
Johnstone IM (2001) On the distribution of the largest eigenvalue in principal components analysis. Ann Stat 29(2):295–327
Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genet 2(12):e190. https://doi.org/10.1371/journal.pgen.0020190
Abbas-Aghababazadeh F, Li Q, Fridley BL (2018) Comparison of normalization approaches for gene expression studies completed with high-throughput sequencing. PLoS One 13(10):e0206312. https://doi.org/10.1371/journal.pone.0206312
Wang L, Feng Z, Wang X et al (2010) DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics 26(1):136–138. https://doi.org/10.1093/bioinformatics/btp612
Langmead B, Hansen KD, Leek JT (2010) Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome Biol 11(8):R83. https://doi.org/10.1186/gb-2010-11-8-r83
Li J, Witten DM, Johnstone IM et al (2012) Normalization, testing, and false discovery rate estimation for RNA-sequencing data. Biostatistics 13(3):523–538. https://doi.org/10.1093/biostatistics/kxr031
Auer PL, Doerge RW (2011) A two-stage Poisson model for testing RNA-seq data. Stat Appl Genet Mol Biol 10(1):Article 26
Srivastava S, Chen L (2010) A two-parameter generalized Poisson model to improve the analysis of RNA-seq data. Nucleic Acids Res 38(17):e170. https://doi.org/10.1093/nar/gkq670
Robinson MD, Smyth GK (2007) Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23(21):2881–2887. https://doi.org/10.1093/bioinformatics/btm453. btm453 [pii]
Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140. https://doi.org/10.1093/bioinformatics/btp616. btp616 [pii]
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15(12):550. https://doi.org/10.1186/s13059-014-0550-8
Di Y, Schafer DW, Cumbie JS et al (2011) The NBP negative binomial model for assessing differential gene expression from RNA-Seq. Stat Appl Genet Mol Biol 10(1):24
Zhou YH, Xia K, Wright FA (2011) A powerful and flexible approach to the analysis of RNA sequence count data. Bioinformatics 27(19):2672–2678. https://doi.org/10.1093/bioinformatics/btr449
Van De Wiel MA, Leday GG, Pardo L et al (2013) Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. Biostatistics 14(1):113–128. https://doi.org/10.1093/biostatistics/kxs031
Hardcastle TJ, Kelly KA (2010) baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics 11:422. https://doi.org/10.1186/1471-2105-11-422
Smyth GK (2004) Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3:Article 3. https://doi.org/10.2202/1544-6115.1027
Ritchie ME, Phipson B, Wu D et al (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43(7):e47. https://doi.org/10.1093/nar/gkv007
Law CW, Chen Y, Shi W et al (2014) voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 15(2):R29. https://doi.org/10.1186/gb-2014-15-2-r29
Li J, Tibshirani R (2013) Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Stat Methods Med Res 22(5):519–536. https://doi.org/10.1177/0962280211428386
Tarazona S, Garcia-Alcalde F, Dopazo J et al (2011) Differential expression in RNA-seq: a matter of depth. Genome Res 21(12):2213–2223. https://doi.org/10.1101/gr.124321.111
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B Methodol 57(1):289–300
Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci U S A 100(16):9440–9445
Storey JD (2002) A direct approach to false discovery rates. J R Stat Soc B Methodol 64(Pt. 3):479–498
Bland JM, Altman DG (1995) Multiple significance tests: the Bonferroni method. BMJ 310(6973):170. https://doi.org/10.1136/bmj.310.6973.170
Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70
Hochberg Y (1988) A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75(4):800–802
Newman AM, Liu CL, Green MR et al (2015) Robust enumeration of cell subsets from tissue expression profiles. Nat Methods 12(5):453–457. https://doi.org/10.1038/nmeth.3337
Thorsson V, Gibbs DL, Brown SD et al (2018) The immune landscape of cancer. Immunity 48(4):812–830.e814. https://doi.org/10.1016/j.immuni.2018.03.023
Li T, Fan J, Wang B et al (2017) TIMER: a web server for comprehensive analysis of tumor-infiltrating immune cells. Cancer Res 77(21):e108–e110. https://doi.org/10.1158/0008-5472.CAN-17-0307
Aran D, Hu Z, Butte AJ (2017) xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biol 18(1):220. https://doi.org/10.1186/s13059-017-1349-1
Hashimshony T, Wagner F, Sher N et al (2012) CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification. Cell Rep 2(3):666–673. https://doi.org/10.1016/j.celrep.2012.08.003
Islam S, Zeisel A, Joost S et al (2014) Quantitative single-cell RNA-seq with unique molecular identifiers. Nat Methods 11(2):163–166. https://doi.org/10.1038/nmeth.2772
Picelli S, Faridani OR, Bjorklund AK et al (2014) Full-length RNA-seq from single cells using Smart-seq2. Nat Protoc 9(1):171–181. https://doi.org/10.1038/nprot.2014.006
Macosko EZ, Basu A, Satija R et al (2015) Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161(5):1202–1214. https://doi.org/10.1016/j.cell.2015.05.002
Zheng GX, Terry JM, Belgrader P et al (2017) Massively parallel digital transcriptional profiling of single cells. Nat Commun 8:14049. https://doi.org/10.1038/ncomms14049
Ziegenhain C, Vieth B, Parekh S et al (2017) Comparative analysis of single-cell RNA sequencing methods. Mol Cell 65(4):631–643.e634. https://doi.org/10.1016/j.molcel.2017.01.023
Svensson V, Natarajan KN, Ly LH et al (2017) Power analysis of single-cell RNA-sequencing experiments. Nat Methods 14(4):381–387. https://doi.org/10.1038/nmeth.4220
Ilicic T, Kim JK, Kolodziejczyk AA et al (2016) Classification of low quality cells from single-cell RNA-seq data. Genome Biol 17:29. https://doi.org/10.1186/s13059-016-0888-1
Lun AT, McCarthy DJ, Marioni JC (2016) A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Res 5:2122. https://doi.org/10.12688/f1000research.9501.2
Satija R, Farrell JA, Gennert D et al (2015) Spatial reconstruction of single-cell gene expression data. Nat Biotechnol 33(5):495–502. https://doi.org/10.1038/nbt.3192
Zhao C, Hu S, Huo X et al (2017) Dr.seq2: a quality control and analysis pipeline for parallel single cell transcriptome and epigenome data. PLoS One 12(7):e0180583. https://doi.org/10.1371/journal.pone.0180583
McCarthy DJ, Campbell KR, Lun AT et al (2017) Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33(8):1179–1186. https://doi.org/10.1093/bioinformatics/btw777
Finak G, McDavid A, Yajima M et al (2015) MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol 16:278. https://doi.org/10.1186/s13059-015-0844-5
Lun AT, Bach K, Marioni JC (2016) Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol 17:75. https://doi.org/10.1186/s13059-016-0947-7
Kharchenko PV, Silberstein L, Scadden DT (2014) Bayesian approach to single-cell differential expression analysis. Nat Methods 11(7):740–742. https://doi.org/10.1038/nmeth.2967
Jiang Y, Zhang NR, Li M (2017) SCALE: modeling allele-specific gene expression by single-cell RNA sequencing. Genome Biol 18(1):74. https://doi.org/10.1186/s13059-017-1200-8
Liu Z, Lou H, Xie K et al (2017) Reconstructing cell cycle pseudo time-series via single-cell transcriptome data. Nat Commun 8(1):22. https://doi.org/10.1038/s41467-017-00039-z
McDavid A, Finak G, Gottardo R (2016) The contribution of cell cycle to heterogeneity in single-cell RNA-seq data. Nat Biotechnol 34(6):591–593. https://doi.org/10.1038/nbt.3498
Wang J, Huang M, Torre E et al (2018) Gene expression distribution deconvolution in single-cell RNA sequencing. Proc Natl Acad Sci U S A 115(28):E6437–E6446. https://doi.org/10.1073/pnas.1721085115
Vallejos CA, Risso D, Scialdone A et al (2017) Normalizing single-cell RNA sequencing data: challenges and opportunities. Nat Methods 14(6):565–571. https://doi.org/10.1038/nmeth.4292
Cole MB, Risso D, Wagner A et al (2019) Performance assessment and selection of normalization procedures for single-cell RNA-Seq. Cell Syst 8(4):315–328.e318. https://doi.org/10.1016/j.cels.2019.03.010
Bacher R, Chu LF, Leng N et al (2017) SCnorm: robust normalization of single-cell RNA-seq data. Nat Methods 14(6):584–586. https://doi.org/10.1038/nmeth.4263
Jia C, Hu Y, Kelly D et al (2017) Accounting for technical noise in differential expression analysis of single-cell RNA sequencing data. Nucleic Acids Res 45(19):10978–10988. https://doi.org/10.1093/nar/gkx754
Vallejos CA, Marioni JC, Richardson S (2015) BASiCS: Bayesian analysis of single-cell sequencing data. PLoS Comput Biol 11(6):e1004333. https://doi.org/10.1371/journal.pcbi.1004333
Prabhakaran S, Azizi E, Carr A et al (2016) Dirichlet process mixture model for correcting technical variation in single-cell gene expression data. JMLR Workshop Conf Proc 48:1070–1079
Azizi E, Prabhakaran S, Carr A et al (2017) Bayesian inference for single-cell clustering and imputing. Genomics Comput Biol 3(1):e46. https://doi.org/10.18547/gcb.2017.vol3.iss1.e46
Gong W, Kwak IY, Pota P et al (2018) DrImpute: imputing dropout events in single cell RNA sequencing data. BMC Bioinformatics 19(1):220. https://doi.org/10.1186/s12859-018-2226-y
Huang M, Wang J, Torre E et al (2018) SAVER: gene expression recovery for single-cell RNA sequencing. Nat Methods 15(7):539–542. https://doi.org/10.1038/s41592-018-0033-z
Mongia A, Sengupta D, Majumdar A (2019) McImpute: matrix completion based imputation for single cell RNA-seq data. Front Genet 10:9. https://doi.org/10.3389/fgene.2019.00009
Li WV, Li JJ (2018) An accurate and robust imputation method scImpute for single-cell RNA-seq data. Nat Commun 9(1):997. https://doi.org/10.1038/s41467-018-03405-7
Linderman GC, Zhao J, Kluger Y (2018) Zero-preserving imputation of scRNA-seq data using low-rank approximation. bioRxiv:397588. https://doi.org/10.1101/397588
Chen C, Wu C, Wu L et al (2018) scRMD: imputation for single cell RNA-seq data via robust matrix decomposition. bioRxiv:459404. https://doi.org/10.1101/459404
van Dijk D, Sharma R, Nainys J et al (2018) Recovering gene interactions from single-cell data using data diffusion. Cell 174(3):716–729.e727. https://doi.org/10.1016/j.cell.2018.05.061
Ronen J, Akalin A (2018) netSmooth: network-smoothing based imputation for single cell RNA-seq. F1000Res 7:8. https://doi.org/10.12688/f1000research.13511.3
Wagner F, Yan Y, Yanai I (2017) K-nearest neighbor smoothing for high-throughput single-cell RNA-Seq data. bioRxiv:217737. https://doi.org/10.1101/217737
Zhang L, Zhang S (2018) Comparison of computational methods for imputing single-cell RNA-sequencing data. IEEE/ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2018.2848633
Andrews TS, Hemberg M (2018) False signals induced by single-cell imputation. F1000Res 7:1740. https://doi.org/10.12688/f1000research.16613.2
Buettner F, Natarajan KN, Casale FP et al (2015) Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat Biotechnol 33(2):155–160. https://doi.org/10.1038/nbt.3102
Katayama S, Tohonen V, Linnarsson S et al (2013) SAMstrt: statistical test for differential expression in single-cell transcriptome with spike-in normalization. Bioinformatics 29(22):2943–2945. https://doi.org/10.1093/bioinformatics/btt511
Ding B, Zheng L, Zhu Y et al (2015) Normalization and noise reduction for single cell RNA-seq experiments. Bioinformatics 31(13):2225–2227. https://doi.org/10.1093/bioinformatics/btv122
Lun ATL, Calero-Nieto FJ, Haim-Vilmovsky L et al (2017) Assessing the reliability of spike-in normalization for analyses of single-cell RNA sequencing data. Genome Res 27(11):1795–1806. https://doi.org/10.1101/gr.222877.117
Vieth B, Parekh S, Ziegenhain C et al (2019) A systematic evaluation of single cell RNA-Seq analysis pipelines: library preparation and normalisation methods have the biggest impact on the performance of scRNA-seq studies. bioRxiv:583013. https://doi.org/10.1101/583013
Buttner M, Miao Z, Wolf FA et al (2019) A test metric for assessing single-cell RNA-seq batch correction. Nat Methods 16(1):43–49. https://doi.org/10.1038/s41592-018-0254-1
Haghverdi L, Lun ATL, Morgan MD et al (2018) Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat Biotechnol 36(5):421–427. https://doi.org/10.1038/nbt.4091
Stuart T, Butler A, Hoffman P et al (2018) Comprehensive integration of single cell data. bioRxiv:460147. https://doi.org/10.1101/460147
Kiselev VY, Andrews TS, Hemberg M (2019) Challenges in unsupervised clustering of single-cell RNA-seq data. Nat Rev 20(5):273–282. https://doi.org/10.1038/s41576-018-0088-9
Brennecke P, Anders S, Kim JK et al (2013) Accounting for technical noise in single-cell RNA-seq experiments. Nat Methods 10(11):1093–1095. https://doi.org/10.1038/nmeth.2645
Fan J, Salathia N, Liu R et al (2016) Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis. Nat Methods 13(3):241–244. https://doi.org/10.1038/nmeth.3734
Usoskin D, Furlan A, Islam S et al (2015) Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing. Nat Neurosci 18(1):145–153. https://doi.org/10.1038/nn.3881
Hyvarinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4–5):411–430
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396. https://doi.org/10.1162/089976603321780317
Van Der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
Hicks SC, Townes FW, Teng M et al (2018) Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 19(4):562–578. https://doi.org/10.1093/biostatistics/kxx053
Risso D, Perraudeau F, Gribkova S et al (2018) A general and flexible method for signal extraction from single-cell RNA-seq data. Nat Commun 9(1):284. https://doi.org/10.1038/s41467-017-02554-5
Kobak D, Berens P (2018) The art of using t-SNE for single-cell transcriptomics. bioRxiv:453449. https://doi.org/10.1101/453449
Wattenberg M, Viegas F, Johnson I (2016) How to use t-SNE effectively. Distill.pub. https://doi.org/10.23915/distill.00002
Linderman GC, Rachh M, Hoskins JG et al (2019) Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data. Nat Methods 16(3):243–245. https://doi.org/10.1038/s41592-018-0308-4
McInnes L, Healy J, Melville J (2018) UMAP: uniform manifold approximation and projection for dimension reduction. arXiv e-prints
Becht E, McInnes L, Healy J et al (2018) Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 37:38. https://doi.org/10.1038/nbt.4314. https://www.nature.com/articles/nbt.4314#supplementary-information
Tung PY, Blischak JD, Hsiao CJ et al (2017) Batch effects and the effective design of single-cell gene expression studies. Sci Rep 7:39921. https://doi.org/10.1038/srep39921
Andrews TS, Hemberg M (2018) Identifying cell populations with scRNASeq. Mol Asp Med 59:114–122. https://doi.org/10.1016/j.mam.2017.07.002
Navin NE (2014) Cancer genomics: one cell at a time. Genome Biol 15(8):452. https://doi.org/10.1186/s13059-014-0452-9
Wang Y, Navin NE (2015) Advances and applications of single-cell sequencing technologies. Mol Cell 58(4):598–609. https://doi.org/10.1016/j.molcel.2015.05.005
Duo A, Robinson MD, Soneson C (2018) A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Res 7:1141. https://doi.org/10.12688/f1000research.15666.2
Kiselev VY, Kirschner K, Schaub MT et al (2017) SC3: consensus clustering of single-cell RNA-seq data. Nat Methods 14(5):483–486. https://doi.org/10.1038/nmeth.4236
Wang B, Zhu J, Pierson E et al (2017) Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat Methods 14(4):414–416. https://doi.org/10.1038/nmeth.4207
Grun D, Lyubimova A, Kester L et al (2015) Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature 525(7568):251–255. https://doi.org/10.1038/nature14966
Zurauskiene J, Yau C (2016) pcaReduce: hierarchical clustering of single cell transcriptional profiles. BMC Bioinformatics 17:140. https://doi.org/10.1186/s12859-016-0984-y
Lin P, Troup M, Ho JW (2017) CIDR: ultrafast and accurate clustering through imputation for single-cell RNA-seq data. Genome Biol 18(1):59. https://doi.org/10.1186/s13059-017-1188-0
Zeisel A, Munoz-Manchado AB, Codeluppi S et al (2015) Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347(6226):1138–1142. https://doi.org/10.1126/science.aaa1934
Guo M, Wang H, Potter SS et al (2015) SINCERA: a pipeline for single-cell RNA-Seq profiling analysis. PLoS Comput Biol 11(11):e1004575. https://doi.org/10.1371/journal.pcbi.1004575
Chen J, Schlitzer A, Chakarov S et al (2016) Mpath maps multi-branching single-cell trajectories revealing progenitor cell progression during development. Nat Commun 7:11988. https://doi.org/10.1038/ncomms11988
Senabouth A, Lukowski SW, Alquicira Hernandez J et al (2017) ascend: R package for analysis of single cell RNA-seq data. bioRxiv:207704. https://doi.org/10.1101/207704
Ester M, Kriegel H-P, et al (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. Paper presented at the Proceedings of the Second International Conference on Knowledge discovery and data mining, Portland, Oregon
Jiang L, Chen H, Pinello L et al (2016) GiniClust: detecting rare cell types from single-cell gene expression data with Gini index. Genome Biol 17(1):144. https://doi.org/10.1186/s13059-016-1010-4
Trapnell C, Cacchiarelli D, Grimsby J et al (2014) The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol 32(4):381–386. https://doi.org/10.1038/nbt.2859
Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci U S A 105(4):1118–1123. https://doi.org/10.1073/pnas.0706851105
Blondel VD, Guillaume J-L, Lambiotte R et al (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008:10008
Lancichinetti A, Fortunato S (2009) Community detection algorithms: a comparative analysis. Phys Rev E 80(5):056117. https://doi.org/10.1103/PhysRevE.80.056117
Levine JH, Simonds EF, Bendall SC et al (2015) Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162(1):184–197. https://doi.org/10.1016/j.cell.2015.05.047
Ding J, Shah S, Condon A (2016) densityCut: an efficient and versatile topological approach for automatic clustering of biological data. Bioinformatics 32(17):2567–2576. https://doi.org/10.1093/bioinformatics/btw227
Xu C, Su Z (2015) Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31(12):1974–1980. https://doi.org/10.1093/bioinformatics/btv088
Wolf FA, Angerer P, Theis FJ (2018) SCANPY: large-scale single-cell gene expression data analysis. Genome Biol 19(1):15. https://doi.org/10.1186/s13059-017-1382-0
Baran Y, Sebe-Pedros A, Lubling Y et al (2018) MetaCell: analysis of single cell RNA-seq data using k-NN graph partitions. bioRxiv:437665. https://doi.org/10.1101/437665
Xie P, Gao M, Wang C et al (2019) SuperCT: a supervised-learning framework for enhanced characterization of single-cell transcriptomic profiles. Nucleic Acids Res. https://doi.org/10.1093/nar/gkz116
Aran D, Looney AP, Liu L et al (2019) Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat Immunol 20(2):163–172. https://doi.org/10.1038/s41590-018-0276-y
Li J, Smalley I, Schell MJ et al (2017) SinCHet: a MATLAB toolbox for single cell heterogeneity analysis in cancer. Bioinformatics 33(18):2951–2953. https://doi.org/10.1093/bioinformatics/btx297
Ferrall-Fairbanks MC, Ball M, Padron E et al (2019) Leveraging single-cell RNA sequencing experiments to model intratumor heterogeneity. JCO Clin Cancer Informatics 3:1–10. https://doi.org/10.1200/cci.18.00074
Yang X, Liu D, Liu F et al (2013) HTQC: a fast quality control toolkit for Illumina sequencing data. BMC Bioinformatics 14:33. https://doi.org/10.1186/1471-2105-14-33
Patel RK, Jain M (2012) NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One 7(2):e30619. https://doi.org/10.1371/journal.pone.0030619
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15):2114–2120. https://doi.org/10.1093/bioinformatics/btu170
Cox MP, Peterson DA, Biggs PJ (2010) SolexaQA: at-a-glance quality assessment of Illumina second-generation sequencing data. BMC Bioinformatics 11:485. https://doi.org/10.1186/1471-2105-11-485
Trapnell C, Pachter L, Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25(9):1105–1111
Robertson G, Schein J, Chiu R et al (2010) De novo assembly and analysis of RNA-seq data. Nat Methods 7(11):909–912. https://doi.org/10.1038/nmeth.1517
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Yu, X., Abbas-Aghababazadeh, F., Chen, Y.A., Fridley, B.L. (2021). Statistical and Bioinformatics Analysis of Data from Bulk and Single-Cell RNA Sequencing Experiments. In: Markowitz, J. (eds) Translational Bioinformatics for Therapeutic Development. Methods in Molecular Biology, vol 2194. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0849-4_9
Download citation
DOI: https://doi.org/10.1007/978-1-0716-0849-4_9
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-0848-7
Online ISBN: 978-1-0716-0849-4
eBook Packages: Springer Protocols