Abstract
The recent technological developments in the field of single-cell RNA-Seq enable us to assay the transcriptome of up to a million single cells in parallel. However, the analyses of such big datasets present a major challenge. During the last decade, a wide variety of strategies have been proposed covering different steps of the analysis. Here, we introduce a selection of computational tools to provide an overview of a generic analysis pipeline.
The first step of every scRNA-Seq experiment is proper study design, which does not require sophisticated experimental or informatics skills but is nonetheless presumably the most important step. The quality of the resulting data strictly depends on the proper planning of the experiment, including the selection of the most suitable technology for the biological question of interest as well as an elaborated study design to minimize the influence of confounding factors. Once the experiment has been conducted, the raw sequencing data needs to be processed to extract the gene expression information for each cell. This task comprises quality assessment of the sequenced reads, alignment against a reference genome, demultiplexing of the cell barcodes, and quantification of the reads/transcripts per gene. As any other transcriptomics technology, single-cell mRNA-Seq requires data normalization to assure sample-to-sample, here cell-to-cell, comparability and the consideration of confounding factors.
Once gene expression values have been extracted from the reads and normalized, the researcher has the agony of choosing between a plethora of analysis approaches to investigate diverse aspects of the single-cell transcriptomes, such as dimensionality reduction and clustering to explore cellular heterogeneity or trajectory analysis to model differentiation processes.
In this chapter, we present a wrap-up of the abovementioned steps to conduct single-cell RNA-Seq analyses and present a selection of existing tools.
Key words
- Single-cell
- mRNA-Seq
- Data analysis
- Guidelines
This is a preview of subscription content, access via your institution.
Buying options

References
Tang F, Barbacioru C, Wang Y et al (2009) mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods 6:377–382. https://doi.org/10.1038/nmeth.1315
Picelli S, Björklund ÅK, Faridani OR et al (2013) Smart-Seq2 for sensitive full-length transcriptome profiling in single cells. Nat Methods 10:1096–1098. https://doi.org/10.1038/nmeth.2639
Islam S, Kjällquist U, Moliner A et al (2011) Characterization of the single-cell transcriptional landscape by highly multiplex RNA-Seq. Genome Res 21:1160–1167. https://doi.org/10.1101/gr.110882.110
Macosko EZ, Basu A, Satija R et al (2015) Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161:1202–1214. https://doi.org/10.1016/j.cell.2015.05.002
Gierahn TM, Wadsworth MH, Hughes TK et al (2017) Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput. Nat Methods 14:395–398. https://doi.org/10.1038/nmeth.4179
Cao J, Packer JS, Ramani V et al (2017) Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357:661–667. https://doi.org/10.1126/science.aam8940
Cadwell CR, Palasantza A, Jiang X et al (2016) Electrophysiological, transcriptomic and morphologic profiling of single neurons using Patch-Seq. Nat Biotechnol 34:199–203. https://doi.org/10.1038/nbt.3445
Paul F, Arkin Y, Giladi A et al (2015) Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell 163:1663–1677. https://doi.org/10.1016/j.cell.2015.11.013
Klein AM, Mazutis L, Akartuna I et al (2015) Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161:1187–1201. https://doi.org/10.1016/J.CELL.2015.04.044
Fan HC, Fu GK, SP a F (2015) Combinatorial labeling of single cells for gene expression cytometry. Science 347:1258367. https://doi.org/10.1126/science.1258367
Goldstein LD, Chen Y-JJ, Dunne J et al (2017) Massively parallel nanowell-based single-cell gene expression profiling. BMC Genomics 18:519. https://doi.org/10.1186/s12864-017-3893-1
Dey SS, Kester L, Spanjaard B et al (2015) Integrated genome and transcriptome sequencing of the same cell. Nat Biotechnol 33:285. https://doi.org/10.1038/nbt.3129
Angermueller C, Clark SJ, Lee HJ et al (2016) Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity. Nat Methods 13:229. https://doi.org/10.1038/nmeth.3728
Hou Y, Guo H, Cao C et al (2016) Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas. Cell Res 26:304. https://doi.org/10.1038/cr.2016.23
Stoeckius M, Hafemeister C, Stephenson W et al (2017) Simultaneous epitope and transcriptome measurement in single cells. Nat Methods 14:865. https://doi.org/10.1038/nmeth.4380
Kang HM, Subramaniam M, Targ S et al (2017) Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat Biotechnol 36:89–94. https://doi.org/10.1038/nbt.4042
Langmead B, Nellore A (2018) Cloud computing for genomic data analysis and collaboration. Nat Rev Genet 19:208–219. https://doi.org/10.1038/nrg.2017.113
Regev A, Teichmann SA, Lander ES et al (2017) The human cell atlas. Elife 6:e27041. https://doi.org/10.7554/eLife.27041
Beaulieu-Jones BK, Greene CS (2017) Reproducibility of computational workflows is automated using continuous analysis. Nat Biotechnol 35:342–346. https://doi.org/10.1038/nbt.3780
Bray NL, Pimentel H, Melsted P, Pachter L (2016) Near-optimal probabilistic RNA-Seq quantification. Nat Biotechnol 34:525–527. https://doi.org/10.1038/nbt.3519
Dobin A, Davis CA, Schlesinger F et al (2013) STAR: ultrafast universal RNA-Seq aligner. Bioinformatics 29:15–21. https://doi.org/10.1093/bioinformatics/bts635
Dutton G (2016) From DNA to diagnosis without delay. Genet Eng Biotechnol News 36:8–9. https://doi.org/10.1089/gen.36.05.03
Turakhia Y, Bejerano G, Dally WJ (2018) Darwin. In: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS ’18. ACM Press, New York, NY, pp 199–213
Lopez R, Regier J, Cole M, et al (2017) A deep generative model for gene expression profiles from single-cell RNA sequencing
Wolf FA, Angerer P, Theis FJ (2018) SCANPY: large-scale single-cell gene expression data analysis. Genome Biol 19:15. https://doi.org/10.1186/s13059-017-1382-0
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. https://doi.org/10.1093/bioinformatics/btu170
Jaitin DA, Kenigsberg E, Keren-Shaul H et al (2014) Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science 343:776–779. https://doi.org/10.1126/science.1247651
Smith T, Heger A, Sudbery I (2017) UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res 27:491–499. https://doi.org/10.1101/gr.209601.116
Parekh S, Ziegenhain C, Vieth B et al (2018) zUMIs: a fast and flexible pipeline to process RNA sequencing data with UMIs. bioRxiv:153940. https://doi.org/10.1101/153940
Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12:357–360. https://doi.org/10.1038/nmeth.3317
Kim D, Pertea G, Trapnell C et al (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36. https://doi.org/10.1186/gb-2013-14-4-r36
Patro R, Duggal G, Love MI et al (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14:417–419. https://doi.org/10.1038/nmeth.4197
Liao Y, Smyth GK, Shi W (2014) featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30:923–930. https://doi.org/10.1093/bioinformatics/btt656
Anders S, Pyl PT, Huber W (2015) HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics 31:166–169. https://doi.org/10.1093/bioinformatics/btu638
Ilicic T, Kim JK, Kolodziejczyk AA et al (2016) Classification of low quality cells from single-cell RNA-seq data. Genome Biol 17:29. https://doi.org/10.1186/s13059-016-0888-1
Grün D, Kester L, van Oudenaarden A (2014) Validation of noise models for single-cell transcriptomics. Nat Methods 11:637–640. https://doi.org/10.1038/nmeth.2930
Butler A, Hoffman P, Smibert P et al (2018) Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol 36:411. https://doi.org/10.1038/nbt.4096
Diaz A, Liu SJ, Sandoval C et al (2016) SCell: integrated analysis of single-cell RNA-seq data. Bioinformatics 32:2219–2220. https://doi.org/10.1093/bioinformatics/btw201
Vallejos CA, Risso D, Scialdone A et al (2017) Normalizing single-cell RNA sequencing data: challenges and opportunities. Nat Methods 14:565–571. https://doi.org/10.1038/nmeth.4292
Qiu X, Hill A, Packer J et al (2017) Single-cell mRNA quantification and differential analysis with Census. Nat Methods 14:309. https://doi.org/10.1038/nmeth.4150
Grün D, Van Oudenaarden A (2015) Design and analysis of single-cell sequencing experiments. Cell 163:799. https://doi.org/10.1016/j.cell.2015.10.039
Buettner F, Natarajan KN, Casale FP et al (2015) Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat Biotechnol 33:155–160. https://doi.org/10.1038/nbt.3102
Yu P, Lin W (2016) Single-cell transcriptome study as big data. Genomics Proteomics Bioinformatics 14:21
Shalek AK, Satija R, Adiconis X et al (2013) Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 498:236–240. https://doi.org/10.1038/nature12172
Lin P, Troup M, Ho JWK (2016) CIDR: ultrafast and accurate clustering through imputation for single cell RNA-Seq data. bioRxiv. https://doi.org/10.1101/068775
Pierson E, Yau C (2015) ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol 16:241. https://doi.org/10.1186/s13059-015-0805-z
Grün D, Lyubimova A, Kester L et al (2015) Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature 525:251–255. https://doi.org/10.1038/nature14966
van DD, Nainys J, Sharma R et al (2017) MAGIC: a diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data. bioRxiv:111591. https://doi.org/10.1101/111591
Huang M, Wang J, Torre E et al (2017) Gene expression recovery for single cell RNA sequencing. bioRxiv:138677. https://doi.org/10.1101/138677
Li WV, Li JJ (2018) An accurate and robust imputation method scImpute for single-cell RNA-seq data. Nat Commun 9:997. https://doi.org/10.1038/s41467-018-03405-7
Pearson K (1901) LIII. On lines and planes of closest fit to systems of points in space. London, Edinburgh. Dublin Philos Mag J Sci 2:559–572. https://doi.org/10.1080/14786440109462720
Van Der ML, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605. https://doi.org/10.1007/s10479-011-0841-3
Wattenberg M, Viégas F, Johnson I (2016) How to use t-SNE effectively. Distill 1:e2. https://doi.org/10.23915/distill.00002
Gisbrecht A, Schulz A, Hammer B (2015) Parametric nonlinear dimensionality reduction using kernel t-SNE. Neurocomputing 147:71–82. https://doi.org/10.1016/j.neucom.2013.11.045
Lopez R, Regier J, Cole MB et al (2018) Bayesian inference for a generative model of transcriptome profiles from single-cell RNA sequencing. bioRxiv:292037. https://doi.org/10.1101/292037
Eraslan G, Simon LM, Mircea M et al (2018) Single cell RNA-seq denoising using a deep count autoencoder. bioRxiv:300681. https://doi.org/10.1101/300681
Wang D, Gu J (2017) VASC: dimension reduction and visualization of single cell RNA sequencing data by deep variational autoencoder. bioRxiv:199315. https://doi.org/10.1101/199315
Haghverdi L, Buettner F, Theis FJ (2014) Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics 31:2989. https://doi.org/10.1093/bioinformatics/btv325
Haghverdi L, Büttner M, Wolf FA et al (2016) Diffusion pseudotime robustly reconstructs lineage branching. Nat Methods 13:845. https://doi.org/10.1038/nmeth.3971
McInnes L, Healy J (2018) UMAP: Uniform Manifold Approximation and Projection for dimension reduction
Becht E, Dutertre C-A, Kwok IWH et al (2018) Evaluation of UMAP as an alternative to t-SNE for single-cell data. bioRxiv:298430. https://doi.org/10.1101/298430
Trapnell C, Cacchiarelli D, Grimsby J et al (2014) The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol 32:381–386. https://doi.org/10.1038/nbt.2859
Juliá M, Telenti A, Rausell A (2015) Sincell: an R/Bioconductor package for statistical assessment of cell-state hierarchies from single-cell RNA-seq. Bioinformatics 31:3380–3382. https://doi.org/10.1093/bioinformatics/btv368
Ji Z, Ji H (2016) TSCAN: pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res 44:e117–e117. https://doi.org/10.1093/nar/gkw430
Saelens W, Cannoodt R, Todorov H, Saeys Y (2018) A comparison of single-cell trajectory inference methods: towards more accurate and robust tools. bioRxiv:276907. https://doi.org/10.1101/276907
Cannoodt R, Saelens W, Sichien D et al (2016) SCORPIUS improves trajectory inference and identifies novel modules in dendritic cell development. bioRxiv:79509. https://doi.org/10.1101/079509
Street K, Risso D, Fletcher RB et al (2017) Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. bioRxiv:128843. https://doi.org/10.1101/128843
Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. https://doi.org/10.1088/1742-5468/2008/10/P10008
Xu C, Su Z (2015) Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31:1974–1980. https://doi.org/10.1093/bioinformatics/btv088
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. AAAI Press, Palo Alto, CA, pp 226–231
Mass E, Ballesteros I, Farlik M et al (2016) Specification of tissue-resident macrophages during organogenesis. Science 353:aaf4238. https://doi.org/10.1126/science.aaf4238
Scholz CJ, Biernat P, Becker M et al (2018) FASTGenomics: an analytical ecosystem for single-cell RNA sequencing data. bioRxiv:272476. https://doi.org/10.1101/272476
Zhu X, Wolfgruber TK, Tasato A et al (2017) Granatum: a graphical single-cell RNA-Seq analysis pipeline for genomics scientists. Genome Med 9:108. https://doi.org/10.1186/s13073-017-0492-3
Gardeux V, David FPA, Shajkofci A et al (2017) ASAP: a web-based platform for the analysis and interactive visualization of single-cell RNA-seq data. Bioinformatics 33:3123–3125. https://doi.org/10.1093/bioinformatics/btx337
Acknowledgments
The authors would like to acknowledge Prof. Dr. med. Joachim L. Schultze for support and advice during the writing process. Moreover, the authors Paweł Biernat and Matthias Becker are supported by a grant from the Federal Ministry for Economic Affairs and Energy (BMWi Project FASTGenomics). The work of Jonas Schulte-Schrepping receives funding from the European Union’s Horizon 2020 research and innovation program under grant agreement no. 733100 (SYSCID). The DFG graduate program 2168/1 (Bonn and Melbourne International Research and Training Group—Bo&MeRanG) supports Patrick Günther.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Baßler, K., Günther, P., Schulte-Schrepping, J., Becker, M., Biernat, P. (2019). A Bioinformatic Toolkit for Single-Cell mRNA Analysis. In: Proserpio, V. (eds) Single Cell Methods. Methods in Molecular Biology, vol 1979. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9240-9_26
Download citation
DOI: https://doi.org/10.1007/978-1-4939-9240-9_26
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-4939-9239-3
Online ISBN: 978-1-4939-9240-9
eBook Packages: Springer Protocols