Advertisement

A Bioinformatic Toolkit for Single-Cell mRNA Analysis

Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1979)

Abstract

The recent technological developments in the field of single-cell RNA-Seq enable us to assay the transcriptome of up to a million single cells in parallel. However, the analyses of such big datasets present a major challenge. During the last decade, a wide variety of strategies have been proposed covering different steps of the analysis. Here, we introduce a selection of computational tools to provide an overview of a generic analysis pipeline.

The first step of every scRNA-Seq experiment is proper study design, which does not require sophisticated experimental or informatics skills but is nonetheless presumably the most important step. The quality of the resulting data strictly depends on the proper planning of the experiment, including the selection of the most suitable technology for the biological question of interest as well as an elaborated study design to minimize the influence of confounding factors. Once the experiment has been conducted, the raw sequencing data needs to be processed to extract the gene expression information for each cell. This task comprises quality assessment of the sequenced reads, alignment against a reference genome, demultiplexing of the cell barcodes, and quantification of the reads/transcripts per gene. As any other transcriptomics technology, single-cell mRNA-Seq requires data normalization to assure sample-to-sample, here cell-to-cell, comparability and the consideration of confounding factors.

Once gene expression values have been extracted from the reads and normalized, the researcher has the agony of choosing between a plethora of analysis approaches to investigate diverse aspects of the single-cell transcriptomes, such as dimensionality reduction and clustering to explore cellular heterogeneity or trajectory analysis to model differentiation processes.

In this chapter, we present a wrap-up of the abovementioned steps to conduct single-cell RNA-Seq analyses and present a selection of existing tools.

Key words

Single-cell mRNA-Seq Data analysis Guidelines 

Notes

Acknowledgments

The authors would like to acknowledge Prof. Dr. med. Joachim L. Schultze for support and advice during the writing process. Moreover, the authors Paweł Biernat and Matthias Becker are supported by a grant from the Federal Ministry for Economic Affairs and Energy (BMWi Project FASTGenomics). The work of Jonas Schulte-Schrepping receives funding from the European Union’s Horizon 2020 research and innovation program under grant agreement no. 733100 (SYSCID). The DFG graduate program 2168/1 (Bonn and Melbourne International Research and Training Group—Bo&MeRanG) supports Patrick Günther.

References

  1. 1.
    Tang F, Barbacioru C, Wang Y et al (2009) mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods 6:377–382.  https://doi.org/10.1038/nmeth.1315CrossRefPubMedGoogle Scholar
  2. 2.
    Picelli S, Björklund ÅK, Faridani OR et al (2013) Smart-Seq2 for sensitive full-length transcriptome profiling in single cells. Nat Methods 10:1096–1098.  https://doi.org/10.1038/nmeth.2639CrossRefPubMedGoogle Scholar
  3. 3.
    Islam S, Kjällquist U, Moliner A et al (2011) Characterization of the single-cell transcriptional landscape by highly multiplex RNA-Seq. Genome Res 21:1160–1167.  https://doi.org/10.1101/gr.110882.110CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Macosko EZ, Basu A, Satija R et al (2015) Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161:1202–1214.  https://doi.org/10.1016/j.cell.2015.05.002CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Gierahn TM, Wadsworth MH, Hughes TK et al (2017) Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput. Nat Methods 14:395–398.  https://doi.org/10.1038/nmeth.4179CrossRefPubMedPubMedCentralGoogle Scholar
  6. 6.
    Cao J, Packer JS, Ramani V et al (2017) Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357:661–667.  https://doi.org/10.1126/science.aam8940CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Cadwell CR, Palasantza A, Jiang X et al (2016) Electrophysiological, transcriptomic and morphologic profiling of single neurons using Patch-Seq. Nat Biotechnol 34:199–203.  https://doi.org/10.1038/nbt.3445
  8. 8.
    Paul F, Arkin Y, Giladi A et al (2015) Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell 163:1663–1677.  https://doi.org/10.1016/j.cell.2015.11.013CrossRefPubMedGoogle Scholar
  9. 9.
    Klein AM, Mazutis L, Akartuna I et al (2015) Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161:1187–1201.  https://doi.org/10.1016/J.CELL.2015.04.044CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Fan HC, Fu GK, SP a F (2015) Combinatorial labeling of single cells for gene expression cytometry. Science 347:1258367.  https://doi.org/10.1126/science.1258367CrossRefPubMedGoogle Scholar
  11. 11.
    Goldstein LD, Chen Y-JJ, Dunne J et al (2017) Massively parallel nanowell-based single-cell gene expression profiling. BMC Genomics 18:519.  https://doi.org/10.1186/s12864-017-3893-1CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Dey SS, Kester L, Spanjaard B et al (2015) Integrated genome and transcriptome sequencing of the same cell. Nat Biotechnol 33:285.  https://doi.org/10.1038/nbt.3129CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Angermueller C, Clark SJ, Lee HJ et al (2016) Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity. Nat Methods 13:229.  https://doi.org/10.1038/nmeth.3728CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Hou Y, Guo H, Cao C et al (2016) Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas. Cell Res 26:304.  https://doi.org/10.1038/cr.2016.23
  15. 15.
    Stoeckius M, Hafemeister C, Stephenson W et al (2017) Simultaneous epitope and transcriptome measurement in single cells. Nat Methods 14:865.  https://doi.org/10.1038/nmeth.4380CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Kang HM, Subramaniam M, Targ S et al (2017) Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat Biotechnol 36:89–94.  https://doi.org/10.1038/nbt.4042CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Langmead B, Nellore A (2018) Cloud computing for genomic data analysis and collaboration. Nat Rev Genet 19:208–219.  https://doi.org/10.1038/nrg.2017.113CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Regev A, Teichmann SA, Lander ES et al (2017) The human cell atlas. Elife 6:e27041.  https://doi.org/10.7554/eLife.27041CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Beaulieu-Jones BK, Greene CS (2017) Reproducibility of computational workflows is automated using continuous analysis. Nat Biotechnol 35:342–346.  https://doi.org/10.1038/nbt.3780CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Bray NL, Pimentel H, Melsted P, Pachter L (2016) Near-optimal probabilistic RNA-Seq quantification. Nat Biotechnol 34:525–527.  https://doi.org/10.1038/nbt.3519
  21. 21.
    Dobin A, Davis CA, Schlesinger F et al (2013) STAR: ultrafast universal RNA-Seq aligner. Bioinformatics 29:15–21.  https://doi.org/10.1093/bioinformatics/bts635CrossRefGoogle Scholar
  22. 22.
    Dutton G (2016) From DNA to diagnosis without delay. Genet Eng Biotechnol News 36:8–9.  https://doi.org/10.1089/gen.36.05.03CrossRefGoogle Scholar
  23. 23.
    Turakhia Y, Bejerano G, Dally WJ (2018) Darwin. In: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS ’18. ACM Press, New York, NY, pp 199–213CrossRefGoogle Scholar
  24. 24.
    Lopez R, Regier J, Cole M, et al (2017) A deep generative model for gene expression profiles from single-cell RNA sequencingGoogle Scholar
  25. 25.
    Wolf FA, Angerer P, Theis FJ (2018) SCANPY: large-scale single-cell gene expression data analysis. Genome Biol 19:15.  https://doi.org/10.1186/s13059-017-1382-0CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120.  https://doi.org/10.1093/bioinformatics/btu170CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Jaitin DA, Kenigsberg E, Keren-Shaul H et al (2014) Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science 343:776–779.  https://doi.org/10.1126/science.1247651CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Smith T, Heger A, Sudbery I (2017) UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res 27:491–499.  https://doi.org/10.1101/gr.209601.116CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Parekh S, Ziegenhain C, Vieth B et al (2018) zUMIs: a fast and flexible pipeline to process RNA sequencing data with UMIs. bioRxiv:153940.  https://doi.org/10.1101/153940
  30. 30.
    Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12:357–360.  https://doi.org/10.1038/nmeth.3317CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    Kim D, Pertea G, Trapnell C et al (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36.  https://doi.org/10.1186/gb-2013-14-4-r36CrossRefPubMedPubMedCentralGoogle Scholar
  32. 32.
    Patro R, Duggal G, Love MI et al (2017) Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14:417–419.  https://doi.org/10.1038/nmeth.4197CrossRefPubMedPubMedCentralGoogle Scholar
  33. 33.
    Liao Y, Smyth GK, Shi W (2014) featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30:923–930.  https://doi.org/10.1093/bioinformatics/btt656CrossRefPubMedGoogle Scholar
  34. 34.
    Anders S, Pyl PT, Huber W (2015) HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics 31:166–169.  https://doi.org/10.1093/bioinformatics/btu638CrossRefPubMedPubMedCentralGoogle Scholar
  35. 35.
    Ilicic T, Kim JK, Kolodziejczyk AA et al (2016) Classification of low quality cells from single-cell RNA-seq data. Genome Biol 17:29.  https://doi.org/10.1186/s13059-016-0888-1CrossRefPubMedPubMedCentralGoogle Scholar
  36. 36.
    Grün D, Kester L, van Oudenaarden A (2014) Validation of noise models for single-cell transcriptomics. Nat Methods 11:637–640.  https://doi.org/10.1038/nmeth.2930CrossRefPubMedGoogle Scholar
  37. 37.
    Butler A, Hoffman P, Smibert P et al (2018) Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol 36:411.  https://doi.org/10.1038/nbt.4096CrossRefPubMedPubMedCentralGoogle Scholar
  38. 38.
    Diaz A, Liu SJ, Sandoval C et al (2016) SCell: integrated analysis of single-cell RNA-seq data. Bioinformatics 32:2219–2220.  https://doi.org/10.1093/bioinformatics/btw201CrossRefPubMedPubMedCentralGoogle Scholar
  39. 39.
    Vallejos CA, Risso D, Scialdone A et al (2017) Normalizing single-cell RNA sequencing data: challenges and opportunities. Nat Methods 14:565–571.  https://doi.org/10.1038/nmeth.4292CrossRefPubMedPubMedCentralGoogle Scholar
  40. 40.
    Qiu X, Hill A, Packer J et al (2017) Single-cell mRNA quantification and differential analysis with Census. Nat Methods 14:309.  https://doi.org/10.1038/nmeth.4150CrossRefPubMedPubMedCentralGoogle Scholar
  41. 41.
    Grün D, Van Oudenaarden A (2015) Design and analysis of single-cell sequencing experiments. Cell 163:799.  https://doi.org/10.1016/j.cell.2015.10.039CrossRefPubMedGoogle Scholar
  42. 42.
    Buettner F, Natarajan KN, Casale FP et al (2015) Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat Biotechnol 33:155–160.  https://doi.org/10.1038/nbt.3102CrossRefPubMedGoogle Scholar
  43. 43.
    Yu P, Lin W (2016) Single-cell transcriptome study as big data. Genomics Proteomics Bioinformatics 14:21CrossRefGoogle Scholar
  44. 44.
    Shalek AK, Satija R, Adiconis X et al (2013) Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 498:236–240.  https://doi.org/10.1038/nature12172CrossRefPubMedPubMedCentralGoogle Scholar
  45. 45.
    Lin P, Troup M, Ho JWK (2016) CIDR: ultrafast and accurate clustering through imputation for single cell RNA-Seq data. bioRxiv.  https://doi.org/10.1101/068775
  46. 46.
    Pierson E, Yau C (2015) ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol 16:241.  https://doi.org/10.1186/s13059-015-0805-zCrossRefPubMedPubMedCentralGoogle Scholar
  47. 47.
    Grün D, Lyubimova A, Kester L et al (2015) Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature 525:251–255.  https://doi.org/10.1038/nature14966CrossRefPubMedGoogle Scholar
  48. 48.
    van DD, Nainys J, Sharma R et al (2017) MAGIC: a diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data. bioRxiv:111591.  https://doi.org/10.1101/111591
  49. 49.
    Huang M, Wang J, Torre E et al (2017) Gene expression recovery for single cell RNA sequencing. bioRxiv:138677.  https://doi.org/10.1101/138677
  50. 50.
    Li WV, Li JJ (2018) An accurate and robust imputation method scImpute for single-cell RNA-seq data. Nat Commun 9:997.  https://doi.org/10.1038/s41467-018-03405-7CrossRefPubMedPubMedCentralGoogle Scholar
  51. 51.
    Pearson K (1901) LIII. On lines and planes of closest fit to systems of points in space. London, Edinburgh. Dublin Philos Mag J Sci 2:559–572.  https://doi.org/10.1080/14786440109462720CrossRefGoogle Scholar
  52. 52.
    Van Der ML, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605.  https://doi.org/10.1007/s10479-011-0841-3CrossRefGoogle Scholar
  53. 53.
    Wattenberg M, Viégas F, Johnson I (2016) How to use t-SNE effectively. Distill 1:e2.  https://doi.org/10.23915/distill.00002CrossRefGoogle Scholar
  54. 54.
    Gisbrecht A, Schulz A, Hammer B (2015) Parametric nonlinear dimensionality reduction using kernel t-SNE. Neurocomputing 147:71–82.  https://doi.org/10.1016/j.neucom.2013.11.045CrossRefGoogle Scholar
  55. 55.
    Lopez R, Regier J, Cole MB et al (2018) Bayesian inference for a generative model of transcriptome profiles from single-cell RNA sequencing. bioRxiv:292037.  https://doi.org/10.1101/292037
  56. 56.
    Eraslan G, Simon LM, Mircea M et al (2018) Single cell RNA-seq denoising using a deep count autoencoder. bioRxiv:300681.  https://doi.org/10.1101/300681
  57. 57.
    Wang D, Gu J (2017) VASC: dimension reduction and visualization of single cell RNA sequencing data by deep variational autoencoder. bioRxiv:199315.  https://doi.org/10.1101/199315
  58. 58.
    Haghverdi L, Buettner F, Theis FJ (2014) Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics 31:2989.  https://doi.org/10.1093/bioinformatics/btv325CrossRefGoogle Scholar
  59. 59.
    Haghverdi L, Büttner M, Wolf FA et al (2016) Diffusion pseudotime robustly reconstructs lineage branching. Nat Methods 13:845.  https://doi.org/10.1038/nmeth.3971CrossRefPubMedGoogle Scholar
  60. 60.
    McInnes L, Healy J (2018) UMAP: Uniform Manifold Approximation and Projection for dimension reductionGoogle Scholar
  61. 61.
    Becht E, Dutertre C-A, Kwok IWH et al (2018) Evaluation of UMAP as an alternative to t-SNE for single-cell data. bioRxiv:298430.  https://doi.org/10.1101/298430
  62. 62.
    Trapnell C, Cacchiarelli D, Grimsby J et al (2014) The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol 32:381–386.  https://doi.org/10.1038/nbt.2859CrossRefPubMedPubMedCentralGoogle Scholar
  63. 63.
    Juliá M, Telenti A, Rausell A (2015) Sincell: an R/Bioconductor package for statistical assessment of cell-state hierarchies from single-cell RNA-seq. Bioinformatics 31:3380–3382.  https://doi.org/10.1093/bioinformatics/btv368CrossRefPubMedPubMedCentralGoogle Scholar
  64. 64.
    Ji Z, Ji H (2016) TSCAN: pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res 44:e117–e117.  https://doi.org/10.1093/nar/gkw430CrossRefPubMedPubMedCentralGoogle Scholar
  65. 65.
    Saelens W, Cannoodt R, Todorov H, Saeys Y (2018) A comparison of single-cell trajectory inference methods: towards more accurate and robust tools. bioRxiv:276907.  https://doi.org/10.1101/276907
  66. 66.
    Cannoodt R, Saelens W, Sichien D et al (2016) SCORPIUS improves trajectory inference and identifies novel modules in dendritic cell development. bioRxiv:79509.  https://doi.org/10.1101/079509
  67. 67.
    Street K, Risso D, Fletcher RB et al (2017) Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. bioRxiv:128843.  https://doi.org/10.1101/128843
  68. 68.
    Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks.  https://doi.org/10.1088/1742-5468/2008/10/P10008
  69. 69.
    Xu C, Su Z (2015) Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31:1974–1980.  https://doi.org/10.1093/bioinformatics/btv088CrossRefPubMedPubMedCentralGoogle Scholar
  70. 70.
    Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. AAAI Press, Palo Alto, CA, pp 226–231Google Scholar
  71. 71.
    Mass E, Ballesteros I, Farlik M et al (2016) Specification of tissue-resident macrophages during organogenesis. Science 353:aaf4238.  https://doi.org/10.1126/science.aaf4238CrossRefPubMedPubMedCentralGoogle Scholar
  72. 72.
    Scholz CJ, Biernat P, Becker M et al (2018) FASTGenomics: an analytical ecosystem for single-cell RNA sequencing data. bioRxiv:272476.  https://doi.org/10.1101/272476
  73. 73.
    Zhu X, Wolfgruber TK, Tasato A et al (2017) Granatum: a graphical single-cell RNA-Seq analysis pipeline for genomics scientists. Genome Med 9:108.  https://doi.org/10.1186/s13073-017-0492-3CrossRefPubMedPubMedCentralGoogle Scholar
  74. 74.
    Gardeux V, David FPA, Shajkofci A et al (2017) ASAP: a web-based platform for the analysis and interactive visualization of single-cell RNA-seq data. Bioinformatics 33:3123–3125.  https://doi.org/10.1093/bioinformatics/btx337

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department for Genomics and Immunoregulation, Life and Medical Sciences Institute (LIMES)University of BonnBonnGermany
  2. 2.Platform for Single Cell Genomics and EpigenomicsGerman Center for Neurodegenerative Diseases (DZNE), University of BonnBonnGermany

Personalised recommendations