Identification of Cell Types from Single-Cell Transcriptomic Data

Part of the Methods in Molecular Biology book series (MIMB, volume 1935)


Unprecedented technological advances in single-cell RNA-sequencing (scRNA-seq) technology have now made it possible to profile genome-wide expression in single cells at low cost and high throughput. There is substantial ongoing effort to use scRNA-seq measurements to identify the “cell types” that form components of a complex tissue, akin to taxonomizing species in ecology. Cell type classification from scRNA-seq data involves the application of computational tools rooted in dimensionality reduction and clustering, and statistical analysis to identify molecular signatures that are unique to each type. As datasets continue to grow in size and complexity, computational challenges abound, requiring analytical methods to be scalable, flexible, and robust. Moreover, careful consideration needs to be paid to experimental biases and statistical challenges that are unique to these measurements to avoid artifacts. This chapter introduces these topics in the context of cell-type identification, and outlines an instructive step-by-step example bioinformatic pipeline for researchers entering this field.

Key words

Single-cell RNA-sequencing Transcriptomic classification Cell-type identification Cell taxonomy Clustering Unsupervised machine learning Cross-species comparison of cell-types 



K. S. would like to acknowledge support from NIH 1K99EY028625-01, the Klarman Cell Observatory, and the laboratory of Dr. Aviv Regev at the Broad Institute. We would like to gratefully acknowledge critical feedback from Drs. Inbal Benhar and Jose Ordovas-Montanes.


  1. 1.
    Vickaryous MK, Hall BK (2006) Human cell type diversity, evolution, development, and classification with special reference to cells derived from the neural crest. Biol Rev Camb Philos Soc 81(3):425–455PubMedCrossRefGoogle Scholar
  2. 2.
    Regev A et al (2017) The human cell atlas. Elife:6Google Scholar
  3. 3.
    Tosches MA et al (2018) Evolution of pallium, hippocampus, and cortical cell types revealed by single-cell transcriptomics in reptiles. Science 360(6391):881–888PubMedCrossRefGoogle Scholar
  4. 4.
    Boisset JC et al (2018) Mapping the physical network of cellular interactions. Nat MethodsGoogle Scholar
  5. 5.
    Tanay A, Regev A (2017) Scaling single-cell genomics from phenomenology to mechanism. Nature 541(7637):331–338PubMedPubMedCentralCrossRefGoogle Scholar
  6. 6.
    Trapnell C (2015) Defining cell types and states with single-cell genomics. Genome Res 25(10):1491–1498PubMedPubMedCentralCrossRefGoogle Scholar
  7. 7.
    Cleary B et al (2017) Efficient generation of transcriptomic profiles by random composite measurements. Cell 171(6):1424–1436.e18PubMedPubMedCentralCrossRefGoogle Scholar
  8. 8.
    Klein AM et al (2015) Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161(5):1187–1201PubMedPubMedCentralCrossRefGoogle Scholar
  9. 9.
    Macosko EZ et al (2015) Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161(5):1202–1214PubMedPubMedCentralCrossRefGoogle Scholar
  10. 10.
    Zheng GX et al (2017) Massively parallel digital transcriptional profiling of single cells. Nat Commun 8:14049PubMedPubMedCentralCrossRefGoogle Scholar
  11. 11.
    Habib N et al (2016) Div-Seq: single-nucleus RNA-Seq reveals dynamics of rare adult newborn neurons. Science 353(6302):925–928PubMedPubMedCentralCrossRefGoogle Scholar
  12. 12.
    Lake BB et al (2016) Neuronal subtypes and diversity revealed by single-nucleus RNA sequencing of the human brain. Science 352(6293):1586–1590PubMedPubMedCentralCrossRefGoogle Scholar
  13. 13.
    Shekhar K et al (2016) Comprehensive classification of retinal bipolar neurons by single-cell transcriptomics. Cell 166(5):1308–1323.e30PubMedPubMedCentralCrossRefGoogle Scholar
  14. 14.
    Villani A-C et al (2017) Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science 356(6335):eaah4573PubMedPubMedCentralCrossRefGoogle Scholar
  15. 15.
    Tasic B et al (2016) Adult mouse cortical cell taxonomy revealed by single cell transcriptomics. Nat Neurosci 19(2):335–346PubMedPubMedCentralCrossRefGoogle Scholar
  16. 16.
    Zeng H, Sanes JR (2017) Neuronal cell-type classification: challenges, opportunities and the path forward. Nat Rev Neurosci 18(9):530PubMedCrossRefGoogle Scholar
  17. 17.
    Stegle O, Teichmann SA, Marioni JC (2015) Computational and analytical challenges in single-cell transcriptomics. Nat Rev Genet 16(3):133CrossRefGoogle Scholar
  18. 18.
    Arendt D (2008) The evolution of cell types in animals: emerging principles from molecular studies. Nat Rev Genet 9(11):868–882PubMedCrossRefGoogle Scholar
  19. 19.
    Ecker JR et al (2017) The BRAIN initiative cell census consortium: lessons learned toward generating a comprehensive BRAIN cell atlas. Neuron 96(3):542–557PubMedPubMedCentralCrossRefGoogle Scholar
  20. 20.
    Kolodziejczyk AA et al (2015) The technology and biology of single-cell RNA sequencing. Mol Cell 58(4):610–620CrossRefGoogle Scholar
  21. 21.
    Islam S et al (2014) Quantitative single-cell RNA-seq with unique molecular identifiers. Nat Methods 11(2):163CrossRefGoogle Scholar
  22. 22.
    Menon V (2017) Clustering single cells: a review of approaches on high- and low-depth single-cell RNA-seq data. Brief Funct GenomicsGoogle Scholar
  23. 23.
    Hicks SC, Teng M, Irizarry RA (2015, 025528) On the widespread and critical impact of systematic bias and batch effects in single-cell RNA-Seq data. bioRxivGoogle Scholar
  24. 24.
    Butler A et al (2018) Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol 36(5):411PubMedPubMedCentralCrossRefGoogle Scholar
  25. 25.
    Haghverdi L et al (2018) Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat Biotechnol 36:421–427PubMedPubMedCentralCrossRefGoogle Scholar
  26. 26.
    Lopez R et al (2018) Bayesian inference for a generative model of transcriptome profiles from single-cell RNA sequencing. bioRxiv:292037Google Scholar
  27. 27.
    Lee JH et al (2014) Highly multiplexed subcellular RNA sequencing in situ. Science 343(6177):1360–1363PubMedPubMedCentralCrossRefGoogle Scholar
  28. 28.
    Stahl PL et al (2016) Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353(6294):78–82PubMedCrossRefGoogle Scholar
  29. 29.
    Chen KH et al (2015) Spatially resolved, highly multiplexed RNA profiling in single cells. Science 348(6233):aaa6090PubMedPubMedCentralCrossRefGoogle Scholar
  30. 30.
    Lubeck E et al (2014) Single-cell in situ RNA profiling by sequential hybridization. Nat Methods 11(4):360PubMedPubMedCentralCrossRefGoogle Scholar
  31. 31.
    Fuzik J et al (2016) Integration of electrophysiological recordings with single-cell RNA-seq data identifies neuronal subtypes. Nat Biotechnol 34(2):175PubMedCrossRefGoogle Scholar
  32. 32.
    Dixit A et al (2016) Perturb-Seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell 167(7):1853–1866.e17PubMedPubMedCentralCrossRefGoogle Scholar
  33. 33.
    Stoeckius M et al (2017) Simultaneous epitope and transcriptome measurement in single cells. Nat Methods 14(9):865PubMedPubMedCentralCrossRefGoogle Scholar
  34. 34.
    Frieda KL et al (2017) Synthetic recording and in situ readout of lineage information in single cells. Nature 541(7635):107–111PubMedCrossRefGoogle Scholar
  35. 35.
    Raj B et al (2018) Simultaneous single-cell profiling of lineages and cell types in the vertebrate brain. Nat Biotechnol 36(5):442–450PubMedPubMedCentralCrossRefGoogle Scholar
  36. 36.
    Pertea M et al (2016) Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc 11(9):1650PubMedPubMedCentralCrossRefGoogle Scholar
  37. 37.
    Villani AC, Shekhar K (2017) Single-cell RNA sequencing of human T cells. Methods Mol Biol 1514:203–239PubMedCrossRefGoogle Scholar
  38. 38.
    Satija R et al (2015) Spatial reconstruction of single-cell gene expression data. Nat Biotechnol 33(5):495–502PubMedPubMedCentralCrossRefGoogle Scholar
  39. 39.
    Lake BB et al (2018) Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain. Nat Biotechnol 36(1):70–80PubMedCrossRefGoogle Scholar
  40. 40.
    Pandey S et al (2018) Comprehensive identification and spatial mapping of Habenular neuronal types using single-cell RNA-Seq. Curr Biol 28(7):1052–1065.e7PubMedCrossRefGoogle Scholar
  41. 41.
    Andrews TS, Hemberg M (2017) Identifying cell populations with scRNASeq. Mol Asp MedGoogle Scholar
  42. 42.
    Brennecke P et al (2013) Accounting for technical noise in single-cell RNA-seq experiments. Nat Methods 10(11):1093CrossRefGoogle Scholar
  43. 43.
    Keogh E, Mueen A (2017) Curse of dimensionality. In: Encyclopedia of machine learning and data mining. Springer, pp 314–315Google Scholar
  44. 44.
    Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(6):417CrossRefGoogle Scholar
  45. 45.
    Hyvärinen A, Karhunen J, Oja E (2004) Independent component analysis, vol 46. Wiley, New YorkGoogle Scholar
  46. 46.
    Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Leen TK, Dietterich TG, Tresp V (eds) Advances in neural information processing systems, vol 13. MIT, Cambridge, UKGoogle Scholar
  47. 47.
    Haghverdi L et al (2016) Diffusion pseudotime robustly reconstructs lineage branching. Nat Methods 13(10):845PubMedPubMedCentralCrossRefGoogle Scholar
  48. 48.
    Lancichinetti A, Fortunato S (2009) Community detection algorithms: a comparative analysis. Phys Rev E Stat Nonlinear Soft Matter Phys 80(5 Pt 2):056117CrossRefGoogle Scholar
  49. 49.
    Levine JH et al (2015) Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162(1):184–197PubMedPubMedCentralCrossRefGoogle Scholar
  50. 50.
    LVD M, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(Nov):2579–2605Google Scholar
  51. 51.
    Soneson C, Robinson MD (2018) Bias, robustness and scalability in single-cell differential expression analysis. Nat Methods 15(4):255PubMedCrossRefGoogle Scholar
  52. 52.
    Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Klarman Cell ObservatoryBroad Institute of MIT and HarvardCambridgeUSA
  2. 2.Janelia Research CampusHoward Hughes Medical InstituteAshburnUSA
  3. 3.Columbia University Medical CenterNew YorkUSA

Personalised recommendations