Data Analysis in Single-Cell Transcriptome Sequencing

  • Shan GaoEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1754)


Single-cell transcriptome sequencing, often referred to as single-cell RNA sequencing (scRNA-seq), is used to measure gene expression at the single-cell level and provides a higher resolution of cellular differences than bulk RNA-seq. With more detailed and accurate information, scRNA-seq will greatly promote the understanding of cell functions, disease progression, and treatment response. Although the scRNA-seq experimental protocols have been improved very quickly, many challenges in the scRNA-seq data analysis still need to be overcome. In this chapter, we focus on the introduction and discussion of the research status in the field of scRNA-seq data normalization and cluster analysis, which are the two most important challenges in the scRNA-seq data analysis. Particularly, we present a protocol to discover and validate cancer stem cells (CSCs) using scRNA-seq. Suggestions have also been made to help researchers rationally design their scRNA-seq experiments and data analysis in their future studies.

Key words

scRNA-seq Single-cell transcriptome sequencing Normalization Cluster analysis 



I appreciate help equally from the people listed below. They are Professor Wenjun Bu; Professor Lin Liu; Ph.D. student Hua Wang; Master’s student Yu Sun and Deshui Yu from College of Life Sciences, Nankai University; Professor Jishou Ruan; PhD student Zhenfeng Wu from School of Mathematical Sciences, Nankai University; and Associate Professor Weixiang Liu from Shenzhen University.


  1. 1.
    Gao S, Ou J, Xiao K (2014) R language and Bioconductor in bioinformatics applications (Chinese Edition). Tianjin Science and Technology Translation Publishing, Co. Ltd, TianjinGoogle Scholar
  2. 2.
    Stegle O, Teichmann SA, Marioni JC (2015) Computational and analytical challenges in single-cell transcriptomics. Nat Rev Genet 16(3):133–145CrossRefGoogle Scholar
  3. 3.
    Zhang M, Sun H, Fei Z, Zhan F, Gong X, Gao S (2014) Fastq_clean: an optimized pipeline to clean the Illumina sequencing data with quality control. 2014 I.E. international conference on bioinformatics and biomedicine, pp 44–48Google Scholar
  4. 4.
    Ziegenhain C, Vieth B, Parekh S, Reinius B, Guillaumet-Adkins A, Smets M, Leonhardt H, Heyn H, Hellmann I, Enard W (2017) Comparative analysis of single-cell RNA sequencing methods. Mol Cell 65(4):631–643CrossRefGoogle Scholar
  5. 5.
    Gao S, Tian X, Chang H, Sun Y, Wu Z, Cheng Z, Dong P, Zhao Q, Ruan J, Bu W (2017) Two novel lncRNAs discovered in human mitochondrial DNA using PacBio full-length transcriptome data. Mitochondrion.
  6. 6.
    Ilicic T, Kim JK, Kolodziejczyk AA, Bagger FO, Mccarthy DJ, Marioni JC, Teichmann SA (2016) Classification of low quality cells from single-cell RNA-seq data. Genome Biol 17(1):29CrossRefGoogle Scholar
  7. 7.
    Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11(10):R106CrossRefGoogle Scholar
  8. 8.
    Robinson MD, Mccarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140CrossRefGoogle Scholar
  9. 9.
    Zhang Y, Li D, Sun B (2015) Do housekeeping genes exist? PLoS One 10(5):e0123691CrossRefGoogle Scholar
  10. 10.
    Jiang L, Schlesinger F, Davis CA, Zhang Y, Li R, Salit M, Gingeras TR, Oliver B (2011) Synthetic spike-in standards for RNA-seq experiments. Genome Res 21(9):1543–1551CrossRefGoogle Scholar
  11. 11.
    Risso D, Ngai J, Speed TP, Dudoit S (2014) Normalization of RNA-seq data using factor analysis of control genes or samples. Nat Biotechnol 32(9):896–902CrossRefGoogle Scholar
  12. 12.
    Lovén J, Orlando DA, Sigova AA, Lin CY, Rahl PB, Burge CB, Levens DL, Lee TI, Young RA (2012) Revisiting global gene expression analysis. Cell 151(3):476–482CrossRefGoogle Scholar
  13. 13.
    Islam S, Zeisel A, Joost S, La MG, Zajac P, Kasper M, Lönnerberg P, Linnarsson S (2014) Quantitative single-cell RNA-seq with unique molecular identifiers. Nat Methods 11(2):163–166CrossRefGoogle Scholar
  14. 14.
    Lun AT, Bach K, Marioni JC (2016) Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol 17(1):75CrossRefGoogle Scholar
  15. 15.
    Ren Y, Zhang J, Sun Y, Wu Z, Ruan J, He B, Liu G, Gao S, Bu W (2016) Full-length transcriptome sequencing on PacBio platform (in Chinese). Chin Sci Bull 11(61):1250–1254Google Scholar
  16. 16.
    Satija R, Farrell JA, Gennert D, Schier AF, Regev A (2015) Spatial reconstruction of single-cell gene expression data. Nat Biotechnol 33(5):495–502CrossRefGoogle Scholar
  17. 17.
    Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemometr Intell Lab Syst 2(1–3):37–52CrossRefGoogle Scholar
  18. 18.
    Hyvarinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4–5):411–430CrossRefGoogle Scholar
  19. 19.
    Balakrishnama S, Ganapathiraju A (1998) Linear discriminant analysis – a brief tutorial. Procof Intjoint Confon Neural Networks 3(94):387–391Google Scholar
  20. 20.
    Carroll JD, Arabie P (1980) Multidimensional scaling. Annu Rev Psychol 31(31):607–649CrossRefGoogle Scholar
  21. 21.
    Maaten LVD, Hinton G (2008) Viualizing data using t-SNE. J Mach Learn Res 9(2605):2579–2605Google Scholar
  22. 22.
    Levina E, Bickel PJ (2004) Maximum likelihood estimation of intrinsic dimension. Adv Neural Inf Proces Syst 17:777–784Google Scholar
  23. 23.
    Camastra F, Vinciarelli A (2002) Estimating the intrinsic dimension of data with a fractal-based method. IEEE Trans Pattern Anal Mach Intell 24(10):1404–1407CrossRefGoogle Scholar
  24. 24.
    Pettis KW, Bailey TA, Jain AK, Dubes RC (1979) An intrinsic dimensionality estimator from near-neighbor information. IEEE Trans Pattern Anal Mach Intell PAMI-1(1):25–37CrossRefGoogle Scholar
  25. 25.
    Costa JA, Hero AO (2004) Geodesic entropic graphs for dimension and entropy estimation in manifold learning. IEEE Trans Signal Process 52(8):2210–2221CrossRefGoogle Scholar
  26. 26.
    Kfgl B (2002) Intrinsic dimension estimation using packing numbers. Adv Neural Inform Process Syst NIPS-02:697–704Google Scholar
  27. 27.
    Pettit JB, Tomer R, Achim K, Richardson S, Azizi L, Marioni J (2014) Identifying cell types from spatially referenced single-cell expression datasets. PLoS Comput Biol 10(9):e1003824CrossRefGoogle Scholar
  28. 28.
    O'Flaherty JD, Barr M, Fennell D, Richard D, Reynolds J, O'Leary J, O’Byrne K (2012) The cancer stem-cell hypothesis: its emerging role in lung cancer biology and its relevance for future therapy. J Thorac Oncol 7(12):1880–1890CrossRefGoogle Scholar
  29. 29.
    Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2013) Star: ultrafast universal rna-seq aligner. Bioinformatics 29(1):15–21CrossRefGoogle Scholar
  30. 30.
    Wu Z, Liu W, Jin X, Yu D, Wang H, Liu L, Ruan J, Gao S (2018) NormExpression: an R package to normalize gene expression data using evaluated methods. bioRxiv.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.College of Life SciencesNankai UniversityTianjinPeople’s Republic of China
  2. 2.Institute of StatisticsNankai UniversityTianjinPeople’s Republic of China

Personalised recommendations