Statistics in Biosciences

, Volume 9, Issue 1, pp 200–216 | Cite as

SIDEseq: A Cell Similarity Measure Defined by Shared Identified Differentially Expressed Genes for Single-Cell RNA sequencing Data

  • Courtney Schiffman
  • Christina Lin
  • Funan Shi
  • Luonan Chen
  • Lydia Sohn
  • Haiyan HuangEmail author


One goal of single-cell RNA sequencing (scRNA seq) is to expose possible heterogeneity within cell populations due to meaningful, biological variation. Examining cell-to-cell heterogeneity, and further, identifying subpopulations of cells based on scRNA seq data has been of common interest in life science research. A key component to successfully identifying cell subpopulations (or clustering cells) is the (dis)similarity measure used to group the cells. In this paper, we introduce a novel measure, named SIDEseq, to assess cell-to-cell similarity using scRNA seq data. SIDEseq first identifies a list of putative differentially expressed (DE) genes for each pair of cells. SIDEseq then integrates the information from all the DE gene lists (corresponding to all pairs of cells) to build a similarity measure between two cells. SIDEseq can be implemented in any clustering algorithm that requires a (dis)similarity matrix. This new measure incorporates information from all cells when evaluating the similarity between any two cells, a characteristic not commonly found in existing (dis)similarity measures. This property is advantageous for two reasons: (a) borrowing information from cells of different subpopulations allows for the investigation of pairwise cell relationships from a global perspective and (b) information from other cells of the same subpopulation could help to ensure a robust relationship assessment. We applied SIDEseq to a newly generated human ovarian cancer scRNA seq dataset, a public human embryo scRNA seq dataset, and several simulated datasets. The clustering results suggest that the SIDEseq measure is capable of uncovering important relationships between cells, and outperforms or at least does as well as several popular (dis)similarity measures when used on these datasets.


single-cell RNA sequencing (scRNA seq) subpopulation identification single-cell clustering similarity measure ovarian cancer EMT inducers (Thrombin, TGFB-1) 



Thanks to Sandrine Dudoit and Davide Risso for their help with the remove unwanted variation normalization methods. This work is partially supported by NIH U01 HG007031, NSF DMS-11-60319, NIH 5R21CA182375-01A1, Bakar’s Fellows Program, Strategic Priority Research Program of the Chinese Academy of Sciences [XDB13040700], and the National Natural Science Foundation of China [91529303, 61134013, 91439103].

Supplementary material

12561_2017_9194_MOESM1_ESM.docx (3.2 mb)
Supplementary material 1 (docx 3241 KB)


  1. 1.
    Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11:R106CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Asiedu MK, Beauchamp-Perez FD, Ingle JN, Behrens MD, Radisky DC, Knutson KL (2014) AXL induces epithelial-to-mesenchymal transition and regulates the function of breast cancer stem cells. Oncogene 33(10):1316–1324CrossRefPubMedGoogle Scholar
  3. 3.
    ATCC (2010) Passage number effects in cell lines. Tech Bull.
  4. 4.
    Bullard J et al (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinform 11:94CrossRefGoogle Scholar
  5. 5.
    Eberwine J et al (2014) The promise of single-cell sequencing. Nat Methods 11:25–27CrossRefPubMedGoogle Scholar
  6. 6.
    Eisenberg E et al (2013) Human housekeeping genes revisited. Trends Genet 29(10):569–574CrossRefPubMedGoogle Scholar
  7. 7.
    Gorges TM, Pantel K (2013) Circulating tumor cells as therapy-related biomarkers in cancer patients. Cancer Immunol Immunother 62:931–939CrossRefPubMedGoogle Scholar
  8. 8.
    Gorges TM, Tinhofer I, Drosch M, Rose L, Zollner TM, Krahn T, von Ahsen O (2012) Circulating tumour cells escape from EpCAM-based detection due to epithelial-to-mesenchymal transition. BMC Cancer 12:178CrossRefPubMedPubMedCentralGoogle Scholar
  9. 9.
    Gou WF et al (2014) The role of RhoC in epithelial-to-mesenchymal transition of ovarian carcinoma cells. BMC Cancer 14:477CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Grun D et al (2015) Digital synthesis of plucked-string and drum timbres. Nature 525:251–255ADSCrossRefPubMedGoogle Scholar
  11. 11.
    Grun D, Kester L, Van Oudenaarden A (2014) Validation of noise models for single-cell transcriptomics. Nat Am 11(6):637–643Google Scholar
  12. 12.
    Hansen KD (2012) Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics 13(2):204–216CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Jang H et al (2012) Transformation of epithelial ovarian cancer stemlike cells into mesenchymal lineage via EMT results in cellular heterogeneity and supports tumor engraftment. Mol Med 18:1197–1208Google Scholar
  14. 14.
    Jiang L et al (2016) GiniClust: detecting rare cell types from single-cell gene expression data with Gini Index. Genome Biol 17:144CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Jiang P et al (2016) Quality control of single-cell RNA-seq by SinQC. Bioinformatics 32(11):1–3Google Scholar
  16. 16.
    Kasimir-Bauer S, Hoffmann O, Wallwiener D, Kimmig R, Fehm T (2012) Expression of stem cell and epithelial-mesenchymal transition markers in primary breast cancer patients with circulating tumor cells. Breast Cancer Res 14(1):R15CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Levine J et al (2015) Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162:184–197CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Lin H-K et al (2003) Suppression versus induction of androgen receptor functions by the phosphatidylinositol 3-kinase/Akt pathway in prostate cancer LNCaP cells with different passage numbers. J Biol Chem 51:50902–50907CrossRefGoogle Scholar
  19. 19.
    Mani SA, Guo W, Liao MJ, Eaton EN, Ayyanan A, Zhou AY, Brooks M, Reinhard F, Zhang CC, Shipitsin M, Campbell LL, Polyak K, Brisken C, Yang J, Weinberg RA (2008) The epithelial-mesenchymal transition generates cells with properties of stem cells. Cell 133(4):704–715CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    O’Driscoll L et al (2006) Phenotypic and global gene expression profile changes between low passage and high passage MIN-6 cells. J Endocrinol 191:665–676CrossRefPubMedGoogle Scholar
  21. 21.
    Ozkumur E, Shah AM, Ciciliano JC, Emmink BL, Miyamoto DT, Brachtel E, Yu M, Chen PI, Morgan B, Trautwein J, Kimura A, Sengupta S, Stott SL, Karabacak NM, Barber TA, Walsh JR, Smith K, Spuhler PS, Sullivan JP, Lee RJ, Ting DT, Luo X, Shaw AT, Bardia A, Sequist LV, Louis DN, Maheswaran S, Kapur R, Haber DA, Toner M (2013) Inertial focusing for tumor antigen-dependent and -independent sorting of rare circulating tumor cells. Sci Transl Med 5(179):179ra47CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Ramsköld D et al (2012) Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol 30:777–782CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Risso D et al (2014) Normalization of RNA-seq data using factor analysis of control genes or samples. Nat Biotechnol 32(9):896–902CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Sandberg R (2014) Entering the era of single-cell transcriptomics in biology and medicine. Nat Methods 11:22–24CrossRefPubMedGoogle Scholar
  25. 25.
    Stegle O, Teichmann SA, Marioni JC (2015) Computational and analytical challenges in single-cell transcriptomics. Nat Rev Genet 16:133–145CrossRefPubMedGoogle Scholar
  26. 26.
    Stegle O et al (2012) Using Probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analysis. Nat Protoc 7(3):500–507CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Xu C, Sui Z (2015) Identification of cell types from single-cell transcriptomes using a novel clustering method., Bioinformatics Advance. Access 31(12):1974–1980Google Scholar
  28. 28.
    Yan L et al (2013) Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells. Nat Struct Mol Biol 20:1131–1139CrossRefPubMedGoogle Scholar
  29. 29.
    Zappia L, Phipson B, Oshlack A (2017) splatter: Simple Simulation of Single-cell RNA Sequencing Data. R package version 0.99.10.
  30. 30.
    Zeisel A (2015) Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science 347(6226,):1138–1142ADSCrossRefPubMedGoogle Scholar
  31. 31.
    Zhong Y-C (2013) Thrombin promotes epithelial ovarian cancer cell invasion by inducing epithelial-mesenchymal transition. J Gynecol Oncol 24(3):265–272CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© International Chinese Statistical Association 2017

Authors and Affiliations

  • Courtney Schiffman
    • 1
  • Christina Lin
    • 2
  • Funan Shi
    • 3
  • Luonan Chen
    • 4
  • Lydia Sohn
    • 5
  • Haiyan Huang
    • 3
    Email author
  1. 1.Department of BiostatisticsUC BerkeleyBerkeleyUSA
  2. 2.Department of Chemical Biology and Department of Molecular and Cellular BiologyUC BerkeleyBerkeleyUSA
  3. 3.Department of StatisticsUC BerkeleyBerkeleyUSA
  4. 4.Key Laboratory of Systems Biology, Innovation Center for Cell Signaling Network, Institute of Biochemistry and Cell BiologyShanghai Institutes for Biological Sciences, Chinese Academy of SciencesShanghaiChina
  5. 5.Department of Mechanical EngineeringUC BerkeleyBerkeleyUSA

Personalised recommendations