SIDEseq: A Cell Similarity Measure Defined by Shared Identified Differentially Expressed Genes for Single-Cell RNA sequencing Data
- 415 Downloads
One goal of single-cell RNA sequencing (scRNA seq) is to expose possible heterogeneity within cell populations due to meaningful, biological variation. Examining cell-to-cell heterogeneity, and further, identifying subpopulations of cells based on scRNA seq data has been of common interest in life science research. A key component to successfully identifying cell subpopulations (or clustering cells) is the (dis)similarity measure used to group the cells. In this paper, we introduce a novel measure, named SIDEseq, to assess cell-to-cell similarity using scRNA seq data. SIDEseq first identifies a list of putative differentially expressed (DE) genes for each pair of cells. SIDEseq then integrates the information from all the DE gene lists (corresponding to all pairs of cells) to build a similarity measure between two cells. SIDEseq can be implemented in any clustering algorithm that requires a (dis)similarity matrix. This new measure incorporates information from all cells when evaluating the similarity between any two cells, a characteristic not commonly found in existing (dis)similarity measures. This property is advantageous for two reasons: (a) borrowing information from cells of different subpopulations allows for the investigation of pairwise cell relationships from a global perspective and (b) information from other cells of the same subpopulation could help to ensure a robust relationship assessment. We applied SIDEseq to a newly generated human ovarian cancer scRNA seq dataset, a public human embryo scRNA seq dataset, and several simulated datasets. The clustering results suggest that the SIDEseq measure is capable of uncovering important relationships between cells, and outperforms or at least does as well as several popular (dis)similarity measures when used on these datasets.
Keywordssingle-cell RNA sequencing (scRNA seq) subpopulation identification single-cell clustering similarity measure ovarian cancer EMT inducers (Thrombin, TGFB-1)
Thanks to Sandrine Dudoit and Davide Risso for their help with the remove unwanted variation normalization methods. This work is partially supported by NIH U01 HG007031, NSF DMS-11-60319, NIH 5R21CA182375-01A1, Bakar’s Fellows Program, Strategic Priority Research Program of the Chinese Academy of Sciences [XDB13040700], and the National Natural Science Foundation of China [91529303, 61134013, 91439103].
- 3.ATCC (2010) Passage number effects in cell lines. Tech Bull. https://www.atcc.org/sim/media/PDFs/Technical%20Bulletins/tb07.ashx
- 11.Grun D, Kester L, Van Oudenaarden A (2014) Validation of noise models for single-cell transcriptomics. Nat Am 11(6):637–643Google Scholar
- 13.Jang H et al (2012) Transformation of epithelial ovarian cancer stemlike cells into mesenchymal lineage via EMT results in cellular heterogeneity and supports tumor engraftment. Mol Med 18:1197–1208Google Scholar
- 15.Jiang P et al (2016) Quality control of single-cell RNA-seq by SinQC. Bioinformatics 32(11):1–3Google Scholar
- 19.Mani SA, Guo W, Liao MJ, Eaton EN, Ayyanan A, Zhou AY, Brooks M, Reinhard F, Zhang CC, Shipitsin M, Campbell LL, Polyak K, Brisken C, Yang J, Weinberg RA (2008) The epithelial-mesenchymal transition generates cells with properties of stem cells. Cell 133(4):704–715CrossRefPubMedPubMedCentralGoogle Scholar
- 21.Ozkumur E, Shah AM, Ciciliano JC, Emmink BL, Miyamoto DT, Brachtel E, Yu M, Chen PI, Morgan B, Trautwein J, Kimura A, Sengupta S, Stott SL, Karabacak NM, Barber TA, Walsh JR, Smith K, Spuhler PS, Sullivan JP, Lee RJ, Ting DT, Luo X, Shaw AT, Bardia A, Sequist LV, Louis DN, Maheswaran S, Kapur R, Haber DA, Toner M (2013) Inertial focusing for tumor antigen-dependent and -independent sorting of rare circulating tumor cells. Sci Transl Med 5(179):179ra47CrossRefPubMedPubMedCentralGoogle Scholar
- 27.Xu C, Sui Z (2015) Identification of cell types from single-cell transcriptomes using a novel clustering method., Bioinformatics Advance. Access 31(12):1974–1980Google Scholar
- 29.Zappia L, Phipson B, Oshlack A (2017) splatter: Simple Simulation of Single-cell RNA Sequencing Data. R package version 0.99.10. https://github.com/Oshlack/splatter