Abstract
Clustering of single-cell RNA sequencing (scRNA-seq) data enables discovering cell subtypes, which is helpful for understanding and analyzing the processes of diseases. Determining the weight of edges is an essential component in graph-based clustering methods. While several graph-based clustering algorithms for scRNA-seq data have been proposed, they are generally based on k-nearest neighbor (KNN) and shared nearest neighbor (SNN) without considering the structure information of graph. Here, to improve the clustering accuracy, we present a novel method for single-cell clustering, called structural shared nearest neighbor-Louvain (SSNN-Louvain), which integrates the structure information of graph and module detection. In SSNN-Louvain, based on the distance between a node and its shared nearest neighbors, the weight of edge is defined by introducing the ratio of the number of the shared nearest neighbors to that of nearest neighbors, thus integrating structure information of the graph. Then, a modified Louvain community detection algorithm is proposed and applied to identify modules in the graph. Essentially, each community represents a subtype of cells. It is worth mentioning that our proposed method integrates the advantages of both SNN graph and community detection without the need for tuning any additional parameter other than the number of neighbors. To test the performance of SSNN-Louvain, we compare it to five existing methods on 16 real datasets, including nonnegative matrix factorization, single-cell interpretation via multi-kernel learning, SNN-Cliq, Seurat and PhenoGraph. The experimental results show that our approach achieves the best average performance in these datasets.
Similar content being viewed by others
References
Heath JR, Ribas A, Mischel PS (2016) Single-cell analysis tools for drug discovery and development. Nat Rev Drug Discov 15(3):204–216. https://doi.org/10.1038/nrd.2015.16
Van-Loo P, Voet T (2014) Single cell analysis of cancer genomes. Curr Opin Genet Dev 24(24C):82–91. https://doi.org/10.1016/j.gde.2013.12.004
Chen H, Guo J, Mishra SK, Robson P, Niranjan M, Zheng J (2015) Single-cell transcriptional analysis to uncover regulatory circuits driving cell fate decisions in early mouse development. Bioinformatics 31(7):1060–1066. https://doi.org/10.1093/bioinformatics/btu777
Abualigah LM, Khader AT, Hanandeh ES (2018) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071. https://doi.org/10.1007/s10489-018-1190-6
Abualigah LM, Khader AT, Hanandeh ES (2018) A combination of objective functions and hybrid krill herd algorithm for text document clustering analysis. Eng Appl Artif Intell 73:111–125. https://doi.org/10.1016/j.engappai.2018.05.003
Abualigah LMQ (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Stud Comput Intell. https://doi.org/10.1007/978-3-030-10674-4
Abualigah LM, Khader AT (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73(11):4773–4795. https://doi.org/10.1007/s11227-017-2046-2
Abualigah LM, Khader AT, Hanandeh ES (2017) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput Sci. https://doi.org/10.1016/j.jocs.2017.07.018
Abualigah LMQ, Hanandeh ES (2015) Applying genetic algorithms to information retrieval using vector space model. Int J Comput Sci Eng Appl 5(1):19. https://doi.org/10.5121/ijcsea.2015.5102
Shao C, Hofer T (2016) Robust classification of single-cell transcriptome data by nonnegative matrix factorization. Bioinformatics 33(2):235–242. https://doi.org/10.1093/bioinformatics/btw607
Wang B, Zhu J, Pierson E, Ramazzotti D, Batzoglou S (2017) Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat Methods 14(4):414–419. https://doi.org/10.1038/nmeth.4207
Arvaniti E (2017) Claassen M (2017) Sensitive detection of rare disease-associated cell subsets via representation learning. Nat Commun 8:14825. https://doi.org/10.1038/ncomms14825
Lin P, Troup M, Ho JWK (2017) CIDR: ultrafast and accurate clustering through imputation for single-cell RNA-seq data. Genome Biol 18(1):59. https://doi.org/10.1186/s13059-017-1188-0
Kiselev VY, Kirschner K, Schaub MT, Andrews T, Yiu A, Chandra T, Natarajan KN, Reik W, Barahona M, Green AR, Hemberg M (2017) SC3: consensus clustering of single-cell RNA-seq data. Nat Methods 14(5):483–486. https://doi.org/10.1038/nmeth.4236
Yang Y, Huh R, Houston WC, Lin Y, Michael IL, Li Y (2019) SAFE-clustering: Single-cell Aggregated (from Ensemble) clustering for single-cell RNA-seq data. Bioinformatics 35(8):1269–1277. https://doi.org/10.1093/bioinformatics/bty793
Duò A, Robinson MD, Soneson C (2018) A systematic performance evaluation of clustering methods for single-cell RNA-seq data. Research 7:1141. https://doi.org/10.12688/f1000research.15666.2
Xu C, Su Z (2015) Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31(12):1974–1980. https://doi.org/10.1093/bioinformatics/btv088
Rahul S, Jeffrey AF, David G, Alexander FX, Aviv R (2015) Spatial reconstruction of single-cell gene expression data. Nat Biotechnol 33(5):495. https://doi.org/10.1038/nbt.3192
Jacob HL, Erin FS, Sean CB et al (2015) Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162:184–197. https://doi.org/10.1016/j.cell.2015.05.047
Llorens-Bobadilla E, Zhao S, Baser A, Saiz-Castro G, Zwadlo K, Martin-Villalba A (2015) Single-cell transcriptomics reveals a population of dormant neural stem cells that become activated upon brain Injury. Cell Stem Cell 17(3):329–340. https://doi.org/10.1016/j.stem.2015.07.002
Shekhar K, Lapan SW, Whitney IE, Tran NM, Macosko EZ, Kowalczyk M, Adiconis X, Levin JZ, Nemesh J, Goldman M, McCarroll SA, Cepko CL, Regev A, Sanes JR (2016) Comprehensive classification of retinal bipolar neurons by single-cell transcriptomics. Cell 166(5):1308–2132. https://doi.org/10.1016/j.cell.2016.07.054
Lee HC, Kosoy R, Becker CE, Dudley JT, Kidd BA (2017) Automated cell type discovery and classification through knowledge transfer. Bioinformatics 33(11):1689–1695. https://doi.org/10.1093/bioinformatics/btx054
Qiu Y, Li R, Li J, Qiao S, Wang G, Yu JX (2018) Efficient Structural Clustering on Probabilistic Graphs. IEEE T Knowl Data En. https://doi.org/10.1109/TKDE.2018.2872553
Houle ME, Kriegel HP, Kroger P, Schubert E, Zimek A (2010) Can shared-neighbor distances defeat the curse of dimensionality. Int Conf Sci Stat Database Manag. https://doi.org/10.1007/978-3-642-13818-8_34
Fortunato S (2009) Community detection in graphs. Phys Rep 486(3):75–174. https://doi.org/10.1016/j.physrep.2009.11.002
Rubinov M, Sporns O (2010) Complex network measures of brain connectivity: uses and interpretations. NeuroImage 52(3):1059–1069. https://doi.org/10.1016/j.neuroimage.2009.10.003
Newman ME (2006) Modularity and community structure in networks. PNAS 103(23):8577–8582. https://doi.org/10.1073/pnas.0601602103
Newman MEJ (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104. https://doi.org/10.1103/PhysRevE.74.036104
Wang GX, Shen Y, Luan E (2008) A measure of centrality based on modularity matrix. Prog Nat Sci 18(8):1043–1047. https://doi.org/10.1016/j.pnsc.2008.03.015
Que X, Checconi F, Petrini F (2015) Scalable community detection with the Louvain algorithm. IEEE Int Parallel Distrib Process Symp. https://doi.org/10.1109/IPDPS.2015.59
Aittokallio T, Schwikowski B (2006) Graph-based methods for analysing networks in cell biology. Brief Bioinform 7(3):243–255. https://doi.org/10.1093/bib/bbl022
Boudin F (2013) A comparison of centrality measures for graph-based keyphrase extraction. Proc Sixth Int Joint Conf Nat Lang Process 834–838. https://www.aclweb.org/anthology/I13-1102
Tesmer M, Perez CA, Zurada JM (2009) Normalized mutual information feature selection. IEEE T Neural Networks 20(2):189–201. https://doi.org/10.1109/TNN.2008.2005601
Vinh LT, Lee S, Park YT, Auriol BJD (2012) A novel feature selection method based on normalized mutual information. Appl Intell 37(1):100–120. https://doi.org/10.1007/s10489-011-0315-y
Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI (2015) Machine learning applications in cancer prognosis and prediction. Comput Struct Biotec 13:8–17. https://doi.org/10.1016/j.csbj.2014.11.005
Zhu X, Li HD, Guo L, Wu FX, Wang JX (2019) Analysis of single-cell RNA-seq data by clustering approaches. Current Bioinformatics 14:314–322. https://doi.org/10.2174/1574893614666181120095038
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218. https://doi.org/10.1007/BF01908075
Biase FH, Cao X, Zhong S (2014) Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing. Genome Res 24(11):1787–1796. https://doi.org/10.1101/gr.177725.114
Yan L, Yang M, Guo H, Yang L, Wu J, Li R, Liu P, Lian Y, Zheng X, Yan J, Huang J, Li M, Wu X, Wen L, Lao K, Li R, Qiao J, Tang F (2013) Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells. Nat Struct Mol Biol 20(9):1131–1139. https://doi.org/10.1038/nsmb.2660
Deng Q, Ramskld D, Reinius B, Sandberg R (2014) Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343(6167):193–196. https://doi.org/10.1126/science.1245316
Pollen AA, Nowakowski TJ, Shuga J, Wang XH, Leyrat AA, Lui JH, Li N, Szpankowski L, Fowler B, Chen P, Ramalingam N, Sun G, Thu M, Norris M, Lebofsky R, Toppani D, Kemp DW, Wong M, Clerkson B, Jones BN, Wu S, Knutsson L, Alvarado B, Wang J, Weaver LS, May AP, Jones RC, Unger MA, Kriegstein AR, West JAA (2014) Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex. Nat Biotechnol 32(10):1053–1058. https://doi.org/10.1038/nbt.2967
Treutlein B, Brownfield DG, Wu AR, Neff NF, Mantalas GL, Espinoza FH, Desai TJ, Krasnow MA, Quake SR (2014) Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509(7500):371–375. https://doi.org/10.1038/nature13173
Patel AP, Tirosh I, Trombetta JJ, Shalek AK, Gillespie SM, Wakimoto H, Cahill DP, Nahed BV, Curry WT, Martuza RL, Louis DN, Rozenblatt-Rosen O, Suva ML, Regev A, Bernstein BE (2014) Single-cell rna-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344(6190):401–1396. https://doi.org/10.1126/science.1254257
Chung W, Eum HH, Lee HO, Lee KM, Lee HB, Kim KT, Ryu HS, Kim S, Lee JE, Park YH, Kan Z, Han W, Park WY (2017) Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer. Nat Commun 8:15081. https://doi.org/10.1038/ncomms15081
Usoskin D, Furlan A, Islam S, Abdo H, Lnnerberg P, Lou D, Hjerling-Leffler J, Haeggstrm J, Kharchenko O, Kharchenko PV, Linnarsson S, Ernfors P (2015) Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing. Nat Neurosci 18(1):53–145. https://doi.org/10.1038/nn.3881
Song Y, Botvinnik OB, Lovci MT, Kakaradov B, Liu P, Xu JL, Yeo GW (2017) Single-cell alternative splicing analysis with expedition reveals splicing dynamics during neuron differentiation. Mol Cell 67(1):148. https://doi.org/10.1016/j.molcel.2017.06.003
Kolodziejczyk AA, Kim JK, Tsang JC, Ilicic T, Henriksson J, Natarajan KN, Tuck AC, Gao X, Buhler M, Liu P, Marioni JC, Teichmann SA (2015) Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell 17(4):471–485. https://doi.org/10.1016/j.stem.2015.09.011
Ting DT, Wittner BS, Ligorio M, Jordan NV, Shah AM, Miyamoto DT, Aceto N, Bersani F, Brannigan BW, Xega K, Ciciliano JC, Zhu H, MacKenzie OC, Trautwein J, Arora KS, Shahid M, Ellis HL, Qu N, Haber DA (2014) Single-cell RNA sequencing identifies extracellular matrix gene expression by pancreatic circulating tumor cells. Cell Rep 8(6):1905–1918. https://doi.org/10.1016/j.celrep.2014.08.029
Goolam M, Scialdone A, Graham SJL, Macaulay IC, Jedrusik A, Hupalowska A, Voet T, Marioni JC, Zernicka-Goetz M (2016) Heterogeneity in Oct4 and Sox2 targets biases cell fate in 4-cell mouse embryos. Cell 165(1):61–74. https://doi.org/10.1016/j.cell.2016.01.047
Sasagawa Y, Nikaido I, Hayashi T, Danno H, Uno KD, Imai T, Ueda HR (2013) Quartz-Seq: a highly reproducible and sensitive single-cell RNA sequencing method, reveals nongenetic gene-expression heterogeneity. Genome Biol 14(4):3097. https://doi.org/10.1186/gb-2013-14-4-r31
Ramskold D, Luo S, Wang YC, Li R, Deng Q, Faridani OR, Daniels GA, Khrebtukova I, Loring JF, Laurent LC, Schroth GP, Sandberg R (2012) Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol 30(8):777–782. https://doi.org/10.1038/nbt.2282
Engel I, Seumois G, Chavez L, Samaniego-Castruita D, White B, Chawla A, Mock D, Vijayanand P, Kronenberg M (2016) Innate-like functions of natural killer T cell subsets result from highly divergent gene programs. Nat Immunol 17(6):728–739. https://doi.org/10.1038/ni.3437
Kimmerling RJ, Lee-Szeto G, Li JW, Genshaft AS, Kazer SW, Payer KR, De-Riba-Borrajo J, Blainey PC, Irvine DJ, Shalek AK, Manalis SR (2016) A microfluidic platform enabling single-cell RNA-seq of multigenerational lineages. Nat Commun 7:10220. https://doi.org/10.1038/ncomms10220
Vento-Tormo R, Efremova M, Botting RA et al (2018) Single-cell reconstruction of the early maternal–fetal interface in humans. Nature 563:347–353. https://doi.org/10.1038/s41586-018-0698-6
Darmanis S, Sloan SA, Zhang Y, Enge M, Caneda C, Shuer LM, Hayden-Gephart MG, Barres BA, Quake SR (2015) A survey of human brain transcriptome diversity at the single cell level. PNAS 112(23):7285–7290. https://doi.org/10.1073/pnas.1507125112
Shin J, Berg DA, Zhu YH, Shin JY, Song J, Bonaguidi MA, Enikolopov G, Nauen DW, Christian KM, Ming GL, Song HJ (2015) Single-cell RNA-seq with waterfall reveals molecular cascades underlying adult neurogenesis. Cell Stem Cell 17(3):360–372. https://doi.org/10.1016/j.stem.2015.07.013
Xu Y, Li HD, Pan Y, Luo F, Wang JX (2019) A gene rank based approach for single cell similarity assessment and clustering. IEEE/ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2019.2931582
Zheng R, Li M, Liang Z, Wu FX, Pan Y, Wang JX (2019) SinNLRR: a robust subspace clustering method for cell type detection by non-negative and low-rank representation. Bioinformatics 35(19):3642–3650. https://doi.org/10.1093/bioinformatics/btz139
Acknowledgements
This research is supported by the National Natural Science Foundation of China (No. 61762087, 61702555, 61662028), Hunan Provincial Science and Technology Program (No. 2018WK4001), Guangxi Natural Science Foundation (No. 2018JJA170175), 111 Project (No. B18059) and Project of Yulin Normal University (No. 2017YJKY21). This paper was accepted by the CBC2019 conference, and we thank the committee of CBC2019 for their recommendation of this article to Interdisciplinary Sciences: Computational Life Sciences.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zhu, X., Zhang, J., Xu, Y. et al. Single-Cell Clustering Based on Shared Nearest Neighbor and Graph Partitioning. Interdiscip Sci Comput Life Sci 12, 117–130 (2020). https://doi.org/10.1007/s12539-019-00357-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-019-00357-4