Skip to main content
Log in

Single-Cell Clustering Based on Shared Nearest Neighbor and Graph Partitioning

  • Original research article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Clustering of single-cell RNA sequencing (scRNA-seq) data enables discovering cell subtypes, which is helpful for understanding and analyzing the processes of diseases. Determining the weight of edges is an essential component in graph-based clustering methods. While several graph-based clustering algorithms for scRNA-seq data have been proposed, they are generally based on k-nearest neighbor (KNN) and shared nearest neighbor (SNN) without considering the structure information of graph. Here, to improve the clustering accuracy, we present a novel method for single-cell clustering, called structural shared nearest neighbor-Louvain (SSNN-Louvain), which integrates the structure information of graph and module detection. In SSNN-Louvain, based on the distance between a node and its shared nearest neighbors, the weight of edge is defined by introducing the ratio of the number of the shared nearest neighbors to that of nearest neighbors, thus integrating structure information of the graph. Then, a modified Louvain community detection algorithm is proposed and applied to identify modules in the graph. Essentially, each community represents a subtype of cells. It is worth mentioning that our proposed method integrates the advantages of both SNN graph and community detection without the need for tuning any additional parameter other than the number of neighbors. To test the performance of SSNN-Louvain, we compare it to five existing methods on 16 real datasets, including nonnegative matrix factorization, single-cell interpretation via multi-kernel learning, SNN-Cliq, Seurat and PhenoGraph. The experimental results show that our approach achieves the best average performance in these datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Heath JR, Ribas A, Mischel PS (2016) Single-cell analysis tools for drug discovery and development. Nat Rev Drug Discov 15(3):204–216. https://doi.org/10.1038/nrd.2015.16

    Article  CAS  PubMed  Google Scholar 

  2. Van-Loo P, Voet T (2014) Single cell analysis of cancer genomes. Curr Opin Genet Dev 24(24C):82–91. https://doi.org/10.1016/j.gde.2013.12.004

    Article  CAS  PubMed  Google Scholar 

  3. Chen H, Guo J, Mishra SK, Robson P, Niranjan M, Zheng J (2015) Single-cell transcriptional analysis to uncover regulatory circuits driving cell fate decisions in early mouse development. Bioinformatics 31(7):1060–1066. https://doi.org/10.1093/bioinformatics/btu777

    Article  CAS  PubMed  Google Scholar 

  4. Abualigah LM, Khader AT, Hanandeh ES (2018) Hybrid clustering analysis using improved krill herd algorithm. Appl Intell 48(11):4047–4071. https://doi.org/10.1007/s10489-018-1190-6

    Article  Google Scholar 

  5. Abualigah LM, Khader AT, Hanandeh ES (2018) A combination of objective functions and hybrid krill herd algorithm for text document clustering analysis. Eng Appl Artif Intell 73:111–125. https://doi.org/10.1016/j.engappai.2018.05.003

    Article  Google Scholar 

  6. Abualigah LMQ (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Stud Comput Intell. https://doi.org/10.1007/978-3-030-10674-4

    Article  Google Scholar 

  7. Abualigah LM, Khader AT (2017) Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. J Supercomput 73(11):4773–4795. https://doi.org/10.1007/s11227-017-2046-2

    Article  Google Scholar 

  8. Abualigah LM, Khader AT, Hanandeh ES (2017) A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J Comput Sci. https://doi.org/10.1016/j.jocs.2017.07.018

    Article  Google Scholar 

  9. Abualigah LMQ, Hanandeh ES (2015) Applying genetic algorithms to information retrieval using vector space model. Int J Comput Sci Eng Appl 5(1):19. https://doi.org/10.5121/ijcsea.2015.5102

    Article  Google Scholar 

  10. Shao C, Hofer T (2016) Robust classification of single-cell transcriptome data by nonnegative matrix factorization. Bioinformatics 33(2):235–242. https://doi.org/10.1093/bioinformatics/btw607

    Article  CAS  PubMed  Google Scholar 

  11. Wang B, Zhu J, Pierson E, Ramazzotti D, Batzoglou S (2017) Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat Methods 14(4):414–419. https://doi.org/10.1038/nmeth.4207

    Article  CAS  PubMed  Google Scholar 

  12. Arvaniti E (2017) Claassen M (2017) Sensitive detection of rare disease-associated cell subsets via representation learning. Nat Commun 8:14825. https://doi.org/10.1038/ncomms14825

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Lin P, Troup M, Ho JWK (2017) CIDR: ultrafast and accurate clustering through imputation for single-cell RNA-seq data. Genome Biol 18(1):59. https://doi.org/10.1186/s13059-017-1188-0

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Kiselev VY, Kirschner K, Schaub MT, Andrews T, Yiu A, Chandra T, Natarajan KN, Reik W, Barahona M, Green AR, Hemberg M (2017) SC3: consensus clustering of single-cell RNA-seq data. Nat Methods 14(5):483–486. https://doi.org/10.1038/nmeth.4236

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Yang Y, Huh R, Houston WC, Lin Y, Michael IL, Li Y (2019) SAFE-clustering: Single-cell Aggregated (from Ensemble) clustering for single-cell RNA-seq data. Bioinformatics 35(8):1269–1277. https://doi.org/10.1093/bioinformatics/bty793

    Article  CAS  PubMed  Google Scholar 

  16. Duò A, Robinson MD, Soneson C (2018) A systematic performance evaluation of clustering methods for single-cell RNA-seq data. Research 7:1141. https://doi.org/10.12688/f1000research.15666.2

    Article  Google Scholar 

  17. Xu C, Su Z (2015) Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31(12):1974–1980. https://doi.org/10.1093/bioinformatics/btv088

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Rahul S, Jeffrey AF, David G, Alexander FX, Aviv R (2015) Spatial reconstruction of single-cell gene expression data. Nat Biotechnol 33(5):495. https://doi.org/10.1038/nbt.3192

    Article  CAS  Google Scholar 

  19. Jacob HL, Erin FS, Sean CB et al (2015) Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162:184–197. https://doi.org/10.1016/j.cell.2015.05.047

    Article  CAS  Google Scholar 

  20. Llorens-Bobadilla E, Zhao S, Baser A, Saiz-Castro G, Zwadlo K, Martin-Villalba A (2015) Single-cell transcriptomics reveals a population of dormant neural stem cells that become activated upon brain Injury. Cell Stem Cell 17(3):329–340. https://doi.org/10.1016/j.stem.2015.07.002

    Article  CAS  PubMed  Google Scholar 

  21. Shekhar K, Lapan SW, Whitney IE, Tran NM, Macosko EZ, Kowalczyk M, Adiconis X, Levin JZ, Nemesh J, Goldman M, McCarroll SA, Cepko CL, Regev A, Sanes JR (2016) Comprehensive classification of retinal bipolar neurons by single-cell transcriptomics. Cell 166(5):1308–2132. https://doi.org/10.1016/j.cell.2016.07.054

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Lee HC, Kosoy R, Becker CE, Dudley JT, Kidd BA (2017) Automated cell type discovery and classification through knowledge transfer. Bioinformatics 33(11):1689–1695. https://doi.org/10.1093/bioinformatics/btx054

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Qiu Y, Li R, Li J, Qiao S, Wang G, Yu JX (2018) Efficient Structural Clustering on Probabilistic Graphs. IEEE T Knowl Data En. https://doi.org/10.1109/TKDE.2018.2872553

    Article  Google Scholar 

  24. Houle ME, Kriegel HP, Kroger P, Schubert E, Zimek A (2010) Can shared-neighbor distances defeat the curse of dimensionality. Int Conf Sci Stat Database Manag. https://doi.org/10.1007/978-3-642-13818-8_34

    Article  Google Scholar 

  25. Fortunato S (2009) Community detection in graphs. Phys Rep 486(3):75–174. https://doi.org/10.1016/j.physrep.2009.11.002

    Article  Google Scholar 

  26. Rubinov M, Sporns O (2010) Complex network measures of brain connectivity: uses and interpretations. NeuroImage 52(3):1059–1069. https://doi.org/10.1016/j.neuroimage.2009.10.003

    Article  PubMed  Google Scholar 

  27. Newman ME (2006) Modularity and community structure in networks. PNAS 103(23):8577–8582. https://doi.org/10.1073/pnas.0601602103

    Article  CAS  PubMed  Google Scholar 

  28. Newman MEJ (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104. https://doi.org/10.1103/PhysRevE.74.036104

    Article  CAS  Google Scholar 

  29. Wang GX, Shen Y, Luan E (2008) A measure of centrality based on modularity matrix. Prog Nat Sci 18(8):1043–1047. https://doi.org/10.1016/j.pnsc.2008.03.015

    Article  Google Scholar 

  30. Que X, Checconi F, Petrini F (2015) Scalable community detection with the Louvain algorithm. IEEE Int Parallel Distrib Process Symp. https://doi.org/10.1109/IPDPS.2015.59

    Article  Google Scholar 

  31. Aittokallio T, Schwikowski B (2006) Graph-based methods for analysing networks in cell biology. Brief Bioinform 7(3):243–255. https://doi.org/10.1093/bib/bbl022

    Article  CAS  PubMed  Google Scholar 

  32. Boudin F (2013) A comparison of centrality measures for graph-based keyphrase extraction. Proc Sixth Int Joint Conf Nat Lang Process 834–838. https://www.aclweb.org/anthology/I13-1102

  33. Tesmer M, Perez CA, Zurada JM (2009) Normalized mutual information feature selection. IEEE T Neural Networks 20(2):189–201. https://doi.org/10.1109/TNN.2008.2005601

    Article  Google Scholar 

  34. Vinh LT, Lee S, Park YT, Auriol BJD (2012) A novel feature selection method based on normalized mutual information. Appl Intell 37(1):100–120. https://doi.org/10.1007/s10489-011-0315-y

    Article  Google Scholar 

  35. Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI (2015) Machine learning applications in cancer prognosis and prediction. Comput Struct Biotec 13:8–17. https://doi.org/10.1016/j.csbj.2014.11.005

    Article  CAS  Google Scholar 

  36. Zhu X, Li HD, Guo L, Wu FX, Wang JX (2019) Analysis of single-cell RNA-seq data by clustering approaches. Current Bioinformatics 14:314–322. https://doi.org/10.2174/1574893614666181120095038

    Article  CAS  Google Scholar 

  37. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218. https://doi.org/10.1007/BF01908075

    Article  Google Scholar 

  38. Biase FH, Cao X, Zhong S (2014) Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing. Genome Res 24(11):1787–1796. https://doi.org/10.1101/gr.177725.114

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Yan L, Yang M, Guo H, Yang L, Wu J, Li R, Liu P, Lian Y, Zheng X, Yan J, Huang J, Li M, Wu X, Wen L, Lao K, Li R, Qiao J, Tang F (2013) Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells. Nat Struct Mol Biol 20(9):1131–1139. https://doi.org/10.1038/nsmb.2660

    Article  CAS  PubMed  Google Scholar 

  40. Deng Q, Ramskld D, Reinius B, Sandberg R (2014) Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 343(6167):193–196. https://doi.org/10.1126/science.1245316

    Article  CAS  PubMed  Google Scholar 

  41. Pollen AA, Nowakowski TJ, Shuga J, Wang XH, Leyrat AA, Lui JH, Li N, Szpankowski L, Fowler B, Chen P, Ramalingam N, Sun G, Thu M, Norris M, Lebofsky R, Toppani D, Kemp DW, Wong M, Clerkson B, Jones BN, Wu S, Knutsson L, Alvarado B, Wang J, Weaver LS, May AP, Jones RC, Unger MA, Kriegstein AR, West JAA (2014) Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex. Nat Biotechnol 32(10):1053–1058. https://doi.org/10.1038/nbt.2967

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Treutlein B, Brownfield DG, Wu AR, Neff NF, Mantalas GL, Espinoza FH, Desai TJ, Krasnow MA, Quake SR (2014) Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature 509(7500):371–375. https://doi.org/10.1038/nature13173

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Patel AP, Tirosh I, Trombetta JJ, Shalek AK, Gillespie SM, Wakimoto H, Cahill DP, Nahed BV, Curry WT, Martuza RL, Louis DN, Rozenblatt-Rosen O, Suva ML, Regev A, Bernstein BE (2014) Single-cell rna-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344(6190):401–1396. https://doi.org/10.1126/science.1254257

    Article  CAS  Google Scholar 

  44. Chung W, Eum HH, Lee HO, Lee KM, Lee HB, Kim KT, Ryu HS, Kim S, Lee JE, Park YH, Kan Z, Han W, Park WY (2017) Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer. Nat Commun 8:15081. https://doi.org/10.1038/ncomms15081

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Usoskin D, Furlan A, Islam S, Abdo H, Lnnerberg P, Lou D, Hjerling-Leffler J, Haeggstrm J, Kharchenko O, Kharchenko PV, Linnarsson S, Ernfors P (2015) Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing. Nat Neurosci 18(1):53–145. https://doi.org/10.1038/nn.3881

    Article  CAS  Google Scholar 

  46. Song Y, Botvinnik OB, Lovci MT, Kakaradov B, Liu P, Xu JL, Yeo GW (2017) Single-cell alternative splicing analysis with expedition reveals splicing dynamics during neuron differentiation. Mol Cell 67(1):148. https://doi.org/10.1016/j.molcel.2017.06.003

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Kolodziejczyk AA, Kim JK, Tsang JC, Ilicic T, Henriksson J, Natarajan KN, Tuck AC, Gao X, Buhler M, Liu P, Marioni JC, Teichmann SA (2015) Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell 17(4):471–485. https://doi.org/10.1016/j.stem.2015.09.011

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Ting DT, Wittner BS, Ligorio M, Jordan NV, Shah AM, Miyamoto DT, Aceto N, Bersani F, Brannigan BW, Xega K, Ciciliano JC, Zhu H, MacKenzie OC, Trautwein J, Arora KS, Shahid M, Ellis HL, Qu N, Haber DA (2014) Single-cell RNA sequencing identifies extracellular matrix gene expression by pancreatic circulating tumor cells. Cell Rep 8(6):1905–1918. https://doi.org/10.1016/j.celrep.2014.08.029

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Goolam M, Scialdone A, Graham SJL, Macaulay IC, Jedrusik A, Hupalowska A, Voet T, Marioni JC, Zernicka-Goetz M (2016) Heterogeneity in Oct4 and Sox2 targets biases cell fate in 4-cell mouse embryos. Cell 165(1):61–74. https://doi.org/10.1016/j.cell.2016.01.047

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Sasagawa Y, Nikaido I, Hayashi T, Danno H, Uno KD, Imai T, Ueda HR (2013) Quartz-Seq: a highly reproducible and sensitive single-cell RNA sequencing method, reveals nongenetic gene-expression heterogeneity. Genome Biol 14(4):3097. https://doi.org/10.1186/gb-2013-14-4-r31

    Article  CAS  Google Scholar 

  51. Ramskold D, Luo S, Wang YC, Li R, Deng Q, Faridani OR, Daniels GA, Khrebtukova I, Loring JF, Laurent LC, Schroth GP, Sandberg R (2012) Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol 30(8):777–782. https://doi.org/10.1038/nbt.2282

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Engel I, Seumois G, Chavez L, Samaniego-Castruita D, White B, Chawla A, Mock D, Vijayanand P, Kronenberg M (2016) Innate-like functions of natural killer T cell subsets result from highly divergent gene programs. Nat Immunol 17(6):728–739. https://doi.org/10.1038/ni.3437

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Kimmerling RJ, Lee-Szeto G, Li JW, Genshaft AS, Kazer SW, Payer KR, De-Riba-Borrajo J, Blainey PC, Irvine DJ, Shalek AK, Manalis SR (2016) A microfluidic platform enabling single-cell RNA-seq of multigenerational lineages. Nat Commun 7:10220. https://doi.org/10.1038/ncomms10220

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Vento-Tormo R, Efremova M, Botting RA et al (2018) Single-cell reconstruction of the early maternal–fetal interface in humans. Nature 563:347–353. https://doi.org/10.1038/s41586-018-0698-6

    Article  CAS  PubMed  Google Scholar 

  55. Darmanis S, Sloan SA, Zhang Y, Enge M, Caneda C, Shuer LM, Hayden-Gephart MG, Barres BA, Quake SR (2015) A survey of human brain transcriptome diversity at the single cell level. PNAS 112(23):7285–7290. https://doi.org/10.1073/pnas.1507125112

    Article  CAS  PubMed  Google Scholar 

  56. Shin J, Berg DA, Zhu YH, Shin JY, Song J, Bonaguidi MA, Enikolopov G, Nauen DW, Christian KM, Ming GL, Song HJ (2015) Single-cell RNA-seq with waterfall reveals molecular cascades underlying adult neurogenesis. Cell Stem Cell 17(3):360–372. https://doi.org/10.1016/j.stem.2015.07.013

    Article  CAS  PubMed  Google Scholar 

  57. Xu Y, Li HD, Pan Y, Luo F, Wang JX (2019) A gene rank based approach for single cell similarity assessment and clustering. IEEE/ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2019.2931582

    Article  PubMed  Google Scholar 

  58. Zheng R, Li M, Liang Z, Wu FX, Pan Y, Wang JX (2019) SinNLRR: a robust subspace clustering method for cell type detection by non-negative and low-rank representation. Bioinformatics 35(19):3642–3650. https://doi.org/10.1093/bioinformatics/btz139

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China (No. 61762087, 61702555, 61662028), Hunan Provincial Science and Technology Program (No. 2018WK4001), Guangxi Natural Science Foundation (No. 2018JJA170175), 111 Project (No. B18059) and Project of Yulin Normal University (No. 2017YJKY21). This paper was accepted by the CBC2019 conference, and we thank the committee of CBC2019 for their recommendation of this article to Interdisciplinary Sciences: Computational Life Sciences.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Xiaoqing Peng or Hong-Dong Li.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (DOC 85 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, X., Zhang, J., Xu, Y. et al. Single-Cell Clustering Based on Shared Nearest Neighbor and Graph Partitioning. Interdiscip Sci Comput Life Sci 12, 117–130 (2020). https://doi.org/10.1007/s12539-019-00357-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-019-00357-4

Keywords

Navigation