Skip to main content

Applications of Community Detection Algorithms to Large Biological Datasets

  • Protocol
  • First Online:
Deep Sequencing Data Analysis

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2243))

Abstract

Recent advances in data acquiring technologies in biology have led to major challenges in mining relevant information from large datasets. For example, single-cell RNA sequencing technologies are producing expression and sequence information from tens of thousands of cells in every single experiment. A common task in analyzing biological data is to cluster samples or features (e.g., genes) into groups sharing common characteristics. This is an NP-hard problem for which numerous heuristic algorithms have been developed. However, in many cases, the clusters created by these algorithms do not reflect biological reality. To overcome this, a Networks Based Clustering (NBC) approach was recently proposed, by which the samples or genes in the dataset are first mapped to a network and then community detection (CD) algorithms are used to identify clusters of nodes.

Here, we created an open and flexible python-based toolkit for NBC that enables easy and accessible network construction and community detection. We then tested the applicability of NBC for identifying clusters of cells or genes from previously published large-scale single-cell and bulk RNA-seq datasets.

We show that NBC can be used to accurately and efficiently analyze large-scale datasets of RNA sequencing experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. The GTEx Consortium (2015) The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660. https://doi.org/10.1126/science.1262110, http://www.ncbi.nlm.nih.gov/pubmed/25954001

  2. Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA et al (2013) The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45:1113–1120. http://dx.doi.org/10.1038/ng.2764, http://10.0.4.14/ng.2764

  3. Durbin RM, Altshuler DL, Durbin RM et al (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061–1073. http://www.nature.com/doifinder/10.1038/nature09534

    Article  CAS  PubMed  Google Scholar 

  4. Baran Y, Subramaniam M, Biton A et al (2015) The landscape of genomic imprinting across diverse adult human tissues. Genome Res 25:927–936. http://genome.cshlp.org/lookup/doi/10.1101/gr.192278.115

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Pirinen M, Lappalainen T, Zaitlen NA et al (2015) Assessing allele-specific expression across multiple tissues from RNA-seq read data. Bioinformatics 31:2497–2504. http://bioinformatics.oxfordjournals.org/lookup/doi/10.1093/bioinformatics/btv074

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Lappalainen T, Sammeth M, Friedländer MR et al (2013) Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501:506–511. http://www.nature.com/doifinder/10.1038/nature12531

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Mele M, Ferreira PG, Reverter F et al (2015) The human transcriptome across tissues and individuals. Science 348:660–665. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4547472&tool=pmcentrez&rendertype=abstract, http://www.sciencemag.org/cgi/doi/10.1126/science.aaa0355

  8. Leiserson MDM, Vandin F, Wu H-T et al (2014) Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat Genet 47:106–114. http://www.nature.com/doifinder/10.1038/ng.3168

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  9. Nawy T (2013) Single-cell sequencing. Nat Methods 11:18. http://www.nature.com/doifinder/10.1038/nmeth.2801, http://www.nature.com/doifinder/10.1038/nmeth.2771

  10. Ramsköld D, Luo S, Wang Y-C et al (2012) Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol 30:777–782. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3467340&tool=pmcentrez&rendertype=abstract, http://www.nature.com/doifinder/10.1038/nbt.2282

  11. Shalek AK, Satija R, Adiconis X et al (2013) Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 498:236–240. https://doi.org/10.1038/nature12172. http://www.ncbi.nlm.nih.gov/pubmed/23685454, http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3683364&tool=pmcentrez&rendertype=abstract

  12. Jaitin DA, Kenigsberg E, Keren-Shaul H et al (2014) Massively parallel single-cell RNA-Seq for marker free decomposition of tissues into cell types. Science 343:776–779. http://www.sciencemag.org/cgi/doi/10.1126/science.1247651

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Macosko EZ, Basu A, Satija R et al (2015) Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161:1202–1214. https://doi.org/10.1016/j.cell.2015.05.002. http://linkinghub.elsevier.com/retrieve/pii/S0092867415005498

  14. Stephens ZD, Lee SY, Faghri F et al (2015) Big data: astronomical or genomical? PLoS Biol. 13:e1002195. http://dx.plos.org/10.1371/journal.pbio.1002195

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  15. Marx V (2013) Biology: the big challenges of big data. Nature 498:255–260. http://www.nature.com/doifinder/10.1038/498255a

    Article  CAS  PubMed  Google Scholar 

  16. Jiang D, Tang C, Zhang A (2004) Cluster analysis for gene expression data: a survey. IEEE Trans Knowl Data Eng 16:1370–1386. https://doi.org/10.1109/TKDE.2004.68. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=1339264

  17. Sørlie T, Perou CM, Tibshirani R et al (2001) Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA 98:10869–10874. https://doi.org/10.1073/pnas.191367098, http://www.ncbi.nlm.nih.gov/pubmed/11553815, http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC58566

  18. Kapp AV, Jeffrey SS, Langerød A et al (2006) Discovery and validation of breast cancer subtypes. BMC Genomics 7:231. https://doi.org/10.1186/147121647231, http://www.ncbi.nlm.nih.gov/pubmed/16965636, http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC1574316

  19. Rothenberg ME, Nusse Y, Kalisky T et al (2012) Identification of a cKit(+) colonic crypt base secretory cell that supports Lgr5(+) stem cells in mice. Gastroenterology 142:1195–1205.e6. https://doi.org/10.1053/j.gastro.2012.02.006, http://www.ncbi.nlm.nih.gov/pubmed/22333952

  20. Pollen AA, Nowakowski TJ, Shuga J et al (2014) Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex. Nat Biotechnol 32:1053–1058. http://www.nature.com/doifinder/10.1038/nbt.2967

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Treutlein B, Lee QY, Camp JG et al (2016) Dissecting direct reprogramming from fibroblast to neuron using single-cell RNA-seq. Nature 1–15. http://www.nature.com/doifinder/10.1038/nature18323

  22. Kolodziejczyk AA, Kim JK, Tsang JCH et al (2015) Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation. Cell stem cell 17:471–85. https://doi.org/10.1016/j.stem.2015.09.011. http://www.ncbi.nlm.nih.gov/pubmed/26431182, http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC4595712

  23. Wang J, Xia S, Arand B et al (2016) Single-cell co-expression analysis reveals distinct functional modules, co-regulation mechanisms and clinical outcomes. PLoS Comput Biol 12:e1004892. https://doi.org/10.1371/journal.pcbi.1004892. http://www.ncbi.nlm.nih.gov/pubmed/27100869, http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC4839722

  24. Wills QF, Livak KJ, Tipping AJ et al (2013) Single-cell gene expression analysis reveals genetic associations masked in whole-tissue experiments. Nat Biotechnol 31:748–752. https://doi.org/10.1038/nbt.2642, http://www.ncbi.nlm.nih.gov/pubmed/23873083

  25. Hung J-H, Yang T-H, Hu Z et al (2012) Gene set enrichment analysis: performance evaluation and usage guidelines. Brief Bioinform 13:281–291. http://bib.oxfordjournals.org/cgi/doi/10.1093/bib/bbr049

    Article  PubMed  Google Scholar 

  26. Ashburner M, Ball CA, Blake JA et al (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25:25–29. http://www.nature.com/doifinder/10.1038/75556

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Kanehisa M (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 28:27–30. http://nar.oxfordjournals.org/lookup/doi/10.1093/nar/28.1.27

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Hamosh A (2004) Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 33:D514–D517. http://nar.oxfordjournals.org/lookup/doi/10.1093/nar/gki033

    Article  PubMed Central  CAS  Google Scholar 

  29. Huang DW, Sherman BT, Lempicki RA (2009) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37:1–13. http://naroxfordjournals.org/lookup/doi/10.1093/nar/gkn923

    Article  CAS  Google Scholar 

  30. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. https://doi.org/10.1145/331499.331504

  31. Kohonen T (1990) The self-organizing map. Proc IEEE 78:1464–1480. https://doi.org/10.1109/5.58325, http://ieeexplore.ieee.org/document/58325/

  32. Von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17:395–416. https://doi.org/10.1007/s112220079033z. arXiv:0711.0189v1

  33. Martin E, Hans-Peter K, Jörg S et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD-96 Proceedings, pp 226–231. CiteSeerX:10.1.1.121.9220

    Google Scholar 

  34. Xu R, Wunsch DC (2010) Clustering algorithms in biomedical research: a review. IEEE Rev Biomed Eng 3:120–154. https://doi.org/10.1109/RBME.2010.2083647, http://www.ncbi.nlm.nih.gov/pubmed/22275205

  35. Berkhin P (2006) A survey of clustering data mining techniques. In: Kogan J, Nicholas C, Teboulle M (eds) Grouping multidimensional data: recent advances in clustering. Springer, Berlin, pp 25–71. https://doi.org/10.1007/3540283498_2

    Chapter  Google Scholar 

  36. Lewis K, Kaufman J, Gonzalez M et al (2008) Tastes, ties, and time: a new social network dataset using Facebook.com. Soc Netw 30:330–342. https://doi.org/10.1016/j.socnet.2008.07.002, http://linkinghub.elsevier.com/retrieve/pii/S0378873308000385

  37. Ediger D, Jiang K, Riedy J et al (2010) Massive social network analysis: mining twitter for social good. In: 2010 39th International conference on parallel processing. IEEE, pp 583–593. https://doi.org/10.1109/ICPP.2010.66

  38. Jeong H, Mason SP, Barabási A-L et al (2001) Lethality and centrality in protein networks. Nature 411:41–42. http://www.nature.com/doifinder/10.1038/35075138

    Article  CAS  PubMed  Google Scholar 

  39. Shen-Orr SS, Milo R, Mangan S et al (2002) Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet 31:64–68. http://www.nature.com/doifinder/10.1038/ng881

    Article  CAS  PubMed  Google Scholar 

  40. Papadopoulos S, Kompatsiaris Y, Vakali A et al (2012) Community detection in Social Media. Data Min Knowl Discov 24:515–554. http://link.springer.com/10.1007/s106180110224z

    Article  Google Scholar 

  41. Chen J, Yuan B (2006) Detecting functional modules in the yeast protein-protein interaction network. Bioinformatics 22:2283–2290. http://bioinformatics.oxfordjournals.org/cgi/doi/10.1093/bioinformatics/btl370

    Article  CAS  PubMed  Google Scholar 

  42. Dourisboure Y, Geraci F, Pellegrini M (2007) Extraction and classification of dense communities in the web. In: Proceedings of the 16th international conference on world wide web WWW ’07. ACM, New York, pp 461–470. https://doi.org/10.1145/1242572.1242635

    Chapter  Google Scholar 

  43. Fortunato S (2010) Community detection in graphs. Phys Rep 486:75–174. https://doi.org/10.1016/j.physrep.2009.11.002, http://linkinghub.elsevier.com/retrieve/pii/S0370157309002841

  44. Newman MEJ (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103:8577–8582. http://www.pnas.org/cgi/doi/10.1073/pnas.0601602103

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113. http://www.ncbi.nlm.nih.gov/pubmed/14995526, https://link.aps.org/doi/10.1103/PhysRevE.69.026113

  46. Newman MEJ (2004) Analysis of weighted networks. Phys Rev E 70:056131. https://link.aps.org/doi/10.1103/PhysRevE.70.056131

    Article  CAS  Google Scholar 

  47. Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70:066111. http://link.aps.org/doi/10.1103/PhysRevE.70.066111

    Article  CAS  Google Scholar 

  48. Reichardt J, Bornholdt S (2006) Statistical mechanics of community detection. Phys Rev E 74:016110. http://link.aps.org/doi/10.1103/PhysRevE.74.016110

    Article  CAS  Google Scholar 

  49. Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci 105:1118–1123. http://www.pnas.org/cgi/doi/10.1073/pnas.0706851105

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Yucel M, Muchnik L, Hershberg U (2016) Detection of network communities with memory-biased random walk algorithms. J Complex Netw 5:48–69. http://comnet.oxfordjournals.org/content/early/2016/04/22/comnet.cnw007.abstract%5Cnpapers2://publication/doi/10.1093/comnet/cnw007

    Google Scholar 

  51. Jiang P, Singh M (2010) SPICi: a fast clustering algorithm for large biological networks. Bioinformatics (Oxford, England) 26:1105–1111. https://doi.org/10.1093/bioinformatics/btq078, http://www.ncbi.nlm.nih.gov/pubmed/20185405, http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC2853685

  52. Blondel VD, Guillaume J-L, Lambiotte R et al (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008:P10008. https://doi.org/10.1088/17425468/2008/10/P10008, http://stacks.iop.org/17425468/2008/i=10/a=P10008?key=crossref.46968f6ec61eb8f907a760be1c5ace52

  53. Waltman L, van Eck NJ (2013) A smart local moving algorithm for large-scale modularity-based community detection. Eur Phys J B 86:471. http://link.springer.com/10.1140/epjb/e201340829-0

    Article  CAS  Google Scholar 

  54. Levine J, Simonds E, Bendall S et al (2015) Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162:184–197. https://doi.org/10.1016/j.cell.2015.05.047, http://linkinghub.elseviercom/retrieve/pii/S0092867415006376

  55. Xu C, Su Z (2015) Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 31:1974–1980. http://dx.doi.org/10.1093/bioinformatics/btv088

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. PhenoGraph repository. https://github.com/jacoblevine/PhenoGraph. Accessed 3 May 2018

  57. SNN-Cliq repository. http://bioinfo.uncc.edu/SNNCliq/. Accessed 3 May 2018

  58. Butler A, Hoffman P, Smibert P et al (2018) Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol 36:411–420. https://doi.org/10.1038/nbt.4096, http://www.ncbi.nlm.nih.gov/pubmed/29608179

  59. Seurat repository. http://satijalab.org/seurat/. Accessed 3 May 2018

  60. Patel AP, Tirosh I, Trombetta JJ et al (2014) Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science (New York, N.Y.) 344:1–9. http://www.ncbi.nlm.nih.gov/pubmed/24925914, http://www.sciencemag.org/cgi/doi/10.1126/science.1254257

  61. Series GSE57872. ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE57nnn/GSE57872/suppl/GSE57872_GBM_data_matrix.txt.gz. Accessed 7 Sept 2017

  62. Klein AM, Mazutis L, Akartuna I et al (2015) Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161:1187–1201. https://doi.org/10.1016/j.cell.2015.04.044, http://linkinghub.elsevier.com/retrieve/pii/S0092867415005000

  63. Series GSE65525. http://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE65525&format=file. Accessed 7 Sept 2017

  64. GTEx Portal. http://www.gtexportal.org/home/datasets. Accessed 7 Sept 2017

  65. Durinck S, Moreau Y, Kasprzyk A et al (2005) BioMart and bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics (Oxford, England) 21:3439–3440. https://doi.org/10.1093/bioinformatics/bti525, http://www.ncbi.nlm.nih.gov/pubmed/16082012

  66. Series GSE63472. ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE63nnn/GSE63472/suppl/GSE63472_P14Retina_merged_digital_expression.txt.gz. Accessed 7 Sept 2017

  67. Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830. http://dl.acm.org/citation.cfm?id=1953048.2078195

    Google Scholar 

  68. Omohundro SM (1989) Five balltree construction algorithms. International Computer Science Institute, Berkeley

    Google Scholar 

  69. Fruchterman TMJ, Reingold EM (1991) Graph drawing by force-directed placement. Softw Pract Exp 21:1129–1164. http://dx.doi.org/10.1002/spe.4380211102

    Article  Google Scholar 

  70. Csardi G, Nepusz T (2006) The igraph software package for complex network research. Int J Complex Syst 1695(5):1–9

    Google Scholar 

  71. Desgraupes B (2018) clusterCrit: clustering indices

    Google Scholar 

  72. R Core Team (2016) R: a language and environment for statistical computing

    Google Scholar 

  73. Karatzoglou A, Smola A, Hornik K et al (2004) kernlab an S4 package for Kernel methods in R. J Stat Softw 11. https://doi.org/10.18637/jss.v011.i09, http://www.jstatsoft.org/v11/i09/

  74. Uhlen M, Fagerberg L, Hallstrom BM et al (2015) Tissue-based map of the human proteome. Science 347:1260419. http://www.sciencemag.org/cgi/doi/10.1126/science.1260419

    Article  PubMed  CAS  Google Scholar 

  75. Human Protein Atlas Version 14. http://v14.proteinatlas.org. Accessed 6 Aug 2018

  76. Bullard JH, Purdom E, Hansen KD et al (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11:94. http://bmcbioinformatics.biomedcentral.com/articles/10.1186/147121051194

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  77. Diaz A, Liu SJ, Sandoval C et al (2016) SCell: integrated analysis of single-cell RNA-seq data. Bioinformatics 32:2219–2220. http://bioinformatics.oxfordjournals.org/lookup/doi/10.1093/bioinformatics/btw201

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Guo M, Wang H, Potter SS et al (2015) SINCERA: a pipeline for single-cell RNA-Seq profiling analysis. PLoS Comput Biol 11:e1004575. http://dx.plos.org/10.1371/journal.pcbi.1004575

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  79. Li P, Piao Y, Shon HS et al (2015) Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data. BMC Bioinformatics 16:347. http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s1285901507787

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  80. Vallejos CA, Risso D, Scialdone A et al (2017) Normalizing single-cell RNA sequencing data: challenges and opportunities. Nat Methods 14:565–571. http://www.nature.com/doifinder/10.1038/nmeth.4292

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. van Dongen S, Enright AJ (2012) Metric distances derived from cosine similarity and Pearson and Spearman correlations. Preprint, arXiv:1208.3145. http://arxiv.org/abs/1208.31451208.3145

    Google Scholar 

  82. Jaskowiak PA, Campello RJGB, Costa IG (2014) On the selection of appropriate distances for gene expression data clustering. BMC Bioinformatics 15(Suppl 2):S2. https://doi.org/10.1186/1471210515S2S2, http://www.ncbi.nlm.nih.gov/pubmed/24564555, http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC4072854

  83. Heng TSP, Painter MW, Immunological Genome Project Consortium (2008) The Immunological Genome Project: networks of gene expression in immune cells. Nat Immunol 9:1091–1094. https://doi.org/10.1038/ni10081091, http://www.ncbi.nlm.nih.gov/pubmed/18800157

  84. Harding SD, Armit C, Armstrong J et al (2011) The GUDMAP database–an online resource for genitourinary research. Development (Cambridge, England) 138:2845–2853. https://doi.org/10.1242/dev.063594, http://www.ncbi.nlm.nih.gov/pubmed/21652655, http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC3188593

  85. Ritchie ME, Phipson B, Wu D et al (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43:e47. http://nar.oxfordjournals.org/lookup/doi/10.1093/nar/gkv007, https://academic.oup.com/nar/articlelookup/doi/10.1093/nar/gkv007, http://www.ncbi.nlm.nih.gov/pubmed/25605792, http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC4402510

  86. Subramanian A, Tamayo P, Mootha VK et al (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci 102:15545–15550. http://www.pnas.org/cgi/doi/10.1073/pnas.0506580102

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Yaari G, Bolen CR, Thakar J et al (2013) Quantitative set analysis for gene expression: a method to quantify gene set differential expression including gene-gene correlations. Nucleic Acids Res 41:e170. https://doi.org/10.1093/nar/gkt660, http://www.ncbi.nlm.nih.gov/pubmed/23921631, http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC3794608

  88. Dalerba P, Kalisky T, Sahoo D et al (2011) Single-cell dissection of transcriptional heterogeneity in human colon tumors. Nat Biotechnol 29:1120–1127. https://doi.org/10.1038/nbt.2038, http://wwwpubmedcentral.nih.gov/articlerender.fcgi?artid=3237928%7B&%7Dtool=pmcentrez%7B&%7Drendertype=abstract, http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3237928&tool=pmcentrez&rendertype=abstract

  89. Huang DW, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4:44–57. https://doi.org/10.1038/nprot.2008.211, http://www.ncbi.nlm.nih.gov/pubmed/19131956

  90. Fan HC, Fu GK, Fodor SPA (2015) Combinatorial labeling of single cells for gene expression cytometry. Science 347:1258367. http://www.sciencemag.org/cgi/doi/10.1126/science.1258367, http://www.sciencemag.org/content/347/6222/1258367.abstract

  91. Andoni A, Indyk P (2008) Near-optimal Hashing algorithms for approximate nearest neighbor in high dimensions. Commun ACM 51:117–122. http://doi.acm.org/10.1145/1327452.1327494

    Article  Google Scholar 

  92. Bawa M, Condie T, Ganesan P (2005) LSH forest: self-tuning indexes for similarity search. In: Proceedings of the 14th international conference on world wide web WWW ’05. ACM, New York, pp 651–660. https://doi.org/10.1145/1060745.1060840

    Chapter  Google Scholar 

  93. Wang M, Zhang W, Ding W et al (2014) Parallel clustering algorithm for large-scale biological data sets. PLoS ONE 9:e91315. http://dx.plos.org/10.1371/journal.pone.0091315

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  94. Hastie T, Tibshirani R (2004) Efficient quadratic regularization for expression arrays. Biostatistics 5:329–340. https://doi.org/10.1093/biostatistics/kxh010, http://biostatistics.oxfordjournals.org/content/5/3/329.abstract

  95. Halko N, Martinsson PG, Tropp JA (2011) Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev 53:217–288. http://epubs.siam.org/doi/abs/10.1137/090771806

    Article  Google Scholar 

  96. van der Maaten L, Hinton GE (2008) Visualizing high-dimensional data using t-SNE. J Mach Learn Res 9:2579–2605

    Google Scholar 

  97. Tirosh I, Izar B, Prakadan SM et al (2016) Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352:189–196. http://www.sciencemag.org/cgi/doi/10.1126/science.aad0501arXiv:1011.1669v3

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science (New York, N.Y.) 290:2319–2323. https://doi.org/10.1126/science.290.5500.2319, http://www.ncbi.nlm.nih.gov/pubmed/11125149

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Gur Yaari or Tomer Kalisky .

Editor information

Editors and Affiliations

1 Electronic Supplementary Material

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Kanter, I., Yaari, G., Kalisky, T. (2021). Applications of Community Detection Algorithms to Large Biological Datasets. In: Shomron, N. (eds) Deep Sequencing Data Analysis. Methods in Molecular Biology, vol 2243. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1103-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-1103-6_3

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-1102-9

  • Online ISBN: 978-1-0716-1103-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics