Using MCL to Extract Clusters from Networks

Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 804)

Abstract

MCL is a general purpose cluster algorithm for both weighted and unweighted networks. The algorithm utilises network topology as well as edge weights, is highly scalable and has been applied in a wide variety of bioinformatic methods. In this chapter, we give protocols and case studies for clustering of networks derived from, respectively, protein sequence similarities and gene expression profile correlations.

Key words

Network clustering Cluster analysis Protein sequence similarity Gene expression profiles 

References

  1. 1.
    van Dongen S. (2000) A cluster algorithm for graphs. Tech. rep., National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam.Google Scholar
  2. 2.
    van Dongen S. (2000) Graph clustering by flow simulation. PhD thesis, University of Utrecht.Google Scholar
  3. 3.
    van Dongen S. (2008) Graph clustering via a discrete uncoupling process. SIAM J Matrix Anal Appl, 30:121–141.CrossRefGoogle Scholar
  4. 4.
    Enright A, van Dongen S, Ouzounis C. (2002) An efficient algorithm for the large-scale detection of protein families. Nucleic Acids Res, 7:1575–1584.CrossRefGoogle Scholar
  5. 5.
    Enright AJ, Kunin V, Ouzounis CA. (2003) Protein families and TRIBES in genome sequence space. Nucleic Acids Res, 31:4632–4638.PubMedCrossRefGoogle Scholar
  6. 6.
    Li L, Stoeckert C, Roos D, OrthoMCL. (2003) Identification of ortholog groups for eukaryotic genomes. Genome Res, 13:2178–2189.PubMedCrossRefGoogle Scholar
  7. 7.
    Pereira-Leal JB, Enright AJ, Ouzounis CA. (2004) Detection of functional modules from protein interaction networks. Proteins, 54:49–57.CrossRefGoogle Scholar
  8. 8.
    Brohée S, van Helden J. (2006) Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics, 7:488.PubMedCrossRefGoogle Scholar
  9. 9.
    Samuel Lattimore B, van Dongen S, Crabbe MJ. (2005) GeneMCL in microarray analysis. Comput Biol Chem, 29:354–359.Google Scholar
  10. 10.
    Freeman TC, et al. (2007) Construction, visualisation, and clustering of transcription networks from microarray expression data. PLoS Comput Biol, 3:2032–2042.PubMedCrossRefGoogle Scholar
  11. 11.
    Lopez F, et al. (2008) TranscriptomeBrowser: a powerful and flexible toolbox to explore productively the transcriptional landscape of the Gene Expression Omnibus database. PLoS ONE, 3:e4001.PubMedCrossRefGoogle Scholar
  12. 12.
    Theodosiou T, et al. (2008) PuReD-MCL: a graph-based PubMed document clustering methodology. Bioinformatics, 24:1935–1941.PubMedCrossRefGoogle Scholar
  13. 13.
    Hubbard TJ, et al. (2009) Ensembl. Nucleic Acids Res, 37:D690–697.PubMedCrossRefGoogle Scholar
  14. 14.
    Chen F, et al. (2007) Assessing performance of orthology detection strategies applied to eukaryotic genomes. PLoS ONE, 2:e383.PubMedCrossRefGoogle Scholar
  15. 15.
    Theocharidis A, et al. (2009) Network visualization and analysis of gene expression data using BioLayout Express(3D). Nat Protoc, 4:1535–1550.PubMedCrossRefGoogle Scholar
  16. 16.
    Brohee S, Faust K, Lima-Mendez G, Sand O, Janky R, Vanderstocken G, Deville Y, van Helden J. (2008) NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways. Nucleic Acids Res, 36:W444–W451.PubMedCrossRefGoogle Scholar
  17. 17.
    King AD, Przulj N, Jurisica I. (2004) Protein complex prediction via costbased clustering. Bioinformatics, 20:3013–3020.PubMedCrossRefGoogle Scholar
  18. 18.
    Darby AC, et al. (2007) Intracellular pathogens go extreme: genome evolution in the Rickettsiales. Trends Genet, 23:511–520.PubMedCrossRefGoogle Scholar
  19. 19.
    d′Haeseleer P. (2005) How does gene expression clustering work? Nat Biotechnol, 23:1499–1501.Google Scholar
  20. 20.
    van Noort V, Snel B, Huynen MA. (2003) Predicting gene function by conserved co-expression. Trends Genet, 19:238–242.PubMedCrossRefGoogle Scholar
  21. 21.
    Faith JJ, et al. (2008) Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata. Nucleic Acids Res, 36:D866–870.PubMedCrossRefGoogle Scholar
  22. 22.
    Gama-Castro S, et al. (2008) RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation. Nucleic Acids Res, 36:D120–124.Google Scholar
  23. 23.
    Keseler IM, et al. (2009) EcoCyc: a comprehensive view of Escherichia coli biology. Nucleic Acids Res, 37:D464–470.PubMedCrossRefGoogle Scholar
  24. 24.
    Bairoch A, et al. (2009) The universal protein resource (UniProt) 2009. Nucleic Acids Res, 37:D169–D174.CrossRefGoogle Scholar
  25. 25.
    van Dongen S. (2000) Performance criteria for graph clustering and Markov cluster experiments. Tech. rep., National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam. [http://www.cwi.nl/static/publications/reports/INS-2000.html].
  26. 26.
    Ogata H, Audic S, Barbe V, Artiguenave F, Fournier PE, Raoult D, Claverie JM. (2000) Selfish DNA in protein-coding genes of Rickettsia. Science, 290:347–350.PubMedCrossRefGoogle Scholar
  27. 27.
    Neidhardt FC, Curtiss R. (1996) Escherichia Coli and Salmonella: Cellular and Molecular Biology. 2nd ed. ASM Press, Washington. [Walker GC. The SOS response of Escherichia coli. 1400–1416].Google Scholar
  28. 28.
    Huang DW, Sherman BT, Lempicki RA. (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc, 4:44–57.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.European Molecular Biology LaboratoryEuropean Bioinformatics InstituteCambridgeUK

Personalised recommendations