Comparative Genomics and Evolutionary Modularity of Prokaryotes

  • Cedoljub Bundalovic-Torma
  • John Parkinson
Part of the Advances in Experimental Medicine and Biology book series (AEMB, volume 883)


The soaring number of high-quality genomic sequences has ushered in the era of post-genomic research where our understanding of organisms has dramatically shifted towards defining the function of genes within their larger biological contexts. As a result, novel high-throughput experimental technologies are being increasingly employed to uncover physical and functional associations of genes and proteins in complex biological processes. Through the construction and analysis of physical, genetic and metabolic networks generated for the model organisms, such as Escherichia coli, organizational principles of the genome have been deduced, such as modularity, which has important implications toward understanding prokaryotic evolution and adaptation to novel lifestyles.


Comparative genomics Genomic-context High-throughput interaction screening Network biology Modularity Prokaryotic evolution 


  1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410CrossRefPubMedGoogle Scholar
  2. Angiuoli SV, Gussman A, Klimke W, Cochrane G, Field D et al (2008) Toward an online repository of Standard Operating Procedures (SOPs) for (meta)genomic annotation. OMICS 12:137–141PubMedCentralCrossRefPubMedGoogle Scholar
  3. Arifuzzaman M, Maeda M, Itoh A, Nishikata K, Takita C et al (2006) Large-scale identification of protein–protein interactions of Escherichia coli K-12. Genome Res 16:686–691PubMedCentralCrossRefPubMedGoogle Scholar
  4. Armean IM, Lilley KS, Trotter MWB (2013) Popular computational methods to assess multiprotein complexes derived from label-free affinity purification and mass spectrometry (AP-MS) experiments. Mol Cell Proteomics 12:1–13PubMedCentralCrossRefPubMedGoogle Scholar
  5. Babu M, Butland G, Pogoutse O, Li J, Greenblatt JF, Emili A (2009) Sequential peptide affinity purification system for the systematic isolation and identification of protein complexes from Escherichia coli. Methods Mol Biol 564:373–400CrossRefPubMedGoogle Scholar
  6. Babu M, Gagarinova A, Emili A (2011) Array-based synthetic genetic screens to map bacterial pathways and functional networks in Escherichia coli. Methods Mol Biol 781:99–126CrossRefPubMedGoogle Scholar
  7. Babu M, Arnold R, Bundalovic-Torma C, Gagarinova A, Wong KS et al (2014) Quantitative genome-wide genetic interaction screens reveal global epistatic relationships of protein complexes in Escherichia coli. PLoS Genet 10Google Scholar
  8. Bader GD, Hogue CW (2003) An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 4Google Scholar
  9. Bader GD, Betel D, Hogue CWV (2003) BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res 31:248–250PubMedCentralCrossRefPubMedGoogle Scholar
  10. Barbe V, Cruveiller S, Kunst F, Lenoble P, Meurice G et al (2009) From a consortium sequence to a unified sequence: the Bacillus subtilis 168 reference genome a decade later. Microbiology 155:1758–1775PubMedCentralCrossRefPubMedGoogle Scholar
  11. Bernhardt TG, de Boer PA (2004) Screening for synthetic lethal mutants in Escherichia coli and identification of EnvC (YibP) as a periplasmic septal ring factor with murein hydrolase activity. Mol Microbiol 52:1244–1269CrossRefGoogle Scholar
  12. Boone C, Bussey H, Andrews BJ (2007) Exploring genetic interactions and networks with yeast. Nat Rev Genet 8:437–449CrossRefPubMedGoogle Scholar
  13. Brohee S, van Helden J (2006) Evaluation of clustering algorithms for protein–protein interaction networks. BMC Bioinformatics 7:488–506PubMedCentralCrossRefPubMedGoogle Scholar
  14. Buchanan G, Sargent F, Berks BC, Palmer T (2001) A genetic screen for suppressors of Escherichia coli Tat signal peptide mutations establishes a critical role for the second arginine within the twin-arginine motif. Arch Microbiol 177:107–112CrossRefPubMedGoogle Scholar
  15. Butland G, Peregrin-Alvarez JM, Li J, Yang W, Yang X et al (2004) Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature 433:431–437Google Scholar
  16. Butland G, Babu M, Diaz-Mejia JJ, Bohdana F, Phanse S et al (2008) eSGA: E. coli synthetic array analysis. Nat Methods 5:789–795CrossRefPubMedGoogle Scholar
  17. Caspi R, Altman T, Billington R, Dreher K, Foerster H et al (2013) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res 42:D459–D471PubMedCentralCrossRefPubMedGoogle Scholar
  18. Chen F, Mackey AJ, Stoeckert CJ, Roos DS (2006) OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res 34:D363–D368PubMedCentralCrossRefPubMedGoogle Scholar
  19. Cherry JM, Adler C, Ball C, Chervitz SA, Dwight SS et al (1998) SGD: Saccharomyces Genome Database. Nucleic Acids Res 26:73–79PubMedCentralCrossRefPubMedGoogle Scholar
  20. Clarke P, Vuiv PO, O’Connell M (2005) Novel mobilizable prokaryotic two-hybrid system vectors for high-throughput protein interaction mapping in Escherichia coli by bacterial conjugation. Nucleic Acids Res 33:e18PubMedCentralCrossRefPubMedGoogle Scholar
  21. Claudel-Renard C, Chevalet C, Faraut T, Kahn D (2003) Enzyme-specific profiles for genome annotation: PRIAM. Nucleic Acids Res 15:6633–6639CrossRefGoogle Scholar
  22. Dandekar T, Snel B, Huynen M, Bork P (1998) Conservation of gene order: a fingerprint of proteins that physically interact. TIBS 23:325–328Google Scholar
  23. Diaz-Mejia JJ, Babu M, Emili A (2008) Computational and experimental approaches to chart the Escherichia coli cell-envelope-associated proteome and interactome. FEMS Microbiol Rev 33:66–97PubMedCentralCrossRefPubMedGoogle Scholar
  24. Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95:14863–14868PubMedCentralCrossRefPubMedGoogle Scholar
  25. Enault F, Suhre K, Abergel C, Poirot O, Claverie J-M (2003) Annotation of bacterial genomes using improved phylogenomic profiles. Bioinformatics 19:i105–i107CrossRefPubMedGoogle Scholar
  26. Feist AM, Henry CS, Reed JL, Krummenacker M, Joyce AR, Karp PD, Broadbelt LJ, Hatzimanikatis V, Palsson BO (2007) A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol Syst Biol 3Google Scholar
  27. Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M et al (2013) STRING v9.1: protein–protein interaction networks, with increased coverage and integration. Nucleic Acids Res 41:D808–D815PubMedCentralCrossRefPubMedGoogle Scholar
  28. Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315:972–976CrossRefPubMedGoogle Scholar
  29. Fulton DL, Li YY, Laird MR, Horsman BG, Roche FM, Brinkman FS (2006) Improving the specificity of high-throughput ortholog prediction. BMC Bioinformatics 7:270–285PubMedCentralCrossRefPubMedGoogle Scholar
  30. Gabaldon T, Koonin EV (2013) Functional and evolutionary implications of gene orthology. Nat Rev Genet 14:360–366CrossRefPubMedGoogle Scholar
  31. Hacker J, Kaper JB (2000) Pathogenicity islands and the evolution of microbes. Annu Rev Microbiol 54:641–679CrossRefPubMedGoogle Scholar
  32. Hakes L, Robertson DL, Oliver SG, Lovell SC (2007) Protein interactions from complexes: a structural perspective. Comp Funct Genomics 2007Google Scholar
  33. Hu P, Janga SC, Babu M, Diaz-Mejia JJ, Butland G et al (2009) Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins. PLoS Biol 7:e97Google Scholar
  34. Hung SS, Wasmuth J, Sanford C, Parkinson J (2010) DETECT—a density estimation tool for enzyme classification and its application to Plasmodium falciparum. Bioinformatics 26:1690–1698CrossRefPubMedGoogle Scholar
  35. iRefScape (2011) A cytoscape plug-in for visualization and data mining of protein interaction data from iRefIndex. BMC Bioinformatics 12:388Google Scholar
  36. Jiang X, Fares MA (2011) Functional diversification of the twin-arginine translocation pathway mediates the emergence of novel ecological adaptations. Mol Biol Evol 28:3183–3193CrossRefPubMedGoogle Scholar
  37. Jiang C, Brown PJ, Ducret A, Brun YV (2014) Sequential evolution of bacterial morphology by co-option of a developmental regulator. Nature 506:489–493PubMedCentralCrossRefPubMedGoogle Scholar
  38. Joung JK, Ramm EI, Pabo CO (2000) A bacterial two-hybrid selection system for studying protein–DNA and protein–protein interactions. Proc Natl Acad Sci U S A 97:7382–7387PubMedCentralCrossRefPubMedGoogle Scholar
  39. Kanehisa M, Goto S (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 28:27–30PubMedCentralCrossRefPubMedGoogle Scholar
  40. Karp PD, Paley SM, Krummenacker M, Latendresse M, Dale JM et al (2009) Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology. Brief Bioinform 2:40–79Google Scholar
  41. Kerrien S, Orchard S, Montecchi-Palazzi L, Aranda B, Quinn AF et al (2007) Broadening the horizon – level 2.5 of the HUPO-PSI format for molecular interactions. BMC Biol 5:44Google Scholar
  42. Kerrien S, Aranda B, Breuza L, Bridge A, Broackes-Carter F et al (2012) The IntAct molecular interaction database in 2012. Nucleic Acids Res 40:D841–D846PubMedCentralCrossRefPubMedGoogle Scholar
  43. Keseler IM, Collado-Vides J, Gama-Castro S, Ingraham J, Paley S, Paulsen IT, Peralta-Gil M, Karp PD (2005) EcoCyc: a comprehensive database resource for Escherichia coli. Nucleic Acids Res 33:D334–D337PubMedCentralCrossRefPubMedGoogle Scholar
  44. Killcoyne S, Carter GW, Smith J, Boyle J (2009) Cytoscape: a community-based framework for network modeling. Methods Mol Biol 563:219–239CrossRefPubMedGoogle Scholar
  45. Koonin EV, Makarova KS, Aravind L (2001) Horizontal gene transfer in prokaryotes: quantification and classification. Annu Rev Microbiol 55:709–742CrossRefPubMedGoogle Scholar
  46. Korbel JO, Jensen LJ, von Mering C, Bork P (2004) Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs. Nat Biotechnol 22:911–917CrossRefPubMedGoogle Scholar
  47. Kuzniar A, van Ham RC, Pongor S, Leunissen JA (2008) The quest for orthologs: finding the corresponding gene across genomes. Trends Genet 24:539–551CrossRefPubMedGoogle Scholar
  48. Licata L, Briganti L, Peluso D, Perfetto L, Iannuccelli M et al (2011) MINT, the molecular interaction database: 2012 update. Nucleic Acids Res 40:D857–D861PubMedCentralCrossRefPubMedGoogle Scholar
  49. Marcotte EM, Pellegrini M, Ng H-L, Rice DW, Yeates TO, Eisenberg D (1999) Detecting protein function and protein–protein interactions from genome sequences. Science 285:751–753CrossRefPubMedGoogle Scholar
  50. Mewes HW, Frishman D, Guldener U, Mannhaupt G, Mayer K et al (2002) MIPS: a database for genomes and protein sequences. Nucleic Acids Res 30:31–34PubMedCentralCrossRefPubMedGoogle Scholar
  51. Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002) Network motifs: simple building blocks of complex networks. Science 298:824–827CrossRefPubMedGoogle Scholar
  52. Monti M, Orru S, Pagnozzi D, Picci P (2005) Interaction proteomics. Biosci Rep 25:45–56CrossRefPubMedGoogle Scholar
  53. Morris JH, Apeltsin L, Newman AM, Baumbach J, Wittkop T, Bader GD, Ferrin TE (2011) clusterMaker: a multi-algorithm clustering plugin for Cytoscape. BMC Bioinformatics 12: 436–449Google Scholar
  54. Oh YK, Palsson BO, Park SM, Schilling CH, Mahadevan R (2007) Genome-scale reconstruction of metabolic network in Bacillus subtilis based on high-throughput phenotyping and gene essentiality data. J Biol Chem 282:28791–28799CrossRefPubMedGoogle Scholar
  55. Omelchenko MV, Makarova KS, Wolf YI, Rogozin IB, Koonin EV (2003) Evolution of mosaic operons by horizontal gene transfer and gene displacement in situ. Genome Biol 4Google Scholar
  56. Overbeek R, Fonstein M, D’Souza M, Pusch GD, Maltsev N (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci U S A 96:2896–2901PubMedCentralCrossRefPubMedGoogle Scholar
  57. Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ et al (2014) The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res 42:D206–D214PubMedCentralCrossRefPubMedGoogle Scholar
  58. Pagani I, Liolios K, Jansson J, Chen I-MA, Smirnova T, Nosrat B, Markowitz M, Kyrpides NC (2011) The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 40:D571–D579PubMedCentralCrossRefPubMedGoogle Scholar
  59. Pardo M, Choudhary JS (2012) Assignment of protein interactions from affinity purification/mass spectrometry data. J Proteome Res 11:1462–1474CrossRefPubMedGoogle Scholar
  60. Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg MJ, Yeates TO (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci U S A 96:4285–4288PubMedCentralCrossRefPubMedGoogle Scholar
  61. Peregrin-Alvarez JM, Xiong X, Su C, Parkinson J (2009a) The modular organization of protein interactions in Escherichia coli. PLoS Comp Biol 5Google Scholar
  62. Peregrin-Alvarez JM, Sanford C, Parkinson J (2009b) The conservation and evolutionary modularity of metabolism. Genome Biol 10Google Scholar
  63. Porcar M, Latorre A, Moya A (2013) What symbionts teach us about modularity. Front Bioeng Biotechnol 1Google Scholar
  64. Powell S, Forslund K, Szklarczyk D, Trachana K, Roth A et al (2014) eggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Res 42:D231–D239PubMedCentralCrossRefPubMedGoogle Scholar
  65. Rajagopala SV, Sikorski P, Kumar A, Mosca R, Vasblom J et al (2014) The binary protein–protein interaction landscape of Escherichia coli. Nat Biotechnol 32:285–293PubMedCentralCrossRefPubMedGoogle Scholar
  66. Razick S, Magklaras G, Donaldson IM (2008) IRefIndex: a consolidated protein interaction database with provenance. BMC Bioinformatics 9Google Scholar
  67. Remm M, Storm CE, Sonnhammer EL (2001) Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol 314:1041–1052CrossRefPubMedGoogle Scholar
  68. Reuter S, Connor TR, Barquist L, Walker D, Feltwell T et al (2014) Parallel independent evolution of pathogenicity within the genus Yersinia. Proc Natl Acad Sci U S A 111:6768–6773PubMedCentralCrossRefPubMedGoogle Scholar
  69. Richmond CS, Glasner JD, Mau R, Jin H, Blattner FR (1999) Genome-wide expression profiling in Escherichia coli K-12. Nucleic Acids Res 19:3821–3835CrossRefGoogle Scholar
  70. Saito R, Smoot ME, Ono K, Ruscheinski J, Wang PL, Lotia S, Pico AR, Bader GD, Ideker T (2012) A travel guide to Cytoscape plugins. Nat Methods 9:1069–1076PubMedCentralCrossRefPubMedGoogle Scholar
  71. Salgado H, Peralta-Gil M, Gama-Castro S, Santos-Zavaleta A, Muniz-Rascado L et al (2013) RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more. Nucleic Acids Res 41:D203–D213PubMedCentralCrossRefPubMedGoogle Scholar
  72. Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D (2004) The database of interacting proteins: 2004 update. Nucleic Acids Res 32:D449–D451PubMedCentralCrossRefPubMedGoogle Scholar
  73. Saurin W, Hofnung M, Dassa E (1999) Getting in or out: early segregation between importers and exporters in the evolution of ATP-binding cassette (ABC) transporters. J Mol Evol 48:22–41CrossRefPubMedGoogle Scholar
  74. Schena M, Shalon D, Davis RW, Brown PO (1995) Quantitative monitoring of gene expression with a complementary DNA microarray. Science 270:467–470CrossRefPubMedGoogle Scholar
  75. Silhavy TJ, Kahne D, Walker S (2010) The bacterial cell envelope. Cold Spring Harb Perspect Biol 2Google Scholar
  76. Silva MT (2012) Classical labeling of bacterial pathogens according to their lifestyle in the host: inconsistencies and alternatives. Front Microbiol 3:71PubMedCentralCrossRefPubMedGoogle Scholar
  77. Singh AH, Wolf DM, Wang P, Arkin AP (2008) Modularity of stress response evolution. Proc Natl Acad Sci U S A 105:7500–7505PubMedCentralCrossRefPubMedGoogle Scholar
  78. Slonim DK, Yanai I (2009) Getting started in gene expression microarray analysis. PLoS Comput Biol 5Google Scholar
  79. Smith V, Botsteinm D, Brown PO (1995) Genetic footprinting: a genomic strategy for determining a gene’s function given its sequence. Proc Natl Acad Sci U S A 92:6479–6483PubMedCentralCrossRefPubMedGoogle Scholar
  80. Song L, Langfelder P, Horvath S (2012) Comparison of co-expression measures: mutual information, correlation, and model based indices. BMC Bioinformatics 13:328–348PubMedCentralCrossRefPubMedGoogle Scholar
  81. Su C, Peregrin-Alvarez JM, Butland G, Panse S, Fong V, Emili A, Parkinson J (2008)—an integrated protein interaction database for E. coli. Nucleic Acids Res 36:D632–D636PubMedCentralCrossRefPubMedGoogle Scholar
  82. Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A et al (2011) The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 39:D561–D568PubMedCentralCrossRefPubMedGoogle Scholar
  83. Tatsuov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science 278:631–637CrossRefGoogle Scholar
  84. Taylor JS, Raes J (2004) Duplication and divergence: the evolution of new genes and old ideas. Annu Rev Genet 38:615–643CrossRefPubMedGoogle Scholar
  85. Toft C, Fares MA (2008) The evolution of the flagellar assembly pathway in endosymbiotic bacterial genomes. Mol Biol Evol 25:2069–2076Google Scholar
  86. Typas A, Nichols RJ, Siegele DA, Shales M, Collins S et al (2008) A tool-kit for high-throughput, quantitative analyses of genetic interactions in E. coli. Nat Methods 5:781–787PubMedCentralCrossRefPubMedGoogle Scholar
  87. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS et al (2000) A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature 403:623–627CrossRefPubMedGoogle Scholar
  88. Van Criekinge W, Beyaert R (1999) Yeast two-hybrid: state of the art. Biol Proced Online 2:1–38PubMedCentralCrossRefPubMedGoogle Scholar
  89. van Dongen S, Abreu-Goodger C (2012) Using MCL to extract clusters from networks. Methods Mol Biol 804:281–295CrossRefPubMedGoogle Scholar
  90. Vasblom J, Wodak SJ (2009) Markov clustering versus affinity propagation for the partitioning of protein interaction graphs. BMC Bioinformatics 10Google Scholar
  91. Wagner C, de Saizieu A, Schonfeld H-J, Kamber M, Lange R et al (2002) Genetic analysis and functional characterization of the Streptococcus pneumoniae vic operon. Infect Immun 70:6121–6128PubMedCentralCrossRefPubMedGoogle Scholar
  92. Wall DP, Fraser HB, Hirsh AE (2003) Detecting putative orthologs. Bioinformatics 19:1710–1711CrossRefPubMedGoogle Scholar
  93. Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10:57–63PubMedCentralCrossRefPubMedGoogle Scholar
  94. Warsow G, Greber B, Falk SS, Harder C, Siatkowski M et al (2010) ExprEssence-revealing the essence of differential experimental data in the context of an interaction/regulation net-work. BMC Syst Bil 4:164–191Google Scholar
  95. Wu D, Hugenholtz P, Mavromatis K, Pukall R, Dalin E et al (2009) A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea. Nature 462:1056–1060PubMedCentralCrossRefPubMedGoogle Scholar
  96. Yellaboina S, Goyal K, Mande SC (2007) Inferring genome-wide functional linkages in E. coli by combining improved genome context methods: comparison with high-throughput experimental data. Genome Res 17:527–535PubMedCentralCrossRefPubMedGoogle Scholar
  97. Young KH (1998) Yeast two-hybrid: so many interactions, (in) so little time…. Biol Reprod 58:302–311Google Scholar
  98. Yu NY, Wagner JR, Liard MR, Melli G, Rey S et al (2010) PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics 26:1608–1615PubMedCentralCrossRefPubMedGoogle Scholar
  99. Yuan J, Zweers JC, van Dijl JM, Dalbey RE (2010) Protein transport across and into cell membranes in bacteria and archaea. Cell Mol Life Sci 67:179–199CrossRefPubMedGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of Molecular Structure and FunctionThe Peter Gilgan Centre for Research and Learning, Hospital for Sick ChildrenTorontoCanada
  2. 2.Department of Molecular Structure and FunctionThe Peter Gilgan Centre for Research and Learning, Hospital for Sick ChildrenTorontoCanada

Personalised recommendations