Escherichia Coli — Functional and Evolutionary Implications of Genome Scale Computer-Aided Protein Sequence Analysis

  • Eugene V. Koonin
  • Roman L. Tatusov
  • Kenneth E. Rudd
Part of the Stadler Genetics Symposia Series book series (SGSS)


Complete sequencing of model genomes has recently become a reality. Hundreds of viral and more than 20 organellar genome sequences are currently available (Bork et al., 1994). The genomes of several bacteria, Archaea, and yeast are expected to be completed within 2–3 years. The ultimate value of genome projects is not to establish complete and accurate nucleotide sequences per se, but rather to use the sequence in order to deduce how the genome determines all cellular functions. Eventually, it should be possible to determine the whole pathway from the nucleotide sequence to the phenotype of an organism, which could be re-stated as the “first principles” of cellular structure and function. Numerous biochemical and genetic experiments will be indispensable for achieving this ambitious goal. Nonetheless, computer analysis of the amino acid sequences encoded in the genome is a necessary and complementary approach that allows one to systematically predict protein functions and derive possible evolutionary relationships. One cannot help but to note that computer-assisted sequences analysis, even though generally lacking the precision that is, at least in principle, achievable in laboratory experiments, is much less labor-consuming and costly. In fact, at this time, only computer methods allow one to analyze gene products encoded in a complete genome simultaneously and consistently and to obtain meaningful, readily comparable results for each of them in a relatively short time.


Paralogous Gene Coli Protein Functional Prediction Related Bacterium Repetitive Domain 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Alefounder, P. R., Abell, C., and Battersbym, A. R., 1988, The sequence of hemC, hemD and two additional E. coli genes, Nucleic Acids Res. 16: 9871.PubMedCrossRefGoogle Scholar
  2. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J., 1990, Basic local alignment search tool, J. Mol. Biol. 215: 403.PubMedGoogle Scholar
  3. Altschul, S. F., Boguski, M. S., Gish, W., and Wootton, J. C., 1994, Issues in searching molecular sequence databases, Nature Genetics 6: 119.PubMedCrossRefGoogle Scholar
  4. Bennett, V., 1992, Ankyrins. Adaptors between diverse plasma membrane proteins and the cytoplasm, J. Biol. Chem. 267: 8703.PubMedGoogle Scholar
  5. Bork, P., 1993, Hundreds of ankyrin-like repeats in functionally diverse proteins:mobile modules that cross phyla horizontally, Proteins:Struçt. Funct. Genet. 17: 363.PubMedCrossRefGoogle Scholar
  6. Bork, P., Ouzounis, C., Sander, C., Scharf, M., Schneider, R., and Sonnhammer, E., 1992, Comprehensive sequence analysis of the 182 ORFs of yeast chromosome III, Prot. Sci. 1:1677.Google Scholar
  7. Bork, P., Ouzounis, C., and Sander, C., 1994, From genome sequences to protein function, Curr. Opin. Struct. Biol. 4: 393.CrossRefGoogle Scholar
  8. Bork, P., Ouzounis, C., Casari, G., Schneider, R., Sander, C., Dolan, M., Gilbert, W., and Gillevet, P. M., 1995, Exploring the Mycoplasma capricolum genome:A small bacterium reveals its physiology, Molec. Microbiol.,in press.Google Scholar
  9. Bourne, H. R., Sanders, D. A., and McCormick F., 1991, The GTPase superfamily:conserved structure and molecular mechanism, Nature 349: 117.PubMedCrossRefGoogle Scholar
  10. Bredt, D. S., Hwang, P. M., Glatt, C. E, Lowenstein, C., Reed, R. R., and Snyder, S. H., 1991, Cloned and expressed nitric oxide synthase structurally resembles cytochrome P-450 reductase, Nature 351: 714.PubMedCrossRefGoogle Scholar
  11. Bryant, P. J. and Woods, D. F., 1992, A major palmitoylated membrane protein of human erythrocytes shows homology to yeast guanylate kinase and to the product of a Drosophila tumor suppressor gene, Cell 68: 621PubMedCrossRefGoogle Scholar
  12. Cho, K. O., Hunt, C. A., and Kennedy, M. B., 1992, The rat brain postsynaptic density fraction contains a homolog of the Drosophila discs-large tumor suppressor protein, Neuron 9: 929.PubMedCrossRefGoogle Scholar
  13. Chothia, C., 1992, Proteins. One thousand families for the molecular biologist, Nature 357: 543.PubMedCrossRefGoogle Scholar
  14. Crouzet, J., Levy-Schil, S., Cameron, B., Cauchois, L, Rigault, S., Rouyez, M. C., Blanche, F., Debussche L, and Thibaut, D., 1991, Nucleotide sequence and genetic analysis of a 13.1-kilobase-pair Pseudomonas denitirficans DNA fragment containing five cob genes and identification of structural genes encoding Cob(I)alanine adenosyltransferase, cobyric acid synthase, a nd bifunctional cob in amide kinase-cobinamide phosphate guanylyltransferase. J. Bacteriol. 173: 6074.PubMedGoogle Scholar
  15. Cussac, V., Ferrero, R. L, and Labigne, A., 1992, Expression of Helicobacter pylori urease genes in Escherichia coli grown under nitrogen-limiting conditions, J. Bacteriol. 174: 2466.PubMedGoogle Scholar
  16. Daniels, D., Plunkett, G., Burland, V., and Blattner, F. R., 1992, Analysis of the Escherichia coli genome:DNA sequence of the region from 84.5 to 86.5 minutes, Science 257: 771.PubMedCrossRefGoogle Scholar
  17. Dever, T. E., Glynias, M. J., and Merrick, W. C., 1987, GTP-binding domain:three consensus sequence elements with distinct spacing, Proc. Natl. Acad. Sci. USA 84: 1814.PubMedCrossRefGoogle Scholar
  18. Dicker, I. B. and Seetharam, S., 1992, What is known about the structure and function of the Escherichia coli protein FirA, Mol. Microbiol. 6: 817.PubMedCrossRefGoogle Scholar
  19. Doolittle, R F., ed., 1990, “Molecular Evolution”, Meth. Enzymol., 183.Google Scholar
  20. Drlica, K., 1987, The nucleoid, in “Escherichia coli and Salmonella typhimurium. Cellular and Molecular Biology” Neidhardt, F., Ingraham, J. L, Low, K. B., Magasanik, B., Schaechter, M., and Umbarger, H. E., Eds., American Society for Microbiology, Washington, DC, p. 91.Google Scholar
  21. Fishel, R, Lescoe, M. K., Rao, M. R, Copeland, N. G., Jenkins N. A., Garber, J., Kane, M., and Kolodner, R, 1993, The human mutator gene homolog MSH2 and its association with hereditary nonpolyposis colon cancer, Cell 75: 1027.PubMedCrossRefGoogle Scholar
  22. Fitch, W. M., 1970, Distinguishing homologous from analogous proteins, Syst. ZooL 19, 99.PubMedCrossRefGoogle Scholar
  23. Goebl, M. and Yanagida, M., 1991, The TPR snap helix:a novel protein repeat motif from mitosis to transcription, Trends Biochem. Sci. 16: 173.PubMedCrossRefGoogle Scholar
  24. Gray, M. W., 1989, The evolutionary origin of organelles, Trends Genet. 5: 294.PubMedCrossRefGoogle Scholar
  25. Green, P., 1994, Ancient conserved regions in gene sequences, Curr. Opin. Struct. Biol. 4: 404.CrossRefGoogle Scholar
  26. Green, P., Lipman, D. J., Hillier, L, Waterston, R., States, D. J., and J. M. Claverie, 1993, Ancient conserved regions in new gene sequences and the protein databases, Science 259: 1711.PubMedCrossRefGoogle Scholar
  27. Henikoff, S. and Henikoff, J., 1992, Amino acid substitution matrices from protein blocks, Proc. Natl. Acad. Sci. USA 89: 10915.PubMedCrossRefGoogle Scholar
  28. Henikoff, S. and Henikoff, J., 1993, Performance evaluation of amino acid substitution matrices, Proteins Struct. Funct. Genet. 17: 49.PubMedCrossRefGoogle Scholar
  29. Hussain, H., Grove J., Griffiths L, Busby S., and Cole J., 1994, A seven-gene operon essential for formate-dependent nitrite reduction to ammonia by enteric bacteria, Mol. Microbiol. 12: 153PubMedCrossRefGoogle Scholar
  30. Kiino, D. R., Singer, M. S., and Rothman-Denes, L B., 1993, Two overlapping genes encoding membrane proteins required for bacteriophage N4 adsorption, J. Bacteriol. 175: 7081PubMedGoogle Scholar
  31. Klingensmith, J., Nusse, R., and Perrimon, N., 1994, The Drosophila segment polarity gene dishevelled encodes a novel protein required for response to the wingless signal, Genes Dey. 8: 118CrossRefGoogle Scholar
  32. Knoll, A. H., 1992, The early evolution of eukaryotes:a geological prospective, Science 256: 622.PubMedCrossRefGoogle Scholar
  33. Koonin, E. V., 1994, Prediction of an rRNA methyltransferase domain in human tumor-specific nucleolar protein P120, Nucleic Acids Res. 22: 2476.PubMedCrossRefGoogle Scholar
  34. Koonin, E V., 1995, Multidomain organization of eukaryotic guanine nucleotide exchange translation initiation factor eIF-2B subunits revealed by analysis of conserved sequence motifs, Protein Sci. in press.Google Scholar
  35. Koonin, E. V., Bork, P., and Sander, C., 1994, Yeast chromosome III:new gene functions, EMBO J. 13: 493.PubMedGoogle Scholar
  36. Koonin, E.- V., Woods, D. F., Bryant, P. J., 1992, Dlg-R proteins: guanylate kinase homologues with an aberrant NTP-binding site, Nature Genet. 2: 256.PubMedCrossRefGoogle Scholar
  37. Koonin, E V. and Rudd, K. E., 1993, SpoU protein of Escherichia coli belongs to a new family of putative rRNA methylases, Nucleic Acids Res. 21: 5519.PubMedCrossRefGoogle Scholar
  38. Kunisawa, T., and Otsuka, J., 1988, Periodic distribution of homologous genes or gene segments on the Escherichia coli K12 genome. Protein Seq. Data Anal. 1: 263.PubMedGoogle Scholar
  39. Labedan, B. and Riley, M., 1995, Widespread protein sequence similarities:origins of Escherichia coli genes, J. Bacteriol. 177: 1585.PubMedGoogle Scholar
  40. Leach, F. S., Nicolaides, N. C., Papadopoulos, N., Liu, B., Jen, J., Parsons, R., Peltomaki, P., Sistonen, P., Aaltonen, L A., Nystrom-Lahti, M. et al, 1993, Mutations of a mutS homolog in hereditary nonpolyposis colorectal cancer, Cell 75: 1215.PubMedCrossRefGoogle Scholar
  41. Lee, M. H., Mulrooney, B. S. B., Renner, M. J., Markowicz, Y., Hausinger, R. P., 1992, Klebsiella aerogenes urease gene cluster:sequence of ureD and demonstration that four accessory genes (ureD, ureE, urel, and ureG) are involved in nickel metallocenter biosynthesis, J. Bacteriol. 174: 4324.PubMedGoogle Scholar
  42. Lipinska, B., Zylicz, M., and Georgopoulos, C., 1990, The HtrA (DegP) protein, essential for Escherichia coli survival at high temperatures, is an endopeptidase, J. Bacteriol. 172: 1791PubMedGoogle Scholar
  43. Lue, R. A., Marfatia, S. M., Branton, D., and Chishti, A. H., 1994, Cloning and characterization of hdlg: he human homologue of the Drosophila discs large tumor suppressor binds to protein 4.1, Proc. Natl. Acad. Sci. USA 91: 9818PubMedCrossRefGoogle Scholar
  44. Maier, T., Jacobi, A., Sauter, M., and Bock, A., 1993, Purification of Rhizobium leguminosarum HypB, a nickel-binding protein required for hydrogenase synthesis, J. Bacteriol. 175: 630.PubMedGoogle Scholar
  45. Maniloff, J., McElhaney, R. N., Finch, L R., and Baseman, J. B. (eds), 1992, “Mycoplasmas - molecular iology and pathogenesis”, American Society for Microbiology, Washington, DC.Google Scholar
  46. Michaely, P. and Bennett, V., 1992, The ANK repeat:A ubiquitous motif involved in macromolecular recognition, Trends Cell Biol. 2: 127.PubMedCrossRefGoogle Scholar
  47. Neuwald, A. F. and Green, P., 1994, Detecting patterns in protein sequences, J. Mol. Biol. 239: 698PubMedCrossRefGoogle Scholar
  48. Neidhardt, F., Ingraham, J. L., Low, K. B., Magasanik, B., Schaechter, M., and Umbarger, H. E. (eds.), Escherichia coli and Salmonella typhimurium. Cellular and Molecular Biology, American Society for Microbiology,Washington, DC.Google Scholar
  49. Olsen, G. J., Woese, C. R., and Overbeek, R., 1994, The winds of (evolutionary) change:breathing new life into microbiology, J. Bacteriol. 176: 1.PubMedGoogle Scholar
  50. Palmer, J. D., 1985, Comparative organization of chloroplast genomes, Ann u. Rev. Genet. 19: 325.CrossRefGoogle Scholar
  51. Papadopoulos, N., Nicolaides, N. C., Wei, Y. F., Ruben, S. M., Carter, K. C., Rosen, C. A., Haseltine, W. A., Fleischmann, R. D., Fraser, C. M., Adams, M. D. et al., 1994, Mutation of a mutL homolog in hereditary colon cancer, Science 263: 1625.PubMedCrossRefGoogle Scholar
  52. Pepperberg, D. R., Okajima, T. L, Wiggert, B., Ripps, H., Crouch, R. K., and Chader, G. J., 1993, Interphotoreceptor retinoid-binding protein (IRBP). Molecular biology and physiological role in the visual cycle of rhodopsin, Mol. Neurobiol. 7: 61.PubMedCrossRefGoogle Scholar
  53. Qian, Y. Q, Billeter, M., Otting, G., Muller M., Gehring, W. J., and Wuthrich, K., 1989, The structure of the Antennapedia homeodomain determined by NMR spectroscopy in solution:comparison with prokaryotic repressors, Cell 59: 573.PubMedCrossRefGoogle Scholar
  54. Riley, M., 1993, Functions of the gene products of Escherichia coli, Microbiol. Rev. 57: 862.PubMedGoogle Scholar
  55. Riley, M., and Anilionis, A., 1978, Evolution of the bacterial genome. Ann u. Rev. Microbiol. 32: 519.CrossRefGoogle Scholar
  56. Riley, M. and Krawiec, S., 1987, Genome organization, in:“Escherichia coli and Salmonella typhimurium. Cellular and Molecular Biology” F. Neidhardt, J. L Ingraham, K. B. Low, B. Magasanik, M. Schaechter, and H. E. Umbarger, eds., American Society for Microbiology, Washington, DC, p. 967.Google Scholar
  57. Rudd, K. E., 1993, Maps, genes, sequences, and computers:an Escherichia coli case study, ASM News 59: 335.Google Scholar
  58. Ruff, P., Speicher, D. W., Husain-Chishti, A., 1991, Molecular identification of a major palmitoylated erythrocyte membrane protein containing the src homology 3 motif, Proc. Natl. Acad. Sci. USA 88: 6595.PubMedCrossRefGoogle Scholar
  59. Safer, M. H., Jr., 1994, Computer-aided analyses of transport protein sequences:gleaning evidence concerning function, structure, biogenesis, and evolution, Microbiol. Rev. 58: 71.Google Scholar
  60. Safer, M. H. Jr and Reizer, J., 1994, The bacterial phosphotransferase system:new frontiers 30 years later, Mol. Microbio1. 13: 755.CrossRefGoogle Scholar
  61. Saraste, M, Sibbald, P. R., and Wittinghofer, A., 1990, The P-loop - a common motif in ATP- and GTP-binding proteins, Trends Biochem. Sci. 15: 430.PubMedCrossRefGoogle Scholar
  62. Savakis, C. and Doelz, R., 1993, Contamination of cDNA sequences in databases, Science 259: 1677.PubMedCrossRefGoogle Scholar
  63. Schuler, G. D., Altschul, S. F., and Lipman, D. J., 1991, A workbench for multiple alignment construction and analysis, Proteins:Struct. Funct. Genet. 9: 180.PubMedCrossRefGoogle Scholar
  64. Service, R. F., 1994, Stalking the start of colon cancer, Science 263: 1559.PubMedCrossRefGoogle Scholar
  65. Sikorski, R. S., Boguski, M. S., Goebl, M., and Hieter, P., 1990, A repeating amino acid motif in CDC23 defines a family of proteins and a new relationship among genes required for mitosis and RNA synthesis, Cell 60: 307.PubMedCrossRefGoogle Scholar
  66. Sikorski, R. S., Michaud, W. A., Wootton, J. C., Boguski, M. S., Connelly, C., and Hieter, P., 1991, TPR proteins as essential components of the yeast cell cycle, Cold Spring Harb. Symp. Quant. Biol. 56: 663.PubMedCrossRefGoogle Scholar
  67. Silber, K. R., Keiler, K. C., and Sauer, R. T., 1992, Tsp:a tail-specific protease that selectively degrades proteins with nonpolar C termini. Proc. Natl. Acad. Sci. USA 89: 295.PubMedCrossRefGoogle Scholar
  68. Sirum-Connolly, K. and Mason, T. L, 1993, Functional requirement of a site-specific ribose methylation in ribosomal RNA, Science 262: 1886.PubMedCrossRefGoogle Scholar
  69. Tatusov, R. L, Altschul, S. F., and Koonin, E. V., 1994, Detection of conserved segments in proteins:iterative scanning of sequence databases with alignment blocks, Proc. Natl. Acad. Sci. USA 91: 1 2091.Google Scholar
  70. Treisman, J., Harris, E., Wilson, D., and Desplan, C., 1992, The homeodomain:a new face for the helix-turn-helix, Bioessays 14: 145.PubMedCrossRefGoogle Scholar
  71. Vaara, M., 1992, Eight bacterial proteins, including UDP-N-acetylglucosamine acyltransferase (LpxA) and three other transferases of Escherichia coli, consist of a six-residue periodicity theme, FEMS Microbiol. Lett. 76: 249.PubMedCrossRefGoogle Scholar
  72. Vuorio, R., Harkonen, T., Tolvanen, M., and Vaara, M., 1994, The novel hexapeptide motif found in the acyltransferases LpxA and LpxD of lipid A biosynthesis is conserved in various bacteria, FEBS Microbiol. Lett. 337: 289.CrossRefGoogle Scholar
  73. Walker, J. E., Saraste, M., Runswick, M. J., and Gay, N. J.,1982, Distantly related sequences in the a-and b-subunits of ATP synthase, myosin, kinases and other ATP-requiring enzymes and a common nucleotide binding fold, EMBO Journal 1: 945.Google Scholar
  74. Woods, D. F. and Bryant, P. J., 1991, The discs-large tumor suppressor gene of Drosophila encodes a guanylate kinase homolog localized at septate junctions, Cell 66, 451.PubMedCrossRefGoogle Scholar
  75. Wootton, J. C. and Federhen, S., 1993, Statistics of local complexity in amino acid sequences and sequence databases, Comput. Chem. 17: 149.CrossRefGoogle Scholar
  76. Wootton, J. C., 1994a, Non-globular domains in protein sequences: automated segmentation using complexity measures, Comput. Chem. 18: 269.PubMedCrossRefGoogle Scholar
  77. Wootton, J. C., 1994b, Sequences with `unusual’ amino acid composition, Curr. Opin. Struct. Biol. 4: 413.CrossRefGoogle Scholar
  78. Zipkas, D. and Riley, M., 1975, Proposal concerning mechanism of evolution of the genome of Escherichia coli, Proc. Natl. Acad. Sci. USA 72: 4660.CrossRefGoogle Scholar
  79. Zipkas, D., Solomon, D., and Riley, M., 1978, Relationship between gene function and gene location, J. Mol. Evol. 11: 47.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 1996

Authors and Affiliations

  • Eugene V. Koonin
    • 1
  • Roman L. Tatusov
    • 1
  • Kenneth E. Rudd
    • 1
  1. 1.National Center for Biotechnology Information, National Library of MedicineNational Institutes of HealthBethesdaUSA

Personalised recommendations