Journal of Molecular Evolution

, Volume 57, Issue 2, pp 149–158 | Cite as

Analysis of Sequence Periodicity in E. coli Proteins: Empirical Investigation of the “Duplication and Divergence” Theory of Protein Evolution



Periodicity was quantified in 4289 Escherichia coli K12 confirmed and putative protein sequences, using a simple chi-square technique previously shown to reveal triplet period periodicity in coding DNA. Periodicities were calculated from period n = 2 to period n = 50 in nine different alphabetic representations of the proteins. By comparison with a randomly generated proteome of the same compositional content, the E. coli proteome does not contain a significant excess of periodic proteins. However, 60 proteins do appear to be significantly periodic in at least one alphabetic representation, after Bonferroni correction, at p < 0.01, and 30 at p < 0.001. These are compared with significantly periodic proteins of solved three-dimensional structure, detected by an identical analysis of the sequences from a protein structure database. It is concluded that there is no evidence for the presence of a proteome-wide quasi-periodicity as predicted by the “duplication and divergence” model of protein evolution and that the major periodicity detected is a consequence of the repetitive tendencies within α-helices. However, it is not possible to explain all sequence periodicities in terms of observable secondary structure, as in cases where sequence periodicity can be compared to solved structure, there is often no structural regularity that would provide an obvious explanation in terms of natural selection on protein function.


Periodicity Proteome E. coli Gene duplication 


  1. 1.
    Barker, WC, Dayhoff, MO 1977Evolution of lipoproteins deduced from protein sequence data.Comp Biochem Physiol B57309315CrossRefPubMedGoogle Scholar
  2. 2.
    Berman, HM, Westbrook, J, Feng, Z, Gilliland, G, Bhat, TN, Weissig, H, Shindyalov, IN, Bourne, PE 2000The Protein Data Bank.Nucleic Acids Res28235242PubMedGoogle Scholar
  3. 3.
    Bonferroni, CE 1936Teoria statistica delle classi e calcolo delle probabilità.Pubbl R Ist Superiore Sci Econ Commerc Firenze8362Google Scholar
  4. 4.
    Eisenberg, D, Weiss, RM, Terwilliger, TC 1984The hydrophobic moment detects periodicity in protein hydrophobicity.Proc Natl Acad Sci USA81140144PubMedGoogle Scholar
  5. 5.
    Elleman, TC, Crewther, WG, Van Der Touw, J 1978Amino acid sequences of alpha-helical segments from S-carboxymethylkerateine-A. Statistical analysis.Biochem J173387391PubMedGoogle Scholar
  6. 6.
    Fisher, RA, Yates, F 1953Statistical tables for biological, agricultural and medical research.Oliver and BoydEdinburgh/LondonGoogle Scholar
  7. 7.
    Ivanov, OC, Ivanov, CP 1980Some evidence for the universality of structural periodicity in proteins.J Mol Evol164768PubMedGoogle Scholar
  8. 8.
    Ivanov, OC, Kenderov, PS, Revalski, JP 1984The structural periodicity of E. coli ribosomal proteins.Orig Life14557564PubMedGoogle Scholar
  9. 9.
    Karlin, S, Ost, F, Blaisdell, BE 1989

    Patterns in DNA and amino acid sequences and their statistical significance.

    Waterman, MS eds. Mathematical methods for DNA sequences.CRC PressBoca Raton FL133157
    Google Scholar
  10. 10.
    Katti, MV, Sami-Subbu, R, Ranjekar, PK, Gupta, VS 2000Amino acid repeat patterns in protein sequences: Their diversity and structural-functional implications.Protein Sci912031209PubMedGoogle Scholar
  11. 11.
    Konopka, AK 1993

    Plausible classification codes and local compositional complexity of nucleotide sequences.

    Lim, HAFickett, JWCantor, CRRobbins, RJ eds. The Second International Conference on Bioinformatics, Supercomputing and Complex Genome Analysis.World ScientificSingapore6987
    Google Scholar
  12. 12.
    Korotkova, MA, Korotkov, EV, Rundenko, VM 1999Latent periodicity in protein sequences.J Mol Model5103115CrossRefGoogle Scholar
  13. 13.
    Ohno, S 1970Evolution by gene duplication.Springer-VerlagBerlinGoogle Scholar
  14. 14.
    Ohno, S 1984Repeats of base oligomers as the primordial coding sequences of the primeval earth and their vestiges in modern genes.J Mol Evol20313321PubMedGoogle Scholar
  15. 15.
    Ohno, S 1988Codon preference is but an illusion created by the construction principle of coding sequences.Proc Natl Acad Sci USA8543784382PubMedGoogle Scholar
  16. 16.
    Pattabiraman, N, Namboodiri, K, Lowrey, A, Gaber, BP 1990NRL_3D: A sequence-structure database derived from the Protein Data Bank (PDB) and searchable within the PIR environment.Protein Sequences Data Anal3387405Google Scholar
  17. 17.
    Shulman, MJ, Steinberg, CM, Westmoreland, N 1981The coding function of nucleotide sequences can be discerned by statistical analysis.J Theor Biol88409420Google Scholar
  18. 18.
    Stanfel, LE 1996A new approach to clustering the amino acids.J Theor Biol183195205CrossRefPubMedGoogle Scholar
  19. 19.
    Vaara, M 1992Eight bacterial proteins, including UDP-N-acetylglucosamine acyltransferase (LpxA) and three other transferases of Escherichiacoli, consist of a six-residue periodicity theme.FEMS Microbiol Lett15249254CrossRefGoogle Scholar
  20. 20.
    Wuilmart, C, Urbain, J 1984Alpha secondary structures generate weak but recurrent periodicity in proteins.Eur J Biochem1393549PubMedGoogle Scholar
  21. 21.
    Ycas, M 1976Origin of periodic proteins.Fed Proc3521392140PubMedGoogle Scholar
  22. 22.
    Zhurkin, VS 1981Periodicity in DNA primary structure is defined by secondary structure of the encoded protein.Nucleic Acids Res919631971Google Scholar

Copyright information

© Springer-Verlag New York Inc. 2003

Authors and Affiliations

  1. 1.Drug DesignRiboTargets Ltd., Granta Park, Cambridge CB1 6GBUK
  2. 2.Rowett Research Institute, Greenburn Road, Bucksburn, Aberdeen AB21 9SBUK

Personalised recommendations