Journal of Molecular Evolution

, Volume 57, Issue 2, pp 149-158

First online:

Analysis of Sequence Periodicity in E. coli Proteins: Empirical Investigation of the “Duplication and Divergence” Theory of Protein Evolution

  • Derek GathererAffiliated withDrug Design, RiboTargets Ltd., Granta Park, Cambridge CB1 6GB Email author 
  • , Neil R. McEwanAffiliated withRowett Research Institute, Greenburn Road, Bucksburn, Aberdeen AB21 9SB

Rent the article at a discount

Rent now

* Final gross prices may vary according to local VAT.

Get Access


Periodicity was quantified in 4289 Escherichia coli K12 confirmed and putative protein sequences, using a simple chi-square technique previously shown to reveal triplet period periodicity in coding DNA. Periodicities were calculated from period n = 2 to period n = 50 in nine different alphabetic representations of the proteins. By comparison with a randomly generated proteome of the same compositional content, the E. coli proteome does not contain a significant excess of periodic proteins. However, 60 proteins do appear to be significantly periodic in at least one alphabetic representation, after Bonferroni correction, at p < 0.01, and 30 at p < 0.001. These are compared with significantly periodic proteins of solved three-dimensional structure, detected by an identical analysis of the sequences from a protein structure database. It is concluded that there is no evidence for the presence of a proteome-wide quasi-periodicity as predicted by the “duplication and divergence” model of protein evolution and that the major periodicity detected is a consequence of the repetitive tendencies within α-helices. However, it is not possible to explain all sequence periodicities in terms of observable secondary structure, as in cases where sequence periodicity can be compared to solved structure, there is often no structural regularity that would provide an obvious explanation in terms of natural selection on protein function.


Periodicity Proteome E. coli Gene duplication