, Volume 115, Issue 1, pp 93–103 | Cite as

Genome size and the accumulation of simple sequence repeats: implications of new data from genome sequencing projects

  • John M. Hancock


The relationship between the level of repetitiveness in genomic sequences and genome size has been re-investigated making use of the rapidly growing database of complete eubacterial and archaeal genome sequences combined with the fragmentary but now large amount of data from eukaryotic genomes. Relative simplicity factors (RSFs), which measure the repetitiveness of sequences, were calculated and significantly simple motifs (SSMs), which identify the kinds of sequences that are repeated, were identified. A previously reported correlation between genome size and repetitiveness was confirmed, but it was shown that the higher RSFs seen in eukaryotic genomes also reflect a generally higher level of repetitiveness independent of genome size differences. Differences in genome size are responsible for about 10% of the variance in RSF seen between species. The spectrum of SSMs seen within a genome differed markedly within the eubacteria but less so in eukaryotes and, particularly, in archaea. Species with SSM spectra that differ from the norm tend also to have high RSFs for their genome size and to be pathogens that make use of repetitive sequences to avoid host defence responses. Some of the variance in repetitiveness seen in other species may therefore also reflect the action of selection, although other forces such as variation in the effectiveness of mechanisms for regulating slippage errors of replication, may also be important.

C-value genome size microsatellites relative simplicity factor sequence repetitiveness simple sequences 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Adams, M.D. & 194 co-authors, 2000. The genome sequence of Drosophila melanogaster. Science 287: 2185-2195.Google Scholar
  2. Albà, M.M., M.F. Santibáñez-Koref & J.M. Hancock, 2001. The comparative genomics of glutamine codon repetition: a category of genes that includes repeat expansion disease genes is prominent in humans and mice and rare in Drosophila. J. Mol. Evol. 52: 249-259.Google Scholar
  3. Bennetzen, J.L., 2000. Transposable element contributions to plant gene and genome evolution. Plant Mol. Biol. 42: 251-269.Google Scholar
  4. The C. elegans Sequencing Consortium, 1998. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282: 2012-2018.Google Scholar
  5. Cavalier-Smith, T., 1985. Eukaryote gene numbers, non-coding DNA and genome size, pp. 69-103 in The Evolution of Genome Size, edited by T. Cavalier-Smith. Wiley, New York.Google Scholar
  6. DeLong, E.F., 1992. Archaea in coastal marine environments. Proc. Natl. Acad. Sci. USA 89: 5685-5689.Google Scholar
  7. Djian, P., J.M. Hancock & H.S. Chana, 1996. Codon repeats in genes associated with human diseases: fewer repeats in the genes of non-human primates and concentrated nucleotide substitutions at the sites of reiteration. Proc. Natl. Acad. Sci. USA 93: 417-421.Google Scholar
  8. Ellegren, H., 2000. Heterogeneous mutation processes in human microsatellite DNA sequences. Nat. Genet. 24: 400-402.Google Scholar
  9. Felsenstein, J., 1993. PHYLIP (Phylogeny Inference Package) version 3.5c. Distributed by the author. Department of Genetics, University of Washington, Seattle.Google Scholar
  10. Freudenreich, C.H., J.B. Stavenhagen & V.A. Zakian, 1997. Stability of a CTG/CAG trinucleotide repeat in yeast is dependent on its orientation in the genome. Mol. Cell. Biol. 17: 2090-2098.Google Scholar
  11. Hancock, J.M., 1995. The contribution of slippage-like processes to genome evolution. J. Mol. Evol. 41: 1038-1047.Google Scholar
  12. Hancock, J.M., 1996a. Simple sequences and the expanding genome. BioEssays 18: 421-425.Google Scholar
  13. Hancock, J.M., 1996b. Simple sequences in a ‘minimal’ genome. Nat. Genet. 14: 14-15.Google Scholar
  14. Hancock, J.M. & J.S. Armstrong, 1994. SIMPLE34: an improved and enhanced implementation for VAX and Sun computers of the SIMPLE algorithm for analysis of clustered repetitive motifs in nucleotide sequences. Comput. Appl. Biosci. 10: 67-70.Google Scholar
  15. Hancock, J.M. & A.P. Vogler, 2000. How slippage-derived sequences are incorporated into rRNA variable-region secondary structure: implications for phylogeny reconstruction. Mol. Phylogenet. Evol. 14: 366-374.Google Scholar
  16. Hancock, J.M., E.A. Worthey & M.F. Santibáñez-Koref, 2001. A role for selection in regulating the evolutionary emergence of disease-causing and other coding CAG repeats in humans and mice. Mol. Biol. Evol. 18: 1014-1023.Google Scholar
  17. Hart, R.W. & R.B. Setlow, 1974. Correlation between deoxyribonucleic acid excision-repair and life-span in a number of mammalian species. Proc. Natl. Acad. Sci. USA 71: 2169-2173.Google Scholar
  18. International Human Genome Sequencing Consortium, 2001. Initial sequencing and analysis of the human genome. Nature 409: 860-921.Google Scholar
  19. Landau, G.M., J.P. Schmidt & D. Sokol, 2001. An algorithm for approximate tandem repeats. J. Comp. Biol. 8: 1-18.Google Scholar
  20. Moxon E.R., P.B. Rainey, M.A. Nowak & R.E. Lenski, 1994. Adaptive evolution of highly mutable loci in pathogenic bacteria. Curr. Biol. 4: 24-33.Google Scholar
  21. Rubinsztein, D.C., W. Amos, J. Leggo, S. Goodburn, R.S. Ramesar, J. Old, R. Bontrop, R. McMahon, D.E. Barton & M.A. Ferguson-Smith, 1994. Mutational bias provides a model for the evolution of Huntington's disease and predicts a general increase in disease prevalence. Nat. Genet. 7: 525-530.Google Scholar
  22. Rubinsztein, D.C., W. Amos, J. Leggo, S. Goodburn, S. Jain, S.H. Li, R.L. Margolis, C.A. Ross & M.A. Ferguson-Smith, 1995a. Microsatellite evolution-evidence for directionality and variation in rate between species. Nat. Genet. 10: 337-343.Google Scholar
  23. Rubinsztein, D.C., J. Leggo, G.A. Coetzee, R.A. Irvine, M. Buckley & M.A. Ferguson-Smith, 1995b. Sequence variation and size ranges of CAG repeats in the Machado-Joseph disease, spinocerebellar ataxia type 1 and androgen receptor genes. Hum. Mol. Genet. 4: 1585-1590.Google Scholar
  24. Saunders, N.J., A.C. Jeffries, J.F. Peden, D.W. Hood, H. Tettelin, R. Rappuoli & E.R. Moxon, 2000. Repeat-associated phase variable genes in the complete genome sequence of Neisseria meningitidis MC58. Mol. Microbiol. 37: 207-215.Google Scholar
  25. Schmidt, K.H., C.M. Abbott & D.R. Leach, 2000. Two opposing effects of mismatch repair on CTG repeat instability in Escherichia coli. Mol. Microbiol. 35: 463-471.Google Scholar
  26. Schug M.D., T.F. Mackay & C.F. Aquadro, 1997. Low mutation rates of microsatellite loci in Drosophila melanogaster. Nat. Genet. 15: 99-102.Google Scholar
  27. Tautz, D., M. Trick & G.A. Dover, 1986. Cryptic simplicity in DNA is a major source of genetic variation. Nature 322: 652-656.Google Scholar
  28. Xu, X., M. Peng & Z. Fang, 2000. The direction of microsatellite mutations is dependent upon allele length. Nat. Genet. 24: 396-399.Google Scholar
  29. Wright, F.A., W.J. Lemon, W.D. Zhao, R. Sears, D. Zhuo, J.-P. Wang, H.-Y. Yang, T. Baer, D. Stredney, J. Spitzner, A. Stutz, R. Krahe & B. Yuan, 2001. A draft annotation and overview of the human genome. Genome Biol. 2: Preprint 0001.1-0001.39.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • John M. Hancock
    • 1
  1. 1.Department of Computer ScienceRoyal Holloway University of LondonUK (Phone

Personalised recommendations