Skip to main content
Log in

There appear to be conserved constraints on the distribution of nucleotide sequences in cellular genomes

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Summary

The data from a genomic library can be sorted into the frequencies of every possible tetranucleotide in the sequence. This tabulation, a short sequence distribution, contains the frequency of occurrence of the 256 tetranucleotides and thus seems to serve as a vehicle for averaging sequence information. Two such distributions can be readily compared by correlation. Reported here are correlations (Spearmanr s) of the distributions from all of the genomic libraries in GenBank 44.0 with sizes equal to or larger than that ofSalmonella typhimurium, except for the data for mouse and humans. All of the organisms examined showed highly significant correlations between the two DNA strands (not the complementarity expected from base pairing). Of 155 comparisons between libraries, 132 showed significant correlations at the 99% confidence level. Application of the correlation coefficients as a similarity matrix clustered most organisms in a phenogram in a pattern consistent with other hypotheses. This suggests a highly conserved pattern underlying all other genetic information in cellular DNA and affecting both DNA strands, perhaps caused by interaction with conserved factors necessary for DNA packaging.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Blaisdell BE (1985) Markov chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding. J Mol Evol 21:278–288

    Google Scholar 

  • Blaisdell BE (1986) A measure of the similarity of sets of sequences not requiring sequence alignment. Proc Natl Acad Sci USA 83:5155–5159

    PubMed  Google Scholar 

  • Blaisdell BE (1989a) Effectiveness of measures requiring and not requiring prior sequence alignment for estimating the dissimilarity of natural sequences. J Mol Evol 29:526–537

    PubMed  Google Scholar 

  • Blaisdell BE (1989b) Average values of a dissimilarity measure not requiring sequence alignment are twice the averages of conventional mismatch counts requiring sequence alignment for a computer-generated model system. J Mol Evol 29:538–547

    PubMed  Google Scholar 

  • Fickett JW, Burke C (1989) Development of a database for nucleotide sequences. In: Waterman MS (ed) Mathematical models for DNA sequences. CRC Press, Boca Raton FL, pp 1–34

    Google Scholar 

  • Grantham R, Gautier C, Gouy M, Mercier R, Pavé A (1980) Codon catalog useage and the genome hypothesis. Nucleic Acids Res 8:r49-r62

    PubMed  Google Scholar 

  • Karlin S (1986) Comparative analysis of structural relationships in DNA and protein sequences. In: Karlin S, Nevo E (eds) Evolutionary process and theory. Academic Press, New York, pp 329–363

    Google Scholar 

  • Karlin S, Ost F, Blaisdell BE (1989) Patterns in DNA and amino acid sequences and their statistical significance. In: Waterman MS (ed) Mathematical models for DNA sequences. CRC Press, Boca Raton FL, pp 133–157

    Google Scholar 

  • Kimura M (1986) The neutral theory of molecular evolution. Cambridge University Press, New York

    Google Scholar 

  • Nei M (1987) Molecular evolutionary genetics. Columbia University Press, New York

    Google Scholar 

  • Phillips GJ, Arnold J, Ivarie R (1987) Mono-through hexanucleotide composition of theEscherichia coli genome: a Markov chain analysis. Nucleic Acids Res 15:2611–2626

    Google Scholar 

  • Rogerson AC (1989) The sequence asymmetry of theEscherichia coli chromosome appears to be independent of strand or function and may be evolutionarily conserved. Nucleic Acids Res 17:5547–5563

    PubMed  Google Scholar 

  • Rohlf FJ (1986) NTSYS-pc version 1.01. applied Biostatistics, Setauket, NY

    Google Scholar 

  • Sneath PA, Sokal RR (1973) Numerical taxonomy. WH Freeman, San Francisco

    Google Scholar 

  • Travers AA (1989) DNA conformation and protein binding. Annu Rev Biochem 58:427–452

    Article  PubMed  Google Scholar 

  • von Heijne G (1987) Sequence analysis in molecular biology. Academic Press. San Diego

    Google Scholar 

  • Woese CR (1987) Bacterial evolution. Microbiol Revs 51:221–271

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rogerson, A.C. There appear to be conserved constraints on the distribution of nucleotide sequences in cellular genomes. J Mol Evol 32, 24–30 (1991). https://doi.org/10.1007/BF02099925

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02099925

Key words

Navigation