Abstract
The stochastic complexity of a data base of 365 protein-coding regions is analysed. When the primary sequence is modeled as a spatially homogeneous Markov source, the fit to observed codon preference is very poor. The situation improves substantially when a non-homogeneous model is used. Some implications for the estimation of species phylogeny and substitution rates are discussed.
Similar content being viewed by others
Literature
Almagor, H. 1983. “A Markov Analysis of DNA Sequences.”J. Theor. Biol. 104, 633–645.
Bernardi, G., B. Olofsson, J. Filipski, M. Zerial, J. Salinas, G. Cuny, M. Meunier-Rotival and F. Rodier. 1985. “The Mosaic Genome of Warm-Blooded Vertebrates.”Science 228, 953–958.
Bernardi, G. and G. Bernardi. 1985. “Codon Usage and Genome Composition.”J. Molec. Evol. 22, 363–365.
Billingsley, P. 1961.Statistical Inference for Markov Processes. Chicago: University of Chicago Press.
Blaisdell, B. E. 1985. “Markov Chain Analysis Finds a Significant Influence of Neighboring Bases on the Occurrence of a Base in Eukaryotic Nuclear DNA Sequences Both Protein-Coding and Noncoding.”J. Molec. Evol. 21, 278–288.
—. 1986. “A Measure of the Similarity of Sets of Sequences Not Requiring Sequence Alignment.”Proc. Natn. Acad. Sci. U.S.A. 83, 5155–5159.
Chatfield, C. 1973. “Statistical Inference Regarding Markov Chain Models.”Appl. Statist. 22, 7–20.
Erickson, J. W. and G. G. Altman. 1979. “A Search for Patterns in the Nucleotide Sequence of the MS2 Genome.”J. Math. Biol. 7, 219–230.
Felsenstein, J. 1983. “Statistical Inference of Phylogenies.”J. R. Statist. Soc. 146, 246–272.
Fuchs, C. 1980. “On the Distribution of Nucleotides in Seven Completely Sequenced DNAs.”Gene 10, 371–373.
Garden, P. W. 1980. “Markov Analysis of Viral DNA/RNA Sequences.”J. Theor. Biol. 82, 679–684.
Gouy, M. and C. Gautier. 1982. “Codon Usage in Bacteria: Correlation with Gene Expressivity.”Nucleic Acids Res. 10, 7055–7074.
Grantham, R., C. Gautier and M. Gouy. 1980a. “Codon Frequencies in 119 Individual Genes Confirm Consistent Choices of Degenerate Bases according to Genome Type.”Nucleic Acids Res. 9, r43-r74.
———. R. Mercier and A. Pavé. 1980b. “Codon Catalog Usage and the Genome Hypothesis.”Nucleic Acids Res. 8, r49-r62.
———, M. Jacobzone and R. Mercier. 1981. “Codon Catalog Usage is a Genome Strategy Modulated for Gene Expressivity.”Nucleic Acids Res. 9, r43-r74.
Grosjean, H. and W. Fiers. 1982. “Preferential Codon Usage in prokaryotic Genes—The Optimal Anticodon Interaction Energy and the Selective Codon Usage in Efficiently Expressed Genes.”Gene 18, 199–209.
Ikemura, T. 1981. “Correlation Between the Abundance ofEscherichia coli Transfer RNAs and the Occurrence of the Respective Codons in its Protein Genes.”J. Molec. Biol. 146, 1–21.
—. 1985. “Codon Usage and the tRNA Content in Unicellular and Multicellular Organisms.”Molec. Biol. Evol. 2, 13–34.
— and H. Ozeki. 1982. “Codon Usage and Transfer RNA Contents: Organism-Specific Codon-Choice Patterns in Reference to the Isoacceptor Contents.”Cold Spring Harbor Symp. Quant. Biol. 49, 1087–1097.
Katz, R. W. 1981. “On Some Criteria for Estimating the Order of a Markov Chain.”Technometrics 23, 243–249.
Kimura, M. 1983.The Neutral Theory of Molecular Evolution. New York: Cambridge University Press.
Konopka, A. 1984. “Is the Information Content of DNA Evolutionarily Significant?”J. Theor. Biol. 107, 697–704.
Lipman, D. J. and J. Maizel. 1982. “Comparative Analysis of Nucleic Acid Sequences by their General Constraints.”Nucleic Acids Res. 10, 2733–2739.
— and W. J. Wilbur. 1983. “Contextual Constraints on Synonymous Codon Choice.”J. Molec. Biol. 163, 363–376.
Maruyama, T., T. Gojobori, S. Aota and T. Ikemura. 1986. “Codon Usage Tabulated from the GenBank Genetic Sequence Data.”Nucleic Acids Res. 14, r151-r197.
Nei, M. 1987.Molecular Evolutionary Genetics. New York: Columbia University Press.
Nyunona, H. and C. J. Lusty. 1983. “The CarB Gene ofEscherichia coli: A Duplicated Gene Coding for the Large Sub-unit of Carbamoyl-Phosphate Synthetase.”Proc. Natn. Acad. Sci. U.S.A. 80, 4529–4633.
Ogasawara, N. 1985. “Markedly Unbiased Codon Usage inBacillus subtilis.”Gene 40, 145–150.
Phillips, G. J., J. Arnold and R. Ivarie. 1987a. “Mono-Through Hexanucleotide Composition of theEscherichia Coli Genome: A Markov Chain Analysis.”Nucleic Acids Res. 15, 2611–2626.
—, J. Arnold and R. Ivarie. 1987b. “The Effect of Codon Usage on the Oligonucleotide Composition of theE. coli Genome and Identification of Over- and Under-represented Sequences by Markov Chain Analysis.”Nucleic Acids Res. 15, 2627–2638.
Sharp, P. M. and W.-H. Li. 1986. “An Evolutionary Perspective on Synonymous Codon Usage in Unicellular Organisms.”J. Molec. Evol. 24, 28–38.
Shulman, M. J., C. M. Steinbert and N. Westmoreland. 1981. “The Coding Function of Nucleotide Sequences can be Discerned by Statistical Analysis.”J. Theor. Biol. 88, 409–420.
Smith, T. F., M. S. Waterman and J. R. Sadler. 1983. “Statistical Characterization of Nucleic Acid Sequence Functional Domains.”Nucleic Acids Res. 11, 2205–2220.
Tong, H. 1975. “Determination of the Order of a Markov Chain by Akaike's Information Criterion.”J. Appl. Prob. 12, 488–497.
Subba Rao, J., C. P. Geevan and G. Subba Rao. 1982. “Significance of the Information Content of DNA in Mutations and Evolution.”J. Theor. Biol. 96, 571–577.
Wilbur, W. J. 1985. “Codon Equilibrium I: Testing for Homogeneous Equilibrium.”J. Molec. Evol. 21, 169–181.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Tavaré, S., Song, B. Codon preference and primary sequence structure in protein-coding regions. Bltn Mathcal Biology 51, 95–115 (1989). https://doi.org/10.1007/BF02458838
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF02458838