Skip to main content
Log in

Codon preference and primary sequence structure in protein-coding regions

  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

The stochastic complexity of a data base of 365 protein-coding regions is analysed. When the primary sequence is modeled as a spatially homogeneous Markov source, the fit to observed codon preference is very poor. The situation improves substantially when a non-homogeneous model is used. Some implications for the estimation of species phylogeny and substitution rates are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Literature

  • Almagor, H. 1983. “A Markov Analysis of DNA Sequences.”J. Theor. Biol. 104, 633–645.

    Article  Google Scholar 

  • Bernardi, G., B. Olofsson, J. Filipski, M. Zerial, J. Salinas, G. Cuny, M. Meunier-Rotival and F. Rodier. 1985. “The Mosaic Genome of Warm-Blooded Vertebrates.”Science 228, 953–958.

    Google Scholar 

  • Bernardi, G. and G. Bernardi. 1985. “Codon Usage and Genome Composition.”J. Molec. Evol. 22, 363–365.

    Article  MathSciNet  Google Scholar 

  • Billingsley, P. 1961.Statistical Inference for Markov Processes. Chicago: University of Chicago Press.

    Google Scholar 

  • Blaisdell, B. E. 1985. “Markov Chain Analysis Finds a Significant Influence of Neighboring Bases on the Occurrence of a Base in Eukaryotic Nuclear DNA Sequences Both Protein-Coding and Noncoding.”J. Molec. Evol. 21, 278–288.

    Article  Google Scholar 

  • —. 1986. “A Measure of the Similarity of Sets of Sequences Not Requiring Sequence Alignment.”Proc. Natn. Acad. Sci. U.S.A. 83, 5155–5159.

    Article  MATH  Google Scholar 

  • Chatfield, C. 1973. “Statistical Inference Regarding Markov Chain Models.”Appl. Statist. 22, 7–20.

    Article  Google Scholar 

  • Erickson, J. W. and G. G. Altman. 1979. “A Search for Patterns in the Nucleotide Sequence of the MS2 Genome.”J. Math. Biol. 7, 219–230.

    Article  MATH  Google Scholar 

  • Felsenstein, J. 1983. “Statistical Inference of Phylogenies.”J. R. Statist. Soc. 146, 246–272.

    MATH  Google Scholar 

  • Fuchs, C. 1980. “On the Distribution of Nucleotides in Seven Completely Sequenced DNAs.”Gene 10, 371–373.

    Article  Google Scholar 

  • Garden, P. W. 1980. “Markov Analysis of Viral DNA/RNA Sequences.”J. Theor. Biol. 82, 679–684.

    Article  Google Scholar 

  • Gouy, M. and C. Gautier. 1982. “Codon Usage in Bacteria: Correlation with Gene Expressivity.”Nucleic Acids Res. 10, 7055–7074.

    Google Scholar 

  • Grantham, R., C. Gautier and M. Gouy. 1980a. “Codon Frequencies in 119 Individual Genes Confirm Consistent Choices of Degenerate Bases according to Genome Type.”Nucleic Acids Res. 9, r43-r74.

    Google Scholar 

  • ———. R. Mercier and A. Pavé. 1980b. “Codon Catalog Usage and the Genome Hypothesis.”Nucleic Acids Res. 8, r49-r62.

    Google Scholar 

  • ———, M. Jacobzone and R. Mercier. 1981. “Codon Catalog Usage is a Genome Strategy Modulated for Gene Expressivity.”Nucleic Acids Res. 9, r43-r74.

    Google Scholar 

  • Grosjean, H. and W. Fiers. 1982. “Preferential Codon Usage in prokaryotic Genes—The Optimal Anticodon Interaction Energy and the Selective Codon Usage in Efficiently Expressed Genes.”Gene 18, 199–209.

    Article  Google Scholar 

  • Ikemura, T. 1981. “Correlation Between the Abundance ofEscherichia coli Transfer RNAs and the Occurrence of the Respective Codons in its Protein Genes.”J. Molec. Biol. 146, 1–21.

    Article  Google Scholar 

  • —. 1985. “Codon Usage and the tRNA Content in Unicellular and Multicellular Organisms.”Molec. Biol. Evol. 2, 13–34.

    Google Scholar 

  • — and H. Ozeki. 1982. “Codon Usage and Transfer RNA Contents: Organism-Specific Codon-Choice Patterns in Reference to the Isoacceptor Contents.”Cold Spring Harbor Symp. Quant. Biol. 49, 1087–1097.

    Google Scholar 

  • Katz, R. W. 1981. “On Some Criteria for Estimating the Order of a Markov Chain.”Technometrics 23, 243–249.

    Article  MATH  MathSciNet  Google Scholar 

  • Kimura, M. 1983.The Neutral Theory of Molecular Evolution. New York: Cambridge University Press.

    Google Scholar 

  • Konopka, A. 1984. “Is the Information Content of DNA Evolutionarily Significant?”J. Theor. Biol. 107, 697–704.

    Google Scholar 

  • Lipman, D. J. and J. Maizel. 1982. “Comparative Analysis of Nucleic Acid Sequences by their General Constraints.”Nucleic Acids Res. 10, 2733–2739.

    Google Scholar 

  • — and W. J. Wilbur. 1983. “Contextual Constraints on Synonymous Codon Choice.”J. Molec. Biol. 163, 363–376.

    Article  Google Scholar 

  • Maruyama, T., T. Gojobori, S. Aota and T. Ikemura. 1986. “Codon Usage Tabulated from the GenBank Genetic Sequence Data.”Nucleic Acids Res. 14, r151-r197.

    Google Scholar 

  • Nei, M. 1987.Molecular Evolutionary Genetics. New York: Columbia University Press.

    Google Scholar 

  • Nyunona, H. and C. J. Lusty. 1983. “The CarB Gene ofEscherichia coli: A Duplicated Gene Coding for the Large Sub-unit of Carbamoyl-Phosphate Synthetase.”Proc. Natn. Acad. Sci. U.S.A. 80, 4529–4633.

    Google Scholar 

  • Ogasawara, N. 1985. “Markedly Unbiased Codon Usage inBacillus subtilis.”Gene 40, 145–150.

    Article  Google Scholar 

  • Phillips, G. J., J. Arnold and R. Ivarie. 1987a. “Mono-Through Hexanucleotide Composition of theEscherichia Coli Genome: A Markov Chain Analysis.”Nucleic Acids Res. 15, 2611–2626.

    Google Scholar 

  • —, J. Arnold and R. Ivarie. 1987b. “The Effect of Codon Usage on the Oligonucleotide Composition of theE. coli Genome and Identification of Over- and Under-represented Sequences by Markov Chain Analysis.”Nucleic Acids Res. 15, 2627–2638.

    Google Scholar 

  • Sharp, P. M. and W.-H. Li. 1986. “An Evolutionary Perspective on Synonymous Codon Usage in Unicellular Organisms.”J. Molec. Evol. 24, 28–38.

    Article  Google Scholar 

  • Shulman, M. J., C. M. Steinbert and N. Westmoreland. 1981. “The Coding Function of Nucleotide Sequences can be Discerned by Statistical Analysis.”J. Theor. Biol. 88, 409–420.

    Article  Google Scholar 

  • Smith, T. F., M. S. Waterman and J. R. Sadler. 1983. “Statistical Characterization of Nucleic Acid Sequence Functional Domains.”Nucleic Acids Res. 11, 2205–2220.

    Google Scholar 

  • Tong, H. 1975. “Determination of the Order of a Markov Chain by Akaike's Information Criterion.”J. Appl. Prob. 12, 488–497.

    Article  MATH  Google Scholar 

  • Subba Rao, J., C. P. Geevan and G. Subba Rao. 1982. “Significance of the Information Content of DNA in Mutations and Evolution.”J. Theor. Biol. 96, 571–577.

    Article  Google Scholar 

  • Wilbur, W. J. 1985. “Codon Equilibrium I: Testing for Homogeneous Equilibrium.”J. Molec. Evol. 21, 169–181.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tavaré, S., Song, B. Codon preference and primary sequence structure in protein-coding regions. Bltn Mathcal Biology 51, 95–115 (1989). https://doi.org/10.1007/BF02458838

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02458838

Keywords

Navigation