Skip to main content
Log in

Markov chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Summary

Sixty-four eucaryotic nuclear DNA sequences, half of them coding and half noncoding, have been examined as expressions of first-, second-, or third-order Markov chains. Standard statistical tests found that most of the sequences required at least second-order Markov chains for their representation, and some required chains of third order. For all 64 sequences the observed one-step second-order transition count matrices were effective in predicting the two-step transition count matrices, and 56 of 64 were effective in predicting the three-step transition count matrices. The departure from random expectation of the observed first- and second-order transition count matrices meant that a considerable sample of eucaryotic nuclear DNA sequences, both protein coding and noncoding, have significant local structure over subsequences of three to five contiguous bases, and that this structure occurs throughout the total length of the sequence. These results suggested that present DNA sequences may have arisen from the duplication, concatenation, and gradual modification of very early short sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Almagor H (1983) A Markov chain analysis of DNA sequences. J Theor Biol 104:633–645

    PubMed  Google Scholar 

  • Altenburger W, Neumaier PS, Steinmetz M, Zachau HG (1981) DNA sequence of the constant region of the mouse immunoglobulin kappa chain. Nucleic Acids Res 9:971–981

    PubMed  Google Scholar 

  • Anderson TW, Goodman LA (1957) Statistical inference about Markov chains. Ann Math Stat 28:89–109

    Google Scholar 

  • Baralle FE, Shoulders CC, Proudfoot NJ (1980a) The primary structure of the human epsilon-globin gene. Cell 21:621–626

    PubMed  Google Scholar 

  • Baralle FE, Shoulders CC, Goodbourn S, Jeffreys A, Proudfoot NJ (1980b) The 5′ flanking region of human epsilon-globin gene. Nucleic Acids Res 8:4393–4404

    PubMed  Google Scholar 

  • Bell GI, Pictet RL, Rutter WJ, Cordell B, Tischer E, Goodman HM (1980a) Sequence of the human insulin gene. Nature 284:26–32

    PubMed  Google Scholar 

  • Bell GI, Pictet R, Rutter WJ (1980b) Analysis of the regions flanking the human insulin gene and sequence of an Alu family member. Nucleic Acids Res 8:4091–4109

    PubMed  Google Scholar 

  • Bird AP (1980) DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res 8:1499–1504

    PubMed  Google Scholar 

  • Blaisdell BE (1983a) A prevalent persistent nonrandomness that distinguishes coding and noncoding eucaryotic nuclear DNA sequences. J Mol Evol 19:122–133

    PubMed  Google Scholar 

  • Blaisdell BE (1983b) Choice of base at silent codon site 3 is not selectively neutral in eucaryotic structural genes: It maintains excess short runs of weak and strong hydrogen bonding bases. J Mol Evol 19:226–236

    PubMed  Google Scholar 

  • Chang ACY, Cochet M, Cohen SN (1980) Structural organization of human genomic DNA encoding the propiomelanocortin peptide. Proc Natl Acad Sci USA 77:4890–4894

    PubMed  Google Scholar 

  • Coulondre C, Miller JH, Farabaugh PJ, Gilbert W (1978) Molecular basis of base substitution hotspots inEscherichia coli. Nature 274:775–780

    PubMed  Google Scholar 

  • Elton RA (1975) Doublet frequencies in sequenced nucleic acids. J Mol Evol 4:323–346

    PubMed  Google Scholar 

  • Erickson JW, Altman G (1979) A search for patterns in the nucleotide sequence of the MS2 genome. J Math Biol 7:219–230

    Google Scholar 

  • Gatlin L (1972) Information theory and the living system. Columbia University Press, New York

    Google Scholar 

  • Goeddel DV, Yelverlon E, Ullrich A, Heyneker HL, Miozzari G, Holmes W, Seeburg PH, Dull T, May L, Stebbins N, Crea R, Maeda S, McCandliss R, Sloma A, Tabor JM, Gross M, Familetti PC, Pestka S (1980) Human leukocyte interferon produced byE. coli is biologically active. Nature 287:411–416

    PubMed  Google Scholar 

  • Gubbins EJ, Maurer RA, Lagrimini M, Erwin CR, Donelson JE (1980) Structure of the rat prolactin gene. J Biol Chem 255:8655–8662

    PubMed  Google Scholar 

  • Hieter PA, Max EE, Seidman JG, Maizel JV, Leder P (1980) Cloned human and mouse kappa immunoglobulin constant and J region genes conserve homology in functional segments. Cell 22:197–207

    PubMed  Google Scholar 

  • Holland JP, Holland MJ (1979) The primary structure of a glyceraldehyde-3-phosphate dehydrogenase gene fromSaccharomyces cerevisiae. J Biol Chem 254:9839–9845

    PubMed  Google Scholar 

  • Josse J, Kaiser AD, Kornberg A (1961) Enzymatic synthesis of deoxyribonucleic acid. VIII. Frequencies of nearest neighbor base sequences in deoxyribonucleic acid. J Biol Chem 236:864–875

    PubMed  Google Scholar 

  • Jukes TH (1978) Codons and nearest neighbor nucleotide pairs in mammalian messenger RNA. J Mol Evol 11:121–127

    PubMed  Google Scholar 

  • Konkel DA, Maizel JV, Leder P (1979) The evolution and sequence comparison of two recently diverged mouse chromosome beta-globin genes. Cell 18:865–873

    PubMed  Google Scholar 

  • Kullback S, Kupperman M, Ku HH (1962) Tests for contingency tables and Markov chains. Technometrics 4:573–608

    Google Scholar 

  • Lawn RM, Efstratiadis A, O'Connell C, Maniatis T (1980) The nucleotide sequence of the human beta-globin gene. Cell 21:647–651

    PubMed  Google Scholar 

  • Lawn RM, Adelman J, Franke AE, Houck M, Cross M, Najarian R, Coeddel OV (1981) Human fibroblast interferon gene lacks introns. Nucleic Acids Res 9:1045–1052

    PubMed  Google Scholar 

  • Lipman DJ, Wilbur WJ (1983) Contextual constraints on synonymous codon choice. J Mol Biol 163:363–376

    PubMed  Google Scholar 

  • Lomedico P, Rosenthal N, Efstratiadis A, Gilbert W, Kolodner R, Tizard R (1979) The structure and evolution of the two nonallelic rat preproinsulin genes. Cell 18:545–558

    PubMed  Google Scholar 

  • Ng R, Abelson J (1980) Isolation and sequence of the gene for actin inSaccharomyces cerevisiae. Proc Natl Acad Sci USA 77:3912–3916

    PubMed  Google Scholar 

  • Nishioka Y, Leder P (1979) The complete sequence of a chromosomal mouse alpha globin gene reveals elements conserved throughout vertebrate evolution. Cell 18:875–882

    PubMed  Google Scholar 

  • Nishioka Y, Leder PJ (1980) Organization and complete sequence of identical embryonic and plasmacytoma kappa V-region genes. J Biol Chem 255:3691–3694

    PubMed  Google Scholar 

  • Nussinov R (1980) Some rules in the ordering of nucleotides in the DNA. Nucleic Acids Res 8:4545–4562

    PubMed  Google Scholar 

  • Nussinov R (1981) The universal dinucleotide asymmetry rules in DNA and amino acid codon choice. J Mol Evol 17:237–244

    PubMed  Google Scholar 

  • Ohno S, Epplen JT (1983) The primitive code and repeats of base oligomers as the primordial protein-encoding sequence. Proc Natl Acad Sci USA 80:3391–3395

    PubMed  Google Scholar 

  • Perder F, Efstratiadis A, Lomedico P, Gilbert W, Kolodner R, Dodgson J (1980) The evolution of genes: the chicken preproinsulin gene. Cell 20:555–566

    PubMed  Google Scholar 

  • Proudfoot NJ, Maniatis T (1980) The structure of a human alpha globin pseudogene and its relationship to alpha globin gene duplication. Cell 21:537–544

    PubMed  Google Scholar 

  • Richards RJ, Shine J, Ullrich A, Wells JRE, Goodman HM (1979) Molecular cloning and sequence analysis of adult chicken beta globin cDNA. Nucleic Acids Res 7:1137–1146.

    PubMed  Google Scholar 

  • Robertson MA, Staden R, Tanaka Y, Catterall JF, O'Malley BW, Brownlee CG (1979) Sequence of three introns of the chick ovalbumin gene. Nature 278:370–372

    PubMed  Google Scholar 

  • Sakano H, Maki R, Kurosawa Y, Roeder W, Tonegawa S (1980) Two types of somatic recombination are necessary for the generation of complete immunoglobulin heavy chain genes. Nature 286:676–683

    PubMed  Google Scholar 

  • Salser W (1977) Globin messenger—RNA sequences—analysis of base-pairing and evolutionary implications. Cold Spring Harbor Symp Quant Biol 42:985–1103

    Google Scholar 

  • Slightom JL, Blechl AE, Smithies O (1980) Human fetal G-gamma and A-gamma globin genes: Complete nucleotide sequences suggest that DNA can be exchanged between these duplicated genes. Cell 21:627–638

    PubMed  Google Scholar 

  • Spritz RA, De Riel JK, Forget BG, Weissman SM (1980) Complete nucleotide sequence of the human delta-globin gene. Cell 21:639–646

    PubMed  Google Scholar 

  • Sun SM, Slightom JL, Hall TC (1981) Intervening sequences in a plant gene: comparison of the partial sequence of cDNA and genomic DNA of French bean phaseolin. Nature 289:37–41

    Google Scholar 

  • Sures I, Lowry J, Kedes LH (1978) The DNA sequence of sea urchin (S. purpuratus) H2A, H2B and H3 histone coding and spacer regions. Cell 15:1033–1044

    PubMed  Google Scholar 

  • Swartz MN, Trautner TA, Kornberg A (1962) Enzymatic synthesis of deoxyribonucleic acid. XI. Further studies on nearest neighbor base sequences in deoxyribonucleic acids. J Biol Chem 237:1961–1967

    PubMed  Google Scholar 

  • Takahashi N, Kataoka T, Honjo T (1980) Nucleotide sequences of class-switch recombination region of the mouse immunoglobulin gamma 2b-chain gene. Gene 11:117–127

    PubMed  Google Scholar 

  • Tschumper G, Carbon J (1980) Sequence of a yeast fragment containing a chromosomal replicator and the TRPI gene. Gene 10:157–166

    PubMed  Google Scholar 

  • Ullrich A, Dull RJ, Gray A, Brosius J, Sures I (1980) Genetic variation in the human insulin gene. Science 209:612–615

    PubMed  Google Scholar 

  • van Ooyen A, van den Berg J, Mantei N, Weissmann C (1979) Comparison of total sequence of a cloned rabbit beta-globin gene and its flanking regions with a homologous mouse sequence. Science 206:337–344

    PubMed  Google Scholar 

  • Young RA, Hagenbuchle O, Schibler U (1981) A single mouse alpha-amylase gene specifies two different tissue-specific mRNAs. Cell 23:451–458

    PubMed  Google Scholar 

  • Zuckerkandl E (1975) The appearance of new structures and functions in proteins during evolution. J Mol Evol 7:1–57

    PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Blaisdell, B.E. Markov chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding. J Mol Evol 21, 278–288 (1985). https://doi.org/10.1007/BF02102360

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02102360

Key words

Navigation