Journal of Molecular Evolution

, Volume 21, Issue 3, pp 278–288 | Cite as

Markov chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding

  • B. Edwin Blaisdell


Sixty-four eucaryotic nuclear DNA sequences, half of them coding and half noncoding, have been examined as expressions of first-, second-, or third-order Markov chains. Standard statistical tests found that most of the sequences required at least second-order Markov chains for their representation, and some required chains of third order. For all 64 sequences the observed one-step second-order transition count matrices were effective in predicting the two-step transition count matrices, and 56 of 64 were effective in predicting the three-step transition count matrices. The departure from random expectation of the observed first- and second-order transition count matrices meant that a considerable sample of eucaryotic nuclear DNA sequences, both protein coding and noncoding, have significant local structure over subsequences of three to five contiguous bases, and that this structure occurs throughout the total length of the sequence. These results suggested that present DNA sequences may have arisen from the duplication, concatenation, and gradual modification of very early short sequences.

Key words

Higher-order Markov chains Prediction of following DNA bases 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Almagor H (1983) A Markov chain analysis of DNA sequences. J Theor Biol 104:633–645PubMedGoogle Scholar
  2. Altenburger W, Neumaier PS, Steinmetz M, Zachau HG (1981) DNA sequence of the constant region of the mouse immunoglobulin kappa chain. Nucleic Acids Res 9:971–981PubMedGoogle Scholar
  3. Anderson TW, Goodman LA (1957) Statistical inference about Markov chains. Ann Math Stat 28:89–109Google Scholar
  4. Baralle FE, Shoulders CC, Proudfoot NJ (1980a) The primary structure of the human epsilon-globin gene. Cell 21:621–626PubMedGoogle Scholar
  5. Baralle FE, Shoulders CC, Goodbourn S, Jeffreys A, Proudfoot NJ (1980b) The 5′ flanking region of human epsilon-globin gene. Nucleic Acids Res 8:4393–4404PubMedGoogle Scholar
  6. Bell GI, Pictet RL, Rutter WJ, Cordell B, Tischer E, Goodman HM (1980a) Sequence of the human insulin gene. Nature 284:26–32PubMedGoogle Scholar
  7. Bell GI, Pictet R, Rutter WJ (1980b) Analysis of the regions flanking the human insulin gene and sequence of an Alu family member. Nucleic Acids Res 8:4091–4109PubMedGoogle Scholar
  8. Bird AP (1980) DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res 8:1499–1504PubMedGoogle Scholar
  9. Blaisdell BE (1983a) A prevalent persistent nonrandomness that distinguishes coding and noncoding eucaryotic nuclear DNA sequences. J Mol Evol 19:122–133PubMedGoogle Scholar
  10. Blaisdell BE (1983b) Choice of base at silent codon site 3 is not selectively neutral in eucaryotic structural genes: It maintains excess short runs of weak and strong hydrogen bonding bases. J Mol Evol 19:226–236PubMedGoogle Scholar
  11. Chang ACY, Cochet M, Cohen SN (1980) Structural organization of human genomic DNA encoding the propiomelanocortin peptide. Proc Natl Acad Sci USA 77:4890–4894PubMedGoogle Scholar
  12. Coulondre C, Miller JH, Farabaugh PJ, Gilbert W (1978) Molecular basis of base substitution hotspots inEscherichia coli. Nature 274:775–780PubMedGoogle Scholar
  13. Elton RA (1975) Doublet frequencies in sequenced nucleic acids. J Mol Evol 4:323–346PubMedGoogle Scholar
  14. Erickson JW, Altman G (1979) A search for patterns in the nucleotide sequence of the MS2 genome. J Math Biol 7:219–230Google Scholar
  15. Gatlin L (1972) Information theory and the living system. Columbia University Press, New YorkGoogle Scholar
  16. Goeddel DV, Yelverlon E, Ullrich A, Heyneker HL, Miozzari G, Holmes W, Seeburg PH, Dull T, May L, Stebbins N, Crea R, Maeda S, McCandliss R, Sloma A, Tabor JM, Gross M, Familetti PC, Pestka S (1980) Human leukocyte interferon produced byE. coli is biologically active. Nature 287:411–416PubMedGoogle Scholar
  17. Gubbins EJ, Maurer RA, Lagrimini M, Erwin CR, Donelson JE (1980) Structure of the rat prolactin gene. J Biol Chem 255:8655–8662PubMedGoogle Scholar
  18. Hieter PA, Max EE, Seidman JG, Maizel JV, Leder P (1980) Cloned human and mouse kappa immunoglobulin constant and J region genes conserve homology in functional segments. Cell 22:197–207PubMedGoogle Scholar
  19. Holland JP, Holland MJ (1979) The primary structure of a glyceraldehyde-3-phosphate dehydrogenase gene fromSaccharomyces cerevisiae. J Biol Chem 254:9839–9845PubMedGoogle Scholar
  20. Josse J, Kaiser AD, Kornberg A (1961) Enzymatic synthesis of deoxyribonucleic acid. VIII. Frequencies of nearest neighbor base sequences in deoxyribonucleic acid. J Biol Chem 236:864–875PubMedGoogle Scholar
  21. Jukes TH (1978) Codons and nearest neighbor nucleotide pairs in mammalian messenger RNA. J Mol Evol 11:121–127PubMedGoogle Scholar
  22. Konkel DA, Maizel JV, Leder P (1979) The evolution and sequence comparison of two recently diverged mouse chromosome beta-globin genes. Cell 18:865–873PubMedGoogle Scholar
  23. Kullback S, Kupperman M, Ku HH (1962) Tests for contingency tables and Markov chains. Technometrics 4:573–608Google Scholar
  24. Lawn RM, Efstratiadis A, O'Connell C, Maniatis T (1980) The nucleotide sequence of the human beta-globin gene. Cell 21:647–651PubMedGoogle Scholar
  25. Lawn RM, Adelman J, Franke AE, Houck M, Cross M, Najarian R, Coeddel OV (1981) Human fibroblast interferon gene lacks introns. Nucleic Acids Res 9:1045–1052PubMedGoogle Scholar
  26. Lipman DJ, Wilbur WJ (1983) Contextual constraints on synonymous codon choice. J Mol Biol 163:363–376PubMedGoogle Scholar
  27. Lomedico P, Rosenthal N, Efstratiadis A, Gilbert W, Kolodner R, Tizard R (1979) The structure and evolution of the two nonallelic rat preproinsulin genes. Cell 18:545–558PubMedGoogle Scholar
  28. Ng R, Abelson J (1980) Isolation and sequence of the gene for actin inSaccharomyces cerevisiae. Proc Natl Acad Sci USA 77:3912–3916PubMedGoogle Scholar
  29. Nishioka Y, Leder P (1979) The complete sequence of a chromosomal mouse alpha globin gene reveals elements conserved throughout vertebrate evolution. Cell 18:875–882PubMedGoogle Scholar
  30. Nishioka Y, Leder PJ (1980) Organization and complete sequence of identical embryonic and plasmacytoma kappa V-region genes. J Biol Chem 255:3691–3694PubMedGoogle Scholar
  31. Nussinov R (1980) Some rules in the ordering of nucleotides in the DNA. Nucleic Acids Res 8:4545–4562PubMedGoogle Scholar
  32. Nussinov R (1981) The universal dinucleotide asymmetry rules in DNA and amino acid codon choice. J Mol Evol 17:237–244PubMedGoogle Scholar
  33. Ohno S, Epplen JT (1983) The primitive code and repeats of base oligomers as the primordial protein-encoding sequence. Proc Natl Acad Sci USA 80:3391–3395PubMedGoogle Scholar
  34. Perder F, Efstratiadis A, Lomedico P, Gilbert W, Kolodner R, Dodgson J (1980) The evolution of genes: the chicken preproinsulin gene. Cell 20:555–566PubMedGoogle Scholar
  35. Proudfoot NJ, Maniatis T (1980) The structure of a human alpha globin pseudogene and its relationship to alpha globin gene duplication. Cell 21:537–544PubMedGoogle Scholar
  36. Richards RJ, Shine J, Ullrich A, Wells JRE, Goodman HM (1979) Molecular cloning and sequence analysis of adult chicken beta globin cDNA. Nucleic Acids Res 7:1137–1146.PubMedGoogle Scholar
  37. Robertson MA, Staden R, Tanaka Y, Catterall JF, O'Malley BW, Brownlee CG (1979) Sequence of three introns of the chick ovalbumin gene. Nature 278:370–372PubMedGoogle Scholar
  38. Sakano H, Maki R, Kurosawa Y, Roeder W, Tonegawa S (1980) Two types of somatic recombination are necessary for the generation of complete immunoglobulin heavy chain genes. Nature 286:676–683PubMedGoogle Scholar
  39. Salser W (1977) Globin messenger—RNA sequences—analysis of base-pairing and evolutionary implications. Cold Spring Harbor Symp Quant Biol 42:985–1103Google Scholar
  40. Slightom JL, Blechl AE, Smithies O (1980) Human fetal G-gamma and A-gamma globin genes: Complete nucleotide sequences suggest that DNA can be exchanged between these duplicated genes. Cell 21:627–638PubMedGoogle Scholar
  41. Spritz RA, De Riel JK, Forget BG, Weissman SM (1980) Complete nucleotide sequence of the human delta-globin gene. Cell 21:639–646PubMedGoogle Scholar
  42. Sun SM, Slightom JL, Hall TC (1981) Intervening sequences in a plant gene: comparison of the partial sequence of cDNA and genomic DNA of French bean phaseolin. Nature 289:37–41Google Scholar
  43. Sures I, Lowry J, Kedes LH (1978) The DNA sequence of sea urchin (S. purpuratus) H2A, H2B and H3 histone coding and spacer regions. Cell 15:1033–1044PubMedGoogle Scholar
  44. Swartz MN, Trautner TA, Kornberg A (1962) Enzymatic synthesis of deoxyribonucleic acid. XI. Further studies on nearest neighbor base sequences in deoxyribonucleic acids. J Biol Chem 237:1961–1967PubMedGoogle Scholar
  45. Takahashi N, Kataoka T, Honjo T (1980) Nucleotide sequences of class-switch recombination region of the mouse immunoglobulin gamma 2b-chain gene. Gene 11:117–127PubMedGoogle Scholar
  46. Tschumper G, Carbon J (1980) Sequence of a yeast fragment containing a chromosomal replicator and the TRPI gene. Gene 10:157–166PubMedGoogle Scholar
  47. Ullrich A, Dull RJ, Gray A, Brosius J, Sures I (1980) Genetic variation in the human insulin gene. Science 209:612–615PubMedGoogle Scholar
  48. van Ooyen A, van den Berg J, Mantei N, Weissmann C (1979) Comparison of total sequence of a cloned rabbit beta-globin gene and its flanking regions with a homologous mouse sequence. Science 206:337–344PubMedGoogle Scholar
  49. Young RA, Hagenbuchle O, Schibler U (1981) A single mouse alpha-amylase gene specifies two different tissue-specific mRNAs. Cell 23:451–458PubMedGoogle Scholar
  50. Zuckerkandl E (1975) The appearance of new structures and functions in proteins during evolution. J Mol Evol 7:1–57PubMedGoogle Scholar

Copyright information

© Springer-Verlag 1985

Authors and Affiliations

  • B. Edwin Blaisdell
    • 1
  1. 1.Linus Pauling Institute of Science and MedicinePaloAltoUSA

Personalised recommendations