Markov chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding
- 104 Downloads
Sixty-four eucaryotic nuclear DNA sequences, half of them coding and half noncoding, have been examined as expressions of first-, second-, or third-order Markov chains. Standard statistical tests found that most of the sequences required at least second-order Markov chains for their representation, and some required chains of third order. For all 64 sequences the observed one-step second-order transition count matrices were effective in predicting the two-step transition count matrices, and 56 of 64 were effective in predicting the three-step transition count matrices. The departure from random expectation of the observed first- and second-order transition count matrices meant that a considerable sample of eucaryotic nuclear DNA sequences, both protein coding and noncoding, have significant local structure over subsequences of three to five contiguous bases, and that this structure occurs throughout the total length of the sequence. These results suggested that present DNA sequences may have arisen from the duplication, concatenation, and gradual modification of very early short sequences.
Key wordsHigher-order Markov chains Prediction of following DNA bases
Unable to display preview. Download preview PDF.
- Anderson TW, Goodman LA (1957) Statistical inference about Markov chains. Ann Math Stat 28:89–109Google Scholar
- Erickson JW, Altman G (1979) A search for patterns in the nucleotide sequence of the MS2 genome. J Math Biol 7:219–230Google Scholar
- Gatlin L (1972) Information theory and the living system. Columbia University Press, New YorkGoogle Scholar
- Goeddel DV, Yelverlon E, Ullrich A, Heyneker HL, Miozzari G, Holmes W, Seeburg PH, Dull T, May L, Stebbins N, Crea R, Maeda S, McCandliss R, Sloma A, Tabor JM, Gross M, Familetti PC, Pestka S (1980) Human leukocyte interferon produced byE. coli is biologically active. Nature 287:411–416PubMedGoogle Scholar
- Kullback S, Kupperman M, Ku HH (1962) Tests for contingency tables and Markov chains. Technometrics 4:573–608Google Scholar
- Salser W (1977) Globin messenger—RNA sequences—analysis of base-pairing and evolutionary implications. Cold Spring Harbor Symp Quant Biol 42:985–1103Google Scholar
- Sun SM, Slightom JL, Hall TC (1981) Intervening sequences in a plant gene: comparison of the partial sequence of cDNA and genomic DNA of French bean phaseolin. Nature 289:37–41Google Scholar