Skip to main content
Log in

Stochastic models for heterogeneous DNA sequences

  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

The composition of naturally occurring DNA sequences is often strikingly heterogeneous. In this paper, the DNA sequence is viewed as a stochastic process with local compositional properties determined by the states of a hidden Markov chain. The model used is a discrete-state, discreteoutcome version of a general model for non-stationary time series proposed by Kitagawa (1987). A smoothing algorithm is described which can be used to reconstruct the hidden process and produce graphic displays of the compositional structure of a sequence. The problem of parameter estimation is approached using likelihood methods and an EM algorithm for approximating the maximum likelihood estimate is derived. The methods are applied to sequences from yeast mitochondrial DNA, human and mouse mitochondrial DNAs, a human X chromosomal fragment and the complete genome of bacteriophage lambda.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Literature

  • Anderson, S., A. T. Bankier, B. G. Barrell, M. H. L. de Bruijn, A. R. Coulson, J. Drouin, I. C. Eperon, D. P. Nierlich, B. A. Roe, F. Sanger, P. H. Schreier, A. J. H. Smith, R. Staden and I. G. Young. 1981. “Sequence and Organization of the Human Mitochondrial Genome.”Nature 290, 457–464.

    Article  Google Scholar 

  • Baum, L. E., T. Petrie, G. Soules, N. Weiss. 1970. “A Maximization Technique Occurring in the Statistical Analysis of Probabalistic Functions of Markov Chains.”Ann. Math. Statist. 41, 164–171.

    MATH  MathSciNet  Google Scholar 

  • Becker, R. A. and J. M. Chambers. 1984.S—An Interactive Environment for Data Analysis. Belmont, CA: Wadsworth.

    Google Scholar 

  • Bernardi, G. and G. Bernardi. 1986. “Compositional Constraints and Genome Evolution.”J. Molec. Evol. 24, 1–11.

    Article  Google Scholar 

  • —, B. Olofsson, J. Filipski, M. Zerial, G. Cuny, M. Meunier-Rotival, F. Rodier. 1985. “The Mosaic Genome of Warm Blooded Vertebrates.”Science 228, 953–957.

    Google Scholar 

  • Bibb, M. J., R. A. Van Etten, C. T. Wright, M. W. Walberg, D. A. Clayton. 1981. “Sequence and Gene Organization of Mouse Mitochondrial DNA.”Cell 26, 167–180.

    Article  Google Scholar 

  • Blanc, H. and B. Dujon. 1980. “Replicator Regions of the Yeast Mitochondrial DNA Responsible for Suppressiveness.”Proc. Natn. Acad. Sci. U.S.A. 77, 3942–3946.

    Article  Google Scholar 

  • de Zamaroczy, M., G. Bernardi. 1986. “The Primary Structure of the Mitochondrial Genome ofSaccharomyces cerevisiae—a review.”Gene 47, 155–177.

    Article  Google Scholar 

  • Elton, R. A. 1974. “Theoretical Models for Heterogeneity of Base Composition in DNA.”J. Theor. Biol. 45, 533–553.

    Article  Google Scholar 

  • Dempster, A. P., N. M. Laird, D. B. Rubin. 1977. “Maximum Likelihood from Incomplete Data via the EM Algorithm.”J. R. Statist. Soc. B39, 1–22.

    MATH  MathSciNet  Google Scholar 

  • Fangman, W. L. and B. Dujon. 1984. “Yeast Mitochondrial Genomes Consisting of Only AT Base Pairs Replicate and Exhibit Suppressiveness.”Proc. Natn. Acad. Sci. U.S.A. 81, 7156–7160.

    Article  Google Scholar 

  • Goursot, R., M. Mangin, G. Bernardi. 1982. “Surrogate, Origins of Replication in the Mitochondrial Genomes ofori o Petite Mutants of Yeast.”EMBO J. 1, 705–711.

    Google Scholar 

  • Hinckley, D. V. 1970. “Inference About the Change Point in a Sequence of Random Variables.”Biometrika 57, 1–17.

    Article  MathSciNet  Google Scholar 

  • Kitagawa, G. 1987. “Non-Gaussian State-Space Modeling of Nonstationary Time Series.”J. Am. Statist. Assoc. 82, 1032–1041.

    Article  MATH  MathSciNet  Google Scholar 

  • Ott, G. 1967. “Compact Encoding of Stationary Markov Sources.”IEEE Trans. Inf. Theor. IT-13, 82–86.

    Article  MATH  Google Scholar 

  • Riley, D. E., R. Reeves, S. M. Gartler, 1986. “Xrep, a Plasmid-Stimulating X Chromosomal Sequence Bearing Similarities to the BK Virus Replication Origin and Viral Enhancers.”Nucl. Acids Res. 14, 9407–9423.

    Google Scholar 

  • Sanger, F., A. R. Coulson, G. F. Hong, D. F. Hill, G. B. Petersen. 1982. “Nucleotide Sequence of Bacteriophage λ DNA.”J. Molec. Biol. 162, 729–773.

    Article  Google Scholar 

  • Schwarz, G. 1978. “Estimating the Dimension of a Model.”Ann. Statist. 6, 461–464.

    MATH  MathSciNet  Google Scholar 

  • Skalka, A., E. Burgi, A. D. Hershey. 1968. “Segmental Distribution of Nucleotides in the DNA of Bacteriophage Lambda.”J. Molec. Biol. 34, 1–16.

    Article  Google Scholar 

  • Smith, A. F. M. 1975. “A Baysean Approach to Inference About a Change Point in a Sequence of Random Variables.”Biometrika 62, 407–416.

    Article  MATH  MathSciNet  Google Scholar 

  • Staden, R. 1984. “Graphic Methods to Determine the Function of Nucleic Acid Sequences.”Nucl. Acids Res. 12, 521–538.

    Google Scholar 

  • Sueoka, N. 1959. “A Statistical Analysis of Deoxyribonucleic Acid Distribution in Density Gradient Centrifugation.”Proc. Natn. Acad. Sci. U.S.A. 45, 1480–1490.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Churchill, G.A. Stochastic models for heterogeneous DNA sequences. Bltn Mathcal Biology 51, 79–94 (1989). https://doi.org/10.1007/BF02458837

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02458837

Keywords

Navigation