Analysing grouping of nucleotides in DNA sequences using lumped processes constructed from Markov chains

Guédon, Yann; d'Aubenton-Carafa, Yves; Thermes, Claude

doi:10.1007/s00285-005-0358-y

Analysing grouping of nucleotides in DNA sequences using lumped processes constructed from Markov chains

Published: 07 February 2006

Volume 52, pages 343–372, (2006)
Cite this article

Journal of Mathematical Biology Aims and scope Submit manuscript

Yann Guédon¹,
Yves d'Aubenton-Carafa² &
Claude Thermes²

122 Accesses
1 Citation
Explore all metrics

Abstract

The most commonly used models for analysing local dependencies in DNA sequences are (high-order) Markov chains. Incorporating knowledge relative to the possible grouping of the nucleotides enables to define dedicated sub-classes of Markov chains. The problem of formulating lumpability hypotheses for a Markov chain is therefore addressed. In the classical approach to lumpability, this problem can be formulated as the determination of an appropriate state space (smaller than the original state space) such that the lumped chain defined on this state space retains the Markov property. We propose a different perspective on lumpability where the state space is fixed and the partitioning of this state space is represented by a one-to-many probabilistic function within a two-level stochastic process. Three nested classes of lumped processes can be defined in this way as sub-classes of first-order Markov chains. These lumped processes enable parsimonious reparameterizations of Markov chains that help to reveal relevant partitions of the state space. Characterizations of the lumped processes on the original transition probability matrix are derived. Different model selection methods relying either on hypothesis testing or on penalized log-likelihood criteria are presented as well as extensions to lumped processes constructed from high-order Markov chains. The relevance of the proposed approach to lumpability is illustrated by the analysis of DNA sequences. In particular, the use of lumped processes enables to highlight differences between intronic sequences and gene untranslated region sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Investigating Some Attributes of Periodicity in DNA Sequences via Semi-Markov Modelling

General continuous-time Markov model of sequence evolution via insertions/deletions: local alignment probability computation

Article Open access 27 September 2016

General continuous-time Markov model of sequence evolution via insertions/deletions: are alignment probabilities factorable?

Article Open access 17 September 2016

References

Akaike, H.: A new look at the statistical model identification. IEEE Transactions on Automatic Control 19 (6), 716–723 (1974)
Article MathSciNet Google Scholar
Billingsley, P.: Statistical methods in Markov chains. Ann. Math. Stat. 32, 12–40 (1961)
MATH MathSciNet Google Scholar
Bühlmann, P., Wyner, A.J.: Variable length Markov chains. The Ann. Stat. 27 (2), 480–513 (1999)
Google Scholar
Burke, C.J., Rosenblatt, M.: A Markovian function of a Markov chain. Ann. Math. Stat. 29, 1112–1122 (1958)
MATH MathSciNet Google Scholar
Burnham, K.P., Anderson, D.R.: Model Selection and Multimodel Inference. A Practical Information-Theoretic Approach. 2nd edn. New York: Springer, 2002
Csiszár, I., Shields, P.C.: The consistency of the BIC Markov order estimator. The Ann. Stat. 28 (6), 1601–1619 (2000)
Google Scholar
Ephraim, Y., Merhav, N.: Hidden Markov processes. IEEE Tran. Information Theory 48 (6), 1518–1569 (2002)
Article MathSciNet Google Scholar
Feller, W.: An Introduction to Probability Theory and Its Applications, Volume 1, 3rd edn. New York: Wiley, 1968
Guttorp, P.: Stochastic Modeling of Scientific Data. London: Chapman & Hall, 1995
Hall, D.L., Kadafar, K., Malkinson, A.M.: Statistical methodology for assessing homology of intronic regions of genes. The Canadian J. Stat. 26 (3), 455–465 (1998)
Google Scholar
Jansen, R.P.: mRNA localization: message on the move. Nature Reviews Molecular Cell Biol. 2, 247–256 (2001)
Article Google Scholar
Jeffreys, H.: Theory of Probability, 3rd edn. Oxford: Oxford University Press, 1961
Kass, R. E., Raftery, A.E.: Bayes factors. J. American Stat. Association 90, 773–795 (1995)
Article MATH MathSciNet Google Scholar
Katz, R.W.: On some criteria for estimating the order of a Markov chain. Technometrics 23 (3), 243–249 (1981)
Article Google Scholar
Kemeny, J.G., Snell, J.L.: Finite Markov Chains. New York: Springer, 1976
Kulkarni, V.G.: Modeling and Analysis of Stochastic Systems. London: Chapman & Hall, 1995
Lauritzen, S.L.: Graphical Models. Oxford: Oxford University Press, 1996
Macdonald, P.: Diversity in translational regulation. Current Opinion Cell Biol. 13, 326–331 (2001)
Article Google Scholar
Mächler, M., Bühemann, P.: Variable length Markov chains: Methodology, computing and software. J. Computational and Graphical Stat. 13 (2), 435–455 (2004)
Article Google Scholar
Mitchell, P., Tollervey, D.: mRNA turnover. Current Opinion in Cell Biol. 13, 320–325 (2001)
Article Google Scholar
Pesole, G., Liuni, S., Grillo, G., Licciulli, F., Mignone, F., Gissi, C., Saccone, C.: UTRdb and UTRsite: specialized database of sequences and functional elements of 5' and 3' untranslated regions of eukaryotic mRNAs. Nucleic Acids Res. 30, 335–340 (2002)
Article Google Scholar
Prum, B., Rodolphe, F., de Turckheim, E.: Finding words with unexpected frequencies in DNA sequences. J. Royal Stat. Soc. Series B 57, 205–220 (1995)
MATH MathSciNet Google Scholar
Raftery, A.E., Tavaré, S.: Estimation and modelling repeated patterns in high order Markov chains with the mixture transition distribution model. Appl. Stat. 43 (1), 179–199 (1994)
MathSciNet Google Scholar
Reinert, G., Schbath, S., Waterman, M.S.: Probabilistic and statistical properties of words: An overview. J. Comput. Biol. 7 (1/2), 1–46 (2000)
Google Scholar
Robin, S., Daudin, J.J.: Exact distribution of word occurrences in a random sequence of letters. J. Appl. Probability 36, 179–193 (1999)
Article MATH MathSciNet Google Scholar
Rogers, D.F., Plante, R.D.: Estimating equilibrium probabilities for band diagonal Markov chains using aggregation and disaggregation techniques. Computers & Oper. Res. 20, 857–877 (1993)
Article MATH Google Scholar
Ron, D., Singer, Y., Tishby, N.: The power of amnesia: Learning probabilistic automata with variable memory length. Machine Learning 25, 117–149 (1996)
Article MATH Google Scholar
Schwarz, G.: Estimating the dimension of a model. The Ann. Stat. 6 (2), 461–464 (1978)
Google Scholar
Smyth, P., Heckerman, D., Jordan, M.I.: Probabilistic independence networks for hidden Markov probability models. Neural Computation 9, 227–269 (1997)
MATH MathSciNet Google Scholar
Stefanov, V.T.: The intersite distances between pattern occurrences in strings generated by general discrete- and continuous-time models: An algorithmic approach. J. Appl. Probability 40, 881–892 (2003)
Article MATH MathSciNet Google Scholar
Thomas, M.U., Barr, D.R.: An approximate test of Markov chain lumpability. J. American Stat. Association 72, 175–179 (1977)
Article MATH Google Scholar
Weinberger, M.J., Rissanen, J.J., Feder, M.: A universal finite memory source. IEEE Transactions on Information Theory 41 (3), 643–652 (1995)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Unité Mixte de Recherche CIRAD/CNRS/INRA/IRD/Université Montpellier II, Botanique et Bioinformatique de l'Architecture des Plantes, TA 40/PS2, 34398, Montpellier Cedex 5, France
Yann Guédon
Centre de Génétique Moléculaire, CNRS, 91198, Gif sur Yvette Cedex, France
Yves d'Aubenton-Carafa & Claude Thermes

Authors

Yann Guédon
View author publications
You can also search for this author in PubMed Google Scholar
Yves d'Aubenton-Carafa
View author publications
You can also search for this author in PubMed Google Scholar
Claude Thermes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yves d'Aubenton-Carafa.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guédon, Y., d'Aubenton-Carafa, Y. & Thermes, C. Analysing grouping of nucleotides in DNA sequences using lumped processes constructed from Markov chains. J. Math. Biol. 52, 343–372 (2006). https://doi.org/10.1007/s00285-005-0358-y

Download citation

Received: 04 October 2004
Revised: 24 May 2005
Published: 07 February 2006
Issue Date: March 2006
DOI: https://doi.org/10.1007/s00285-005-0358-y

Keywords or phrases

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysing grouping of nucleotides in DNA sequences using lumped processes constructed from Markov chains

Abstract

Access this article

Similar content being viewed by others

Investigating Some Attributes of Periodicity in DNA Sequences via Semi-Markov Modelling

General continuous-time Markov model of sequence evolution via insertions/deletions: local alignment probability computation

General continuous-time Markov model of sequence evolution via insertions/deletions: are alignment probabilities factorable?

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords or phrases

Navigation

Analysing grouping of nucleotides in DNA sequences using lumped processes constructed from Markov chains

Abstract

Access this article

Similar content being viewed by others

Investigating Some Attributes of Periodicity in DNA Sequences via Semi-Markov Modelling

General continuous-time Markov model of sequence evolution via insertions/deletions: local alignment probability computation

General continuous-time Markov model of sequence evolution via insertions/deletions: are alignment probabilities factorable?

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords or phrases

Search

Navigation