Finite Markov Chain Embedding for the Exact Distribution of Patterns in a Set of Random Sequences

  • Juliette Martin
  • Leslie Regad
  • Anne-Claude Camproux
  • Grégory Nuel
Chapter
Part of the Statistics for Industry and Technology book series (SIT)

Abstract:

Patterns with “unusual” frequencies are new functional candidate patterns. Their identification is usually achieved by considering an homogeneous m-order Markov model (m≥ 1) of the sequence, allowing the computation of p-values. For practical reasons, stationarity of the model is often assumed. This approximation can result in some artifacts especially when a large set of small sequences is considered. In this work, an exact method, able to take into account both nonstationarity and fragmentary structure of sequences, is applied on a simulated and a real set of sequences. This illustrates that pattern statistics can be very sensitive to the stationary assumption.

Keywords and phrases:

stationary distribution pattern Markov chain biological patterns finite Markov chain embedding 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Camproux, A. C., Gautier, R. and Tufféry, T. (2004). A hidden Markovmodel derivated structural alphabet for proteins. J. Mol. Biol., 339: 561–605.CrossRefGoogle Scholar
  2. Durinck, S., Moreau, Y., Kasprzyk, A., Davis, S., De Moor, B., Brazma, A. and Huber, W. (2005). BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics, 21, 16: 3439–3440.CrossRefGoogle Scholar
  3. Fu, J. C. and Koutras, M. V. (1994). Distribution theory of runs: A Markov chain approach. J. Am. Statist. Assoc.89: 1050–1058.CrossRefMathSciNetMATHGoogle Scholar
  4. Myles, H. and Douglas, A. W. (1973). Nonparametric Statistical Inference. John Wiley & Sons: 185–194.Google Scholar
  5. Nuel, G. (2006). Effective p-value computations using FiniteMarkov Chain Imbedding (FMCI): Application to local score and to pattern statistics. Algo. Mol. Biol., 1(5).Google Scholar
  6. Nuel, G. (2008). Pattern Markov chains: optimal Markov chain embedding through deterministic finite automata. J. Appl. Prob., 45: 226–243.CrossRefMathSciNetMATHGoogle Scholar
  7. Nuel, G. and Prum, B. (2007). Analyse statistique des séquences biologiques: modélisation markovienne, alignements et motifs. Hermes, Paris, in Press.Google Scholar
  8. Regad, L., Guyon, F., Maupetit, J., Tufféry, P. and Camproux, A. C. (2008). A hidden Markov model applied to the protein 3D structure analysis. Comput. Statist. Data Anal., 52: 3198–3207.CrossRefMathSciNetMATHGoogle Scholar
  9. Regad, L., Martin, J. and Camproux, A. C. (2006). Identification of non random motifs in loops using a structural alphabet. Proceedings of IEEE Symposium on Computational Intelligence in Bioinformatics and Computational: 92–100.Google Scholar

Copyright information

© Birkhäuser Boston 2010

Authors and Affiliations

  • Juliette Martin
    • 1
    • 2
    • 3
    • 4
  • Leslie Regad
    • 2
    • 5
  • Anne-Claude Camproux
    • 2
    • 5
  • Grégory Nuel
    • 6
    • 7
  1. 1.Unité Mathématique Informatique et génome UR1077INRAJouy-en-JosasFrance
  2. 2.Equipe de Bioinformatique Génomique et MoléculaireINSERM UMR-S726/UniversitéParisFrance
  3. 3.Université de LyonLyonFrance
  4. 4.Institut de Biologie et Chime des ProtéinesUniversité Lyon 1; IFR 128, CNRS, UMR 5086, IBCPLyonFrance
  5. 5.MTI, Inserm UMR-S 973; Université Denis Diderot Paris 7ParisFrance
  6. 6.CNRSParisFrance
  7. 7.MAP5 UMR CNRS 8145, Laboratory of Applied Mathematics, Department of Mathematics and Computer ScienceUniversité Paris DescartesParisFrance

Personalised recommendations