Skip to main content

Finite Markov Chain Embedding for the Exact Distribution of Patterns in a Set of Random Sequences

  • Chapter
  • First Online:

Part of the book series: Statistics for Industry and Technology ((SIT))

Abstract:

Patterns with “unusual” frequencies are new functional candidate patterns. Their identification is usually achieved by considering an homogeneous m-order Markov model (m≥ 1) of the sequence, allowing the computation of p-values. For practical reasons, stationarity of the model is often assumed. This approximation can result in some artifacts especially when a large set of small sequences is considered. In this work, an exact method, able to take into account both nonstationarity and fragmentary structure of sequences, is applied on a simulated and a real set of sequences. This illustrates that pattern statistics can be very sensitive to the stationary assumption.

Juliette Martin and Leslie Regad equally contributed Anne-Claude Camproux is corresponding author

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   149.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Camproux, A. C., Gautier, R. and Tufféry, T. (2004). A hidden Markovmodel derivated structural alphabet for proteins. J. Mol. Biol., 339: 561–605.

    Article  Google Scholar 

  • Durinck, S., Moreau, Y., Kasprzyk, A., Davis, S., De Moor, B., Brazma, A. and Huber, W. (2005). BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics, 21, 16: 3439–3440.

    Article  Google Scholar 

  • Fu, J. C. and Koutras, M. V. (1994). Distribution theory of runs: A Markov chain approach. J. Am. Statist. Assoc.89: 1050–1058.

    Article  MathSciNet  MATH  Google Scholar 

  • Myles, H. and Douglas, A. W. (1973). Nonparametric Statistical Inference. John Wiley & Sons: 185–194.

    Google Scholar 

  • Nuel, G. (2006). Effective p-value computations using FiniteMarkov Chain Imbedding (FMCI): Application to local score and to pattern statistics. Algo. Mol. Biol., 1(5).

    Google Scholar 

  • Nuel, G. (2008). Pattern Markov chains: optimal Markov chain embedding through deterministic finite automata. J. Appl. Prob., 45: 226–243.

    Article  MathSciNet  MATH  Google Scholar 

  • Nuel, G. and Prum, B. (2007). Analyse statistique des séquences biologiques: modélisation markovienne, alignements et motifs. Hermes, Paris, in Press.

    Google Scholar 

  • Regad, L., Guyon, F., Maupetit, J., Tufféry, P. and Camproux, A. C. (2008). A hidden Markov model applied to the protein 3D structure analysis. Comput. Statist. Data Anal., 52: 3198–3207.

    Article  MathSciNet  MATH  Google Scholar 

  • Regad, L., Martin, J. and Camproux, A. C. (2006). Identification of non random motifs in loops using a structural alphabet. Proceedings of IEEE Symposium on Computational Intelligence in Bioinformatics and Computational: 92–100.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juliette Martin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Birkhäuser Boston

About this chapter

Cite this chapter

Martin, J., Regad, L., Camproux, AC., Nuel, G. (2010). Finite Markov Chain Embedding for the Exact Distribution of Patterns in a Set of Random Sequences. In: Skiadas, C. (eds) Advances in Data Analysis. Statistics for Industry and Technology. Birkhäuser Boston. https://doi.org/10.1007/978-0-8176-4799-5_16

Download citation

Publish with us

Policies and ethics