Abstract:
Patterns with “unusual” frequencies are new functional candidate patterns. Their identification is usually achieved by considering an homogeneous m-order Markov model (m≥ 1) of the sequence, allowing the computation of p-values. For practical reasons, stationarity of the model is often assumed. This approximation can result in some artifacts especially when a large set of small sequences is considered. In this work, an exact method, able to take into account both nonstationarity and fragmentary structure of sequences, is applied on a simulated and a real set of sequences. This illustrates that pattern statistics can be very sensitive to the stationary assumption.
Keywords and phrases:
Juliette Martin and Leslie Regad equally contributed Anne-Claude Camproux is corresponding author
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Camproux, A. C., Gautier, R. and Tufféry, T. (2004). A hidden Markovmodel derivated structural alphabet for proteins. J. Mol. Biol., 339: 561–605.
Durinck, S., Moreau, Y., Kasprzyk, A., Davis, S., De Moor, B., Brazma, A. and Huber, W. (2005). BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics, 21, 16: 3439–3440.
Fu, J. C. and Koutras, M. V. (1994). Distribution theory of runs: A Markov chain approach. J. Am. Statist. Assoc.89: 1050–1058.
Myles, H. and Douglas, A. W. (1973). Nonparametric Statistical Inference. John Wiley & Sons: 185–194.
Nuel, G. (2006). Effective p-value computations using FiniteMarkov Chain Imbedding (FMCI): Application to local score and to pattern statistics. Algo. Mol. Biol., 1(5).
Nuel, G. (2008). Pattern Markov chains: optimal Markov chain embedding through deterministic finite automata. J. Appl. Prob., 45: 226–243.
Nuel, G. and Prum, B. (2007). Analyse statistique des séquences biologiques: modélisation markovienne, alignements et motifs. Hermes, Paris, in Press.
Regad, L., Guyon, F., Maupetit, J., Tufféry, P. and Camproux, A. C. (2008). A hidden Markov model applied to the protein 3D structure analysis. Comput. Statist. Data Anal., 52: 3198–3207.
Regad, L., Martin, J. and Camproux, A. C. (2006). Identification of non random motifs in loops using a structural alphabet. Proceedings of IEEE Symposium on Computational Intelligence in Bioinformatics and Computational: 92–100.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Birkhäuser Boston
About this chapter
Cite this chapter
Martin, J., Regad, L., Camproux, AC., Nuel, G. (2010). Finite Markov Chain Embedding for the Exact Distribution of Patterns in a Set of Random Sequences. In: Skiadas, C. (eds) Advances in Data Analysis. Statistics for Industry and Technology. Birkhäuser Boston. https://doi.org/10.1007/978-0-8176-4799-5_16
Download citation
DOI: https://doi.org/10.1007/978-0-8176-4799-5_16
Published:
Publisher Name: Birkhäuser Boston
Print ISBN: 978-0-8176-4798-8
Online ISBN: 978-0-8176-4799-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)