Finite Markov Chain Embedding for the Exact Distribution of Patterns in a Set of Random Sequences
Patterns with “unusual” frequencies are new functional candidate patterns. Their identification is usually achieved by considering an homogeneous m-order Markov model (m≥ 1) of the sequence, allowing the computation of p-values. For practical reasons, stationarity of the model is often assumed. This approximation can result in some artifacts especially when a large set of small sequences is considered. In this work, an exact method, able to take into account both nonstationarity and fragmentary structure of sequences, is applied on a simulated and a real set of sequences. This illustrates that pattern statistics can be very sensitive to the stationary assumption.
Keywords and phrases:stationary distribution pattern Markov chain biological patterns finite Markov chain embedding
Unable to display preview. Download preview PDF.
- Myles, H. and Douglas, A. W. (1973). Nonparametric Statistical Inference. John Wiley & Sons: 185–194.Google Scholar
- Nuel, G. (2006). Effective p-value computations using FiniteMarkov Chain Imbedding (FMCI): Application to local score and to pattern statistics. Algo. Mol. Biol., 1(5).Google Scholar
- Nuel, G. and Prum, B. (2007). Analyse statistique des séquences biologiques: modélisation markovienne, alignements et motifs. Hermes, Paris, in Press.Google Scholar
- Regad, L., Martin, J. and Camproux, A. C. (2006). Identification of non random motifs in loops using a structural alphabet. Proceedings of IEEE Symposium on Computational Intelligence in Bioinformatics and Computational: 92–100.Google Scholar