Abstract
Let \(\mathbb{A} = (A_i)_{1\leq i\leq n}\) be a sequence of letters taken in a finite alphabet Θ. Let \(s : \Theta \rightarrow \mathbb{Z}\) be a scoring function and \(\mathbb{X} = (X_i)_{1\leq i\leq n}\) the corresponding score sequence where X i = s(A i ). The local score is defined as follows: \(H_n=\max_{1\leq i\leq j\leq n}\sum_{k=i}^{j}X_k\). We provide the exact distribution of the local score in random sequences in several models. We will first consider a Markov model on the score sequence \(\mathbb{X}\), and then on the letter sequence \(\mathbb{A}\). The exact P-value of the local score obtained with both models are compared thanks to several datasets. They are also compared with previous results using the independent model.
Similar content being viewed by others
References
Arratia R., Waterman M.-S. (1994). A phase transition for the score in matching random sequences allowing deletions. Annals of Applied Probability 4, 200–225
Bacro J.-N., Daudin J.-J., Mercier S., Robin S. (2003). Back to the local score in the algorithmic case: a direct and simple proof. Annals of the Institute of Statistical Mathematics 54(4): 748–757
Bailey T.L., Gribskov M. (2002). Estimating and evaluating the statistics of gapped local-alignment scores. Journal of Computational Biology 9(3): 575–593
Daudin J.-J., Etienne M.-P., Valois P. (2003). Asymptotic behavior of the local score of independent and identically distributed random sequences. Stochastic Processes and their Applications 107, 1–28
Durbin R., Eddy S., Krogh A., Mitchison G. (1998). Biological sequence analysis. probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge
Ewens W. (2002) Statistical methods in bioinformatics. Springer, Berlin Heidelberg New York
Hassenforder C., Mercier S. (2003). Exact Distribution for the local score of a Markov chain. Comptes rendus de l’Académie des sciences 336(10): 863–868
Karlin S., Altschul S.F. (1990). Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of National Academy of Sciences, USA 87, 2264–2268
Karlin S., Dembo A. (1992). Limit distributions of maximal segmental score among Markov-dependent partial sums. Advances in Applied Probability 24, 113–140
Karlin S., Taylor H.M. (1981). A second course in stochastic processes. Academic, New York
Kyte J., Doolittle R.F. (1982). A simple method for displaying the hydrophatic character of a protein. Journal of Molecular Biology 157, 105–132
Mercier, S., Cellier, D., Charlot, F., Daudin, J.-J. (2001). Exact and asymptotic distribution for the local score of one I.I.D. Random sequence. Lecture Notes in Computational Science, volume for JOBIM 2000, 2066, 74–85.
Mercier S., Daudin J.-J. (2001). Exact distribution for the local score of one I.I.D. Random sequence. Journal of Computational Biology 8(4): 373–380
Mott R.F. (2000). Accurate formula for P-values of gapped local score and profile alignments. Journal of Molecular Biology 300, 649–659
Prum, B. (2001). Probabilités, statistique et génomes. Matapli, 64.
Nuel, G. (2006). Exact distribution of local score using Finite Markov Chain Imbedding: an effective approach. ICAM 2006, Santiago, Chile.
Robert, C. (1996). Méthodes de Monte Carlo par Chaînes de Markov (Economica).
Waterman M.S. (1995). Introduction to computational biology. Chapman and Hall, London
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Hassenforder, C., Mercier, S. Exact Distribution of the Local Score for Markovian Sequences. AISM 59, 741–755 (2007). https://doi.org/10.1007/s10463-006-0064-6
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-006-0064-6