Skip to main content
Log in

Exact Distribution of the Local Score for Markovian Sequences

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

Let \(\mathbb{A} = (A_i)_{1\leq i\leq n}\) be a sequence of letters taken in a finite alphabet Θ. Let \(s : \Theta \rightarrow \mathbb{Z}\) be a scoring function and \(\mathbb{X} = (X_i)_{1\leq i\leq n}\) the corresponding score sequence where X i = s(A i ). The local score is defined as follows: \(H_n=\max_{1\leq i\leq j\leq n}\sum_{k=i}^{j}X_k\). We provide the exact distribution of the local score in random sequences in several models. We will first consider a Markov model on the score sequence \(\mathbb{X}\), and then on the letter sequence \(\mathbb{A}\). The exact P-value of the local score obtained with both models are compared thanks to several datasets. They are also compared with previous results using the independent model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Arratia R., Waterman M.-S. (1994). A phase transition for the score in matching random sequences allowing deletions. Annals of Applied Probability 4, 200–225

    MATH  MathSciNet  Google Scholar 

  • Bacro J.-N., Daudin J.-J., Mercier S., Robin S. (2003). Back to the local score in the algorithmic case: a direct and simple proof. Annals of the Institute of Statistical Mathematics 54(4): 748–757

    Article  MathSciNet  Google Scholar 

  • Bailey T.L., Gribskov M. (2002). Estimating and evaluating the statistics of gapped local-alignment scores. Journal of Computational Biology 9(3): 575–593

    Article  Google Scholar 

  • Daudin J.-J., Etienne M.-P., Valois P. (2003). Asymptotic behavior of the local score of independent and identically distributed random sequences. Stochastic Processes and their Applications 107, 1–28

    Article  MATH  MathSciNet  Google Scholar 

  • Durbin R., Eddy S., Krogh A., Mitchison G. (1998). Biological sequence analysis. probabilistic models of proteins and nucleic acids. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Ewens W. (2002) Statistical methods in bioinformatics. Springer, Berlin Heidelberg New York

    Google Scholar 

  • Hassenforder C., Mercier S. (2003). Exact Distribution for the local score of a Markov chain. Comptes rendus de l’Académie des sciences 336(10): 863–868

    MATH  MathSciNet  Google Scholar 

  • Karlin S., Altschul S.F. (1990). Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of National Academy of Sciences, USA 87, 2264–2268

    Article  MATH  Google Scholar 

  • Karlin S., Dembo A. (1992). Limit distributions of maximal segmental score among Markov-dependent partial sums. Advances in Applied Probability 24, 113–140

    Article  MATH  MathSciNet  Google Scholar 

  • Karlin S., Taylor H.M. (1981). A second course in stochastic processes. Academic, New York

    MATH  Google Scholar 

  • Kyte J., Doolittle R.F. (1982). A simple method for displaying the hydrophatic character of a protein. Journal of Molecular Biology 157, 105–132

    Article  Google Scholar 

  • Mercier, S., Cellier, D., Charlot, F., Daudin, J.-J. (2001). Exact and asymptotic distribution for the local score of one I.I.D. Random sequence. Lecture Notes in Computational Science, volume for JOBIM 2000, 2066, 74–85.

  • Mercier S., Daudin J.-J. (2001). Exact distribution for the local score of one I.I.D. Random sequence. Journal of Computational Biology 8(4): 373–380

    Article  Google Scholar 

  • Mott R.F. (2000). Accurate formula for P-values of gapped local score and profile alignments. Journal of Molecular Biology 300, 649–659

    Article  Google Scholar 

  • Prum, B. (2001). Probabilités, statistique et génomes. Matapli, 64.

  • Nuel, G. (2006). Exact distribution of local score using Finite Markov Chain Imbedding: an effective approach. ICAM 2006, Santiago, Chile.

  • Robert, C. (1996). Méthodes de Monte Carlo par Chaînes de Markov (Economica).

  • Waterman M.S. (1995). Introduction to computational biology. Chapman and Hall, London

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sabine Mercier.

About this article

Cite this article

Hassenforder, C., Mercier, S. Exact Distribution of the Local Score for Markovian Sequences. AISM 59, 741–755 (2007). https://doi.org/10.1007/s10463-006-0064-6

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-006-0064-6

Keywords

Navigation