Abstract
We study the sequence alignment problem and its independent version, the discrete Hammersley process with an exploration penalty. We obtain rigorous upper bounds for the number of optimality regions in both models near the soft edge. At zero penalty the independent model becomes an exactly solvable model and we identify cases for which the law of the last passage time converges to a Tracy-Widom law.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Aluru, S.: Handbook of computational molecular biology. Chapman & Hall/CRC Computer and Information Science Series. Chapman & Hall/CRC, Boca Raton (2006)
Amsalu, S., Matzinger, H., Vachkovskaia, M.: Thermodynamical approach to the longest common subsequence problem. J. Stat. Phys. 131(6), 1103–1120 (2008)
Apostol, T. M.: Introduction to analytic number theory, 5th edition ed. Undergraduate Texts in Mathematics. Springer (1995)
Baryshnikov, Y.: GUEs queues. Prob. Theory Relat. Fields 119, 256–274 (2001)
Basdevant, A.-L., Enriquez, N., Gerin, L., Gouéré, J.-B.: Discrete Hammersley’s lines with sources and sinks. ALEA Lat. Am. J. Probab. Math. Stat 13, 33–52 (2016)
Bergroth, L., Hakonen, H., Raita, T.: A survey of longest common subsequence algorithms. SPIRE 00, 39–48 (2000)
Bodineau, T., Martin, J.: A universality property for last-passage percolation close to the axis. Electron. Commun. Probab. 10(11), 105–112 (2005)
Chvátal, V., Sankoff, D.: Longest common subsequences of two random sequences. J. Appl. Probab. 12(2), 306–315 (1975)
Cramèr, H.: Sur un nouveau théorème limite de la probabilité. Actualités Sci. Industr 736, 5–23 (1938)
Dewey, C. N., Huggins, P. M., Woods, K., Sturmfels, B., Pachter, L.: Parametric alignment of drosophila genomes. PLoS Comput. Biol. 2(6), e73 (2006)
Fernández-Baca, D., Seppäläinen, T., Slutzki, G.: Bounds for parametric sequence comparison. Discrete Appl. Math 118, 181–198 (2002)
Fernȧndez-Baca, D., Venkatachalam, B.: Parametric sequence alignment. CRC Press Computer and Information Science Series. Chapman and Hall (2006)
Georgiou, N.: Soft edge results for longest increasing paths on the planar lattice. Electron. J. Probab. 15, 1–13 (2010)
Georgiou, N., Rassoul-Agha, F., Seppäläinen, T.: Variational formulas and cocycle solutions for directed polymer and percolation models. Commun. Math. Phys. 346(2), 741–779 (2016)
Georgiou, N., Rassoul-Agha F., Seppäläinen, T.: Stationary cocycles and Busemann functions for the corner growth model. Probab. Theory Relat. Fields. https://doi.org/10.1007/s00440-016-0729-x (2016)
Georgiou, N., Rassoul-Agha, F., Seppäläinen, T.: Geodesics and the competition interface for the corner growth model. Probab. Theory Relat. Fields. https://doi.org/10.1007/s00440-016-0734-0 (2016)
Glynn, P. W., Whitt, W.: Departures from many queues in series. Ann. Appl. Probab 1(4), 546–572 (1991)
Gong, R., Houdré, C., Lember, J.: Lower bounds on the generalized central moments of the optimal alignments score of random sequences. arXiv:1506.06067 (2015)
Gusfield, D., Balasubramanian, K., Naor, D.: Parametric optimization of sequence alignment. Algorithmica 12(4-5), 312–326 (1994)
Hammersley, J. M.: A few seedlings of research. In: Proceedings of the 6th berkeley symposium on mathematical statistics and probability (University California, Berkeley, Calif., 1970/1971), Vol. I: Theory of statistics, pp 345–394. University California Press, Berkeley (1972)
Henikoff, S., Henikoff, J. G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89(22), 10915–10919 (1992)
Hirschberg, D. S.: A linear space algorithm for computing maximal common subsequences. Commun. ACM 18(6), 341–343 (1975)
Houdré, C., Matzinger, H.: Closeness to the diagonal for longest common subsequences in random words. Electron. Commun. Probab. 21(36), 1–19 (2016)
Hower, V., Heitsch, C. E.: Parametric analysis of RNA branching configurations. Bull. Math. Biol. 73(4), 754–776 (2011)
Kiwi, M., Loebl, M., Matouṡek, J.: Expected length of the longest common subsequence for large alphabets. Adv. Math 197(2), 480–498 (2005)
Komlós, J., Major, P., Tusnády, G.: An approximation of partial sums of independent RV’s, and the sample DF. II. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 34, 33–58 (1976)
Lember, J., Matzinger, H.: Standard deviation of the longest common subsequence. Ann. Probab. 37(3), 1192–1235 (2009)
Lember, J., Matzinger, H., Vollmer, A.: Optimal alignments of longest common subsequences and their path properties. Bernoulli 20(3), 1292–1343 (2014)
Maier, D.: The complexity of some problems on subsequences and supersequences. J. ACM 25(2), 322–336 (1987)
Malaspinas, A. S., Eriksson, N., Huggins, P. M.: Parametric analysis of alignment and phylogenetic uncertainty. Bull. Math. Biol. 73(4), 795–810 (2011)
Martin, J. B.: Limiting shape for directed percolation models. Ann. Probab. 32(4), 2908–2937 (2004)
Masek, W. J., Paterson, M. S.: A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20(1), 18–31 (1980)
Myers, E. W., Miller, W.: Optimal alignments in linear space. Comput. Appl. Biosci. 4(1), 11–17 (1988)
Needleman, S. B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol 48, 443–453 (1970)
Ng, P. C., Henikoff, S.: Predicting deleterious amino acid substitutions. Genome Res. 11(5), 863–874 (2001)
O’Connell, N., Yor, M.: Brownian analogues of Burke’s theorem. Stoch. Proc. Appl. 96, 285–304 (2001)
Pachter, L., Sturmfels, B.: Parametric inference for biological sequence analysis. Proc. Natl. Acad. Sci. U.S.A. 101(46), 16138–16143 (2004)
Pachter, L., Sturmfels, B.: Algebraic statistics for computational biology. Cambridge University Press, New York (2005)
Priezzev, V. B., Schütz, G. M.: Exact solution of the Bernoulli matching model of sequence alignment. J. Stat. Mech. Theor. Exp. 2008(09), P09007 (2008)
Seppäläinen, T.: Increasing sequences of independent points on the planar lattice. Ann. Appl. Probab. 7(4), 886–898 (1997)
Seppäläinen, T.: A scaling limit for queues in series. Ann. Appl. Probab. 7 (4), 855–872 (1997)
Smith, T. F., Waterman, M. S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
Tracy, C. A., Widom, H.: Level-spacing distributions and the airy kernel. Comm. Math. Phys. 159(1), 151–174 (1994)
Vingron, M., Waterman, M. S.: Sequence alignment and penalty choice. review of concepts, case studies and implications. J. Mol. Biol 235(1), 1–12 (1994)
Vinzant, C.: Lower bounds for optimal alignments of binary sequences. Discret. Appl. Math. 157, 3341–3346 (2009)
Xia, X.: Bioinformatics and the cell: Modern computational approaches in genomics, Proteomics and transcriptomics. Springer, Berlin (2007)
Acknowledgments
We thank the anonymous referees for their helpful comments, which have led to a much improved version of the article. We also thank Lior Pachter for pointing out the reference [45] to us.
Author information
Authors and Affiliations
Corresponding author
Additional information
NG was partially supported by the University of Sussex Strategic development Fund (SDF) and by the EPSRC First grant EP/P021409/1: The flat edge in last passage percolation. JO was partially supported by an ISM-CRM fellowship and a Concordia Horizon fellowship
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Georgiou, N., Ortmann, J. Optimality Regions and Fluctuations for Bernoulli Last Passage Models. Math Phys Anal Geom 21, 22 (2018). https://doi.org/10.1007/s11040-018-9276-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11040-018-9276-2
Keywords
- Soft edge
- Edge results
- Optimality regions
- Sequence alignment
- Discrete Hammersley process
- Longest common subsequence
- Bernoulli increasing paths
- Tracy-Widom distribution
- Last passage time
- Corner growth models
- Flat edge