Skip to main content
Log in

Non-normal Limiting Distribution for Optimal Alignment Scores of Strings in Binary Alphabets

  • Published:
Journal of Statistical Physics Aims and scope Submit manuscript

Abstract

We consider two independent binary i.i.d. random strings X and Y of equal length n and the optimal alignments according to a symmetric scoring functions only. We decompose the space of scoring functions into five components. Two of these components add a part to the optimal score which does not depend on the alignment and which is asymptotically normal. We show that when we restrict the number of gaps sufficiently and add them only into one sequence, then the alignment score can be decomposed into a part which is normal and has order \(O(\sqrt{n})\) and a part which is on a smaller order and tends to a Tracy–Widom distribution. Adding gaps only into one sequence is equivalent to aligning a string with its descendants in case of mutations and deletes. For testing relatedness of strings, the normal part is irrelevant, since it does not depend on the alignment hence it can be safely removed from the test statistic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Borodin, A., Ferrari, P.L., Prahofer, M., Sasamoto, T.: Fluctuation properties of the tasep with periodic initial configuration. J. Stat. Phys. 129, 1055–1080 (2007)

    Article  ADS  MathSciNet  MATH  Google Scholar 

  2. Amsalu, S., Hauser, R., Matzinger, H.: Monte Carlo approach to the fluctuation problem for LCS and optimal alignments. Accepted in 4th issue of Markov Processes and Related Fields. (2013)

  3. Baik, J., Deift, P., Johansson, K.: On the distribution of the length of the longest increasing subsequence of random permutations. J. Am. Math. Soc. 12(4), 1119–1178 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  4. Banjamini, I., Kalai, G., Schramm, O.: First passage percolation has sublinear distance variance. Ann. Probab. 31(4), 1970–1978 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  5. Baryshnikov, Y.: GUEs and queues. Probab. Theory Relat. Fields 119(2), 256–274 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bentkus, V.Y.: Lower bounds for the rate of convergence in the central limit theorem in Banach spaces. Litovsk. Mat. Sb. 25(4), 10–21 (1985)

    MathSciNet  MATH  Google Scholar 

  7. Bonetto, F., Matzinger, H.: Fluctuations of the longest common subsequence in the case of 2- and 3-letter alphabets. Latin Am. J. Probab. Math. 2, 195–216 (2006)

    MathSciNet  MATH  Google Scholar 

  8. Chatterjee, S., Dey, P.S.: Central limit theorem for first-passage percolation time across thin cylinders. Probab. Theory Relat. Fields 156(3–4), 613–663 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  9. Chvatal, V., Sankoff, D.: Longest common subsequences of two random sequences. J. Appl. Probab. 12, 306–315 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  10. Csörgő, M., Révész, P.: Strong approximations in probability and statistics. In: Probability and Mathematical Statistics. Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New York, London (1981)

  11. Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, New York (1998)

    Book  MATH  Google Scholar 

  12. Einmahl, U.: Extensions of results of Komlós, Major, and Tusnády to the multivariate case. J. Multivar. Anal. 28(1), 20–68 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  13. Götze, F.: On the rate of convergence in the multivariate CLT. Ann. Probab. 19(2), 724–739 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  14. Gravner, J., Tracy, C.A., Widom, H.: Limit theorems for height fluctuations in a class of discrete space and time growth models. J. Stat. Phys. 102(5–6), 1085–1132 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  15. Hauser, R., Martinez, S., Matzinger, H.: Large deviation based upper bounds for the LCS-problem. Adv. Appl. Probab. 38, 827–852 (2006)

    Article  MATH  Google Scholar 

  16. Hauser, R., Matzinger, H., Durringer, C.: Approximation to the mean curve in the lcs problem. Stoch. Process. Appl. 118(1), 629–648 (2008)

    MathSciNet  MATH  Google Scholar 

  17. Houdre, C., Lember, J., Matzinger, H.: On the longest common increasing binary subsequence. C. R. Acad. Sci. Paris Ser. I 343, 589–594 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  18. Krug, J., Spohn, H.: Kinetic rougning of growing surfaces. In: Collection Aléa-Saclay: Monographs and Texts in Statistical Physics, vol. 1. Cambridge University Press, Cambridge. (1992)

  19. Johansson, K.: Shape fluctuations and random matrices. Commun. Math. Phys. 209(2), 437–476 (2000)

    Article  ADS  MathSciNet  MATH  Google Scholar 

  20. Kardar, M., Parisi, G., Zhang, Y.C.: Dynamic scaling of growing interfaces. Phys. Rev. Lett. 56(9), 889–892 (1986)

    Article  ADS  MATH  Google Scholar 

  21. Komlós, J., Major, P., Tusnády, G.: An approximation of partial sums of independent \({\rm RV}\)’s and the sample \({\rm DF}\). I. Z. Wahrscheinlichkeitstheorie Verw. Geb. 32, 111–131 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  22. Komlós, J., Major, P., Tusnády, G.: An approximation of partial sums of independent RV’s, and the sample DF. II. Z. Wahrscheinlichkeitstheorie Verw. Geb. 34(1), 33–58 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  23. Lember, J., Matzinger, H.: Standard deviation of the longest common subsequence. Ann. Probab. 37(3), 1192–1235 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  24. Obłój, J.: The Skorokhod embedding problem and its offspring. Probab. Surv. 1, 321–390 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  25. O’Connell, N., Yor, M.: A representation for non-colliding random walks. Electron. Commun. Probab. 7, 1–12 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  26. Okounkov, A., Reshetikhin, N.: Correlation function of Schur process with application to local geometry of a random 3-dimensional Young diagram. J. Am. Math. Soc. 16(3), 581–603 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  27. Pemantle, R., Peres, Y.: Planar first-passage percolation times are not tight. In: Probability and Phase Transition (Cambridge, 1993), volume 420 of NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci., pp. 261–264. Kluwer Academic Publishers, Dordrecht. (1994)

  28. Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. In: Compressed Sensing, pp. 210–268. Cambridge University Press, Cambridge (2012)

  29. Waterman, M.S.: Estimating statistical significance of sequence alignments. Philos. Trans. R. Soc. Lond. B 344, 383–390 (1994)

    Article  ADS  Google Scholar 

Download references

Acknowledgements

We want to thank the reviewers of this paper for the comments which improved the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ionel Popescu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Duan, J.T., Matzinger, H. & Popescu, I. Non-normal Limiting Distribution for Optimal Alignment Scores of Strings in Binary Alphabets. J Stat Phys 168, 1056–1084 (2017). https://doi.org/10.1007/s10955-017-1835-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10955-017-1835-6

Keywords

Navigation