Skip to main content
Log in

On Suboptimal LCS-alignments for Independent Bernoulli Sequences with Asymmetric Distributions

  • Published:
Methodology and Computing in Applied Probability Aims and scope Submit manuscript

Abstract

Let X = X 1 ... X n and Y = Y 1 ... Y n be two binary sequences with length n. A common subsequence of X and Y is any subsequence of X that at the same time is a subsequence of Y; The common subsequence with maximal length is called the longest common subsequence (LCS) of X and Y. LCS is a common tool for measuring the closeness of X and Y. In this note, we consider the case when X and Y are both i.i.d. Bernoulli sequences with the parameters ϵ and 1 − ϵ, respectively. Hence, typically the sequences consist of large and short blocks of different colors. This gives an idea to the so-called block-by-block alignment, where the short blocks in one sequence are matched to the long blocks of the same color in another sequence. Such and alignment is not necessarily a LCS, but it is computationally easy to obtain and, therefore, of practical interest. We investigate the asymptotical properties of several block-by-block type of alignments. The paper ends with the simulation study, where the of block-by-block type of alignments are compared with the LCS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alexander KS (1994) The rate of convergence of the mean length of the longest common subsequence. Ann Appl Probab 4(4):1074–1082

    Article  MathSciNet  MATH  Google Scholar 

  • Arratia R, Waterman MS (1994) A phase transition for the score in matching random sequences allowing deletions. Ann Appl Probab 4(1):200–225

    Article  MathSciNet  MATH  Google Scholar 

  • Booth H, MacNamara S, Nielsen O, Wilson S (2004) An iterative approach to determine the length of the longest common subsequence of two strings. Methodol Comput Appl Probab 6:401–421

    Article  MathSciNet  MATH  Google Scholar 

  • Christianini N, Hahn MW (2007) Introduction to computational Genomics. Cambridge University Press

  • Deonier R, Tavare S, Waterman M (2005) Computational Genome analysis. An introduction. Springer

  • Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press

  • Durrett R (2005) Probability: theory and examples. Thompson

  • Hauser R, Matzinger H, Durringer C (2008) Approximation to the mean curve in the lcs-problem. Stochastic Proc Appl 118(4):629–648

    Article  MathSciNet  MATH  Google Scholar 

  • Kiwi MA, Loebl M, Matousek J (2005) Expected length of the longest common subsequence for large alphabets. Adv Math 197(2):480–498

    Article  MathSciNet  MATH  Google Scholar 

  • Lember J, Matzinger H (2009) Standard deviation of the longest common subsequence. Ann Probab 37(3):1198–1235

    Article  MathSciNet  Google Scholar 

  • Waterman MS (1995) Introduction to computational biology. Chapman & Hall

  • Waterman MS, Vingron M (1994) Sequence comparison significance and Poisson approximation. Statistical Science 9(3):367–381

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jüri Lember.

Additional information

J. Lember is partially supported by Estonian Science Foundation Grant nr. 7553 and SFB 701 of Bielefeld University.

M. Toots is partially supported by Estonian Science Foundation Grant nr. 7553.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barder, S., Lember, J., Matzinger, H. et al. On Suboptimal LCS-alignments for Independent Bernoulli Sequences with Asymmetric Distributions. Methodol Comput Appl Probab 14, 357–382 (2012). https://doi.org/10.1007/s11009-010-9206-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11009-010-9206-7

Keywords

AMS 2000 Subject Classifications

Navigation