Abstract
Let X = X 1 ... X n and Y = Y 1 ... Y n be two binary sequences with length n. A common subsequence of X and Y is any subsequence of X that at the same time is a subsequence of Y; The common subsequence with maximal length is called the longest common subsequence (LCS) of X and Y. LCS is a common tool for measuring the closeness of X and Y. In this note, we consider the case when X and Y are both i.i.d. Bernoulli sequences with the parameters ϵ and 1 − ϵ, respectively. Hence, typically the sequences consist of large and short blocks of different colors. This gives an idea to the so-called block-by-block alignment, where the short blocks in one sequence are matched to the long blocks of the same color in another sequence. Such and alignment is not necessarily a LCS, but it is computationally easy to obtain and, therefore, of practical interest. We investigate the asymptotical properties of several block-by-block type of alignments. The paper ends with the simulation study, where the of block-by-block type of alignments are compared with the LCS.
Similar content being viewed by others
References
Alexander KS (1994) The rate of convergence of the mean length of the longest common subsequence. Ann Appl Probab 4(4):1074–1082
Arratia R, Waterman MS (1994) A phase transition for the score in matching random sequences allowing deletions. Ann Appl Probab 4(1):200–225
Booth H, MacNamara S, Nielsen O, Wilson S (2004) An iterative approach to determine the length of the longest common subsequence of two strings. Methodol Comput Appl Probab 6:401–421
Christianini N, Hahn MW (2007) Introduction to computational Genomics. Cambridge University Press
Deonier R, Tavare S, Waterman M (2005) Computational Genome analysis. An introduction. Springer
Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press
Durrett R (2005) Probability: theory and examples. Thompson
Hauser R, Matzinger H, Durringer C (2008) Approximation to the mean curve in the lcs-problem. Stochastic Proc Appl 118(4):629–648
Kiwi MA, Loebl M, Matousek J (2005) Expected length of the longest common subsequence for large alphabets. Adv Math 197(2):480–498
Lember J, Matzinger H (2009) Standard deviation of the longest common subsequence. Ann Probab 37(3):1198–1235
Waterman MS (1995) Introduction to computational biology. Chapman & Hall
Waterman MS, Vingron M (1994) Sequence comparison significance and Poisson approximation. Statistical Science 9(3):367–381
Author information
Authors and Affiliations
Corresponding author
Additional information
J. Lember is partially supported by Estonian Science Foundation Grant nr. 7553 and SFB 701 of Bielefeld University.
M. Toots is partially supported by Estonian Science Foundation Grant nr. 7553.
Rights and permissions
About this article
Cite this article
Barder, S., Lember, J., Matzinger, H. et al. On Suboptimal LCS-alignments for Independent Bernoulli Sequences with Asymmetric Distributions. Methodol Comput Appl Probab 14, 357–382 (2012). https://doi.org/10.1007/s11009-010-9206-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11009-010-9206-7