# A Linear-Space Algorithm for the Substring Constrained Alignment Problem

## Abstract

In a string similarity metric adopting affine gap penalties, we propose a quadratic-time, linear-space algorithm for the following constrained string alignment problem. The input of the problem is a pair of strings to be aligned and a pattern given as a string. Let an occurrence of the pattern in a string be a minimal substring of the string that is most similar to the pattern. Then, the output of the problem is a highest-scoring alignment of the pair of strings that matches an occurrence of the pattern in one string and an occurrence of the pattern in the other, where the score of the alignment excludes the similarity between the matched occurrences of the pattern. This problem may arise when we know that each of the strings has exactly one meaningful occurrence of the pattern and want to determine a putative pair of such occurrences based on homology of the strings.

## References

- 1.Arslan, A.N.: Regular expression constrained sequence alignment. J. Discrete Algorithms
**5**, 647–661 (2007)MathSciNetCrossRefzbMATHGoogle Scholar - 2.Chen, Y.-C., Chao, K.-M.: On the generalized constrained longest common subsequence problems. J. Comb. Optim.
**21**, 383–392 (2011)MathSciNetCrossRefzbMATHGoogle Scholar - 3.Chung, Y.-S., Lu, C.L., Tang, C.Y.: Efficient algorithms for regular expression constrained sequence alignment. Inf. Process. Lett.
**103**, 240–246 (2007)MathSciNetCrossRefzbMATHGoogle Scholar - 4.Chin, F.Y.L., De Santis, A., Ferrara, A.L., Ho, N.L., Kim, S.K.: A simple algorithm for the constrained sequence problems. Inf. Process. Lett.
**90**, 175–179 (2004)MathSciNetCrossRefzbMATHGoogle Scholar - 5.Deorowicz, S.: Quadratic-time algorithm for a string constrained LCS problem. Inf. Process. Lett.
**112**, 423–426 (2012)MathSciNetCrossRefzbMATHGoogle Scholar - 6.Gotoh, O.: An improved algorithm for matching biological sequences. J. Mol. Biol.
**162**, 705–708 (1982)CrossRefGoogle Scholar - 7.Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1997)CrossRefzbMATHGoogle Scholar
- 8.Hirschberg, D.S.: Algorithms for the longest common subsequence problem. J. ACM
**24**, 664–675 (1977)MathSciNetCrossRefzbMATHGoogle Scholar - 9.Kucherov, G., Pinhas, T., Ziv-Ukelson, M.: Regular language constrained sequence alignment revisited. J. Comput. Biol.
**18**, 771–781 (2011)MathSciNetCrossRefzbMATHGoogle Scholar - 10.Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol.
**48**, 443–453 (1970)CrossRefGoogle Scholar - 11.Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol.
**147**, 195–197 (1981)CrossRefGoogle Scholar - 12.Tsai, Y.-T.: The constrained longest common subsequence problem. Inf. Process. Lett.
**88**, 173–176 (2003)MathSciNetCrossRefzbMATHGoogle Scholar