CPM 2006: Combinatorial Pattern Matching pp 153-164

# Subsequence Combinatorics and Applications to Microarray Production, DNA Sequencing and Chaining Algorithms

• Sven Rahmann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4009)

## Abstract

We investigate combinatorial enumeration problems related to subsequences of strings; in contrast to substrings, subsequences need not be contiguous. For a finite alphabet Σ, the following three problems are solved. (1) Number of distinct subsequences: Given a sequence s ∈Σ n and a nonnegative integer kn, how many distinct subsequences of length k does s contain? A previous result by Chase states that this number is maximized by choosing s as a repeated permutation of the alphabet. This has applications in DNA microarray production. (2) Number of ρ -restricted ρ -generated sequences: Given s ∈Σ n and integers k ≥1 and ρ≥1, how many distinct sequences in Σ k contain no single nucleotide repeat longer than ρ and can be written as $$s_1^{r_1}\dots s_n^{r_n}$$ with 0≤r i ρ for all i? For ρ= ∞, the question becomes how many length-k sequences match the regular expression s 1 * s 2 * ...s n *. These considerations allow a detailed analysis of a new DNA sequencing technology (“454 sequencing”). (3) Exact length distribution of the longest increasing subsequence: Given Σ= {1,...,K} and an integer n ≥1, determine the number of sequences in Σ n whose longest strictly increasing subsequence has length k, where 0 ≤kK. This has applications to significance computations for chaining algorithms.

## Keywords

Arithmetic Operation Regular Expression Deposition Sequence Alphabet Size Motif Occurrence
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## References

1. 1.
Margulies, M., et al.: Genome sequencing in microfabricated high-density picolitre reactors. Nature 437(7057), 376–380 (2005), Corrigendum in Nature 439(7075), 502 (2006)Google Scholar
2. 2.
Aldous, D., Diaconis, P.: Longest increasing subsequences: From patience sorting to the Baik-Deift-Johansson theorem. Bulletin of the American Mathematical Society 36(4), 413–432 (1999)
3. 3.
Niedermeier, R.: Invitation to Fixed Parameter Algorithms. Oxford University Press, Oxford (2006)
4. 4.
Rahmann, S.: The shortest common supersequence problem in a microarray production setting. In: Proceedings of the 2nd European Conference in Computational Biology (ECCB 2003), pp. ii156–ii161, vol. 19(suppl. 2) of Bioinformatics (2003)Google Scholar
5. 5.
Chase, P.: Subsequence numbers and logarithmic concavity. Discrete Math. 16, 123–140 (1976)
6. 6.
Skiena, S.S.: The Algorithm Design Manual. Springer, Heidelberg (1997)