# Subsequence Combinatorics and Applications to Microarray Production, DNA Sequencing and Chaining Algorithms

## Abstract

We investigate combinatorial enumeration problems related to subsequences of strings; in contrast to substrings, subsequences need not be contiguous. For a finite alphabet Σ, the following three problems are solved. **(1) Number of distinct subsequences**: Given a sequence *s* ∈Σ^{ n } and a nonnegative integer *k* ≤*n*, how many distinct subsequences of length *k* does *s* contain? A previous result by Chase states that this number is maximized by choosing *s* as a repeated permutation of the alphabet. This has applications in DNA microarray production. **(2) Number of ** *ρ* **-restricted ** *ρ* **-generated sequences**: Given *s* ∈Σ^{ n } and integers *k* ≥1 and *ρ*≥1, how many distinct sequences in Σ^{ k } contain no single nucleotide repeat longer than *ρ* and can be written as \(s_1^{r_1}\dots s_n^{r_n}\) with 0≤*r* _{ i } ≤*ρ* for all *i*? For *ρ*= ∞, the question becomes how many length-*k* sequences match the regular expression *s* _{1} * *s* _{2} * ...*s* _{ n } *. These considerations allow a detailed analysis of a new DNA sequencing technology (“454 sequencing”). **(3) Exact length distribution of the longest increasing subsequence**: Given Σ= {1,...,*K*} and an integer *n* ≥1, determine the number of sequences in Σ^{ n } whose longest strictly increasing subsequence has length *k*, where 0 ≤*k* ≤*K*. This has applications to significance computations for chaining algorithms.

## Keywords

Arithmetic Operation Regular Expression Deposition Sequence Alphabet Size Motif Occurrence## Preview

Unable to display preview. Download preview PDF.

## References

- 1.Margulies, M., et al.: Genome sequencing in microfabricated high-density picolitre reactors. Nature 437(7057), 376–380 (2005), Corrigendum in Nature 439(7075), 502 (2006)Google Scholar
- 2.Aldous, D., Diaconis, P.: Longest increasing subsequences: From patience sorting to the Baik-Deift-Johansson theorem. Bulletin of the American Mathematical Society 36(4), 413–432 (1999)CrossRefMathSciNetMATHGoogle Scholar
- 3.Niedermeier, R.: Invitation to Fixed Parameter Algorithms. Oxford University Press, Oxford (2006)CrossRefMATHGoogle Scholar
- 4.Rahmann, S.: The shortest common supersequence problem in a microarray production setting. In: Proceedings of the 2nd European Conference in Computational Biology (ECCB 2003), pp. ii156–ii161, vol. 19(suppl. 2) of Bioinformatics (2003)Google Scholar
- 5.Chase, P.: Subsequence numbers and logarithmic concavity. Discrete Math. 16, 123–140 (1976)CrossRefMathSciNetMATHGoogle Scholar
- 6.Skiena, S.S.: The Algorithm Design Manual. Springer, Heidelberg (1997)MATHGoogle Scholar