Parameterized Intractability of Distinguishing Substring Selection
A central question in computational biology is the design of genetic markers to distinguish between two given sets of (DNA) sequences. This question is formalized as the NP-complete Distinguishing Substring Selection problem (DSSS for short) which asks, given a set of "good" strings and a set of "bad" strings, for a solution string which is, with respect to the Hamming metric, "away" from the good strings and "close" to the bad strings. More precisely, given integers dg, db, and L, we ask for a length-L string s such that s has Hamming distance at least dg to every length-L substring of the good strings and such that every bad string has a length-L substring with Hamming distance at most db to s. Studying the parameterized complexity of DSSS, we show that, already for binary alphabet, DSSS is W-hard with respect to its natural parameters. This, in particular, implies that a recently given polynomial-time approximation scheme (PTAS) by Deng et al. cannot be replaced by a so-called efficient polynomial-time approximation scheme (EPTAS) unless an unlikely collapse in parameterized complexity theory occurs. This is seemingly the first computational biology problem for which such a border between PTAS (which exists) and EPTAS (which is unlikely to exist) could be established. By way of contrast, for a special case of DSSS, we present an exact fixed-parameter algorithm solving the problem efficiently. In this way we also exhibit a sharp border between fixed-parameter tractability and intractability results.
KeywordsParameterized Complexity Close String Vertex Cover Solution String Distance Parameter
Unable to display preview. Download preview PDF.