Abstract
Many problems in bioinformatics are about finding strings that approximately represent a collection of given strings. We look at more general problems where some input strings can be classified as outliers. The Close to Most Strings problem is, given a set S of same-length strings, and a parameter d, find a string x that maximizes the number of “non-outliers” within Hamming distance d of x. We prove that this problem has no polynomial-time approximation scheme (PTAS) unless NP has randomized polynomial-time algorithms, correcting a decade-old mistake. The Most Strings with Few Bad Columns problem is to find a maximum-size subset of input strings so that the number of non-identical positions is at most k; we show it has no PTAS unless P=NP. We also observe Closest to k Strings has no efficient PTAS (EPTAS) unless the parameterized complexity hierarchy collapses. In sum, outliers help model problems associated with using biological data, but we show the problem of finding an approximate solution is computationally difficult.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amir, A., Paryenty, H., Roditty, L.: Approximations and Partial Solutions for the Consensus Sequence Problem. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 168–173. Springer, Heidelberg (2011)
Andoni, A., Indyk, P., Patrascu, M.: On the optimality of the dimensionality reduction method. In: Proc. of the 47th FOCS, pp. 449–456 (2006)
Arora, S.: Polynomial Time Approximation Schemes for Euclidean Travelling Salesman and other Geometric Problems. J. ACM 45(5), 753–782 (1998)
Bhaskara, A., Charikar, M., Chlamtac, E., Feige, U., Vijayaraghavan, A.: Detecting high log-densities: an O(n 1/4) approximation for densest k-subgraph. In: Proc. of the 42nd STOC, pp. 201–210 (2010)
Boucher, C., Lo, C., Lokshantov, D.: Outlier Detection for DNA Fragment Assembly. arXiv:1111.0376
Boucher, C., Ma, B.: Closest String with Outliers. BMC Bioinformatics 12(suppl.1), S55 (2011)
Chen, Z.-Z., Ma, B., Wang, L.: A Three-String Approach to the Closest String Problem. In: Thai, M.T., Sahni, S. (eds.) COCOON 2010. LNCS, vol. 6196, pp. 449–458. Springer, Heidelberg (2010)
Deng, X., Li, G., Li, Z., Ma, B., Wang, L.: Genetic design of drugs without side-effects. SIAM Journal on Computing 32(4), 1073–1090 (2003)
Festa, P.: On some optimization problems in molecular biology. Mathematical Biosciences 207(2), 219–234 (2007)
Festa, P., Pardalos, P.: Efficient solutions for the far from most string problem. Annals of Operations Research (December 2011) (published Online First)
Frances, M., Litman, A.: On covering problems of codes. Theoretical Computer Science 30(2), 113–119 (1997)
Gąsieniec, L., Jansson, J., Lingas, A.: Efficient approximation algorithms for the Hamming center problem. In: Proc. of the 10th SODA, pp. 905–906 (1999)
Gramm, J., Niedermeier, R., Rossmanith, P.: Fixed-parameter algorithms for CLOSEST STRING and related problems. Algorithmica 37(1), 25–42 (2003)
Gramm, J., Guo, J., Niedermeier, R.: On Exact and Approximation Algorithms for Distinguishing Substring Selection. In: Proc. FST, pp. 195–209 (2003)
Håstad, J.: Some optimal inapproximability results. Journal of the ACM 48(4), 798–859 (2001)
Khot, S.: Ruling out PTAS for graph min-bisection, densest subgraph and bipartite clique. SIAM Journal on Computing 36(4), 1025–1071 (2006)
Khot, S., Ponnuswami, A.K.: Better Inapproximability Results for MaxClique, Chromatic Number and Min-3Lin-Deletion. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4051, pp. 226–237. Springer, Heidelberg (2006)
Lanctot, J.K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string selection problems. In: Preliminary version appeared Proc. 10th SODA Information and Computation, pp. 41–55 (1999)
Landau, G.M., Schmidt, J.P., Sokol, D.: An algorithm for approximate tandem repeats. Journal of Computational Biology 8(1), 1–18 (2001)
Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. Journal of Computer and System Sciences 65(1), 73–96 (2002)
Lokshtanov, D., Marx, D., Saurabh, S.: Slightly superexponential parameterized problems. In: Proc. of the 16th SODA, pp. 760–776 (2011)
Marx, D.: Parameterized complexity and approximation algorithms. Comput. J. 51(1), 60–78 (2008)
Meneses, C.N., Oliveira, C.A.S., Pardalos, P.M.: Optimization techniques for string selection and comparison problems in genomics. IEEE Engineering in Medicine and Biology Magazine 24(3), 81–87 (2005)
Ma, B.: A Polynomial Time Approximation Scheme for the Closest Substring Problem. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 99–107. Springer, Heidelberg (2000)
Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press (2000)
Ma, B., Sun, X.: More efficient algorithms for closest string and substring problems. SIAM Journal on Computing 39, 1432–1443 (2009)
Wang, L., Zhu, B.: Efficient Algorithms for the Closest String and Distinguishing String Selection Problems. In: Deng, X., Hopcroft, J.E., Xue, J. (eds.) FAW 2009. LNCS, vol. 5598, pp. 261–270. Springer, Heidelberg (2009)
Zhao, R., Zhang, N.: A more efficient closest string algorithm. In: Proc. of the 2nd BICoB, pp. 210–215 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Boucher, C., Landau, G.M., Levy, A., Pritchard, D., Weimann, O. (2012). On Approximating String Selection Problems with Outliers. In: Kärkkäinen, J., Stoye, J. (eds) Combinatorial Pattern Matching. CPM 2012. Lecture Notes in Computer Science, vol 7354. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31265-6_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-31265-6_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31264-9
Online ISBN: 978-3-642-31265-6
eBook Packages: Computer ScienceComputer Science (R0)