Skip to main content

On Approximating String Selection Problems with Outliers

  • Conference paper
Combinatorial Pattern Matching (CPM 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7354))

Included in the following conference series:

Abstract

Many problems in bioinformatics are about finding strings that approximately represent a collection of given strings. We look at more general problems where some input strings can be classified as outliers. The Close to Most Strings problem is, given a set S of same-length strings, and a parameter d, find a string x that maximizes the number of “non-outliers” within Hamming distance d of x. We prove that this problem has no polynomial-time approximation scheme (PTAS) unless NP has randomized polynomial-time algorithms, correcting a decade-old mistake. The Most Strings with Few Bad Columns problem is to find a maximum-size subset of input strings so that the number of non-identical positions is at most k; we show it has no PTAS unless P=NP. We also observe Closest to k Strings has no efficient PTAS (EPTAS) unless the parameterized complexity hierarchy collapses. In sum, outliers help model problems associated with using biological data, but we show the problem of finding an approximate solution is computationally difficult.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amir, A., Paryenty, H., Roditty, L.: Approximations and Partial Solutions for the Consensus Sequence Problem. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 168–173. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  2. Andoni, A., Indyk, P., Patrascu, M.: On the optimality of the dimensionality reduction method. In: Proc. of the 47th FOCS, pp. 449–456 (2006)

    Google Scholar 

  3. Arora, S.: Polynomial Time Approximation Schemes for Euclidean Travelling Salesman and other Geometric Problems. J. ACM 45(5), 753–782 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  4. Bhaskara, A., Charikar, M., Chlamtac, E., Feige, U., Vijayaraghavan, A.: Detecting high log-densities: an O(n 1/4) approximation for densest k-subgraph. In: Proc. of the 42nd STOC, pp. 201–210 (2010)

    Google Scholar 

  5. Boucher, C., Lo, C., Lokshantov, D.: Outlier Detection for DNA Fragment Assembly. arXiv:1111.0376

    Google Scholar 

  6. Boucher, C., Ma, B.: Closest String with Outliers. BMC Bioinformatics 12(suppl.1), S55 (2011)

    Article  Google Scholar 

  7. Chen, Z.-Z., Ma, B., Wang, L.: A Three-String Approach to the Closest String Problem. In: Thai, M.T., Sahni, S. (eds.) COCOON 2010. LNCS, vol. 6196, pp. 449–458. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  8. Deng, X., Li, G., Li, Z., Ma, B., Wang, L.: Genetic design of drugs without side-effects. SIAM Journal on Computing 32(4), 1073–1090 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  9. Festa, P.: On some optimization problems in molecular biology. Mathematical Biosciences 207(2), 219–234 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  10. Festa, P., Pardalos, P.: Efficient solutions for the far from most string problem. Annals of Operations Research (December 2011) (published Online First)

    Google Scholar 

  11. Frances, M., Litman, A.: On covering problems of codes. Theoretical Computer Science 30(2), 113–119 (1997)

    MathSciNet  MATH  Google Scholar 

  12. Gąsieniec, L., Jansson, J., Lingas, A.: Efficient approximation algorithms for the Hamming center problem. In: Proc. of the 10th SODA, pp. 905–906 (1999)

    Google Scholar 

  13. Gramm, J., Niedermeier, R., Rossmanith, P.: Fixed-parameter algorithms for CLOSEST STRING and related problems. Algorithmica 37(1), 25–42 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  14. Gramm, J., Guo, J., Niedermeier, R.: On Exact and Approximation Algorithms for Distinguishing Substring Selection. In: Proc. FST, pp. 195–209 (2003)

    Google Scholar 

  15. Håstad, J.: Some optimal inapproximability results. Journal of the ACM 48(4), 798–859 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  16. Khot, S.: Ruling out PTAS for graph min-bisection, densest subgraph and bipartite clique. SIAM Journal on Computing 36(4), 1025–1071 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  17. Khot, S., Ponnuswami, A.K.: Better Inapproximability Results for MaxClique, Chromatic Number and Min-3Lin-Deletion. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4051, pp. 226–237. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  18. Lanctot, J.K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string selection problems. In: Preliminary version appeared Proc. 10th SODA Information and Computation, pp. 41–55 (1999)

    Google Scholar 

  19. Landau, G.M., Schmidt, J.P., Sokol, D.: An algorithm for approximate tandem repeats. Journal of Computational Biology 8(1), 1–18 (2001)

    Article  Google Scholar 

  20. Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. Journal of Computer and System Sciences 65(1), 73–96 (2002)

    Article  MathSciNet  Google Scholar 

  21. Lokshtanov, D., Marx, D., Saurabh, S.: Slightly superexponential parameterized problems. In: Proc. of the 16th SODA, pp. 760–776 (2011)

    Google Scholar 

  22. Marx, D.: Parameterized complexity and approximation algorithms. Comput. J. 51(1), 60–78 (2008)

    Google Scholar 

  23. Meneses, C.N., Oliveira, C.A.S., Pardalos, P.M.: Optimization techniques for string selection and comparison problems in genomics. IEEE Engineering in Medicine and Biology Magazine 24(3), 81–87 (2005)

    Article  Google Scholar 

  24. Ma, B.: A Polynomial Time Approximation Scheme for the Closest Substring Problem. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 99–107. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  25. Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press (2000)

    Google Scholar 

  26. Ma, B., Sun, X.: More efficient algorithms for closest string and substring problems. SIAM Journal on Computing 39, 1432–1443 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  27. Wang, L., Zhu, B.: Efficient Algorithms for the Closest String and Distinguishing String Selection Problems. In: Deng, X., Hopcroft, J.E., Xue, J. (eds.) FAW 2009. LNCS, vol. 5598, pp. 261–270. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  28. Zhao, R., Zhang, N.: A more efficient closest string algorithm. In: Proc. of the 2nd BICoB, pp. 210–215 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Boucher, C., Landau, G.M., Levy, A., Pritchard, D., Weimann, O. (2012). On Approximating String Selection Problems with Outliers. In: Kärkkäinen, J., Stoye, J. (eds) Combinatorial Pattern Matching. CPM 2012. Lecture Notes in Computer Science, vol 7354. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31265-6_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31265-6_34

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31264-9

  • Online ISBN: 978-3-642-31265-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics