On Approximating String Selection Problems with Outliers

  • Christina Boucher
  • Gad M. Landau
  • Avivit Levy
  • David Pritchard
  • Oren Weimann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7354)


Many problems in bioinformatics are about finding strings that approximately represent a collection of given strings. We look at more general problems where some input strings can be classified as outliers. The Close to Most Strings problem is, given a set S of same-length strings, and a parameter d, find a string x that maximizes the number of “non-outliers” within Hamming distance d of x. We prove that this problem has no polynomial-time approximation scheme (PTAS) unless NP has randomized polynomial-time algorithms, correcting a decade-old mistake. The Most Strings with Few Bad Columns problem is to find a maximum-size subset of input strings so that the number of non-identical positions is at most k; we show it has no PTAS unless P=NP. We also observe Closest to k Strings has no efficient PTAS (EPTAS) unless the parameterized complexity hierarchy collapses. In sum, outliers help model problems associated with using biological data, but we show the problem of finding an approximate solution is computationally difficult.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amir, A., Paryenty, H., Roditty, L.: Approximations and Partial Solutions for the Consensus Sequence Problem. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 168–173. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  2. 2.
    Andoni, A., Indyk, P., Patrascu, M.: On the optimality of the dimensionality reduction method. In: Proc. of the 47th FOCS, pp. 449–456 (2006)Google Scholar
  3. 3.
    Arora, S.: Polynomial Time Approximation Schemes for Euclidean Travelling Salesman and other Geometric Problems. J. ACM 45(5), 753–782 (1998)MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    Bhaskara, A., Charikar, M., Chlamtac, E., Feige, U., Vijayaraghavan, A.: Detecting high log-densities: an O(n 1/4) approximation for densest k-subgraph. In: Proc. of the 42nd STOC, pp. 201–210 (2010)Google Scholar
  5. 5.
    Boucher, C., Lo, C., Lokshantov, D.: Outlier Detection for DNA Fragment Assembly. arXiv:1111.0376Google Scholar
  6. 6.
    Boucher, C., Ma, B.: Closest String with Outliers. BMC Bioinformatics 12(suppl.1), S55 (2011)CrossRefGoogle Scholar
  7. 7.
    Chen, Z.-Z., Ma, B., Wang, L.: A Three-String Approach to the Closest String Problem. In: Thai, M.T., Sahni, S. (eds.) COCOON 2010. LNCS, vol. 6196, pp. 449–458. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  8. 8.
    Deng, X., Li, G., Li, Z., Ma, B., Wang, L.: Genetic design of drugs without side-effects. SIAM Journal on Computing 32(4), 1073–1090 (2003)MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Festa, P.: On some optimization problems in molecular biology. Mathematical Biosciences 207(2), 219–234 (2007)MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Festa, P., Pardalos, P.: Efficient solutions for the far from most string problem. Annals of Operations Research (December 2011) (published Online First) Google Scholar
  11. 11.
    Frances, M., Litman, A.: On covering problems of codes. Theoretical Computer Science 30(2), 113–119 (1997)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Gąsieniec, L., Jansson, J., Lingas, A.: Efficient approximation algorithms for the Hamming center problem. In: Proc. of the 10th SODA, pp. 905–906 (1999)Google Scholar
  13. 13.
    Gramm, J., Niedermeier, R., Rossmanith, P.: Fixed-parameter algorithms for CLOSEST STRING and related problems. Algorithmica 37(1), 25–42 (2003)MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    Gramm, J., Guo, J., Niedermeier, R.: On Exact and Approximation Algorithms for Distinguishing Substring Selection. In: Proc. FST, pp. 195–209 (2003)Google Scholar
  15. 15.
    Håstad, J.: Some optimal inapproximability results. Journal of the ACM 48(4), 798–859 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
  16. 16.
    Khot, S.: Ruling out PTAS for graph min-bisection, densest subgraph and bipartite clique. SIAM Journal on Computing 36(4), 1025–1071 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    Khot, S., Ponnuswami, A.K.: Better Inapproximability Results for MaxClique, Chromatic Number and Min-3Lin-Deletion. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4051, pp. 226–237. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  18. 18.
    Lanctot, J.K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string selection problems. In: Preliminary version appeared Proc. 10th SODA Information and Computation, pp. 41–55 (1999)Google Scholar
  19. 19.
    Landau, G.M., Schmidt, J.P., Sokol, D.: An algorithm for approximate tandem repeats. Journal of Computational Biology 8(1), 1–18 (2001)CrossRefGoogle Scholar
  20. 20.
    Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. Journal of Computer and System Sciences 65(1), 73–96 (2002)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Lokshtanov, D., Marx, D., Saurabh, S.: Slightly superexponential parameterized problems. In: Proc. of the 16th SODA, pp. 760–776 (2011)Google Scholar
  22. 22.
    Marx, D.: Parameterized complexity and approximation algorithms. Comput. J. 51(1), 60–78 (2008)Google Scholar
  23. 23.
    Meneses, C.N., Oliveira, C.A.S., Pardalos, P.M.: Optimization techniques for string selection and comparison problems in genomics. IEEE Engineering in Medicine and Biology Magazine 24(3), 81–87 (2005)CrossRefGoogle Scholar
  24. 24.
    Ma, B.: A Polynomial Time Approximation Scheme for the Closest Substring Problem. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 99–107. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  25. 25.
    Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press (2000)Google Scholar
  26. 26.
    Ma, B., Sun, X.: More efficient algorithms for closest string and substring problems. SIAM Journal on Computing 39, 1432–1443 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  27. 27.
    Wang, L., Zhu, B.: Efficient Algorithms for the Closest String and Distinguishing String Selection Problems. In: Deng, X., Hopcroft, J.E., Xue, J. (eds.) FAW 2009. LNCS, vol. 5598, pp. 261–270. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  28. 28.
    Zhao, R., Zhang, N.: A more efficient closest string algorithm. In: Proc. of the 2nd BICoB, pp. 210–215 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Christina Boucher
    • 1
  • Gad M. Landau
    • 2
    • 3
  • Avivit Levy
    • 4
    • 5
  • David Pritchard
    • 6
  • Oren Weimann
    • 2
  1. 1.Department of Computer ScienceUniversity of CaliforniaSan DiegoUSA
  2. 2.Department of Computer ScienceUniversity of HaifaHaifaIsrael
  3. 3.Polytechnic Institute of NYUBrooklynUSA
  4. 4.Shenkar College for Engineering and DesignRamat-GanIsrael
  5. 5.CRIUniversity of HaifaHaifaIsrael
  6. 6.CEMCUniversity of WaterlooCanada

Personalised recommendations