Approximation Algorithms for Hamming Clustering Problems

  • Leszek Gąasieniec
  • Jesper Jansson
  • Andrzej Lingas
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1848)

Abstract

We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by ϱ. The related Hamming diameter p-clustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the p-diameter of S.

First, we provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k and p are constant. We also observe that HDC admits straightforward polynomialtime solutions when k = O(log n) or p = 2. Next, by reduction from the corresponding geometric p-clustering problems in the plane under the L 1 metric, we show that neither HRC nor HDC can be approximated within any constant factor smaller than two unless P=NP. We also prove that for any > 0 it is NP-hard to split S into at most pk 1/7-∈ clusters whose Hamming diameter doesn’t exceed the p-diameter. Furthermore, we note that by adapting Gonzalez’ farthest-point clustering algorithm [6], HRC and HDC can be approximated within a factor of two in time O(pkn). Next, we describe a 2 O(pϱ/ε) kO(p/ε) n 2-time (1 + ε)-approximation algorithm for HRC. In particular, it runs in polynomial time when p = O(1) and ϱ = O(log(k + n)). Finally, we show how to find in O((n/ε + kn log n + k 2 log n)(2ϱ k)2/ε) time a set L of O log k) strings of length n such that for each string in S there is at least one string in L within distance (1 + ε) ϱ, for any constant 0 < ε < 1.

Keywords

Approximation Algorithm Polynomial Time Planar Graph Vertex Cover Binary String 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    M. Bellare, O. Goldreich, and M. Sudan. Free Bits, PCPs, and Non-Approximability-Towards Tight Results. SIAM Journal on Computing 27(3), 1998, pp. 804–915.MATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    M. Chrobak and T.H. Payne. A linear-time algorithm for drawing a planar graph on a grid. Information Processing Letters 54, 1995, pp. 241–246.MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    T. Feder and D. Greene. Optimal Algorithms for Approximate Clustering. Proceedings of the 20th Annual ACM Symposium on Theory of Computing (STOC’88), 1988, pp. 434–444.Google Scholar
  4. 4.
    M. Frances and A. Litman. On Covering Problems of Codes. Theory of Computing Systems 30, 1997, pp. 113–119.MATHMathSciNetGoogle Scholar
  5. 5.
    L. Gçasieniec, J. Jansson, and A. Lingas. Efficient Approximation Algorithms for the Hamming Center Problem. Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’99), 1999, pp. S905–S906.Google Scholar
  6. 6.
    T. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science 38, 1985, pp. 293–306.MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.Google Scholar
  8. 8.
    D.S. Hochbaum (editor). Approximation Algorithms for NP-Hard Problems. PWS Publishing Company, Boston, 1997.Google Scholar
  9. 9.
    D.S. Hochbaum and D.B. Shmoys. A best possible heuristic for the k-center problem. Mathematics of Operational Research 10(2), 1985, pp. 180–184.MATHMathSciNetCrossRefGoogle Scholar
  10. 10.
    D.S. Hochbaum and D.B. Shmoys. A Unified Approach to Approximation Algorithms for Bottleneck Problems. Journal of the Association for Computing Machinery 33(3), 1986, pp. 533–550.MathSciNetGoogle Scholar
  11. 11.
    B. Kolman, R. Busby, and S. Ross. Discrete Mathematical Structures [3rd ed.]. Prentice Hall, New Jersey, 1996.Google Scholar
  12. 12.
    J.K. Lanctot, M. Li, B. Ma, S. Wang, and L. Zhang. Distinguishing String Selection Problems. Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’99), 1999, pp. 633–642.Google Scholar
  13. 13.
    M. Li, B. Ma, and L. Wang, Finding Similar Regions in Many Strings. Proceedings of the 31st Annual ACM Symposium on Theory of Computing (STOC’99), 1999, pp. 473–482.Google Scholar
  14. 14.
    C. Papadimitriou. On the Complexity of Integer Programming. Journal of the ACM 28(4), 1981, pp. 765–768.MATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    S. Vishwanathan. An O(log* n) Approximation Algorithm for the Asymmetric p-Center Problem. Proceedings of the 7th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’96), 1996, pp. 1–5.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Leszek Gąasieniec
    • 1
  • Jesper Jansson
    • 2
  • Andrzej Lingas
    • 2
  1. 1.Dept. of Computer ScienceUniversity of LiverpoolUK
  2. 2.Dept. of Computer ScienceLund UniversityLundSweden

Personalised recommendations