Approximation Algorithms for Hamming Clustering Problems
We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by ϱ. The related Hamming diameter p-clustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the p-diameter of S.
First, we provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k and p are constant. We also observe that HDC admits straightforward polynomialtime solutions when k = O(log n) or p = 2. Next, by reduction from the corresponding geometric p-clustering problems in the plane under the L 1 metric, we show that neither HRC nor HDC can be approximated within any constant factor smaller than two unless P=NP. We also prove that for any ∈ > 0 it is NP-hard to split S into at most pk 1/7-∈ clusters whose Hamming diameter doesn’t exceed the p-diameter. Furthermore, we note that by adapting Gonzalez’ farthest-point clustering algorithm , HRC and HDC can be approximated within a factor of two in time O(pkn). Next, we describe a 2 O(pϱ/ε) kO(p/ε) n 2-time (1 + ε)-approximation algorithm for HRC. In particular, it runs in polynomial time when p = O(1) and ϱ = O(log(k + n)). Finally, we show how to find in O((n/ε + kn log n + k 2 log n)(2ϱ k)2/ε) time a set L of O log k) strings of length n such that for each string in S there is at least one string in L within distance (1 + ε) ϱ, for any constant 0 < ε < 1.
KeywordsApproximation Algorithm Polynomial Time Planar Graph Vertex Cover Binary String
Unable to display preview. Download preview PDF.
- 3.T. Feder and D. Greene. Optimal Algorithms for Approximate Clustering. Proceedings of the 20th Annual ACM Symposium on Theory of Computing (STOC’88), 1988, pp. 434–444.Google Scholar
- 5.L. Gçasieniec, J. Jansson, and A. Lingas. Efficient Approximation Algorithms for the Hamming Center Problem. Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’99), 1999, pp. S905–S906.Google Scholar
- 7.D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.Google Scholar
- 8.D.S. Hochbaum (editor). Approximation Algorithms for NP-Hard Problems. PWS Publishing Company, Boston, 1997.Google Scholar
- 11.B. Kolman, R. Busby, and S. Ross. Discrete Mathematical Structures [3rd ed.]. Prentice Hall, New Jersey, 1996.Google Scholar
- 12.J.K. Lanctot, M. Li, B. Ma, S. Wang, and L. Zhang. Distinguishing String Selection Problems. Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’99), 1999, pp. 633–642.Google Scholar
- 13.M. Li, B. Ma, and L. Wang, Finding Similar Regions in Many Strings. Proceedings of the 31st Annual ACM Symposium on Theory of Computing (STOC’99), 1999, pp. 473–482.Google Scholar
- 15.S. Vishwanathan. An O(log* n) Approximation Algorithm for the Asymmetric p-Center Problem. Proceedings of the 7th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’96), 1996, pp. 1–5.Google Scholar