# Approximation Algorithms for Hamming Clustering Problems

## Abstract

We study Hamming versions of two classical clustering problems. The *Hamming radius p-clustering* problem (HRC) for a set *S* of *k* binary strings, each of length *n*, is to find *p* binary strings of length *n* that minimize the maximum Hamming distance between a string in *S* and the closest of the *p* strings; this minimum value is termed the *p-radius of S* and is denoted by *ϱ*. The related *Hamming diameter p-clustering* problem (HDC) is to split *S* into *p* groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the
*p-diameter of S.*

First, we provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever *k* and *p* are constant. We also observe that HDC admits straightforward polynomialtime solutions when *k = O*(log *n*) or *p* = 2. Next, by reduction from the corresponding geometric *p*-clustering problems in the plane under the *L* _{1} metric, we show that neither HRC nor HDC can be approximated within any constant factor smaller than two unless P=NP. We also prove that for any *∈* > 0 it is NP-hard to split *S* into at most *pk* ^{1/7-∈} clusters whose Hamming diameter doesn’t exceed the
*p*-diameter. Furthermore, we note that by adapting Gonzalez’ farthest-point clustering algorithm [6], HRC and HDC can be approximated within a factor of two in time *O(pkn)*. Next, we describe a 2^{ O(pϱ/ε) }k^{O(p/ε)} *n* ^{2}-time (1 + ε)-approximation algorithm for HRC. In particular, it runs in polynomial time when *p = O*(1) and *ϱ = O*(log(*k* + *n*)). Finally, we show how to find in *O((n/ε* + *kn* log *n* + *k* ^{2} log
*n*)(2^{ϱ} *k*)^{2/ε}) time a set *L* of *O* log *k)* strings of length *n* such that for each string in *S* there is at least one string in *L* within distance (1 + *ε*) *ϱ*, for any constant 0 < *ε* < 1.

## Keywords

Approximation Algorithm Polynomial Time Planar Graph Vertex Cover Binary String## Preview

Unable to display preview. Download preview PDF.

## References

- 1.M. Bellare, O. Goldreich, and M. Sudan. Free Bits, PCPs, and Non-Approximability-Towards Tight Results.
*SIAM Journal on Computing*27(3), 1998, pp. 804–915.MATHCrossRefMathSciNetGoogle Scholar - 2.M. Chrobak and T.H. Payne. A linear-time algorithm for drawing a planar graph on a grid.
*Information Processing Letters*54, 1995, pp. 241–246.MATHCrossRefMathSciNetGoogle Scholar - 3.T. Feder and D. Greene. Optimal Algorithms for Approximate Clustering.
*Proceedings of the 20th Annual ACM Symposium on Theory of Computing*(STOC’88), 1988, pp. 434–444.Google Scholar - 4.M. Frances and A. Litman. On Covering Problems of Codes.
*Theory of Computing Systems*30, 1997, pp. 113–119.MATHMathSciNetGoogle Scholar - 5.L. Gçasieniec, J. Jansson, and A. Lingas. Efficient Approximation Algorithms for the Hamming Center Problem.
*Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms*(SODA’99), 1999, pp. S905–S906.Google Scholar - 6.T. Gonzalez. Clustering to minimize the maximum intercluster distance.
*Theoretical Computer Science*38, 1985, pp. 293–306.MATHCrossRefMathSciNetGoogle Scholar - 7.D. Gusfield.
*Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology*. Cambridge University Press, 1997.Google Scholar - 8.D.S. Hochbaum (editor).
*Approximation Algorithms for NP-Hard Problems*. PWS Publishing Company, Boston, 1997.Google Scholar - 9.D.S. Hochbaum and D.B. Shmoys. A best possible heuristic for the k-center problem.
*Mathematics of Operational Research*10(2), 1985, pp. 180–184.MATHMathSciNetCrossRefGoogle Scholar - 10.D.S. Hochbaum and D.B. Shmoys. A Unified Approach to Approximation Algorithms for Bottleneck Problems.
*Journal of the Association for Computing Machinery*33(3), 1986, pp. 533–550.MathSciNetGoogle Scholar - 11.B. Kolman, R. Busby, and S. Ross.
*Discrete Mathematical Structures*[3rd ed.]. Prentice Hall, New Jersey, 1996.Google Scholar - 12.J.K. Lanctot, M. Li, B. Ma, S. Wang, and L. Zhang. Distinguishing String Selection Problems.
*Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’99)*, 1999, pp. 633–642.Google Scholar - 13.M. Li, B. Ma, and L. Wang, Finding Similar Regions in Many Strings.
*Proceedings of the 31st Annual ACM Symposium on Theory of Computing (STOC’99)*, 1999, pp. 473–482.Google Scholar - 14.C. Papadimitriou. On the Complexity of Integer Programming.
*Journal of the ACM*28(4), 1981, pp. 765–768.MATHCrossRefMathSciNetGoogle Scholar - 15.S. Vishwanathan. An
*O*(log**n*) Approximation Algorithm for the Asymmetric*p*-Center Problem.*Proceedings of the 7th Annual ACM-SIAM Symposium on Discrete Algorithms*(SODA’96), 1996, pp. 1–5.Google Scholar