Abstract
We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by ϱ. The related Hamming diameter p-clustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the p-diameter of S.
First, we provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k and p are constant. We also observe that HDC admits straightforward polynomialtime solutions when k = O(log n) or p = 2. Next, by reduction from the corresponding geometric p-clustering problems in the plane under the L 1 metric, we show that neither HRC nor HDC can be approximated within any constant factor smaller than two unless P=NP. We also prove that for any ∈ > 0 it is NP-hard to split S into at most pk 1/7-∈ clusters whose Hamming diameter doesn’t exceed the p-diameter. Furthermore, we note that by adapting Gonzalez’ farthest-point clustering algorithm [6], HRC and HDC can be approximated within a factor of two in time O(pkn). Next, we describe a 2O(pϱ/ε)kO(p/ε) n 2-time (1 + ε)-approximation algorithm for HRC. In particular, it runs in polynomial time when p = O(1) and ϱ = O(log(k + n)). Finally, we show how to find in O((n/ε + kn log n + k 2 log n)(2ϱ k)2/ε) time a set L of O log k) strings of length n such that for each string in S there is at least one string in L within distance (1 + ε) ϱ, for any constant 0 < ε < 1.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M. Bellare, O. Goldreich, and M. Sudan. Free Bits, PCPs, and Non-Approximability-Towards Tight Results. SIAM Journal on Computing 27(3), 1998, pp. 804–915.
M. Chrobak and T.H. Payne. A linear-time algorithm for drawing a planar graph on a grid. Information Processing Letters 54, 1995, pp. 241–246.
T. Feder and D. Greene. Optimal Algorithms for Approximate Clustering. Proceedings of the 20th Annual ACM Symposium on Theory of Computing (STOC’88), 1988, pp. 434–444.
M. Frances and A. Litman. On Covering Problems of Codes. Theory of Computing Systems 30, 1997, pp. 113–119.
L. Gçasieniec, J. Jansson, and A. Lingas. Efficient Approximation Algorithms for the Hamming Center Problem. Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’99), 1999, pp. S905–S906.
T. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science 38, 1985, pp. 293–306.
D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.
D.S. Hochbaum (editor). Approximation Algorithms for NP-Hard Problems. PWS Publishing Company, Boston, 1997.
D.S. Hochbaum and D.B. Shmoys. A best possible heuristic for the k-center problem. Mathematics of Operational Research 10(2), 1985, pp. 180–184.
D.S. Hochbaum and D.B. Shmoys. A Unified Approach to Approximation Algorithms for Bottleneck Problems. Journal of the Association for Computing Machinery 33(3), 1986, pp. 533–550.
B. Kolman, R. Busby, and S. Ross. Discrete Mathematical Structures [3rd ed.]. Prentice Hall, New Jersey, 1996.
J.K. Lanctot, M. Li, B. Ma, S. Wang, and L. Zhang. Distinguishing String Selection Problems. Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’99), 1999, pp. 633–642.
M. Li, B. Ma, and L. Wang, Finding Similar Regions in Many Strings. Proceedings of the 31st Annual ACM Symposium on Theory of Computing (STOC’99), 1999, pp. 473–482.
C. Papadimitriou. On the Complexity of Integer Programming. Journal of the ACM 28(4), 1981, pp. 765–768.
S. Vishwanathan. An O(log* n) Approximation Algorithm for the Asymmetric p-Center Problem. Proceedings of the 7th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’96), 1996, pp. 1–5.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gąasieniec, L., Jansson, J., Lingas, A. (2000). Approximation Algorithms for Hamming Clustering Problems. In: Giancarlo, R., Sankoff, D. (eds) Combinatorial Pattern Matching. CPM 2000. Lecture Notes in Computer Science, vol 1848. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45123-4_11
Download citation
DOI: https://doi.org/10.1007/3-540-45123-4_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67633-1
Online ISBN: 978-3-540-45123-5
eBook Packages: Springer Book Archive