Skip to main content

Approximation Algorithms for Hamming Clustering Problems

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1848))

Included in the following conference series:

Abstract

We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by ϱ. The related Hamming diameter p-clustering problem (HDC) is to split S into p groups so that the maximum of the Hamming group diameters is minimized; this latter value is called the p-diameter of S.

First, we provide an integer programming formulation of HRC which yields exact solutions in polynomial time whenever k and p are constant. We also observe that HDC admits straightforward polynomialtime solutions when k = O(log n) or p = 2. Next, by reduction from the corresponding geometric p-clustering problems in the plane under the L 1 metric, we show that neither HRC nor HDC can be approximated within any constant factor smaller than two unless P=NP. We also prove that for any > 0 it is NP-hard to split S into at most pk 1/7-∈ clusters whose Hamming diameter doesn’t exceed the p-diameter. Furthermore, we note that by adapting Gonzalez’ farthest-point clustering algorithm [6], HRC and HDC can be approximated within a factor of two in time O(pkn). Next, we describe a 2O(pϱ/ε)kO(p/ε) n 2-time (1 + ε)-approximation algorithm for HRC. In particular, it runs in polynomial time when p = O(1) and ϱ = O(log(k + n)). Finally, we show how to find in O((n/ε + kn log n + k 2 log n)(2ϱ k)2/ε) time a set L of O log k) strings of length n such that for each string in S there is at least one string in L within distance (1 + ε) ϱ, for any constant 0 < ε < 1.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Bellare, O. Goldreich, and M. Sudan. Free Bits, PCPs, and Non-Approximability-Towards Tight Results. SIAM Journal on Computing 27(3), 1998, pp. 804–915.

    Article  MATH  MathSciNet  Google Scholar 

  2. M. Chrobak and T.H. Payne. A linear-time algorithm for drawing a planar graph on a grid. Information Processing Letters 54, 1995, pp. 241–246.

    Article  MATH  MathSciNet  Google Scholar 

  3. T. Feder and D. Greene. Optimal Algorithms for Approximate Clustering. Proceedings of the 20th Annual ACM Symposium on Theory of Computing (STOC’88), 1988, pp. 434–444.

    Google Scholar 

  4. M. Frances and A. Litman. On Covering Problems of Codes. Theory of Computing Systems 30, 1997, pp. 113–119.

    MATH  MathSciNet  Google Scholar 

  5. L. Gçasieniec, J. Jansson, and A. Lingas. Efficient Approximation Algorithms for the Hamming Center Problem. Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’99), 1999, pp. S905–S906.

    Google Scholar 

  6. T. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science 38, 1985, pp. 293–306.

    Article  MATH  MathSciNet  Google Scholar 

  7. D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.

    Google Scholar 

  8. D.S. Hochbaum (editor). Approximation Algorithms for NP-Hard Problems. PWS Publishing Company, Boston, 1997.

    Google Scholar 

  9. D.S. Hochbaum and D.B. Shmoys. A best possible heuristic for the k-center problem. Mathematics of Operational Research 10(2), 1985, pp. 180–184.

    Article  MATH  MathSciNet  Google Scholar 

  10. D.S. Hochbaum and D.B. Shmoys. A Unified Approach to Approximation Algorithms for Bottleneck Problems. Journal of the Association for Computing Machinery 33(3), 1986, pp. 533–550.

    MathSciNet  Google Scholar 

  11. B. Kolman, R. Busby, and S. Ross. Discrete Mathematical Structures [3rd ed.]. Prentice Hall, New Jersey, 1996.

    Google Scholar 

  12. J.K. Lanctot, M. Li, B. Ma, S. Wang, and L. Zhang. Distinguishing String Selection Problems. Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’99), 1999, pp. 633–642.

    Google Scholar 

  13. M. Li, B. Ma, and L. Wang, Finding Similar Regions in Many Strings. Proceedings of the 31st Annual ACM Symposium on Theory of Computing (STOC’99), 1999, pp. 473–482.

    Google Scholar 

  14. C. Papadimitriou. On the Complexity of Integer Programming. Journal of the ACM 28(4), 1981, pp. 765–768.

    Article  MATH  MathSciNet  Google Scholar 

  15. S. Vishwanathan. An O(log* n) Approximation Algorithm for the Asymmetric p-Center Problem. Proceedings of the 7th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’96), 1996, pp. 1–5.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gąasieniec, L., Jansson, J., Lingas, A. (2000). Approximation Algorithms for Hamming Clustering Problems. In: Giancarlo, R., Sankoff, D. (eds) Combinatorial Pattern Matching. CPM 2000. Lecture Notes in Computer Science, vol 1848. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45123-4_11

Download citation

  • DOI: https://doi.org/10.1007/3-540-45123-4_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67633-1

  • Online ISBN: 978-3-540-45123-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics