Min Sum Clustering with Penalties

  • Refael Hassin
  • Einat Or
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3669)


Traditionally, clustering problems are investigated under the assumption that all objects must be clustered. A shortcoming of this formulation is that a few distant objects, called outliers, may exert a disproportionately strong influence over the solution. In this work we investigate the k -min-sum clustering problem while addressing outliers in a meaningful way.

Given a complete graph G = (V,E), a weight function w : EIN 0 on its edges, and \(p \rightarrow {\it {IN}_{o}}\) a penalty function on its nodes, the penalized k -min-sum problem is the problem of finding a partition of V to k+1 sets, {S 1,...,S k + 1}, minimizing \(\sum_{i=1}^{k}\) w(S i )+p(S k + 1), where for S ⊆ V w(S) = \(\sum_{e=\{{\it i},{\it j}\} \subset {\it S}}\) w e , and p(S) = \(\sum_{i \in S}{^p_i}\).

We offer an efficient 2-approximation to the penalized 1-min-sum problem using a primal-dual algorithm. We prove that the penalized 1-min-sum problem is NP-hard even if w is a metric and present a randomized approximation scheme for it. For the metric penalized k-min-sum problem we offer a 2-approximation.


Complete Graph Exhaustive Search Cluster Problem Facility Location Problem Maximal Solution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Charikar, M., Khuller, S., Mount, D.M., Narasimhan, G.: Algorithms for facility location problems with outliers. In: SODA, pp. 642–651 (2001)Google Scholar
  2. 2.
    de la Vega, W.F., Kenyon, C.: A randomized approximation scheme for metric MAX-CUT. J. Comput. Science 63, 531–541 (2001)zbMATHCrossRefGoogle Scholar
  3. 3.
    de la Vega, W.F., Karpinski, M., Kenyon, C.: Approximation schemes for metric bisection and partitioning. In: SODA (2004)Google Scholar
  4. 4.
    de la Vega, W.F., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation schemes for clustering problems. In: Proc. 35th ACM STOC (2003)Google Scholar
  5. 5.
    Feige, U., Kortsarz, G., Peleg, D.: The dense k-subgraph problem. In: Algorithmica, pp. 410–421 (2001)Google Scholar
  6. 6.
    Goldreich, O., Goldwasser, S., Ron, D.: Property testing and its connection to learning and approximation. In: Proc. 37th IEEE FOCS, pp. 339–348 (1996)Google Scholar
  7. 7.
    Guttmann-Beck, N., Hassin, R.: Approximation algorithms for min-sum p-clustering. Discrete Applied Mathematics 89, 125–142 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Garey, M.R., Johnson, D.S.: Computers and Intractability. Freeman, San Francisco (1979)zbMATHGoogle Scholar
  9. 9.
    Hochbaum, D.S.: Solving integer programs over monotone inequalities in three variables: a framework for half integrality and good approximation. European Journal of Operational Research 140, 291–321 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Hassin, R., Rubinstein, S., Tamir, A.: Approximation algorithm for maximum dispersion. Operations research letters 21, 133–137 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Indyk, P.: A sublinear time approximation scheme for clustering in metric spaces. In: 40th Symposium on Foundations of Computer Science, pp. 154–159 (1999)Google Scholar
  12. 12.
    Xu, G., Xu, J.: An LP rounding algorithm for approximating uncapacitated facility location problem with penalties [rapid communication]. Information Processing Letters 94(3), 119–123 (2005)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Refael Hassin
    • 1
  • Einat Or
    • 1
  1. 1.Department of Statistics and Operations ResearchTel Aviv UniversityTel AvivIsrael

Personalised recommendations