ESA 2005: Algorithms – ESA 2005 pp 167-178

Min Sum Clustering with Penalties

• Refael Hassin
• Einat Or
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3669)

Abstract

Traditionally, clustering problems are investigated under the assumption that all objects must be clustered. A shortcoming of this formulation is that a few distant objects, called outliers, may exert a disproportionately strong influence over the solution. In this work we investigate the k -min-sum clustering problem while addressing outliers in a meaningful way.

Given a complete graph G = (V,E), a weight function w : EIN 0 on its edges, and $$p \rightarrow {\it {IN}_{o}}$$ a penalty function on its nodes, the penalized k -min-sum problem is the problem of finding a partition of V to k+1 sets, {S 1,...,S k + 1}, minimizing $$\sum_{i=1}^{k}$$ w(S i )+p(S k + 1), where for S ⊆ V w(S) = $$\sum_{e=\{{\it i},{\it j}\} \subset {\it S}}$$ w e , and p(S) = $$\sum_{i \in S}{^p_i}$$.

We offer an efficient 2-approximation to the penalized 1-min-sum problem using a primal-dual algorithm. We prove that the penalized 1-min-sum problem is NP-hard even if w is a metric and present a randomized approximation scheme for it. For the metric penalized k-min-sum problem we offer a 2-approximation.

Keywords

Complete Graph Exhaustive Search Cluster Problem Facility Location Problem Maximal Solution
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

1. 1.
Charikar, M., Khuller, S., Mount, D.M., Narasimhan, G.: Algorithms for facility location problems with outliers. In: SODA, pp. 642–651 (2001)Google Scholar
2. 2.
de la Vega, W.F., Kenyon, C.: A randomized approximation scheme for metric MAX-CUT. J. Comput. Science 63, 531–541 (2001)
3. 3.
de la Vega, W.F., Karpinski, M., Kenyon, C.: Approximation schemes for metric bisection and partitioning. In: SODA (2004)Google Scholar
4. 4.
de la Vega, W.F., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation schemes for clustering problems. In: Proc. 35th ACM STOC (2003)Google Scholar
5. 5.
Feige, U., Kortsarz, G., Peleg, D.: The dense k-subgraph problem. In: Algorithmica, pp. 410–421 (2001)Google Scholar
6. 6.
Goldreich, O., Goldwasser, S., Ron, D.: Property testing and its connection to learning and approximation. In: Proc. 37th IEEE FOCS, pp. 339–348 (1996)Google Scholar
7. 7.
Guttmann-Beck, N., Hassin, R.: Approximation algorithms for min-sum p-clustering. Discrete Applied Mathematics 89, 125–142 (1998)
8. 8.
Garey, M.R., Johnson, D.S.: Computers and Intractability. Freeman, San Francisco (1979)
9. 9.
Hochbaum, D.S.: Solving integer programs over monotone inequalities in three variables: a framework for half integrality and good approximation. European Journal of Operational Research 140, 291–321 (2002)
10. 10.
Hassin, R., Rubinstein, S., Tamir, A.: Approximation algorithm for maximum dispersion. Operations research letters 21, 133–137 (1997)
11. 11.
Indyk, P.: A sublinear time approximation scheme for clustering in metric spaces. In: 40th Symposium on Foundations of Computer Science, pp. 154–159 (1999)Google Scholar
12. 12.
Xu, G., Xu, J.: An LP rounding algorithm for approximating uncapacitated facility location problem with penalties [rapid communication]. Information Processing Letters 94(3), 119–123 (2005)