# Improved Algorithms for the Random Cluster Graph Model

## Abstract

The following probabilistic process models the generation of noisy clustering data: Clusters correspond to disjoint sets of vertices in a graph. Each two vertices from the same set are connected by an edge with probability *p*, and each two vertices from different sets are connected by an edge with probability *r* < *p*. The goal of the clustering problem is to reconstruct the clusters from the graph. We give algorithms that solve this problem with high probability. Compared to previous studies, our algorithms have lower time complexity and wider parameter range of applicability. In particular, our algorithms can handle *O*(√*n*/ log *n*) clusters in an *n*-vertex graph, while all previous algorithms require that the number of clusters is constant.

## Preview

Unable to display preview. Download preview PDF.

## References

- 1.A. Ben-Dor, R. Shamir, and Z. Yakhini. Clustering gene expression patterns.
*J. of Computational Biology*, 6:281–297, 1999.CrossRefGoogle Scholar - 2.R. B. Boppana. Eigenvalues and graph bisection: An average-case analysis. In
*Proc. 28th Symposium on Foundation of Computer Science (FOCS 87)*, pages 280–285, 1987.Google Scholar - 3.T. Carson and R. Impagliazzo. Hill-climbing finds random planted bisections. In
*Proc. Twelfth Symposium on Discrete Algorithms (SODA 01)*, pages 903–909. ACM press, 2001.Google Scholar - 4.H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations.
*Ann. Math. Statis.*, 23:493–507, 1952.MathSciNetzbMATHCrossRefGoogle Scholar - 5.A. E. Condon and R. M. Karp. Algorithms for graph partitioning on the planted partition model.
*Lecture Notes in Computer Science*, 1671:221–232, 1999.Google Scholar - 6.M. E. Dyer and A. M. Frieze. The solution of some random NP-hard problems in polynomial expected time.
*J. of Algorithms*, 10(4):451–489, 1989.zbMATHCrossRefMathSciNetGoogle Scholar - 7.U. Feige and J. Kilian. Heuristics for semirandom graph problems.
*J. of Computer and System Sciences*, To appear.Google Scholar - 8.M. Jerrum and G. B. Sorkin. Simulated annealing for graph bisection. In
*Proc. 34th Symposium on Foundation of Computer Science (FOCS 93)*, pages 94–103, 1993.Google Scholar - 9.A. Jules.
*Topics in black box optimization*. PhD thesis, U. California, 1996.Google Scholar - 10.V. V. Petrov.
*Sums of independent random variables*. Springer-Verlag, 1975.Google Scholar - 11.Z. Yakhini. Personal communications.Google Scholar