Automatic Subspace Clustering of High Dimensional Data
 Rakesh Agrawal,
 Johannes Gehrke,
 Dimitrios Gunopulos,
 Prabhakar Raghavan
Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, enduser comprehensibility of the results, nonpresumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate clusters in large high dimensional datasets.
 Automatic Subspace Clustering of High Dimensional Data
Data Mining and Knowledge Discovery
Volume 11, Issue 1 , pp 533
 20050701
 10.1007/s1061800513961
 13845810
 1573756X
 Kluwer Academic Publishers
 subspace clustering
 clustering
 dimensionality reduction
 Rakesh Agrawal ^{(1)}
 Johannes Gehrke ^{(2)}
 Dimitrios Gunopulos ^{(3)}
 Prabhakar Raghavan ^{(4)}
 1. IBM Almaden Research Center, 650 Harry Road, San Jose, CA, 95120
 2. Computer Science Department, Cornell University, Ithaca, NY
 3. Department of Computer Science and Eng., University of California Riverside, Riverside, CA, 92521
 4. Verity, Inc., Germany