Advertisement

Constraint-Based Clustering in Large Databases

  • Anthony K. H. Tung
  • Jiawei Han
  • Laks V.S. Lakshmanan
  • Raymond T. Ng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1973)

Abstract

Constrained clustering — finding clusters that satisfy user-specified constraints — is highly desirable in many applications. In this paper, we introduce the constrained clustering problem and show that traditional clustering algorithms (e.g., k-means) cannot handle it. A scalable constraint-clustering algorithm is developed in this study which starts by finding an initial solution that satisfies user-specified constraints and then refines the solution by performing confined object movements under constraints. Our algorithm consists of two phases: pivot movement and deadlock resolution. For both phases, we show that finding the optimal solution is NP-hard. We then propose several heuristics and show how our algorithm can scale up for large data sets using the heuristic of micro-cluster sharing. By experiments, we show the effectiveness and efficiency of the heuristics.

Keywords

Movement Path Spatial Data Mining Deadlock Resolution Average Average Average Pivot Movement 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [AGGR98]
    R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD’98.Google Scholar
  2. [BBD00]
    P. Bradley, K. P. Bennet, and A. Demiriz. Constrained k-means clustering. In MSR-TR-2000-65, Microsoft Research, May 2000.Google Scholar
  3. [BFR98]
    P. Bradley, U. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In KDD’98.Google Scholar
  4. [EKSX96]
    M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. In KDD’96.Google Scholar
  5. [GJ79]
    M. Garey and D. Johnson. Computers and Intractability: a Guide to The Theory of NP-Completeness. Freeman and Company, New York, 1979.zbMATHGoogle Scholar
  6. [HaKa00]
    J. Han and M. Kamber Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.Google Scholar
  7. [KHK99]
    G. Karypis, E.-H. Han, and V. Kumar. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. COMPUTER, 32:68–75, 1999.CrossRefGoogle Scholar
  8. [KMR97]
    D. Karger, R. Motwani, and G. D. S. Ramkumar. On approximating the longest path in a graph. Algorithmica, 18:99–110, 1997.CrossRefMathSciNetGoogle Scholar
  9. [KPR98]
    J. M. Kleinberg, C. Papadimitriou, and P. Raghavan. A microeconomic view of data mining. Data Mining and Knowledge Discovery, 2:311–324, 1998.CrossRefGoogle Scholar
  10. [KR90]
    L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.Google Scholar
  11. [LNHP99]
    L. V. S. Lakshmanan, R. Ng, J. Han, and A. Pang. Optimization of constrained frequent set queries with 2-variable constraints. In SIGMOD’99.Google Scholar
  12. [NH94]
    R. Ng and J. Han. Efficient and effective clustering method for spatial data mining. In VLDB’94.Google Scholar
  13. [NLHP98]
    R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. In SIGMOD’98.Google Scholar
  14. [STA97]
    D. B. Shmoys, E. Tardos, and K. Aardal. Approximation algorithms for facility location problems. In STOC’97.Google Scholar
  15. [TNLH00]
    A. K. H. Tung, R. Ng, L. Lakshmanan, and J. Han. Constraint-based clustering in large databases. http://www.cs.sfu.ca/pub/cs/techreports/2000/CMPT2000-05.pdf.
  16. [WYM97]
    W. Wang, J. Yang, and R. Muntz. STING: A statistical information grid approach to spatial data mining. In VLDB’97.Google Scholar
  17. [ZRL96]
    T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: an efficient data clustering method for very large databases. In SIGMOD’96.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Anthony K. H. Tung
    • 1
  • Jiawei Han
    • 1
  • Laks V.S. Lakshmanan
    • 2
  • Raymond T. Ng
    • 3
  1. 1.Simon Fraser UniversityCanada
  2. 2.IITBombay & Concordia U
  3. 3.University of British ColumbiaCanada

Personalised recommendations