Abstract
Constrained clustering — finding clusters that satisfy user-specified constraints — is highly desirable in many applications. In this paper, we introduce the constrained clustering problem and show that traditional clustering algorithms (e.g., k-means) cannot handle it. A scalable constraint-clustering algorithm is developed in this study which starts by finding an initial solution that satisfies user-specified constraints and then refines the solution by performing confined object movements under constraints. Our algorithm consists of two phases: pivot movement and deadlock resolution. For both phases, we show that finding the optimal solution is NP-hard. We then propose several heuristics and show how our algorithm can scale up for large data sets using the heuristic of micro-cluster sharing. By experiments, we show the effectiveness and efficiency of the heuristics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD’98.
P. Bradley, K. P. Bennet, and A. Demiriz. Constrained k-means clustering. In MSR-TR-2000-65, Microsoft Research, May 2000.
P. Bradley, U. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In KDD’98.
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. In KDD’96.
M. Garey and D. Johnson. Computers and Intractability: a Guide to The Theory of NP-Completeness. Freeman and Company, New York, 1979.
J. Han and M. Kamber Data Mining: Concepts and Techniques. Morgan Kaufmann, 2000.
G. Karypis, E.-H. Han, and V. Kumar. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. COMPUTER, 32:68–75, 1999.
D. Karger, R. Motwani, and G. D. S. Ramkumar. On approximating the longest path in a graph. Algorithmica, 18:99–110, 1997.
J. M. Kleinberg, C. Papadimitriou, and P. Raghavan. A microeconomic view of data mining. Data Mining and Knowledge Discovery, 2:311–324, 1998.
L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.
L. V. S. Lakshmanan, R. Ng, J. Han, and A. Pang. Optimization of constrained frequent set queries with 2-variable constraints. In SIGMOD’99.
R. Ng and J. Han. Efficient and effective clustering method for spatial data mining. In VLDB’94.
R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. In SIGMOD’98.
D. B. Shmoys, E. Tardos, and K. Aardal. Approximation algorithms for facility location problems. In STOC’97.
A. K. H. Tung, R. Ng, L. Lakshmanan, and J. Han. Constraint-based clustering in large databases. http://www.cs.sfu.ca/pub/cs/techreports/2000/CMPT2000-05.pdf.
W. Wang, J. Yang, and R. Muntz. STING: A statistical information grid approach to spatial data mining. In VLDB’97.
T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: an efficient data clustering method for very large databases. In SIGMOD’96.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tung, A.K.H., Han, J., Lakshmanan, L.V., Ng, R.T. (2001). Constraint-Based Clustering in Large Databases. In: Van den Bussche, J., Vianu, V. (eds) Database Theory — ICDT 2001. ICDT 2001. Lecture Notes in Computer Science, vol 1973. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44503-X_26
Download citation
DOI: https://doi.org/10.1007/3-540-44503-X_26
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41456-8
Online ISBN: 978-3-540-44503-6
eBook Packages: Springer Book Archive