PAKDD 2006: Advances in Knowledge Discovery and Data Mining pp 179-188 | Cite as
Parallel Density-Based Clustering of Complex Objects
Abstract
In many scientific, engineering or multimedia applications, complex distance functions are used to measure similarity accurately. Furthermore, there often exist simpler lower-bounding distance functions, which can be computed much more efficiently. In this paper, we will show how these simple distance functions can be used to parallelize the density-based clustering algorithm DBSCAN. First, the data is partitioned based on an enumeration calculated by the hierarchical clustering algorithm OPTICS, so that similar objects have adjacent enumeration values. We use the fact that clustering based on lower-bounding distance values conservatively approximates the exact clustering. By integrating the multi-step query processing paradigm directly into the clustering algorithms, the clustering on the slaves can be carried out very efficiently. Finally, we show that the different result sets computed by the various slaves can effectively and efficiently be merged to a global result by means of cluster connectivity graphs. In an experimental evaluation based on real-world test data sets, we demonstrate the benefits of our approach.
Preview
Unable to display preview. Download preview PDF.
References
- 1.Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD 1996), Portland, OR, pp. 291–316 (1996)Google Scholar
- 2.Brecheisen, S., Kriegel, H.P., Pfeifle, M.: Efficient Density-Based Clustering of Complex Objects. In: Proc. 4th IEEE Int. Conf. on Data Mining (ICDM 2004), Brighton, UK, pp. 43–50 (2004)Google Scholar
- 3.Kriegel, H.P., Kröger, P., Mashael, Z., Pfeifle, M., Pötke, M., Seidl, T.: Effective Similarity Search on Voxelized CAD Objects. In: Proc. 8th Int. Conf. on Database Systems for Advanced Applications (DASFAA 2003), Kyoto, Japan, pp. 27–36 (2003)Google Scholar
- 4.Kriegel, H.P., Brecheisen, S., Kröger, P., Pfeifle, M., Schubert, M.: Using Sets of Feature Vectors for Similarity Search on Voxelized CAD Objects. In: Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD 2003), San Diego, CA, pp. 587–598 (2003)Google Scholar
- 5.Kriegel, H.P., Schönauer, S.: Similarity Search in Structured Data. In: Kambayashi, Y., Mohania, M., Wöß, W. (eds.) DaWaK 2003. LNCS, vol. 2737, pp. 309–319. Springer, Heidelberg (2003)CrossRefGoogle Scholar
- 6.Kailing, K.: New Techniques for Clustering Complex Objects. PhD thesis, Institute for Computer Science, University of Munich (2004)Google Scholar
- 7.Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31(3), 265–323 (1999)CrossRefGoogle Scholar
- 8.Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: Ordering Points to Identify the Clustering Structure. In: Proc. ACM SIGMOD Int. Conf. on Management of Data (SIGMOD 1999), Philadelphia, PA, pp. 49–60 (1999)Google Scholar
- 9.Fonseca, M.J., Jorge, J.A.: Indexing High-Dimensional Data for Content-Based Retrieval in Large Databases. In: Proc. 8th Int. Conf. on Database Systems for Advanced Applications (DASFAA 2003), Kyoto, Japan, pp. 267–274 (2003)Google Scholar