Abstract
Clustering, i.e., the identification of regions of similar objects in a multi-dimensional data set, is a standard method of data analytics with a large variety of applications. For high-dimensional data, subspace clustering can be used to find clusters among a certain subset of data point dimensions and alleviate the curse of dimensionality.
In this paper we focus on the MAFIA subspace clustering algorithm and on using GPUs to accelerate the algorithm. We first present a number of algorithmic changes and estimate their effect on computational complexity of the algorithm. These changes improve the computational complexity of the algorithm and accelerate the sequential version by 1–2 orders of magnitude on practical datasets while providing exactly the same output. We then present the GPU version of the algorithm, which for typical datasets provides a further 1–2 orders of magnitude speedup over a single CPU core or about an order of magnitude over a typical multi-core CPU. We believe that our faster implementation widens the applicability of MAFIA and subspace clustering.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bellman, R.: Dynamic Programming (Dover Books on Computer Science). Dover Publications (2003)
Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data 3(1), 1:1–1:58 (2009)
Nagesh, H.S.: High Performance Subspace Clustering for Massive Data Sets. Master’s thesis (1999)
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec. 27(2), 94–105 (1998)
Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. SIGMOD Rec. 28(2), 61–72 (1999)
Nagesh, H., Goil, S., Choudhary, A.: Parallel Algorithms for Clustering High-Dimensional Large-Scale Datasets. Kluwer (2001)
Wang, H., Chu, F., Fan, W., Yu, P.S., Pei, J.: A fast algorithm for subspace clustering by pattern similarity. In: Proceedings of the 16th SSDBM, pp. 51–62 (2004)
Liu, G., Li, J., Sim, K., Wong, L.: Distance based subspace clustering with flexible dimension partitioning. In: IEEE 23rd International Conference on Data Engineering, ICDE 2007, pp. 1250–1254 (April 2007)
Liu, G., Sim, K., Li, J., Wong, L.: Efficient mining of distance-based subspace clusters. Statistical Analysis and Data Mining 2(5-6), 427–444 (2009)
Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., Müller-Gorman, I., Zimek, A.: Detection and visualization of subspace cluster hierarchies. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 152–163. Springer, Heidelberg (2007)
Parsons, L.: Evaluating subspace clustering algorithms. In: Workshop on Clustering High Dimensional Data and its Applications, SIAM International Conference on Data Mining (SDM 2004), pp. 48–56 (2004)
Kröger, P., Kriegel, H.P., Kailing, K.: Density-Connected Subspace Clustering for High-Dimensional Data. In: SDM (2004)
Müller, E., Günnemann, S., Assent, I., Seidl, T.: Evaluating clustering in subspace projections of high dimensional data. Proc. VLDB Endow. 2(1), 1270–1281 (2009)
Cao, F., Tung, A.K.H., Zhou, A.: Scalable clustering using graphics processors. In: Yu, J.X., Kitsuregawa, M., Leong, H.-V. (eds.) WAIM 2006. LNCS, vol. 4016, pp. 372–384. Springer, Heidelberg (2006)
Wu, R., Zhang, B., Hsu, M.: Clustering billions of data points using GPUs. In: UCHPC-MAW 2009, pp. 1–6. ACM, New York (2009)
Hong-Tao, B., Li-li, H., Dan-Tong, O., Zhan-Shan, L., He, L.: K-Means on Commodity GPUs with CUDA. In: 2009 WRI World Congress on Computer Science and Information Engineering, March 31-April 2, vol. 3, pp. 651–655 (2009)
Kohlhoff, K.J., Sosnick, M.H., Hsu, W.T., Pande, V.S., Altman, R.B.: CAMPAIGN: An open-source Library of GPU-accelerated Data Clustering Algorithms. Bioinformatics (2011)
Kim, S., Wunsch, D.: A GPU based Parallel Hierarchical Fuzzy ART clustering. In: The 2011 International Joint Conference on Neural Networks (IJCNN), July 31-August 5, pp. 2778–2782 (2011)
Anderson, D., Luke, R., Keller, J.: Speedup of Fuzzy Clustering Through Stream Processing on Graphics Processing Units. IEEE Transactions on Fuzzy Systems 16(4), 1101–1106 (2008)
Chiosa, I., Kolb, A.: GPU-Based Multilevel Clustering. IEEE Transactions on Visualization and Computer Graphics 17(2), 132–145 (2011)
Böhm, C., Noll, R., Plant, C., Wackersreuther, B.: Density-based clustering using graphics processors. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, pp. 661–670. ACM, New York (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Adinetz, A., Kraus, J., Meinke, J., Pleiter, D. (2013). GPUMAFIA: Efficient Subspace Clustering with MAFIA on GPUs. In: Wolf, F., Mohr, B., an Mey, D. (eds) Euro-Par 2013 Parallel Processing. Euro-Par 2013. Lecture Notes in Computer Science, vol 8097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40047-6_83
Download citation
DOI: https://doi.org/10.1007/978-3-642-40047-6_83
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40046-9
Online ISBN: 978-3-642-40047-6
eBook Packages: Computer ScienceComputer Science (R0)