GPUMAFIA: Efficient Subspace Clustering with MAFIA on GPUs

Adinetz, Andrew; Kraus, Jiri; Meinke, Jan; Pleiter, Dirk

doi:10.1007/978-3-642-40047-6_83

Andrew Adinetz^19,20,
Jiri Kraus²¹,
Jan Meinke¹⁹ &
…
Dirk Pleiter¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8097))

Included in the following conference series:

European Conference on Parallel Processing

3693 Accesses
6 Citations

Abstract

Clustering, i.e., the identification of regions of similar objects in a multi-dimensional data set, is a standard method of data analytics with a large variety of applications. For high-dimensional data, subspace clustering can be used to find clusters among a certain subset of data point dimensions and alleviate the curse of dimensionality.

In this paper we focus on the MAFIA subspace clustering algorithm and on using GPUs to accelerate the algorithm. We first present a number of algorithmic changes and estimate their effect on computational complexity of the algorithm. These changes improve the computational complexity of the algorithm and accelerate the sequential version by 1–2 orders of magnitude on practical datasets while providing exactly the same output. We then present the GPU version of the algorithm, which for typical datasets provides a further 1–2 orders of magnitude speedup over a single CPU core or about an order of magnitude over a typical multi-core CPU. We believe that our faster implementation widens the applicability of MAFIA and subspace clustering.

Download to read the full chapter text

Chapter PDF

Parallel Subspace Clustering Using Multi-core and Many-core Architectures

Efficient Monte Carlo clustering in subspaces

Article 14 February 2017

Spark2Fires: A New Parallel Approximate Subspace Clustering Algorithm

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Bellman, R.: Dynamic Programming (Dover Books on Computer Science). Dover Publications (2003)
Google Scholar
Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data 3(1), 1:1–1:58 (2009)
Google Scholar
Nagesh, H.S.: High Performance Subspace Clustering for Massive Data Sets. Master’s thesis (1999)
Google Scholar
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec. 27(2), 94–105 (1998)
Article Google Scholar
Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. SIGMOD Rec. 28(2), 61–72 (1999)
Article Google Scholar
Nagesh, H., Goil, S., Choudhary, A.: Parallel Algorithms for Clustering High-Dimensional Large-Scale Datasets. Kluwer (2001)
Google Scholar
Wang, H., Chu, F., Fan, W., Yu, P.S., Pei, J.: A fast algorithm for subspace clustering by pattern similarity. In: Proceedings of the 16th SSDBM, pp. 51–62 (2004)
Google Scholar
Liu, G., Li, J., Sim, K., Wong, L.: Distance based subspace clustering with flexible dimension partitioning. In: IEEE 23rd International Conference on Data Engineering, ICDE 2007, pp. 1250–1254 (April 2007)
Google Scholar
Liu, G., Sim, K., Li, J., Wong, L.: Efficient mining of distance-based subspace clusters. Statistical Analysis and Data Mining 2(5-6), 427–444 (2009)
Article MathSciNet Google Scholar
Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., Müller-Gorman, I., Zimek, A.: Detection and visualization of subspace cluster hierarchies. In: Kotagiri, R., Radha Krishna, P., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 152–163. Springer, Heidelberg (2007)
Chapter Google Scholar
Parsons, L.: Evaluating subspace clustering algorithms. In: Workshop on Clustering High Dimensional Data and its Applications, SIAM International Conference on Data Mining (SDM 2004), pp. 48–56 (2004)
Google Scholar
Kröger, P., Kriegel, H.P., Kailing, K.: Density-Connected Subspace Clustering for High-Dimensional Data. In: SDM (2004)
Google Scholar
Müller, E., Günnemann, S., Assent, I., Seidl, T.: Evaluating clustering in subspace projections of high dimensional data. Proc. VLDB Endow. 2(1), 1270–1281 (2009)
Google Scholar
Cao, F., Tung, A.K.H., Zhou, A.: Scalable clustering using graphics processors. In: Yu, J.X., Kitsuregawa, M., Leong, H.-V. (eds.) WAIM 2006. LNCS, vol. 4016, pp. 372–384. Springer, Heidelberg (2006)
Chapter Google Scholar
Wu, R., Zhang, B., Hsu, M.: Clustering billions of data points using GPUs. In: UCHPC-MAW 2009, pp. 1–6. ACM, New York (2009)
Chapter Google Scholar
Hong-Tao, B., Li-li, H., Dan-Tong, O., Zhan-Shan, L., He, L.: K-Means on Commodity GPUs with CUDA. In: 2009 WRI World Congress on Computer Science and Information Engineering, March 31-April 2, vol. 3, pp. 651–655 (2009)
Google Scholar
Kohlhoff, K.J., Sosnick, M.H., Hsu, W.T., Pande, V.S., Altman, R.B.: CAMPAIGN: An open-source Library of GPU-accelerated Data Clustering Algorithms. Bioinformatics (2011)
Google Scholar
Kim, S., Wunsch, D.: A GPU based Parallel Hierarchical Fuzzy ART clustering. In: The 2011 International Joint Conference on Neural Networks (IJCNN), July 31-August 5, pp. 2778–2782 (2011)
Google Scholar
Anderson, D., Luke, R., Keller, J.: Speedup of Fuzzy Clustering Through Stream Processing on Graphics Processing Units. IEEE Transactions on Fuzzy Systems 16(4), 1101–1106 (2008)
Article Google Scholar
Chiosa, I., Kolb, A.: GPU-Based Multilevel Clustering. IEEE Transactions on Visualization and Computer Graphics 17(2), 132–145 (2011)
Article Google Scholar
Böhm, C., Noll, R., Plant, C., Wackersreuther, B.: Density-based clustering using graphics processors. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, pp. 661–670. ACM, New York (2009)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

JSC, Forschungszentrum Jülich, 52425, Jülich, Germany
Andrew Adinetz, Jan Meinke & Dirk Pleiter
Research Computing Center, Lomonosov Moscow State University, Russia
Andrew Adinetz
NVIDIA GmbH, Germany
Jiri Kraus

Authors

Andrew Adinetz
View author publications
You can also search for this author in PubMed Google Scholar
Jiri Kraus
View author publications
You can also search for this author in PubMed Google Scholar
Jan Meinke
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Pleiter
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

German Research School for Simulation Sciences, RWTH Aachen, Schinkelstr. 2a, 52062, Aachen, Germany
Felix Wolf
Jülich Supercomputing Centre, Forschungszentrum Jülich GmbH, Station 22,, 52425, Jülich, Germany
Bernd Mohr
Center for Computing and Communication, RWTH Aachen, Seffenter Weg 23, 52074, Aachen, Germany
Dieter an Mey

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Adinetz, A., Kraus, J., Meinke, J., Pleiter, D. (2013). GPUMAFIA: Efficient Subspace Clustering with MAFIA on GPUs. In: Wolf, F., Mohr, B., an Mey, D. (eds) Euro-Par 2013 Parallel Processing. Euro-Par 2013. Lecture Notes in Computer Science, vol 8097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40047-6_83

Download citation

DOI: https://doi.org/10.1007/978-3-642-40047-6_83
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40046-9
Online ISBN: 978-3-642-40047-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

GPUMAFIA: Efficient Subspace Clustering with MAFIA on GPUs

Abstract

Chapter PDF

Similar content being viewed by others

Parallel Subspace Clustering Using Multi-core and Many-core Architectures

Efficient Monte Carlo clustering in subspaces

Spark2Fires: A New Parallel Approximate Subspace Clustering Algorithm

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

GPUMAFIA: Efficient Subspace Clustering with MAFIA on GPUs

Abstract

Chapter PDF

Similar content being viewed by others

Parallel Subspace Clustering Using Multi-core and Many-core Architectures

Efficient Monte Carlo clustering in subspaces

Spark2Fires: A New Parallel Approximate Subspace Clustering Algorithm

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation