Using a Multitasking GPU Environment for Content-Based Similarity Measures of Big Data

Tarakji, Ayman; Hassani, Marwan; Lankes, Stefan; Seidl, Thomas

doi:10.1007/978-3-642-39640-3_13

Ayman Tarakji²⁴,
Marwan Hassani²⁵,
Stefan Lankes²⁴ &
…
Thomas Seidl²⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7975))

Included in the following conference series:

International Conference on Computational Science and Its Applications

1802 Accesses
3 Citations

Abstract

Performance and efficiency became recently key requirements of computer architectures. Modern computers incorporate Graphics Processing Units (GPUs) into running data mining algorithms, as well as other general purpose computations. In this paper, different parallelization methods are analyzed and compared in order to understand their applicability. From multi-threading on shared memory to using NVIDIA’s GPU accelerators for increasing performance and efficiency on parallel computing, this work discusses the parallelization of data mining algorithms considering performance and efficiency issues. The performance is compared on both many-core systems and GPU accelerators on a distance measure algorithm using a relatively big data set. We optimize the way we deal with GPUs in heterogeneous systems to make them more suitable for big data mining applications with heavy distance calculations. Moreover, we focus on achieving a higher utilization of GPU resources and a better reuse of data. Our implementation of the content-based similarity algorithm SQFD on the GPU outperforms by up to 50× CPU counterparts, and up to 15× CPU multi-threaded implementations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abdalla, A.M.H.: Applications Performance on GPGPUs with the Fermi Architecture. MA thesis. The University of Edinburgh (2011)
Google Scholar
Beecks, C., Uysal, M.S., Seidl, T.: Signature Quadratic Form Distance. In: Proceedings of the ACM International Conference on Image and Video Retrieval, CIVR 2010, pp. 438–445. ACM (2010)
Google Scholar
Cao, F., Tung, A.K.H., Zhou, A.: Scalable clustering using graphics processors. In: Proceedings of the 7th International Conference on Advances in Web-Age Information Management, WAIM 2006, pp. 372–384. Springer (2006)
Google Scholar
Das, A., Dally, W.J., Mattson, P.: Compiling for Stream Processing. In: Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, PACT 2006, pp. 33–42. ACM (2006)
Google Scholar
Glaskowsky, P.N.: NVIDIA’s Fermi: The First Complete GPU Computing Architecture. Tech. rep. NVIDIA Corporation (2009)
Google Scholar
Hassani, M., Spaus, P., Gaber, M.M., Seidl, T.: Density-Based Projected Clustering of Data Streams. In: Proceedings of the 6th International Conference on Scalable Uncertainty Management, SUM 2012, pp. 311–324. Springer (2012)
Google Scholar
Kailing, K., Kriegel, H.-P., Kroeger, P.: Density-Connected Subspace Clustering for High-Dimensional Data. In: Proceedings of the Fourth SIAM International Conference on Data Mining, SDM 2004, pp. 246–257 (2004)
Google Scholar
Krulis, M., Lokoc, J., Beecks, C., Skopal, T., Seidl, T.: Processing the signature quadratic form distance on many-core GPU architectures. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, pp. 2373–2376. ACM (2011)
Google Scholar
Mattson, P., Dally, W.J., Rixner, S., Kapasi, U.J., Owens, J.D.: Communication Scheduling”. In: Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems. In: ASPLOS IX, pp. 82–92. ACM (2000)
Google Scholar
Munshi, A.: The OpenCL 1.2 Speciffication. Khronos OpenCL Working Group. Khronos Grpoup. Khronos (2012)
Google Scholar
NVIDIA CUDA C Programming Guide. NVIDIA Corp. (2012), http://www.nvidia.com
NVIDIA Corp., ed. NVIDIA’s Next Generation CUDA Compute Archi- tecture: Kepler TM GK110. The Fastest, Most Efficient HPC Architecture Ever Built (2012)
Google Scholar
OpenMP Architecture Review Board. The OpenMP API Speciffication For Parallel Programming (2011)
Google Scholar
Pabst, H.-F., Springer, J.P., Schollmeyer, A., Lenhardt, R., Lessig, C., Froehlich, B.: Ray casting of trimmed NURBS surfaces on the GPU. In: Proceedings of the 2006 IEEE Symposium on Interactive Ray Tracing, pp. 151–160 (2006)
Google Scholar
Preis, T.: Econophysics complex correlations and trend switchings in financial time series. The European Physical Journal Special Topics, 5–86 (2011)
Google Scholar
Rubner, Y., Tomasi, C., Guibas, L.J.: The Earth Mover’s Distance as a Metric for Image Retrieval. International Journal of Computer Vision, 99–121 (2000)
Google Scholar
Tanenbaum, A.S.: Parallel Computer Architectures. In: Structured Computer Organization. Pearson Studium (2001) isbn: 0130959901
Google Scholar
Tarakji, A., Marx, M., Lankes, S.: The Development of a Scheduling System GPUSched for Graphics Processing Units. In: The International Conference on High Performance Computing Simulation, HPCS (2013)
Google Scholar
Wang, J.Z., Li, J., Wiederhold, G.: SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries. IEEE Transactions on Pattern Analysis and Machine Intelligence, 947–963 (2001)
Google Scholar
Wasson, S.: Nvidia Kepler powers Oak Ridge’s supercomputing Titan. Tech. rep. PC Hardware Eplored (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Chair for Operating Systems, RWTH Aachen University, Aachen, Germany
Ayman Tarakji & Stefan Lankes
Data Management and Data Exploration Group, RWTH Aachen University, Germany
Marwan Hassani & Thomas Seidl

Authors

Ayman Tarakji
View author publications
You can also search for this author in PubMed Google Scholar
Marwan Hassani
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Lankes
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Seidl
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

L-I.S.U.T. - D.A.P.I.t. Facoltà Ingegneria, Università degli Studi della Basilicata, Viale dell’Ateneo Lucano, 10, 85100, Potenza, Italy
Beniamino Murgante
Covenant University, Canaanland OTA, Nigeria
Sanjay Misra
Partimento di Scienze e Tecnologie per LAgricoltura, le Foreste, la Natura e lEnergia, Università degli Studi della Tuscia, Via S. Camillo de Lellis, snc, 01100, Viterbo, Italy
Maurizio Carlini
Dipartimento di Scienze dell’Ingegneria Civile e dell’Architecttura, Politecnico di Bari, Via Orabona, 4, 70125, Bari, Italy
Carmelo M. Torre
International University VNU-HCM, Quarter 6, Linh Trung, Thu Duc, Ho Chi Minh City, Vietnam
Hong-Quang Nguyen
School of Business Systems, Monash University, 3800, Clayton, VIC, Australia
David Taniar
Department of Intelligent Informatics, Kyushu Sangyo University, 2-3-1 Matsukadai, 813-8503, Higashi-ku, Fukuoka, Japan
Bernady O. Apduhan
Department of Mathematics and Computer Science, University of Perugia, Via Vanvitelli, 1, 06123, Perugia, Italy
Osvaldo Gervasi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tarakji, A., Hassani, M., Lankes, S., Seidl, T. (2013). Using a Multitasking GPU Environment for Content-Based Similarity Measures of Big Data. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2013. ICCSA 2013. Lecture Notes in Computer Science, vol 7975. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39640-3_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-39640-3_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39639-7
Online ISBN: 978-3-642-39640-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics