Using a Multitasking GPU Environment for Content-Based Similarity Measures of Big Data

  • Ayman Tarakji
  • Marwan Hassani
  • Stefan Lankes
  • Thomas Seidl
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7975)


Performance and efficiency became recently key requirements of computer architectures. Modern computers incorporate Graphics Processing Units (GPUs) into running data mining algorithms, as well as other general purpose computations. In this paper, different parallelization methods are analyzed and compared in order to understand their applicability. From multi-threading on shared memory to using NVIDIA’s GPU accelerators for increasing performance and efficiency on parallel computing, this work discusses the parallelization of data mining algorithms considering performance and efficiency issues. The performance is compared on both many-core systems and GPU accelerators on a distance measure algorithm using a relatively big data set. We optimize the way we deal with GPUs in heterogeneous systems to make them more suitable for big data mining applications with heavy distance calculations. Moreover, we focus on achieving a higher utilization of GPU resources and a better reuse of data. Our implementation of the content-based similarity algorithm SQFD on the GPU outperforms by up to 50× CPU counterparts, and up to 15× CPU multi-threaded implementations.


GPGPU Similarity Measures Data Mining Heterogeneous Parallel Systems 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abdalla, A.M.H.: Applications Performance on GPGPUs with the Fermi Architecture. MA thesis. The University of Edinburgh (2011)Google Scholar
  2. 2.
    Beecks, C., Uysal, M.S., Seidl, T.: Signature Quadratic Form Distance. In: Proceedings of the ACM International Conference on Image and Video Retrieval, CIVR 2010, pp. 438–445. ACM (2010)Google Scholar
  3. 3.
    Cao, F., Tung, A.K.H., Zhou, A.: Scalable clustering using graphics processors. In: Proceedings of the 7th International Conference on Advances in Web-Age Information Management, WAIM 2006, pp. 372–384. Springer (2006)Google Scholar
  4. 4.
    Das, A., Dally, W.J., Mattson, P.: Compiling for Stream Processing. In: Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, PACT 2006, pp. 33–42. ACM (2006)Google Scholar
  5. 5.
    Glaskowsky, P.N.: NVIDIA’s Fermi: The First Complete GPU Computing Architecture. Tech. rep. NVIDIA Corporation (2009)Google Scholar
  6. 6.
    Hassani, M., Spaus, P., Gaber, M.M., Seidl, T.: Density-Based Projected Clustering of Data Streams. In: Proceedings of the 6th International Conference on Scalable Uncertainty Management, SUM 2012, pp. 311–324. Springer (2012)Google Scholar
  7. 7.
    Kailing, K., Kriegel, H.-P., Kroeger, P.: Density-Connected Subspace Clustering for High-Dimensional Data. In: Proceedings of the Fourth SIAM International Conference on Data Mining, SDM 2004, pp. 246–257 (2004)Google Scholar
  8. 8.
    Krulis, M., Lokoc, J., Beecks, C., Skopal, T., Seidl, T.: Processing the signature quadratic form distance on many-core GPU architectures. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011, pp. 2373–2376. ACM (2011)Google Scholar
  9. 9.
    Mattson, P., Dally, W.J., Rixner, S., Kapasi, U.J., Owens, J.D.: Communication Scheduling”. In: Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems. In: ASPLOS IX, pp. 82–92. ACM (2000)Google Scholar
  10. 10.
    Munshi, A.: The OpenCL 1.2 Speciffication. Khronos OpenCL Working Group. Khronos Grpoup. Khronos (2012)Google Scholar
  11. 11.
    NVIDIA CUDA C Programming Guide. NVIDIA Corp. (2012),
  12. 12.
    NVIDIA Corp., ed. NVIDIA’s Next Generation CUDA Compute Archi- tecture: Kepler TM GK110. The Fastest, Most Efficient HPC Architecture Ever Built (2012)Google Scholar
  13. 13.
    OpenMP Architecture Review Board. The OpenMP API Speciffication For Parallel Programming (2011)Google Scholar
  14. 14.
    Pabst, H.-F., Springer, J.P., Schollmeyer, A., Lenhardt, R., Lessig, C., Froehlich, B.: Ray casting of trimmed NURBS surfaces on the GPU. In: Proceedings of the 2006 IEEE Symposium on Interactive Ray Tracing, pp. 151–160 (2006)Google Scholar
  15. 15.
    Preis, T.: Econophysics complex correlations and trend switchings in financial time series. The European Physical Journal Special Topics, 5–86 (2011)Google Scholar
  16. 16.
    Rubner, Y., Tomasi, C., Guibas, L.J.: The Earth Mover’s Distance as a Metric for Image Retrieval. International Journal of Computer Vision, 99–121 (2000)Google Scholar
  17. 17.
    Tanenbaum, A.S.: Parallel Computer Architectures. In: Structured Computer Organization. Pearson Studium (2001) isbn: 0130959901Google Scholar
  18. 18.
    Tarakji, A., Marx, M., Lankes, S.: The Development of a Scheduling System GPUSched for Graphics Processing Units. In: The International Conference on High Performance Computing Simulation, HPCS (2013)Google Scholar
  19. 19.
    Wang, J.Z., Li, J., Wiederhold, G.: SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries. IEEE Transactions on Pattern Analysis and Machine Intelligence, 947–963 (2001)Google Scholar
  20. 20.
    Wasson, S.: Nvidia Kepler powers Oak Ridge’s supercomputing Titan. Tech. rep. PC Hardware Eplored (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Ayman Tarakji
    • 1
  • Marwan Hassani
    • 2
  • Stefan Lankes
    • 1
  • Thomas Seidl
    • 2
  1. 1.Chair for Operating SystemsRWTH Aachen UniversityAachenGermany
  2. 2.Data Management and Data Exploration GroupRWTH Aachen UniversityGermany

Personalised recommendations