Multimedia Tools and Applications

, Volume 64, Issue 2, pp 475–489 | Cite as

A high performance parallel DCT with OpenCL on heterogeneous computing environment



A noteworthy thing in desktop PCs is that they can provide a great opportunity to increase the performance of processing multimedia data by exploiting task- and data-parallelism with multi-core CPU and many-core GPU. This paper presents a high performance parallel implementation of 2D DCT on this heterogeneous computing environment. For this purpose, Intel TBB (threading building blocks) and OpenCL (Open Compute Language) are utilized for task- and data-parallelism, respectively. The simulation result shows that the parallel DCT implementations far the serial ones in processing speed. Especially, OpenCL implementation shows a linear speedup, a typical SIMD characteristic as the increase of 2D data sets.


OpenCL Multi-core Many-core DCT Heterogeneous computing Multimedia 



This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (KRF 2011-0027264).


  1. 1.
    Akhter S, Roberts J (2006) Multi-core programming: increasing performance through software multi-threading. Intel PressGoogle Scholar
  2. 2.
    Antão S, Sousa L (2010) Exploiting SIMD extensions for linear image processing with OpenCL, 2010 IEEE International Conference on Computer Design (ICCD), pp. 425–430Google Scholar
  3. 3.
    Chong RM, Tanaka T (2010) Motion blur identification using maxima locations for blind colour image restoration. JoC 1(1):49–56Google Scholar
  4. 4.
    Chu SL, Hsiao CC (2010) OpenCL: make ubiquitous supercomputing possible. 12th IEEE Int’l Conference on High Performance Computing and Communications (HPCC), pp. 556–561Google Scholar
  5. 5.
    Contreras G, Martonosi M (2008) Characterizing and improving the performance of Intel threading building blocks. In Proceedings. IEEE Int’l Symposium on Workload Characterization), pp. 1–10Google Scholar
  6. 6.
    Fagerlund A (2010) Multi-core programming with OpenCL: performance and portability- OpenCL in a memory bound scenario, Master thesis, Norwegian University of Science and Technology, Available at
  7. 7.
    Gong C, Liu J, Chen H, Xie J, Gong Z (2011) Accelerating the Sweep3D for a graphic processor unit. J Inform Process Syst 7(1):63–74CrossRefGoogle Scholar
  8. 8.
    Hawick KA, Leist A, Playne DP (2009) Mixing multi-core CPUs and GPUs for scientific simulation software. Computer Science, Massey University, Tech. Rep. CSTN-102Google Scholar
  9. 9.
  10. 10.
    Kim CG, Lee SJ, Kim SD (2005) 2-D discrete cosine transform (DCT) on meshes with hierarchical control modes. Lect Notes Comput Sci 3522:675–682CrossRefGoogle Scholar
  11. 11.
    Kirk DB, Hwu WW (2010) Programming massively parallel processors: a hands-on approach, Morgan KaufmannGoogle Scholar
  12. 12.
    Klyuev V, Oleshchuk V (2011) Semantic retrieval: an approach to representing, searching and summarising text documents. IJITCC 1(2):221–234CrossRefGoogle Scholar
  13. 13.
    Li Y, Xiao L, Chen S, Tian H, Ruan L, Yu B (2011) Parallel point-multiplication based on the extended basic operations on conic curves over Ring Zn. JoC 2(1):69–78Google Scholar
  14. 14.
    Nie DH, Han KP, Lee HS (2009) GPU-based stereo matching algorithm with the strategy of population-based incremental learning. J Inform Process Syst 5(2):105–116CrossRefGoogle Scholar
  15. 15.
    Owens JD (2005) Streaming architectures and technology trends. In: M. Pharr (ed) GPU Gems 2. Addison-Wesley, pp. 457–470.Google Scholar
  16. 16.
    Reinders J (2007) Intel threading building block. O’Reilly, SebastopolGoogle Scholar
  17. 17.
    Robison A, Voss M, Kukanov A (2008) Optimization via reflection on work stealing in TBB. IEEE International Symposium on Parallel and Distributed Processing, pp. 1–8Google Scholar
  18. 18.
    Sathappan OL, Chitra P, Venkatesh P, Prabhu M. Modified genetic algorithm for multiobjective task scheduling on heterogeneous computing system. IJITCC 1(2), 146–158Google Scholar
  19. 19.
    Stallings W (2009) Computer organization and architecture 8/E: designing for performance. Prentice HallGoogle Scholar
  20. 20.
    Stone JE, Gohara D, Guochun S (2010) OpenCL: a parallel programming standard for heterogeneous computing systems. Comput Sci Eng 12(3):66–73CrossRefGoogle Scholar
  21. 21.
    Tullsen DM, Eggers SJ, Levy HM (1995) Simultaneous multithreading: maximizing on-chip parallelism. In Proceedings. 22nd Annual Int’l Symposium on Computer Architecture, ISCA-22, pp. 392–403Google Scholar
  22. 22.
    Zhu W, Curry J (2009) Parallel ant colony for nonlinear function optimization with graphics hardware acceleration. IEEE Int’l Conference on Systems, Man and Cybernetics, pp. 1803–1808Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Computer ScienceNamseoul UniversityCheonanSouth Korea
  2. 2.Graduate School of Information SecurityKorea UniversitySeoulSouth Korea

Personalised recommendations