The Journal of Supercomputing

, Volume 74, Issue 6, pp 2255–2275 | Cite as

Real-time parallel image processing applications on multicore CPUs with OpenMP and GPGPU with CUDA

  • Semra Aydin
  • Refik Samet
  • Omer Faruk Bay


This paper presents real-time image processing applications using multicore and multiprocessing technologies. To this end, parallel image segmentation was performed on many images covering the entire surface of the same metallic and cylindrical moving objects. Experimental results on multicore CPU with OpenMP platform showed that by increasing the chunk size, the execution time decreases approximately four times in comparison with serial computing. The same experiments were implemented on GPGPU using four techniques: (1) Single image transmission with single pixel processing; (2) Single image transmission with multiple pixel processing; (3) Multiple image transmission with single pixel processing; and (4) Multiple image transmission with multiple pixel processing. All techniques were implemented on GeForce, Tesla K20 and Tesla K40. Experimental results of GPU with CUDA platform showed that by increasing the core number speedup is increased. Tesla K40 gave the best results of 35 and 12 (for the first technique), 36 and 13 (for the second technique), 54 and 16 (for the third technique), 71 and 17 (for the fourth technique) times improvement without and with data transmission time in comparison with serial computing. As a result, users are suggested to use Tesla K40 GPU and Multiple image transmission with multiple pixel processing to get the maximum performance.


Parallel computing Real-time image processing Image segmentation Thresholding Multicore programming GPU programming 


  1. 1.
    Hu J, Zhang T, Jiang H (2006) New multi-DSP parallel computing architecture for real-time image processing. J Syst Eng Electron 17(4):883CrossRefzbMATHGoogle Scholar
  2. 2.
    Mondal P, Biswal PK, Banerjee S (2016) FPGA based accelerated 3D affine transform for real-time image processing applications. Comput Electr Eng 49(1):69CrossRefGoogle Scholar
  3. 3.
    Mertes JG, Marranghello N, Pereira AS (2013) Real-time module for digital image processing developed on a FPGA. In: 12th IFAC Conference on Programmable Devices and Embedded Systems. IFAC Proceedings Volumes 46(28), p 405Google Scholar
  4. 4.
    Daz-Pernil D, Berciano A, Pea-Cantillana F, Gutirrez-Naranjo MA (2013) Segmenting images with gradient-based edge detection using membrane computing. Pattern Recognit Lett 34(8):846CrossRefGoogle Scholar
  5. 5.
    Huqqani AA, Schikuta E, Ye S, Chen P (2013) Multicore and GPU parallelization of neural networks for face recognition. Procedia Comput Sci 18:349CrossRefGoogle Scholar
  6. 6.
    Mahafzah BA (2011) Parallel multithreaded IDA heuristic search: algorithm design and performance evaluation. Int J Parallel Emerg Distrib Syst 26(1):61MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Mahafzah BA (2013) Performance assessment of multithreaded quicksort algorithm on simultaneous multithreaded architecture. J Supercomput 66(1):339CrossRefGoogle Scholar
  8. 8.
    Szgyi Z, Trk M, Pataki N (2011) Multicore C++ standard template library in a generative way. In: Proceedings of the Third Workshop on Generative Technologies (WGT) 2011. Electronic Notes in Theoretical Computer Science, vol 279(3), p 63Google Scholar
  9. 9.
    Smistad E, Elster AC, Lindseth F (2014) GPU accelerated segmentation and centerline extraction of tubular structures from medical images. Int J Comput Assist Radiol Surg 9(4):561. CrossRefGoogle Scholar
  10. 10.
    Brodtkorb AR, Hagen TR, SeTra ML (2013) Graphics processing unit GPU programming strategies and trends in GPU computing. J Parallel Distrib Comput 73(1):4CrossRefGoogle Scholar
  11. 11.
    Patil S, Junnarka A (2015) Color image segmentation using median cut and contourlet transform: a parallel segmentation approach. Int J Comput Sci Inf Technol (IJCSIT) 5(6):7353Google Scholar
  12. 12.
    Thapliyal H, Arabnia H (2006) Reversible programmable logic array (RPLA) using Fredkin and Feynman gates for industrial electronics and applications. In: Proceedings of 2006 International Conference on Computer Design and Conference on Computing in Nanotechnology, Las Vegas, pp 70–74Google Scholar
  13. 13.
    Thapliyal H, Arabnia H, Bajpai R, Sharma K (2007) Combined integer and variable precision (CIVP) floating point multiplication architecture for FPGAs. In: Proceedings of 2007 International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, pp 449–450Google Scholar
  14. 14.
    Arabnia HR, Oliver MA (1986) Fast operations on raster images with SIMD machine architectures. Comput Graph Forum 5(3):179–188.
  15. 15.
    Gopineedi PD, Thapliyal H, Srinivas MB, Arabnia HR (2006) Novel and efficient 4:2 and 5:2 compressors with minimum number of transistors designed for low-power operations, pp 160–168Google Scholar
  16. 16.
    Balasubramanian P, Arisaka R, Arabnia H (2012) RB DSOP a rule based disjoint sum of products synthesis method. In: Proceedings of 2012 International Conference on Computer Design, Las Vegas, pp 39–43Google Scholar
  17. 17.
    Thapliyal H, Srinivas M, Arabnia H (2005) Reversible logic synthesis of half, full and parallel subtractors. In: Proceedings of 2005 International Conference on Embedded Systems and Applications, Las Vegas, pp 165–172Google Scholar
  18. 18.
    Al-amri SS, Kalyankar NV, D KS (2010) Image segmentation by using threshold techniques. CoRR abs/1005.4020Google Scholar
  19. 19.
    Osuna-Enciso V, Cuevas E, Sossa H (2013) A comparison of nature inspired algorithms for multi-threshold image segmentation. Expert Syst Appl 40(4):1213CrossRefGoogle Scholar
  20. 20.
    Wei S, Hong Q, Hou M (2011) Automatic image segmentation based on PCNN with adaptive threshold time constant. Neurocomputing 74(9):1485CrossRefGoogle Scholar
  21. 21.
    Han S, Tao W, Wu X, cheng Tai X, Wang T (2010) Fast image segmentation based on multilevel banded closed-form method. Pattern Recognit Lett 31(3):216CrossRefGoogle Scholar
  22. 22.
    Ayala HVH, dos Santos FM, Mariani VC, dos Santos Coelho L (2015) Image thresholding segmentation based on a novel beta differential evolution approach. Expert Syst Appl 42(4):2136CrossRefGoogle Scholar
  23. 23.
    Wang R, Li C, Wang J, Wei X, Li Y, Zhu Y, Zhang S (2015) Threshold segmentation algorithm for automatic extraction of cerebral vessels from brain magnetic resonance angiography images. J Neurosci Methods 241:30CrossRefGoogle Scholar
  24. 24.
    Happ P, Feitosa R, Bentes C, Farias R (2012) A parallel image segmentation algorithm on GPUs. In: Proceedings of the 4th GEOBIA, Rio de Janeiro, 2012, pp 580–586Google Scholar
  25. 25.
    Smistad E, Elster AC, Lindseth F (2014) GPU accelerated segmentation and centerline extraction of tubular structures from medical images. Int J Comput Assist Radiol Surg 9(4):561CrossRefGoogle Scholar
  26. 26.
    Korbes A, Vitor GB, de Alencar Loyufoi R, Ferreira JV (2010) Analysis of a step-based watershed algorithm using CUDA. Int J Curr Res Rev 1(1):6Google Scholar
  27. 27.
    Singh BM, Sharma R, Mittal A, Ghosh D (2011) Parallel implementation of Otsus binarization approach on GPU. Int J Comput Appl 32(2):16Google Scholar
  28. 28.
    Farias R, Farias R, Marroquim R, Clua E (2013) Parallel image segmentation using reduction-sweeps on multicore processors and GPUs. In: Proceedings of the 2013 XXVI Conference on Graphics, Patterns and Images, SIBGRAPI ’13. IEEE Computer Society, Washington, DC, pp 139–146Google Scholar
  29. 29.
    Prosser N (2010) Medical image segmentation using gpu accelerated variational level set methods. Master’s thesis, Rochester Institute of TechnologyGoogle Scholar
  30. 30.
    Abramov A, Kulvicius T, Wörgötter F, Dellen B (2010) Real-time image segmentation on a GPU. In: Keller R, Kramer D, Weiss JP (eds) Facing the multicore-challenge. Lecture notes in computer science, vol 6310. Springer, Berlin, HeidelbergGoogle Scholar
  31. 31.
    Smistad E, Falch TL, Bozorgi M, Elster AC, Lindseth F (2015) Medical image segmentation on GPUs a comprehensive review. Med Image Anal 20(1):1CrossRefGoogle Scholar
  32. 32.
    Li Y, Jiao L, Shang R, Stolkin R (2015) Dynamic-context cooperative quantum-behaved particle swarm optimization based on multilevel thresholding applied to medical image segmentation. Inf Sci 294:408MathSciNetCrossRefGoogle Scholar
  33. 33.
    Chen Z, Meng X, Guo L, Liu G (2012) GICUDA: a parallel program for 3D correlation imaging of large scale gravity and gravity gradiometry data on graphics processing units with CUDA. Comput Geosci 46:119CrossRefGoogle Scholar
  34. 34.
    Bay OF, Samet R, Aydn S, Tural S, Bayram A (2015) Performance analysis of GPU-based parallel image segmentation using CUDA. In: Proceedings of the 2th International Conference on Advanced Technology and Sciences (Antalya-Turkey, 2015), ICAT’15, pp 426–429Google Scholar
  35. 35.
    Hovland RJ Latency and bandwidth impact on gpu-systems. Tech. rep., Norwegian University of Science and TechnologyGoogle Scholar
  36. 36.
    Samet R, Aydin S, Bay OF, Tural S, Bayram A (2015) Real time image processing applications on multicore CPU and GPGPU. In: The 21st International Conference on Parallel and Distributed Processing, WORLDCOMP’15, Las Vegas-Nevada, 27–30 July 2015Google Scholar
  37. 37.
    Samet R, Aydin S, Tural S, Bayram A (2016) Primer defects detection on military cartridge cases. In: The 15th annual International Conference, NICOGRAPH’15, Hangzhou, 6–8 July 2016Google Scholar
  38. 38.
    Abdullah M, Abuelrub E, Mahafzah B (2011) The chained-cubic tree interconnection network. Int Arab J Inf Technol 8(3):334Google Scholar
  39. 39.
    Mahafzah BA, Alshraideh M, Abu-Kabeer TM, Ahmad EF, Hamad NA (2012) The optical chained-cubic tree interconnection network: topological structure and properties. Comput Electr Eng 38(2):330. CrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Gazi UniversityAnkaraTurkey
  2. 2.Ankara UniversityAnkaraTurkey

Personalised recommendations