Acceleration techniques and evaluation on multi-core CPU, GPU and FPGA for image processing and super-resolution

  • Georgios Georgis
  • George Lentaris
  • Dionysios Reisis
Original Research Paper


Super-resolution (SR) techniques constitute a key element in image applications, which need high-resolution reconstruction, while in the worst case, only a single low-resolution observation is available. SR techniques involve computationally demanding processes, and thus, researchers are currently focusing on SR performance acceleration. Aiming at improving the SR performance, the current paper builds up on the characteristics of the L-SEABI SR method to introduce parallelization techniques for GPUs and FPGAs. The proposed techniques accelerate GPU reconstruction of ultra-high definition content, by achieving three (3\(\times\)) times faster than the real-time performance on mid-range and previous generation devices and at least nine times (9\(\times\)) faster than the real-time performance on high-end GPUs. The FPGA design leads to a scalable architecture performing four (4\(\times\)) times faster than the real-time on low-end Xilinx Virtex 5 devices and 69 times (69\(\times\)) faster than the real-time on the Virtex 2000t. Moreover, we confirm the benefits of the proposed acceleration techniques by employing them on a different category of image processing algorithms: on window-based disparity functions, for which the proposed GPU technique shows an improvement over the CPU performance ranging from 14 times (14\(\times\)) to 64 times (64\(\times\)), while the proposed FPGA architecture provides 29\(\times\) acceleration.


Real-time image processing Graphics processing unit Field-programmable gate array Super-resolution Comparative power-performance evaluation 


  1. 1.
    Yang, J., Huang, T.: Digital Imaging and Computer Vision. CRC Press, Boca Raton (2010)Google Scholar
  2. 2.
    Timofte, R., De Smet, V., Van Gool, L.: Anchored neighborhood regression for fast example-based super-resolution. In: International Conference on Computer Vision (ICCV 2013) (2013)Google Scholar
  3. 3.
    Dong, C., Loy, C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision ECCV 2014, Volume 8692 of Lecture Notes in Computer Science, pp. 184–199. Springer, Berlin (2014)Google Scholar
  4. 4.
    Dong, C., Loy, C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016)CrossRefGoogle Scholar
  5. 5.
    Timofte, R., De Smet, V., Van Gool, L.: A+: adjusted anchored neighborhood regression for fast super-resolution. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) Computer Vision—ACCV 2014, volume 9006 of Lecture Notes in Computer Science, pp. 111–126. Springer, Berlin (2015)Google Scholar
  6. 6.
    Georgis, G., Lentaris, G., Reisis, D.: Reduced complexity superresolution for low-bitrate video compression. IEEE Trans. Circuits Syst. Video Technol. 26(2), 332–345 (2016)CrossRefGoogle Scholar
  7. 7.
    Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable parallel programming with CUDA. ACM Queue Mag. 6(2), 40–53 (2008)CrossRefGoogle Scholar
  8. 8.
    Freedman, G., Fattal, R.: Image and video upscaling from local self-examples. ACM Trans. Graph. 30(2), 12:1–12:11 (2011)CrossRefGoogle Scholar
  9. 9.
    Zhu, Y., Zhang, Y., Yuille, A.L.: Single image super-resolution using deformable patches. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2917–2924 (2014)Google Scholar
  10. 10.
    Alex, K.: CUDA Convolutional Neural Networks (2015)Google Scholar
  11. 11.
    nVidia: NVIDIA CUDA Fast Fourier Transform library (cuFFT) (2015)Google Scholar
  12. 12.
    Gallup, D., Frahm, J.-M. Stam, J.: Cuda stereo. In: nVidia GPU Technology Conference 2009 (2009)Google Scholar
  13. 13.
    Yang, Q.: Hardware-efficient bilateral filtering for stereo matching. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 1026–1032 (2014)CrossRefGoogle Scholar
  14. 14.
    Kowalczuk, J., Psota, E.T., Perez, L.C.: Real-time stereo matching on cuda using an iterative refinement method for adaptive support-weight correspondences. IEEE Trans. Circuits Syst. Video Technol. 23(1), 94–104 (2013)CrossRefGoogle Scholar
  15. 15.
    Kowalczuk, J., Psota, E.T., Perez, L.C.: Real-time temporal stereo matching using iterative adaptive support weights. In: 2013 IEEE International Conference on Electro/Information Technology (EIT), pp. 1–6 (2013)Google Scholar
  16. 16.
    Bowen, O., Bouganis, C.: Real-time image super resolution using an fpga. In: International Conference on Field Programmable Logic and Applications, 2008 (FPL 2008), pp. 89–94 (2008)Google Scholar
  17. 17.
    Angelopoulou, M.E., Bouganis, C.-S., Cheung, P.Y.K., Constantinides, G.A.: Robust real-time super-resolution on FPGA and an application to video enhancement. ACM Trans. Reconfig. Technol. Syst. 2(4), 22–29 (2009)CrossRefGoogle Scholar
  18. 18.
    Sanada, Y., Ohira, T., Chikuda, S., Igarashi, M., Ikebe, M., Asai, T., Motomura, M.: FPGA implementation of single-image super-resolution based on frame-bufferless box filtering. J. Signal Process. 17(4), 111–114 (2013)CrossRefGoogle Scholar
  19. 19.
    Pérez, J., Magdaleno, E., Pérez, F., Rodríguez, M., Hernández, D., Corrales, J.: Super-resolution in plenoptic cameras using fpgas. Sensors 14(5), 8669–8685 (2014)CrossRefGoogle Scholar
  20. 20.
    Okuhata, H., Imai, R., Ise, M., Omaki, R.Y., Nakamura, H., Hara, S., Shirakawa, I.: Implementation of dynamic-range enhancement and super-resolution algorithms for medical image processing. In: 2014 IEEE International Conference on Consumer Electronics (ICCE), pp. 181–184. IEEE (2014)Google Scholar
  21. 21.
    Greisen, P., Heinzle, S., Gross, M., Burg, A.P.: An FPGA-based processing pipeline for high-definition stereo video. EURASIP J. Image Video Process. 1, 2011 (2011)Google Scholar
  22. 22.
    Jin, S., Cho, J., Pham, X.D., Lee, K.M., Park, S.-K., Kim, M., Jeon, J.W.: FPGA design and implementation of a real-time stereo vision system. IEEE Trans. Circuits Syst. Video Technol. 20(1), 15–26 (2010)CrossRefGoogle Scholar
  23. 23.
    Werner, M., Stabernack, B., Riechert, C.: Hardware implementation of a full hd real-time disparity estimation algorithm. IEEE Trans. Consum. Electron. 60(1), 66–73 (2014)CrossRefGoogle Scholar
  24. 24.
    Che, S., Li, J., Sheaffer, J.W., Skadron, K., Lach, J.: Accelerating compute-intensive applications with GPUs and FPGAs. In: Symposium on Application Specific Processors, 2008 (SASP 2008), pp. 101–107 (2008)Google Scholar
  25. 25.
    Yang, D., Sun, J., Lee, J., Liang, G., Jenkins, D.D., Peterson, G.D., Li, H.: Performance comparison of cholesky decomposition on GPUs and FPGAs. In: Symposium on Application Accelerators in High Performance Computing (2010)Google Scholar
  26. 26.
    Jones, D.H., Powell, A., Bouganis, C., Cheung, P.Y.K.: GPU versus FPGA for high productivity computing. In: 2010 International Conference on Field Programmable Logic and Applications (FPL), pp. 119–124 (2010)Google Scholar
  27. 27.
    Kalarot, R., Morris, J.: Comparison of FPGA and GPU implementations of real-time stereo vision. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 9–15 (2010)Google Scholar
  28. 28.
    Savarimuthu, T.R., Kjr-Nielsen, A., Srensen, A.S.: Real-time medical video processing, enabled by hardware accelerated correlations. J. Real-Time Image Process. 6(3), 187–197 (2011)CrossRefGoogle Scholar
  29. 29.
    Pietron, M., Wielgosz, M., Zurek, D., Jamro, E., Wiatr, K.: Comparison of GPU and FPGA implementation of SVM algorithm for fast image segmentation. In: Architecture of Computing Systems ARCS 2013, Volume 7767 of Lecture Notes in Computer Science, pp. 292–302. Springer, Berlin (2013)Google Scholar
  30. 30.
    Tomislav, M., Ivan, A., Željko, H.: CPU, GPU and FPGA implementations of mald: Ceramic tile surface defects detection algorithm. Automatika 55(1), 1920–1927 (2014)Google Scholar
  31. 31.
    Gurumani, S.T., Cholakkal, H., Liang, Yun., Rupnow, K., Chen, D.: High-level synthesis of multiple dependent cuda kernels on FPGA. In: 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 305–312 (2013)Google Scholar
  32. 32.
    Jianchao, Y., Wright, J., Huang, T.S., Ma, Y.: Image super-resolution via sparse representation. IEEE Trans. Image Process. 19(11), 2861–2873 (2010)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Dong, W., Zhang, D., Shi, G., Wu, X.: Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization. IEEE Trans. Image Process. 20(7), 1838–1857 (2011)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Villena, S., Vega, M., Molina, R., Katsaggelos, A.K.: Bayesian super-resolution image reconstruction using an l1 prior. In: Proceedings of 6th International Symposium on Image and Signal Processing and Analysis, 2009 (ISPA 2009), pp. 152–157 (2009)Google Scholar
  35. 35.
    Dong, W., Zhang, D., Shi, G., Wu, X.: Nonlocal back-projection for adaptive image enlargement. In: 2009 16th IEEE International Conference on Image Processing (ICIP), pp. 349–352 (2009)Google Scholar
  36. 36.
    Dong, W., Zhang, L., Lukac, R., Shi, G.: Sparse representation based image interpolation with nonlocal autoregressive modeling. IEEE Trans. Image Process. 22(4), 1382–1394 (2013)MathSciNetCrossRefGoogle Scholar
  37. 37.
    Zhou, W., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)CrossRefGoogle Scholar
  38. 38.
    Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 21(12), 4695–4708 (2012)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Levon, J.: Oprofile 1.0, a Statistical Profiler for Linux Systems (2015)Google Scholar
  40. 40.
    nVidia: Parallel Thread Execution ISA (2015)Google Scholar
  41. 41.
    Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming, 1st edn. Addison-Wesley, Reading (2010)Google Scholar
  42. 42.
    Xu, C., Kirk, S.R., Jenkins, S.: Tiling for performance tuning on different models of GPUs. In: 2009 Second International Symposium on Information Science and Engineering (ISISE), pp. 500–504 (2009)Google Scholar
  43. 43.
    Harris, M.: Optimizing Parallel Reduction in CUDA (2007)Google Scholar
  44. 44.
    Eklund, A., Dufort, P.: GPU-Pro 5: Advanced Rendering Techniques—Non-separable 2D, 3D and 4D Filtering with CUDA, Chapter 5, 1st edn. CRC Press, Boca Raton (2014)Google Scholar
  45. 45.
    Volkov, V.: Better Performance at Lower Occupancy (2010)Google Scholar
  46. 46.
    Podlozhnyuk, V.: Image Convolution with CUDA (2012)Google Scholar
  47. 47.
    nVidia: CUDA C Programming Guide (2015)Google Scholar
  48. 48.
    NVIDIA’s Next Generation CUDA Compute Architecture: Kepler GK110 (2012)Google Scholar
  49. 49.
    Szydzik, T., Callico, G.M., Nunez, A.: Efficient FPGA implementation of a high-quality super-resolution algorithm with real-time performance. IEEE Trans. Consum. Electron. 57(2), 664–672 (2011)CrossRefGoogle Scholar
  50. 50.
    Szeliski, R.: Computer Vision: Algorithms and Applications. Springer, Berlin (2010). ISBN: 978-1-84882-935-0MATHGoogle Scholar
  51. 51.
    Lentaris, G., Diamantopoulos, D., Siozios, K., Soudris, D., Rodrigálvarez, A.M.: Hardware implementation of stereo correspondence algorithm for the exomars mission. In: 2012 22nd International Conference on Field Programmable Logic and Applications (FPL), pp. 667–670. IEEE (2012)Google Scholar
  52. 52.
    Scharstein, D., Pal, C.: Learning conditional random fields for stereo. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007 (CVPR’07), pp. 1–8 (2007)Google Scholar
  53. 53.
    Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47(1–3), 7–42 (2002)CrossRefMATHGoogle Scholar
  54. 54.
    Scharstein, D., Szeliski, R.: High-accuracy stereo depth maps using structured light. In: Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003, Volume 1, pp. I-195–I-202 (2003)Google Scholar
  55. 55.
    Hosni, A., Rhemann, C., Bleyer, M., Gelautz, M.: Temporally consistent disparity and optical flow via efficient spatio-temporal filtering. In: Ho, Y.-S. (ed.) Advances in Image and Video Technology, volume 7087 of Lecture Notes in Computer Science, pp. 165–177. Springer, Berlin (2012)Google Scholar
  56. 56.
    Kishonti Ltd.: Compubench, a Professional OpenCL and Renderscript Benchmark (2015)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Georgios Georgis
    • 1
  • George Lentaris
    • 1
  • Dionysios Reisis
    • 1
  1. 1.Electronics Laboratory Department of PhysicsNational and Kapodistrian University of AthensAthensGreece

Personalised recommendations