Skip to main content
Log in

Acceleration techniques and evaluation on multi-core CPU, GPU and FPGA for image processing and super-resolution

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

Super-resolution (SR) techniques constitute a key element in image applications, which need high-resolution reconstruction, while in the worst case, only a single low-resolution observation is available. SR techniques involve computationally demanding processes, and thus, researchers are currently focusing on SR performance acceleration. Aiming at improving the SR performance, the current paper builds up on the characteristics of the L-SEABI SR method to introduce parallelization techniques for GPUs and FPGAs. The proposed techniques accelerate GPU reconstruction of ultra-high definition content, by achieving three (3\(\times\)) times faster than the real-time performance on mid-range and previous generation devices and at least nine times (9\(\times\)) faster than the real-time performance on high-end GPUs. The FPGA design leads to a scalable architecture performing four (4\(\times\)) times faster than the real-time on low-end Xilinx Virtex 5 devices and 69 times (69\(\times\)) faster than the real-time on the Virtex 2000t. Moreover, we confirm the benefits of the proposed acceleration techniques by employing them on a different category of image processing algorithms: on window-based disparity functions, for which the proposed GPU technique shows an improvement over the CPU performance ranging from 14 times (14\(\times\)) to 64 times (64\(\times\)), while the proposed FPGA architecture provides 29\(\times\) acceleration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. As recovered on-line mainly using the http://octopart.com search engine (April 2016).

References

  1. Yang, J., Huang, T.: Digital Imaging and Computer Vision. CRC Press, Boca Raton (2010)

    Google Scholar 

  2. Timofte, R., De Smet, V., Van Gool, L.: Anchored neighborhood regression for fast example-based super-resolution. In: International Conference on Computer Vision (ICCV 2013) (2013)

  3. Dong, C., Loy, C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision ECCV 2014, Volume 8692 of Lecture Notes in Computer Science, pp. 184–199. Springer, Berlin (2014)

  4. Dong, C., Loy, C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016)

    Article  Google Scholar 

  5. Timofte, R., De Smet, V., Van Gool, L.: A+: adjusted anchored neighborhood regression for fast super-resolution. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) Computer Vision—ACCV 2014, volume 9006 of Lecture Notes in Computer Science, pp. 111–126. Springer, Berlin (2015)

  6. Georgis, G., Lentaris, G., Reisis, D.: Reduced complexity superresolution for low-bitrate video compression. IEEE Trans. Circuits Syst. Video Technol. 26(2), 332–345 (2016)

    Article  Google Scholar 

  7. Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable parallel programming with CUDA. ACM Queue Mag. 6(2), 40–53 (2008)

    Article  Google Scholar 

  8. Freedman, G., Fattal, R.: Image and video upscaling from local self-examples. ACM Trans. Graph. 30(2), 12:1–12:11 (2011)

    Article  Google Scholar 

  9. Zhu, Y., Zhang, Y., Yuille, A.L.: Single image super-resolution using deformable patches. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2917–2924 (2014)

  10. Alex, K.: CUDA Convolutional Neural Networks (2015)

  11. nVidia: NVIDIA CUDA Fast Fourier Transform library (cuFFT) (2015)

  12. Gallup, D., Frahm, J.-M. Stam, J.: Cuda stereo. In: nVidia GPU Technology Conference 2009 (2009)

  13. Yang, Q.: Hardware-efficient bilateral filtering for stereo matching. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 1026–1032 (2014)

    Article  Google Scholar 

  14. Kowalczuk, J., Psota, E.T., Perez, L.C.: Real-time stereo matching on cuda using an iterative refinement method for adaptive support-weight correspondences. IEEE Trans. Circuits Syst. Video Technol. 23(1), 94–104 (2013)

    Article  Google Scholar 

  15. Kowalczuk, J., Psota, E.T., Perez, L.C.: Real-time temporal stereo matching using iterative adaptive support weights. In: 2013 IEEE International Conference on Electro/Information Technology (EIT), pp. 1–6 (2013)

  16. Bowen, O., Bouganis, C.: Real-time image super resolution using an fpga. In: International Conference on Field Programmable Logic and Applications, 2008 (FPL 2008), pp. 89–94 (2008)

  17. Angelopoulou, M.E., Bouganis, C.-S., Cheung, P.Y.K., Constantinides, G.A.: Robust real-time super-resolution on FPGA and an application to video enhancement. ACM Trans. Reconfig. Technol. Syst. 2(4), 22–29 (2009)

    Article  Google Scholar 

  18. Sanada, Y., Ohira, T., Chikuda, S., Igarashi, M., Ikebe, M., Asai, T., Motomura, M.: FPGA implementation of single-image super-resolution based on frame-bufferless box filtering. J. Signal Process. 17(4), 111–114 (2013)

    Article  Google Scholar 

  19. Pérez, J., Magdaleno, E., Pérez, F., Rodríguez, M., Hernández, D., Corrales, J.: Super-resolution in plenoptic cameras using fpgas. Sensors 14(5), 8669–8685 (2014)

    Article  Google Scholar 

  20. Okuhata, H., Imai, R., Ise, M., Omaki, R.Y., Nakamura, H., Hara, S., Shirakawa, I.: Implementation of dynamic-range enhancement and super-resolution algorithms for medical image processing. In: 2014 IEEE International Conference on Consumer Electronics (ICCE), pp. 181–184. IEEE (2014)

  21. Greisen, P., Heinzle, S., Gross, M., Burg, A.P.: An FPGA-based processing pipeline for high-definition stereo video. EURASIP J. Image Video Process. 1, 2011 (2011)

    Google Scholar 

  22. Jin, S., Cho, J., Pham, X.D., Lee, K.M., Park, S.-K., Kim, M., Jeon, J.W.: FPGA design and implementation of a real-time stereo vision system. IEEE Trans. Circuits Syst. Video Technol. 20(1), 15–26 (2010)

    Article  Google Scholar 

  23. Werner, M., Stabernack, B., Riechert, C.: Hardware implementation of a full hd real-time disparity estimation algorithm. IEEE Trans. Consum. Electron. 60(1), 66–73 (2014)

    Article  Google Scholar 

  24. Che, S., Li, J., Sheaffer, J.W., Skadron, K., Lach, J.: Accelerating compute-intensive applications with GPUs and FPGAs. In: Symposium on Application Specific Processors, 2008 (SASP 2008), pp. 101–107 (2008)

  25. Yang, D., Sun, J., Lee, J., Liang, G., Jenkins, D.D., Peterson, G.D., Li, H.: Performance comparison of cholesky decomposition on GPUs and FPGAs. In: Symposium on Application Accelerators in High Performance Computing (2010)

  26. Jones, D.H., Powell, A., Bouganis, C., Cheung, P.Y.K.: GPU versus FPGA for high productivity computing. In: 2010 International Conference on Field Programmable Logic and Applications (FPL), pp. 119–124 (2010)

  27. Kalarot, R., Morris, J.: Comparison of FPGA and GPU implementations of real-time stereo vision. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 9–15 (2010)

  28. Savarimuthu, T.R., Kjr-Nielsen, A., Srensen, A.S.: Real-time medical video processing, enabled by hardware accelerated correlations. J. Real-Time Image Process. 6(3), 187–197 (2011)

    Article  Google Scholar 

  29. Pietron, M., Wielgosz, M., Zurek, D., Jamro, E., Wiatr, K.: Comparison of GPU and FPGA implementation of SVM algorithm for fast image segmentation. In: Architecture of Computing Systems ARCS 2013, Volume 7767 of Lecture Notes in Computer Science, pp. 292–302. Springer, Berlin (2013)

  30. Tomislav, M., Ivan, A., Željko, H.: CPU, GPU and FPGA implementations of mald: Ceramic tile surface defects detection algorithm. Automatika 55(1), 1920–1927 (2014)

    Google Scholar 

  31. Gurumani, S.T., Cholakkal, H., Liang, Yun., Rupnow, K., Chen, D.: High-level synthesis of multiple dependent cuda kernels on FPGA. In: 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 305–312 (2013)

  32. Jianchao, Y., Wright, J., Huang, T.S., Ma, Y.: Image super-resolution via sparse representation. IEEE Trans. Image Process. 19(11), 2861–2873 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  33. Dong, W., Zhang, D., Shi, G., Wu, X.: Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization. IEEE Trans. Image Process. 20(7), 1838–1857 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  34. Villena, S., Vega, M., Molina, R., Katsaggelos, A.K.: Bayesian super-resolution image reconstruction using an l1 prior. In: Proceedings of 6th International Symposium on Image and Signal Processing and Analysis, 2009 (ISPA 2009), pp. 152–157 (2009)

  35. Dong, W., Zhang, D., Shi, G., Wu, X.: Nonlocal back-projection for adaptive image enlargement. In: 2009 16th IEEE International Conference on Image Processing (ICIP), pp. 349–352 (2009)

  36. Dong, W., Zhang, L., Lukac, R., Shi, G.: Sparse representation based image interpolation with nonlocal autoregressive modeling. IEEE Trans. Image Process. 22(4), 1382–1394 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  37. Zhou, W., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  38. Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 21(12), 4695–4708 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  39. Levon, J.: Oprofile 1.0, a Statistical Profiler for Linux Systems (2015)

  40. nVidia: Parallel Thread Execution ISA (2015)

  41. Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming, 1st edn. Addison-Wesley, Reading (2010)

    Google Scholar 

  42. Xu, C., Kirk, S.R., Jenkins, S.: Tiling for performance tuning on different models of GPUs. In: 2009 Second International Symposium on Information Science and Engineering (ISISE), pp. 500–504 (2009)

  43. Harris, M.: Optimizing Parallel Reduction in CUDA (2007)

  44. Eklund, A., Dufort, P.: GPU-Pro 5: Advanced Rendering Techniques—Non-separable 2D, 3D and 4D Filtering with CUDA, Chapter 5, 1st edn. CRC Press, Boca Raton (2014)

  45. Volkov, V.: Better Performance at Lower Occupancy (2010)

  46. Podlozhnyuk, V.: Image Convolution with CUDA (2012)

  47. nVidia: CUDA C Programming Guide (2015)

  48. NVIDIA’s Next Generation CUDA Compute Architecture: Kepler GK110 (2012)

  49. Szydzik, T., Callico, G.M., Nunez, A.: Efficient FPGA implementation of a high-quality super-resolution algorithm with real-time performance. IEEE Trans. Consum. Electron. 57(2), 664–672 (2011)

    Article  Google Scholar 

  50. Szeliski, R.: Computer Vision: Algorithms and Applications. Springer, Berlin (2010). ISBN: 978-1-84882-935-0

    MATH  Google Scholar 

  51. Lentaris, G., Diamantopoulos, D., Siozios, K., Soudris, D., Rodrigálvarez, A.M.: Hardware implementation of stereo correspondence algorithm for the exomars mission. In: 2012 22nd International Conference on Field Programmable Logic and Applications (FPL), pp. 667–670. IEEE (2012)

  52. Scharstein, D., Pal, C.: Learning conditional random fields for stereo. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007 (CVPR’07), pp. 1–8 (2007)

  53. Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47(1–3), 7–42 (2002)

    Article  MATH  Google Scholar 

  54. Scharstein, D., Szeliski, R.: High-accuracy stereo depth maps using structured light. In: Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003, Volume 1, pp. I-195–I-202 (2003)

  55. Hosni, A., Rhemann, C., Bleyer, M., Gelautz, M.: Temporally consistent disparity and optical flow via efficient spatio-temporal filtering. In: Ho, Y.-S. (ed.) Advances in Image and Video Technology, volume 7087 of Lecture Notes in Computer Science, pp. 165–177. Springer, Berlin (2012)

  56. Kishonti Ltd.: Compubench, a Professional OpenCL and Renderscript Benchmark (2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dionysios Reisis.

Appendix

Appendix

In this section, we provide the results of our entire super-resolution enhancement evaluation, in tabular (Table 9) and image form (Fig. 17). As Table 9 shows, when we employ our algorithms prior to the technique presented in [35]. the MSSIM metric recedes, particularly on \(352 \times 288\) resolutions. For this particular resolution, the BRISQUE results show that L-SEAI can have superior ANR enhancing performance than both L-SEABI and SIL-SEABI.

Apart from the Cameraman image, Fig. 17 subjectively assesses the output of [32, 36] when processing the \(176\times 144\)Carphone and \(256\times 256\)Butterfly and Starfish images. According to the results, the aliasing reduction effects of SIL-SEABI when it is applied before NARM are also apparent in the Carphone and Butterfly images (Fig. 17i, j). Finally, notice that SIL-SEABI improves the contrast of all images upsampled by [32] (Fig. 17q–t).

Table 9 Per resolution objective comparison of state-of-the-art SR algorithms (scaling factor \(f=2\)) when using L-SEABI (a), SIL-SEABI (b) and L-SEAI (c) as their initial reconstruction phase against the parameters proposed by their authors
Fig. 17
figure 17

Subjective comparison of [32, 36]: normal execution and enhanced with SIL-SEABI (\(f=2\))

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Georgis, G., Lentaris, G. & Reisis, D. Acceleration techniques and evaluation on multi-core CPU, GPU and FPGA for image processing and super-resolution. J Real-Time Image Proc 16, 1207–1234 (2019). https://doi.org/10.1007/s11554-016-0619-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-016-0619-6

Keywords

Navigation