Acceleration techniques and evaluation on multi-core CPU, GPU and FPGA for image processing and super-resolution

Georgis, Georgios; Lentaris, George; Reisis, Dionysios

doi:10.1007/s11554-016-0619-6

Acceleration techniques and evaluation on multi-core CPU, GPU and FPGA for image processing and super-resolution

Original Research Paper
Published: 21 July 2016

Volume 16, pages 1207–1234, (2019)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Georgios Georgis¹,
George Lentaris¹ &
Dionysios Reisis¹

1854 Accesses
23 Citations
Explore all metrics

Abstract

Super-resolution (SR) techniques constitute a key element in image applications, which need high-resolution reconstruction, while in the worst case, only a single low-resolution observation is available. SR techniques involve computationally demanding processes, and thus, researchers are currently focusing on SR performance acceleration. Aiming at improving the SR performance, the current paper builds up on the characteristics of the L-SEABI SR method to introduce parallelization techniques for GPUs and FPGAs. The proposed techniques accelerate GPU reconstruction of ultra-high definition content, by achieving three (3\(\times\)) times faster than the real-time performance on mid-range and previous generation devices and at least nine times (9\(\times\)) faster than the real-time performance on high-end GPUs. The FPGA design leads to a scalable architecture performing four (4\(\times\)) times faster than the real-time on low-end Xilinx Virtex 5 devices and 69 times (69\(\times\)) faster than the real-time on the Virtex 2000t. Moreover, we confirm the benefits of the proposed acceleration techniques by employing them on a different category of image processing algorithms: on window-based disparity functions, for which the proposed GPU technique shows an improvement over the CPU performance ranging from 14 times (14\(\times\)) to 64 times (64\(\times\)), while the proposed FPGA architecture provides 29\(\times\) acceleration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning a Deep Convolutional Network for Image Super-Resolution

Accelerating the Super-Resolution Convolutional Neural Network

Can GPU performance increase faster than the code error rate?

Article Open access 18 April 2024

Notes

As recovered on-line mainly using the http://octopart.com search engine (April 2016).

References

Yang, J., Huang, T.: Digital Imaging and Computer Vision. CRC Press, Boca Raton (2010)
Google Scholar
Timofte, R., De Smet, V., Van Gool, L.: Anchored neighborhood regression for fast example-based super-resolution. In: International Conference on Computer Vision (ICCV 2013) (2013)
Dong, C., Loy, C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision ECCV 2014, Volume 8692 of Lecture Notes in Computer Science, pp. 184–199. Springer, Berlin (2014)
Dong, C., Loy, C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016)
Article Google Scholar
Timofte, R., De Smet, V., Van Gool, L.: A+: adjusted anchored neighborhood regression for fast super-resolution. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) Computer Vision—ACCV 2014, volume 9006 of Lecture Notes in Computer Science, pp. 111–126. Springer, Berlin (2015)
Georgis, G., Lentaris, G., Reisis, D.: Reduced complexity superresolution for low-bitrate video compression. IEEE Trans. Circuits Syst. Video Technol. 26(2), 332–345 (2016)
Article Google Scholar
Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable parallel programming with CUDA. ACM Queue Mag. 6(2), 40–53 (2008)
Article Google Scholar
Freedman, G., Fattal, R.: Image and video upscaling from local self-examples. ACM Trans. Graph. 30(2), 12:1–12:11 (2011)
Article Google Scholar
Zhu, Y., Zhang, Y., Yuille, A.L.: Single image super-resolution using deformable patches. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2917–2924 (2014)
Alex, K.: CUDA Convolutional Neural Networks (2015)
nVidia: NVIDIA CUDA Fast Fourier Transform library (cuFFT) (2015)
Gallup, D., Frahm, J.-M. Stam, J.: Cuda stereo. In: nVidia GPU Technology Conference 2009 (2009)
Yang, Q.: Hardware-efficient bilateral filtering for stereo matching. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 1026–1032 (2014)
Article Google Scholar
Kowalczuk, J., Psota, E.T., Perez, L.C.: Real-time stereo matching on cuda using an iterative refinement method for adaptive support-weight correspondences. IEEE Trans. Circuits Syst. Video Technol. 23(1), 94–104 (2013)
Article Google Scholar
Kowalczuk, J., Psota, E.T., Perez, L.C.: Real-time temporal stereo matching using iterative adaptive support weights. In: 2013 IEEE International Conference on Electro/Information Technology (EIT), pp. 1–6 (2013)
Bowen, O., Bouganis, C.: Real-time image super resolution using an fpga. In: International Conference on Field Programmable Logic and Applications, 2008 (FPL 2008), pp. 89–94 (2008)
Angelopoulou, M.E., Bouganis, C.-S., Cheung, P.Y.K., Constantinides, G.A.: Robust real-time super-resolution on FPGA and an application to video enhancement. ACM Trans. Reconfig. Technol. Syst. 2(4), 22–29 (2009)
Article Google Scholar
Sanada, Y., Ohira, T., Chikuda, S., Igarashi, M., Ikebe, M., Asai, T., Motomura, M.: FPGA implementation of single-image super-resolution based on frame-bufferless box filtering. J. Signal Process. 17(4), 111–114 (2013)
Article Google Scholar
Pérez, J., Magdaleno, E., Pérez, F., Rodríguez, M., Hernández, D., Corrales, J.: Super-resolution in plenoptic cameras using fpgas. Sensors 14(5), 8669–8685 (2014)
Article Google Scholar
Okuhata, H., Imai, R., Ise, M., Omaki, R.Y., Nakamura, H., Hara, S., Shirakawa, I.: Implementation of dynamic-range enhancement and super-resolution algorithms for medical image processing. In: 2014 IEEE International Conference on Consumer Electronics (ICCE), pp. 181–184. IEEE (2014)
Greisen, P., Heinzle, S., Gross, M., Burg, A.P.: An FPGA-based processing pipeline for high-definition stereo video. EURASIP J. Image Video Process. 1, 2011 (2011)
Google Scholar
Jin, S., Cho, J., Pham, X.D., Lee, K.M., Park, S.-K., Kim, M., Jeon, J.W.: FPGA design and implementation of a real-time stereo vision system. IEEE Trans. Circuits Syst. Video Technol. 20(1), 15–26 (2010)
Article Google Scholar
Werner, M., Stabernack, B., Riechert, C.: Hardware implementation of a full hd real-time disparity estimation algorithm. IEEE Trans. Consum. Electron. 60(1), 66–73 (2014)
Article Google Scholar
Che, S., Li, J., Sheaffer, J.W., Skadron, K., Lach, J.: Accelerating compute-intensive applications with GPUs and FPGAs. In: Symposium on Application Specific Processors, 2008 (SASP 2008), pp. 101–107 (2008)
Yang, D., Sun, J., Lee, J., Liang, G., Jenkins, D.D., Peterson, G.D., Li, H.: Performance comparison of cholesky decomposition on GPUs and FPGAs. In: Symposium on Application Accelerators in High Performance Computing (2010)
Jones, D.H., Powell, A., Bouganis, C., Cheung, P.Y.K.: GPU versus FPGA for high productivity computing. In: 2010 International Conference on Field Programmable Logic and Applications (FPL), pp. 119–124 (2010)
Kalarot, R., Morris, J.: Comparison of FPGA and GPU implementations of real-time stereo vision. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 9–15 (2010)
Savarimuthu, T.R., Kjr-Nielsen, A., Srensen, A.S.: Real-time medical video processing, enabled by hardware accelerated correlations. J. Real-Time Image Process. 6(3), 187–197 (2011)
Article Google Scholar
Pietron, M., Wielgosz, M., Zurek, D., Jamro, E., Wiatr, K.: Comparison of GPU and FPGA implementation of SVM algorithm for fast image segmentation. In: Architecture of Computing Systems ARCS 2013, Volume 7767 of Lecture Notes in Computer Science, pp. 292–302. Springer, Berlin (2013)
Tomislav, M., Ivan, A., Željko, H.: CPU, GPU and FPGA implementations of mald: Ceramic tile surface defects detection algorithm. Automatika 55(1), 1920–1927 (2014)
Google Scholar
Gurumani, S.T., Cholakkal, H., Liang, Yun., Rupnow, K., Chen, D.: High-level synthesis of multiple dependent cuda kernels on FPGA. In: 2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 305–312 (2013)
Jianchao, Y., Wright, J., Huang, T.S., Ma, Y.: Image super-resolution via sparse representation. IEEE Trans. Image Process. 19(11), 2861–2873 (2010)
Article MathSciNet MATH Google Scholar
Dong, W., Zhang, D., Shi, G., Wu, X.: Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization. IEEE Trans. Image Process. 20(7), 1838–1857 (2011)
Article MathSciNet MATH Google Scholar
Villena, S., Vega, M., Molina, R., Katsaggelos, A.K.: Bayesian super-resolution image reconstruction using an l1 prior. In: Proceedings of 6th International Symposium on Image and Signal Processing and Analysis, 2009 (ISPA 2009), pp. 152–157 (2009)
Dong, W., Zhang, D., Shi, G., Wu, X.: Nonlocal back-projection for adaptive image enlargement. In: 2009 16th IEEE International Conference on Image Processing (ICIP), pp. 349–352 (2009)
Dong, W., Zhang, L., Lukac, R., Shi, G.: Sparse representation based image interpolation with nonlocal autoregressive modeling. IEEE Trans. Image Process. 22(4), 1382–1394 (2013)
Article MathSciNet MATH Google Scholar
Zhou, W., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Mittal, A., Moorthy, A.K., Bovik, A.C.: No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 21(12), 4695–4708 (2012)
Article MathSciNet MATH Google Scholar
Levon, J.: Oprofile 1.0, a Statistical Profiler for Linux Systems (2015)
nVidia: Parallel Thread Execution ISA (2015)
Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming, 1st edn. Addison-Wesley, Reading (2010)
Google Scholar
Xu, C., Kirk, S.R., Jenkins, S.: Tiling for performance tuning on different models of GPUs. In: 2009 Second International Symposium on Information Science and Engineering (ISISE), pp. 500–504 (2009)
Harris, M.: Optimizing Parallel Reduction in CUDA (2007)
Eklund, A., Dufort, P.: GPU-Pro 5: Advanced Rendering Techniques—Non-separable 2D, 3D and 4D Filtering with CUDA, Chapter 5, 1st edn. CRC Press, Boca Raton (2014)
Volkov, V.: Better Performance at Lower Occupancy (2010)
Podlozhnyuk, V.: Image Convolution with CUDA (2012)
nVidia: CUDA C Programming Guide (2015)
NVIDIA’s Next Generation CUDA Compute Architecture: Kepler GK110 (2012)
Szydzik, T., Callico, G.M., Nunez, A.: Efficient FPGA implementation of a high-quality super-resolution algorithm with real-time performance. IEEE Trans. Consum. Electron. 57(2), 664–672 (2011)
Article Google Scholar
Szeliski, R.: Computer Vision: Algorithms and Applications. Springer, Berlin (2010). ISBN: 978-1-84882-935-0
MATH Google Scholar
Lentaris, G., Diamantopoulos, D., Siozios, K., Soudris, D., Rodrigálvarez, A.M.: Hardware implementation of stereo correspondence algorithm for the exomars mission. In: 2012 22nd International Conference on Field Programmable Logic and Applications (FPL), pp. 667–670. IEEE (2012)
Scharstein, D., Pal, C.: Learning conditional random fields for stereo. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007 (CVPR’07), pp. 1–8 (2007)
Scharstein, D., Szeliski, R.: A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 47(1–3), 7–42 (2002)
Article MATH Google Scholar
Scharstein, D., Szeliski, R.: High-accuracy stereo depth maps using structured light. In: Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003, Volume 1, pp. I-195–I-202 (2003)
Hosni, A., Rhemann, C., Bleyer, M., Gelautz, M.: Temporally consistent disparity and optical flow via efficient spatio-temporal filtering. In: Ho, Y.-S. (ed.) Advances in Image and Video Technology, volume 7087 of Lecture Notes in Computer Science, pp. 165–177. Springer, Berlin (2012)
Kishonti Ltd.: Compubench, a Professional OpenCL and Renderscript Benchmark (2015)

Download references

Author information

Authors and Affiliations

Electronics Laboratory Department of Physics, National and Kapodistrian University of Athens, Athens, Greece
Georgios Georgis, George Lentaris & Dionysios Reisis

Authors

Georgios Georgis
View author publications
You can also search for this author in PubMed Google Scholar
George Lentaris
View author publications
You can also search for this author in PubMed Google Scholar
Dionysios Reisis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dionysios Reisis.

Appendix

In this section, we provide the results of our entire super-resolution enhancement evaluation, in tabular (Table 9) and image form (Fig. 17). As Table 9 shows, when we employ our algorithms prior to the technique presented in [35]. the MSSIM metric recedes, particularly on \(352 \times 288\) resolutions. For this particular resolution, the BRISQUE results show that L-SEAI can have superior ANR enhancing performance than both L-SEABI and SIL-SEABI.

Apart from the Cameraman image, Fig. 17 subjectively assesses the output of [32, 36] when processing the \(176\times 144\)Carphone and \(256\times 256\)Butterfly and Starfish images. According to the results, the aliasing reduction effects of SIL-SEABI when it is applied before NARM are also apparent in the Carphone and Butterfly images (Fig. 17i, j). Finally, notice that SIL-SEABI improves the contrast of all images upsampled by [32] (Fig. 17q–t).

Table 9 Per resolution objective comparison of state-of-the-art SR algorithms (scaling factor \(f=2\)) when using L-SEABI (a), SIL-SEABI (b) and L-SEAI (c) as their initial reconstruction phase against the parameters proposed by their authors

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Georgis, G., Lentaris, G. & Reisis, D. Acceleration techniques and evaluation on multi-core CPU, GPU and FPGA for image processing and super-resolution. J Real-Time Image Proc 16, 1207–1234 (2019). https://doi.org/10.1007/s11554-016-0619-6

Download citation

Received: 24 December 2015
Accepted: 05 July 2016
Published: 21 July 2016
Issue Date: 13 August 2019
DOI: https://doi.org/10.1007/s11554-016-0619-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Acceleration techniques and evaluation on multi-core CPU, GPU and FPGA for image processing and super-resolution

Abstract

Access this article

Similar content being viewed by others

Learning a Deep Convolutional Network for Image Super-Resolution

Accelerating the Super-Resolution Convolutional Neural Network

Can GPU performance increase faster than the code error rate?

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Acceleration techniques and evaluation on multi-core CPU, GPU and FPGA for image processing and super-resolution

Abstract

Access this article

Similar content being viewed by others

Learning a Deep Convolutional Network for Image Super-Resolution

Accelerating the Super-Resolution Convolutional Neural Network

Can GPU performance increase faster than the code error rate?

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation