Performance analysis of a novel GPU computation-to-core mapping scheme for robust facet image modeling

Park, Seung In; Cao, Yong; Watson, Layne T.; Quek, Francis

doi:10.1007/s11554-012-0272-7

Performance analysis of a novel GPU computation-to-core mapping scheme for robust facet image modeling

Original Research Paper
Published: 25 September 2012

Volume 10, pages 485–500, (2015)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Seung In Park¹,
Yong Cao¹,
Layne T. Watson^1,2 &
…
Francis Quek¹

303 Accesses
1 Citation
Explore all metrics

Abstract

Modern graphics processing units (GPUs) are commodity data-parallel coprocessors capable of high performance computation and data throughput. It is well known that the GPUs are ideal implementation platforms for image processing applications. However, the level of efforts and expertise to optimize the application performance is still substantial. This paper investigates the computation-to-core mapping strategies to probe the efficiency and scalability of the robust facet image modeling algorithm on GPUs. Our fine-grained computation-to-core mapping scheme achieves a significant performance gain over the standard pixel-wise mapping scheme. With in-depth performance comparisons across the two different mapping schemes, we analyze the impact of the level of parallelism on the GPU computation and suggest two principles for optimizing future image processing applications on the GPU platform.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-GPU multi-display rendering of extremely large 3D environments

Article 19 December 2022

The Design and Prototype Implementation of a Pipelined Heterogeneous Multi-core GPU

GPU Architecture

References

AMD Inc.: AMD Accelerated Processing Units. Retrieved Feb. 2012 (2011). http://www.amd.com/us/products/technologies/fusion/Pages/fusion.aspx
Archuleta, J., Cao, Y., Scogland, T., Feng W.: Multi-dimensional characterization of temporal data mining on graphics processors. In: Proceedings of the 2009 IEEE International Symposium on Parallel and Distributed Processing (IPDPS ’09), IEEE Computer Society, pp. 1–12 (2009)
Branover, A., Foley, D., Steinman, M.: AMD’s Llano Fusion APU. IEEE Micro 99 (PrePrints, 2012)
Besl, P., Birch, J., Watson, L.: Robust window operators. Mach. Vis. Appl. 2(4), 179–191 (1989)
Article Google Scholar
Bui, P., Brockman, J.: Performance analysis of accelerated image registration using GPGPU. In: Proceedings of 2nd workshop on General Purpose Processing on Graphics Processing Units, ACM, pp 38–45 (2009)
Goldberg, D.: What every computer scientist should know about floating-point arithmetic. ACM Comput Surv 23(1), 5–48 (1991)
Article Google Scholar
Golub, G., Van Loan, C.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
Gregg, C., Hazelwood, K.: Where is the data? Why you cannot debate CPU vs. GPU performance without the answer. In: 2011 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 134–144 (2011)
Haralick, R.M., Watson, L.T.: A facet model for image data. Comput. Graph. Image Process. 15(2), 113–129 (1981)
Article Google Scholar
Haralick, R.M., Watson, L.T., Laffey, T.J.: The topographic primal sketch. Int. J. Robot. Res. 2(1), 50–72 (1983)
Article Google Scholar
Haralick, R.M.: Digital step edges from zero crossing of second directional derivatives. IEEE Trans. Pattern Anal. Mach. Intell. {\bf PAMI-6}(1):58–68 (1984)
Harish, P., Narayanan, P.: Accelerating large graph algorithms on the GPU using CUDA. In: Proceedings of the 14th International Conference on High, Performance Computing (HiPC’07), pp. 197–208 (2007)
Huang, J., Ponce, S., Park, S.I., Cao, Y., Quek, F.: GPU-accelerated computation for robust motion tracking using the CUDA framework. In: 5th International Conference on Visual Information Engineering, VIE 2008, pp. 437–442 (2008)
Householder, A.: Unitary triangularization of a nonsymmetric matrix. J. ACM 5(4), 339–342 (1958)
Article MathSciNet Google Scholar
Huber, P.J.: Robust estimation of a location parameter. Ann. Math. Stat. 35(1), 73–101 (1964)
Article Google Scholar
Jankowski, M.: Iterated facet model approach to background normalization. SPIE 2238, 198–206 (1994)
Article Google Scholar
Luo, Y.M., Duraiswami, R.: Canny edge detection on NVIDIA CUDA. In: 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Vol. 43(1), pp. 1–8 (2008)
Mainguy, J., Birch, J.B., Watson, L.T.: A robust variable order facet model for image data. Mach. Vis. Appl. 8, 141–162 (1995)
Article Google Scholar
Matalas, I., Benjamin, R., Kitney, R.: An edge detection technique using the facet model and parameterized relaxation labeling. IEEE Trans. Pattern Anal. Mach. Intell. 19, 328–341 (1997)
Article Google Scholar
Mizukami, Y., Tadamura, K.: Optical flow computation on compute unified device architecture. In: ICIAP 07: Proceedings of the 14th International Conference on Image Analysis and Processing, pp. 179–184 (2007)
Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable parallel programming with CUDA. Queue 6(2), 40–53 (2008)
Article Google Scholar
NVIDIA Corporation: NVIDIA’s Compute Unified Device Architecture. Retrieved Feb. 2012 (2010). http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf
NVIDIA Corporation: NVIDIA CUDA Best Practices Guide. Retrieved Feb. 2012 (2009). http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/NVIDIA_CUDA_BestPracticesGuide_2.3.pdf
NVIDIA Corporation: NVIDIA’s Next Generation CUDA Compute Architecture: Fermi. Retrieved Feb. 2012 (2010). http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf
Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Krűger, J., Lefohn, A.E., Purcell, T.J.: A survey of general purpose computation on graphics hardware. Comput. Graph. Forum 26(1), 80–113 (2007)
Article Google Scholar
Pathak, S.D., Kim, Y., Kim, J.: Efficient implementation of facet models on a multimedia system. Opt. Eng. 35(6), 1739–1745 (1996)
Article Google Scholar
Qiang, J., Haralick, R.M.: Efficient facet edge detection and quantitative performance evaluation. Pattern Recognit. 35(3), 689–700 (2002)
Article Google Scholar
Ryoo, S., Rodrigues, C., Baghsorkhi, S., Stone, S., Kirk, D., Hwu, W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, pp. 73–82 (2008a)
Ryoo, S., Rodrigues, C.I., Stone, S.S., Stratton, J.A., Ueng, S.Z., Baghsorkhi, S.S., Hwu, W.: Program optimization carving for GPU computing. J. Parallel Distrib. Comput. 68(10), 1389–1401 (2008b)
Article Google Scholar
Park, S.I., Cao, Y., Watson, L.T.: A novel computation-to-core mapping scheme for robust facet image modeling on GPUs. In: The 2010 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2010), pp. 189–195 (2010)
Schaa, D., Kaeli, D.: Exploring the multiple-GPU design space. In: Proceedings of the 2009 IEEE International Symposium on Parallel and Distributed Processing (IPDPS ’09), pp. 1–12 (2009)
Scheuermann, T., Hensley, J.: Efficient histogram generation using scattering on GPUs. In: Proceedings of the 2007 Symposium on Interactive 3D Graphics and Games, pp. 33–37 (2007)
Sinha, S., Frahm, J.M., Pollefeys, M., Genc, Y.: Feature tracking and matching in video using programmable graphics hardware. Mach. Vis. Appl., 22(1), pp. 207–217 (2007)
Google Scholar
Trefethen, L.N., Bau, D.: Numerical linear algebra. SIAM Press, Philadelphia (1997)
Terzopoulos, D.: The computation of visible-surface representation. IEEE Trans. Pattern Anal. Mach. Intell. 10(4), 417–438 (1988)
Article Google Scholar
Torr, P.H.S., Zisserman, A.: MLESAC: a new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 78(1), 138–156 (2000)
Article Google Scholar
Vineet, V., Narayanan, P.J.: CUDA cuts: Fast graph cuts on the GPU. In: 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 1–8 (2008)
Whitehead, N., Fit-Florea, A.: Precision and Performance: Floating Point and IEEE 754 Compliance for NVIDIA GPUs, White Paper, NVIDIA Corporation (2011)
Yang, R., Pollefeys, M.: Multi-resolution real-time stereo on commodity graphics hardware. In: Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and, Pattern Recognition (CVPR’03), pp. 211–217 (2003)
Yang, R., Pollefeys, M., Li, S.: Improved real-time stereo on commodity graphics hardware. In: Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition workshop (CVPRW’04), p. 36 (2004)
Yixun, L., Zhang, E.Z., Shen, X.: A cross-input adaptive framework for GPU program optimizations. In: Proceedings of the 2009 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–10 (2009)

Download references

Author information

Authors and Affiliations

Department of Computer Science, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
Seung In Park, Yong Cao, Layne T. Watson & Francis Quek
Department of Mathematics, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
Layne T. Watson

Authors

Seung In Park
View author publications
You can also search for this author in PubMed Google Scholar
Yong Cao
View author publications
You can also search for this author in PubMed Google Scholar
Layne T. Watson
View author publications
You can also search for this author in PubMed Google Scholar
Francis Quek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Cao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, S.I., Cao, Y., Watson, L.T. et al. Performance analysis of a novel GPU computation-to-core mapping scheme for robust facet image modeling. J Real-Time Image Proc 10, 485–500 (2015). https://doi.org/10.1007/s11554-012-0272-7

Download citation

Received: 04 November 2011
Accepted: 11 August 2012
Published: 25 September 2012
Issue Date: September 2015
DOI: https://doi.org/10.1007/s11554-012-0272-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance analysis of a novel GPU computation-to-core mapping scheme for robust facet image modeling

Abstract

Access this article

Similar content being viewed by others

Multi-GPU multi-display rendering of extremely large 3D environments

The Design and Prototype Implementation of a Pipelined Heterogeneous Multi-core GPU

GPU Architecture

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance analysis of a novel GPU computation-to-core mapping scheme for robust facet image modeling

Abstract

Access this article

Similar content being viewed by others

Multi-GPU multi-display rendering of extremely large 3D environments

The Design and Prototype Implementation of a Pipelined Heterogeneous Multi-core GPU

GPU Architecture

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation