Optimizations with CUDA: A Case Study on 3D Curve-Skeleton Extraction from Voxelized Models

  • Jesús Jiménez
  • Juan Ruiz de Miras
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 359)


In this paper, we show how we have coded and optimized a complex and not trivially parallelizable case study: a 3D curve-skeleton calculation algorithm. For this we use NVIDIA CUDA, which allows the programmer to easily code algorithms for executing in a parallel way on NVIDIA GPU devices. However, when working with algorithms that have high data-sharing or data-dependence requirements, like the curve-skeleton calculation, it is not always a trivial task to achieve acceptable acceleration rates. So we detail step by step a comprehensive collection of optimizations to be considered in this class of algorithms, and in general in any CUDA implementation. Two different GPU architectures have been used to test the implications of each optimization, the NVIDIA GT200 architecture and the Fermi GF100. As a result, although the first direct CUDA implementation of our algorithm ran even slower than its CPU version, overall speedups of 19x (GT200) and 68x (Fermi GF100) were finally achieved.


Curve-skeleton 3D Thinning CUDA GPGPU Optimizations Fermi 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    Khronos OpenCL Working Group: The OpenCL Specification, Version 1.2 (2012),
  3. 3.
    Kong, J., Dimitrov, M., Yang, Y., Liyanage, J., Cao, L., Staples, J., Mantor, M., Zhou, H.: Accelerating MATLAB Image Processing Toolbox Functions on GPUs. In: Proc. of the Third Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU-3), Pittsburgh, PA, USA (March 2010)Google Scholar
  4. 4.
  5. 5.
    Kirk, D.B., Hwu, W.W.: Programming Massively Parallel Processors. Hands-on Approach. Morgan Kaufmann Publishers, Burlington (2010)Google Scholar
  6. 6.
    Sanders, J., Kandrot, E.: CUDA by Example. An Introduction to General-Purpose GPU Programming. Addison-Wesley (2010)Google Scholar
  7. 7.
    Huang, Q., Huang, Z., Werstein, P., Purvis, M.: GPU as a General Purpose Computing Resource. In: Proc. of the International Conference on Parallel and Distributed Computing. Applications and Technologies, pp. 151–158 (2008)Google Scholar
  8. 8.
    Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.W.: Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA. In: Proc. of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2008)Google Scholar
  9. 9.
    Feinbure, F., Troger, P., Polze, A.: Joint Forces: From Multithreaded Programming to GPU Computing. IEEE Software 28(3), 51–57 (2011)CrossRefGoogle Scholar
  10. 10.
    Wittenbrink, C.M., Kilgariff, E., Prabhu, A.: Fermi GF100 GPU Architecture. IEEE Micro 31, 50–59 (2011)CrossRefGoogle Scholar
  11. 11.
    Reyes, R., de Sande, F.: Optimize or wait? Using llc fast-prototyping tool to evaluate CUDA optimizations. In: Proceedings of 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing, pp. 257–261 (2011)Google Scholar
  12. 12.
    Torres, Y., González-Escribano, A., Llanos, D.R.: Understanding the Impact of CUDA Tuning Techniques for Fermi. In: Proceedings of the 2011 International Conference on High Performance Computing and Simulation, HPCS, Number 5999886, pp. 631–639 (2011)Google Scholar
  13. 13.
    Cornea, N., Silver, D., Min, P.: Curve-skeleton Properties, Applications and Algorithms. IEEE Transactions on Visualization and Computer Graphics 13, 530–548 (2007)CrossRefGoogle Scholar
  14. 14.
    Palágyi, K., Kuba, A.: A Parallel 3D 12-Subiteration Thinning Algorithm. Graphical Models and Image Processing 61, 199–221 (1999)CrossRefGoogle Scholar
  15. 15.
    Stanford University: The Stanford 3D Scanning Repository (2012),
  16. 16.
  17. 17.
  18. 18.
  19. 19.
    Price, D.K., Humphrey, J.R., Spagnoli, K.E., Paolini, A.L.: Analyzing the Impact of Data Movement on GPU Computations. In: Proc. of the SPIE - The International Society for Optical Engineering, vol. 7705 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Jesús Jiménez
    • 1
  • Juan Ruiz de Miras
    • 1
  1. 1.Department of Computer ScienceUniversity of JaénJaénSpain

Personalised recommendations