On the Efficient Implementation of a Real-Time Kd-Tree Construction Algorithm

  • Byungjoon Chang
  • Woong Seo
  • Insung IhmEmail author


The kd tree is one of the most commonly used spatial data structures for a variety of graphics applications because of its reliably high-acceleration performance. Several years ago, Zhou et al. devised an effective kd-tree construction algorithm that runs entirely on a GPU. In this chapter, we present improved GPU programming techniques for implementing the algorithm more efficiently on current GPUs. One of the major ideas is to reduce the number of necessary kernel functions by replacing the essential, segmented-scan, and reduction computations by simpler per-block atomic operations, thereby alleviating the overheads from multiple synchronous kernel calls. Combined with the efficient implementation of intrablock scan and reduction, using recently introduced intrinsic functions, these changes achieve remarkable performance enhancement to the kd-tree construction process. Through an example of real-time ray tracing for dynamic scenes of nontrivial complexity, we demonstrate that the proposed GPU techniques can be exploited effectively for various real-time applications.


Real-time ray tracing Kd-tree construction GPU computing CUDA Scan and reduction operations 



This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MOE) (No. 2012R1A1A2008958).


  1. 1.
    Wald, I., Havran, V.: On building fast kd-trees for ray tracing, and on doing that in O(Nlog N). In: Proceedings of the EEE Symposium on Interactive Ray Tracing, pp. 61–69 (2006)Google Scholar
  2. 2.
    Shevtsov, M., Soupikov, A.: Highly parallel fast Kd-tree construction for interactive ray tracing of dynamic scenes. Comp Graph Forum (Proceedings of Eurographics) 26:395–404 (2007)Google Scholar
  3. 3.
    Zhou, K., Hou, Q., Wang, R., Guo, B.: Real-time KD-tree construction on graphics hardware. ACM Trans. Graph. 27, 1–11 (2008)Google Scholar
  4. 4.
    Hou, Q., Sun, X., Zhou, K., Lauterbach, C., Manocha, D.: Memory-scalable GPU spatial hierarchy construction. IEEE Trans. Vis. Comput. Graph. 17, 466–474 (2011)CrossRefGoogle Scholar
  5. 5.
    Choi, B., Komuravelli, R., Lu, V., Sung, H., Bocchino, R., Adve, S., Hart, J.: Parallel SAH k-D tree construction. In: Proceedings of High-Performance Graphics (HPG’10), pp. 77–86 (2010)Google Scholar
  6. 6.
    Wu, Z., Zhao, F., Liu, X.: SAH KD-tree construction on GPU. In: Proceedings of High Performance Graph (HPG’11), pp. 71–78 (2011)Google Scholar
  7. 7.
    CUDPP Google Group.: CUDA data parallel primitives library release 2.0. (2011). Accessed 1 June 2013
  8. 8.
    Sengupta, S., Harris, M., Garland, M., Owens, J.: Efficient parallel scan algorithms for many-core GPUs. In: Scientific Computing with Multicore and Accelerators, Taylor & Francis, pp. 413–442 (2011)Google Scholar
  9. 9.
    NVIDIA.: CUDA C programming guide: design guide (PG-02829-001 v5.0) (2012)Google Scholar
  10. 10.
    Skjellum, A., Whittaker, D., Bangalore, P.: Ballot counting for optimal binary prefix sum. In: Presented in the GPU Technology Conference 2010 (2010)Google Scholar
  11. 11.
    Manku, G.: Fast bit counting routines. (2002). Accessed 1 June 2013

Copyright information

© Springer Science+Business Media Singapore 2015

Authors and Affiliations

  1. 1.Digital Media & Communications R&D CenterSamsung ElectronicsSuwon-siSouth Korea
  2. 2.Department of Computer Science and EngineeringSogang UniversitySeoulSouth Korea

Personalised recommendations