On the Efficient Implementation of a Real-Time Kd-Tree Construction Algorithm
The kd tree is one of the most commonly used spatial data structures for a variety of graphics applications because of its reliably high-acceleration performance. Several years ago, Zhou et al. devised an effective kd-tree construction algorithm that runs entirely on a GPU. In this chapter, we present improved GPU programming techniques for implementing the algorithm more efficiently on current GPUs. One of the major ideas is to reduce the number of necessary kernel functions by replacing the essential, segmented-scan, and reduction computations by simpler per-block atomic operations, thereby alleviating the overheads from multiple synchronous kernel calls. Combined with the efficient implementation of intrablock scan and reduction, using recently introduced intrinsic functions, these changes achieve remarkable performance enhancement to the kd-tree construction process. Through an example of real-time ray tracing for dynamic scenes of nontrivial complexity, we demonstrate that the proposed GPU techniques can be exploited effectively for various real-time applications.
KeywordsReal-time ray tracing Kd-tree construction GPU computing CUDA Scan and reduction operations
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MOE) (No. 2012R1A1A2008958).
- 1.Wald, I., Havran, V.: On building fast kd-trees for ray tracing, and on doing that in O(Nlog N). In: Proceedings of the EEE Symposium on Interactive Ray Tracing, pp. 61–69 (2006)Google Scholar
- 2.Shevtsov, M., Soupikov, A.: Highly parallel fast Kd-tree construction for interactive ray tracing of dynamic scenes. Comp Graph Forum (Proceedings of Eurographics) 26:395–404 (2007)Google Scholar
- 3.Zhou, K., Hou, Q., Wang, R., Guo, B.: Real-time KD-tree construction on graphics hardware. ACM Trans. Graph. 27, 1–11 (2008)Google Scholar
- 5.Choi, B., Komuravelli, R., Lu, V., Sung, H., Bocchino, R., Adve, S., Hart, J.: Parallel SAH k-D tree construction. In: Proceedings of High-Performance Graphics (HPG’10), pp. 77–86 (2010)Google Scholar
- 6.Wu, Z., Zhao, F., Liu, X.: SAH KD-tree construction on GPU. In: Proceedings of High Performance Graph (HPG’11), pp. 71–78 (2011)Google Scholar
- 7.CUDPP Google Group.: CUDA data parallel primitives library release 2.0. http://code.google.com/p/cudpp/ (2011). Accessed 1 June 2013
- 8.Sengupta, S., Harris, M., Garland, M., Owens, J.: Efficient parallel scan algorithms for many-core GPUs. In: Scientific Computing with Multicore and Accelerators, Taylor & Francis, pp. 413–442 (2011)Google Scholar
- 9.NVIDIA.: CUDA C programming guide: design guide (PG-02829-001 v5.0) (2012)Google Scholar
- 10.Skjellum, A., Whittaker, D., Bangalore, P.: Ballot counting for optimal binary prefix sum. In: Presented in the GPU Technology Conference 2010 (2010)Google Scholar
- 11.Manku, G.: Fast bit counting routines. http://cpptruths.googlecode.com/svn/trunk/c/bitcount.c (2002). Accessed 1 June 2013