International Journal of Parallel Programming

, Volume 46, Issue 6, pp 1139–1156 | Cite as

Parallel kd-Tree Construction on the GPU with an Adaptive Split and Sort Strategy

  • David Wehr
  • Rafael Radkowski


We introduce a parallel kd-tree construction method for 3-dimensional points on a GPU which employs a sorting algorithm that maintains high parallelism throughout construction. Typically, large arrays in the upper levels of a kd-tree do not yield high performance when computing each node in one thread. Conversely, small arrays in the lower levels of the tree do not benefit from typical parallel sorts. To address these issues, the proposed sorting approach uses a modified parallel sort on the upper levels before switching to basic parallelization on the lower levels. Our work focuses on 3D point registration and our results indicate that a speed gain by a factor of 100 can be achieved in comparison to a naive parallel algorithm for a typical scene.


GPU computing kd-Tree construction Parallel radix sort 3D-point registration 


  1. 1.
    Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45(6), 891–923 (1998)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Atkinson, M.D., Sack, J.R., Santoro, N., Strothotte, T.: Min-max heaps and generalized priority queues. Commun. ACM 29(10), 996–1000 (1986)CrossRefGoogle Scholar
  3. 3.
    Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)CrossRefGoogle Scholar
  4. 4.
    Bentley, J.L.: Multidimensional divide-and-conquer. Commun. ACM 23(4), 214–229 (1980)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3(3), 209–226 (1977)CrossRefGoogle Scholar
  6. 6.
    Garcia, V., Debreuve, E., Barlaud, M.: Fast k nearest neighbor search using GPU. In: 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2008)Google Scholar
  7. 7.
    Garrett, T., Radkowski, R., Sheaffer, J.: Gpu-accelerated descriptor extraction process for 3d registration in augmented reality. In: 23rd International Conference on Pattern Recognition, Cancun, Mexico (2016)Google Scholar
  8. 8.
    Ha, L., Kruger, L., Silva, C.: Fast four-way parallel radix sorting on GPUs. Comput. Graph. Forum 28(8), 2368–2378 (2009)CrossRefGoogle Scholar
  9. 9.
    Harada, T., Howes, L.: Introduction to GPU radix sort. In: Heterogeneous Computing with OpenCL. Morgan Kaufman (2011)Google Scholar
  10. 10.
    Harris, M.: Maxwell: The Most Advanced CUDA GPU Ever Made. NVIDIA, Santa Clara (2014)Google Scholar
  11. 11.
    Havran, V.: Heuristic ray shooting algorithms. Ph.D. thesis, Czech Technical University, Czech Technical University (2001)Google Scholar
  12. 12.
    Hu, L., Nooshabadi, S., Ahmadi, M.: Massively parallel KD-tree construction and nearest neighbor search algorithms. In: 2015 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 2752–2755 (2015)Google Scholar
  13. 13.
    Karras, T.: Maximizing parallelism in the construction of BVHs, Octrees, and k-d trees. In: Proceedings of the Fourth ACM SIGGRAPH / Eurographics Conference on High-Performance Graphics, EGGH-HPG’12, pp. 33–37 (2012)Google Scholar
  14. 14.
    Leite, P., Teixeira, J.M., Farias, T., Reis, B., Teichrieb, V., Kelner, J.: Nearest neighbor searches on the GPU. Int. J. Parallel Program. 40(3), 313–330 (2012)CrossRefGoogle Scholar
  15. 15.
    Leite, P.J.S., Teixeira, J.M.X.N., de Farias, T.S.M.C., Teichrieb, V., Kelner, J.: Massively parallel nearest neighbor queries for dynamic point clouds on the GPU. In: 2009 21st International Symposium on Computer Architecture and High Performance Computing, pp. 19–25 (2009)Google Scholar
  16. 16.
    Merrill, D.G., Grimshaw, A.S.: Revisiting sorting for GPGPU stream architectures. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT ’10, pp. 545–546 (2010)Google Scholar
  17. 17.
    Qiu, D., May, S., Nüchter, A.: GPU-accelerated nearest neighbor search for 3D registration. In: Proceedings of Computer Vision Systems: 7th International Conference on Computer Vision Systems, ICVS 2009, pp. 194–203. Springer, Berlin (2009)Google Scholar
  18. 18.
    Radkowski, R.: Object tracking with a range camera for augmented reality assembly assistance. J. Comput. Inf. Sci. Eng. 16(1), 1–8 (2016)CrossRefGoogle Scholar
  19. 19.
    Radkowski, R., Garrett, T., Ingebrand, J., Wehr, D.: Trackingexpert—a versatile tracking toolbox for augmented reality. In: IDETC/CIE 2016, the ASME 2016 International Design Engineering Technical Conferences & Computers and Information in Engineering Conference, Charlotte, NC (2016)Google Scholar
  20. 20.
    Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: 2009 IEEE International Symposium on Parallel Distributed Processing, pp. 1–10 (2009)Google Scholar
  21. 21.
    Singh, D.P., Joshi, I., Choudhary, J.: Survey of GPU based sorting algorithms. Int. J. Parallel Program. (2017). CrossRefGoogle Scholar
  22. 22.
    Zhou, K., Hou, Q., Wang, R., Guo, B.: Real-time KD-tree construction on graphics hardware. ACM Trans. Graph. 27(5), 126:1–126:11 (2008)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Virtual Reality Applications CenterIowa State UniversityAmesUSA

Personalised recommendations