Parallel kd-Tree Construction on the GPU with an Adaptive Split and Sort Strategy

Wehr, David; Radkowski, Rafael

doi:10.1007/s10766-018-0571-0

Parallel kd-Tree Construction on the GPU with an Adaptive Split and Sort Strategy

Published: 16 April 2018

Volume 46, pages 1139–1156, (2018)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

David Wehr¹ &
Rafael Radkowski¹

993 Accesses
19 Citations
3 Altmetric
Explore all metrics

Abstract

We introduce a parallel kd-tree construction method for 3-dimensional points on a GPU which employs a sorting algorithm that maintains high parallelism throughout construction. Typically, large arrays in the upper levels of a kd-tree do not yield high performance when computing each node in one thread. Conversely, small arrays in the lower levels of the tree do not benefit from typical parallel sorts. To address these issues, the proposed sorting approach uses a modified parallel sort on the upper levels before switching to basic parallelization on the lower levels. Our work focuses on 3D point registration and our results indicate that a speed gain by a factor of 100 can be achieved in comparison to a naive parallel algorithm for a typical scene.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

https://nvlabs.github.io/cub.

References

Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45(6), 891–923 (1998)
Article MathSciNet Google Scholar
Atkinson, M.D., Sack, J.R., Santoro, N., Strothotte, T.: Min-max heaps and generalized priority queues. Commun. ACM 29(10), 996–1000 (1986)
Article Google Scholar
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Article Google Scholar
Bentley, J.L.: Multidimensional divide-and-conquer. Commun. ACM 23(4), 214–229 (1980)
Article MathSciNet Google Scholar
Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3(3), 209–226 (1977)
Article Google Scholar
Garcia, V., Debreuve, E., Barlaud, M.: Fast k nearest neighbor search using GPU. In: 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2008)
Garrett, T., Radkowski, R., Sheaffer, J.: Gpu-accelerated descriptor extraction process for 3d registration in augmented reality. In: 23rd International Conference on Pattern Recognition, Cancun, Mexico (2016)
Ha, L., Kruger, L., Silva, C.: Fast four-way parallel radix sorting on GPUs. Comput. Graph. Forum 28(8), 2368–2378 (2009)
Article Google Scholar
Harada, T., Howes, L.: Introduction to GPU radix sort. In: Heterogeneous Computing with OpenCL. Morgan Kaufman (2011)
Harris, M.: Maxwell: The Most Advanced CUDA GPU Ever Made. NVIDIA, Santa Clara (2014)
Google Scholar
Havran, V.: Heuristic ray shooting algorithms. Ph.D. thesis, Czech Technical University, Czech Technical University (2001)
Hu, L., Nooshabadi, S., Ahmadi, M.: Massively parallel KD-tree construction and nearest neighbor search algorithms. In: 2015 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 2752–2755 (2015)
Karras, T.: Maximizing parallelism in the construction of BVHs, Octrees, and k-d trees. In: Proceedings of the Fourth ACM SIGGRAPH / Eurographics Conference on High-Performance Graphics, EGGH-HPG’12, pp. 33–37 (2012)
Leite, P., Teixeira, J.M., Farias, T., Reis, B., Teichrieb, V., Kelner, J.: Nearest neighbor searches on the GPU. Int. J. Parallel Program. 40(3), 313–330 (2012)
Article Google Scholar
Leite, P.J.S., Teixeira, J.M.X.N., de Farias, T.S.M.C., Teichrieb, V., Kelner, J.: Massively parallel nearest neighbor queries for dynamic point clouds on the GPU. In: 2009 21st International Symposium on Computer Architecture and High Performance Computing, pp. 19–25 (2009)
Merrill, D.G., Grimshaw, A.S.: Revisiting sorting for GPGPU stream architectures. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, PACT ’10, pp. 545–546 (2010)
Qiu, D., May, S., Nüchter, A.: GPU-accelerated nearest neighbor search for 3D registration. In: Proceedings of Computer Vision Systems: 7th International Conference on Computer Vision Systems, ICVS 2009, pp. 194–203. Springer, Berlin (2009)
Google Scholar
Radkowski, R.: Object tracking with a range camera for augmented reality assembly assistance. J. Comput. Inf. Sci. Eng. 16(1), 1–8 (2016)
Article Google Scholar
Radkowski, R., Garrett, T., Ingebrand, J., Wehr, D.: Trackingexpert—a versatile tracking toolbox for augmented reality. In: IDETC/CIE 2016, the ASME 2016 International Design Engineering Technical Conferences & Computers and Information in Engineering Conference, Charlotte, NC (2016)
Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: 2009 IEEE International Symposium on Parallel Distributed Processing, pp. 1–10 (2009)
Singh, D.P., Joshi, I., Choudhary, J.: Survey of GPU based sorting algorithms. Int. J. Parallel Program. (2017). https://doi.org/10.1007/s10766-017-0502-5
Article Google Scholar
Zhou, K., Hou, Q., Wang, R., Guo, B.: Real-time KD-tree construction on graphics hardware. ACM Trans. Graph. 27(5), 126:1–126:11 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Virtual Reality Applications Center, Iowa State University, Ames, IA, 50011, USA
David Wehr & Rafael Radkowski

Authors

David Wehr
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Radkowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rafael Radkowski.

Appendix

1.1 Median Splitting Determination

I
To determine which chunk a point belongs to, we use the technique described in [7] to compute a median. In brief, we consider the width w of a chunk to be a real number $w=\frac{N}{2^l}$, where l is the zero-indexed tree level. Therefore, given a particular index i, we can determine the chunk by $c = \frac{i}{w}$.
II
Determining whether a point is a median splitting element is also necessary during the histogram calculation. That can be determined with the following criteria, as per [7].
$$\begin{aligned} splitting = {\left\{ \begin{array}{ll} {\left\lceil \frac{i}{w} \right\rceil < \frac{i+1}{w} \wedge i \ne 0}; &{} true \\ \mathrm {else}; &{} false \end{array}\right. } \end{aligned}$$
III
Finally, we need the ability to calculate the starting index of a chunk, excluding the splitting element. Because the chunk width is constant, the starting index $i_s$ of a chunk c can be computed by:
$$\begin{aligned} i_s = {\left\{ \begin{array}{ll} c = 0; &{} 0 \\ i \ne 0; &{} \left\lfloor (w \cdot c) \right\rfloor + 1 \end{array}\right. } \end{aligned}$$

1.2 Experimental Data

See Tables 2 and 3.

Table 2 Quantitative results for all construction operations

Full size table

Table 3 Quantitative results for all nearest-neighbor queries

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wehr, D., Radkowski, R. Parallel kd-Tree Construction on the GPU with an Adaptive Split and Sort Strategy. Int J Parallel Prog 46, 1139–1156 (2018). https://doi.org/10.1007/s10766-018-0571-0

Download citation

Received: 21 April 2017
Accepted: 09 April 2018
Published: 16 April 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s10766-018-0571-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel kd-Tree Construction on the GPU with an Adaptive Split and Sort Strategy

Abstract

Access this article

Similar content being viewed by others

Clustering graph data: the roadmap to spectral techniques

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

Fast Global Registration

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Median Splitting Determination

1.2 Experimental Data

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parallel kd-Tree Construction on the GPU with an Adaptive Split and Sort Strategy

Abstract

Access this article

Similar content being viewed by others

Clustering graph data: the roadmap to spectral techniques

A Practical Fixed-Parameter Algorithm for Constructing Tree-Child Networks from Multiple Binary Trees

Fast Global Registration

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Median Splitting Determination

1.2 Experimental Data

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation