Efficient Neighbor Search for Particle Methods on GPUs

  • Patrick Diehl
  • Marc Alexander Schweitzer
Conference paper
Part of the Lecture Notes in Computational Science and Engineering book series (LNCSE, volume 100)


In this paper we present an efficient and general sorting-based approach for the neighbor search on GPUs. Finding neighbors of a particle is a common task in particle methods and has a significant impact on the overall computational effort–especially in dynamics simulations. We extend a space-filling curve algorithm presented in Connor and Kumar (IEEE Trans Vis Comput Graph, 2009) for its usage on GPUs with the parallel computing model Compute Unified Device Architecture (CUDA). To evaluate our implementation, we consider the respective execution time of our GPU search algorithm, for the most common assemblies of particles: a regular grid, uniformly distributed random points and cluster points in 2 and 3 dimensions. The measured computational time is compared with the theoretical time complexity of the extended algorithm and the computational time of its reference single-core implementation. The presented results show a speed up of factor of 4 comparing the GPU and CPU run times.


Neighbor search GPU Meshfree methods and particle methods 


  1. 1.
    S. Aluru, F.E. Sevilgen, Parallel domain decomposition and load balancing using space-filling curves, in Proceedings of the 4th IEEE Conference on High Performance Computing, Bangalore, 1997, pp. 230–235Google Scholar
  2. 2.
    S. Arya, D.M. Mount, N.S. Netanyahu, R. Silverman, A.Y. Wu, An optimal algortihm for approximate nearest neighbor searching in fixed dimensions, in Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, Arlington, 1994, vol. 5, pp. 573–582Google Scholar
  3. 3.
    M. Bader, Space-Filling Curves – An Introduction with Applications in Scientific Computing (Springer, Berlin/Heidelberg, 2013)zbMATHGoogle Scholar
  4. 4.
    C. Böhm, S. Berchtold, A.D. Keim, Searching in high-dimensional spaces: index strucutres for improving the performance of multimedia databases. ACM Comput. Surv. 33, 322–373 (2001)CrossRefGoogle Scholar
  5. 5.
    T.M. Chan, A minimalist’s implementation of an approximate nearest neighbor algorithm in fixed dimensions,, May 2006
  6. 6.
  7. 7.
    M. Connor, P. Kumar, Fast construction of k-nearest neighbor graphs for point clouds. IEEE Trans. Vis. Comput. Graph. 14(4), 599–608 (2009)Google Scholar
  8. 8.
    A. Dashti, I. Komarov, R.M. D’Souza, Efficient computation of k-nearest neighbour graphs for large high-dimensional data sets on GPU clusters. PLoS ONE 8, e74113 (2013),
  9. 9.
    V. Garcia, E. Debreuve, M. Barlaud, kNN CUDA,
  10. 10.
    R.A. Gingold, J.J. Monaghan, Smoothed particle hydrodynamics: theory and application to non-spherical stars. Mon. Not. R. Astron. Soc. 181, 375–389 (1977)CrossRefzbMATHGoogle Scholar
  11. 11.
    M. Griebel, S. Knapek, G. Zumbusch, Numerical Simulation in Molecular Dynamics (Springer, Berlin/Heidelberg, 2007)zbMATHGoogle Scholar
  12. 12.
    P. Leite, J.M. Teixeira, T. Farias, B. Reis, V. Teichrieb, J. Kelner, Nearest neighbor searches on the gpu. Int. J. Parallel Program. 40(3), 313–330 (2012) (English)CrossRefGoogle Scholar
  13. 13.
    J. Mellor-Crummey, D. Whalley, K. Kennedy, Improving memory hierarchy performance fir irregular applications using data and computation reorderings. Int. J. Parallel Program. 29, 217–247 (2001)CrossRefzbMATHGoogle Scholar
  14. 14.
    D.M. Mount, S. Arya, ANN: a library for approximate nearest neighbor searching,
  15. 15.
    S.A. Nene, S.K Nayar, A simple algorithm for nearest neighbor search in high dimensions. IEEE Trans. Pattern Anal. Mach. Intell. 19, 989–1003 (1997)Google Scholar
  16. 16.
    M.L. Parks, R.B. Lehoucq, S.J. Plimpton, S.A. Silling, Implementing peridynamics within a molecular dynamics code. Comput. Phys. Commun. (EL, ed.) 179, 777–783 (2008)Google Scholar
  17. 17.
    S. Plimpton, Fast parallel algorithms for short-range molecular dynamics. J. Comput. Phys. 117, 1–19 (1995)CrossRefzbMATHGoogle Scholar
  18. 18.
    N. Satish, M. Harris, M. Garland, Designing efficient sorting algorithms for manycore GPUs, in IEEE International Symposium in Parallel & Distributed Processing, Rome, 2009, pp. 1–10Google Scholar
  19. 19.
    M.A. Schweitzer, A Parallel Multilevel Partition of Unity Method for Elliptic Partial Differential Equations. Lecture Notes in Computational Science and Engineering, vol. 29 (Springer, New York, 2003)Google Scholar
  20. 20.
    Y.D. Sergeyev, R.G. Strongin, D. Lera, Introduction to Global Optimization Exploiting Space-Filling Curves (Springer, New York/Heidelberg, 2013)CrossRefzbMATHGoogle Scholar
  21. 21.
    S.A. Silling, Reformulation of elasticity theory for discontinuties and long-range forces. Sandia report SAND98-2176, Sandia National Laboratories, 1998Google Scholar
  22. 22.
    S.A. Silling, E. Askari, A meshfree method based on the peridynamic model of solid mechanics. Comput. Struct. 83, 1526–1535 (2005)CrossRefGoogle Scholar
  23. 23.
    E. Sintorn, U. Assarsson, Fast parallel GPU-sorting using a hybrid algorithm. J. Parallel Distrib. Comput. 68, 1381–1388 (2008)CrossRefzbMATHGoogle Scholar
  24. 24.
    H. Tropf, H. Herzog, Multidimensional range search in dynamically balanced trees. Angew. Inform. (Appl. Inform.) 2, 71–77 (1981). Vieweg VerlagGoogle Scholar
  25. 25.
    M.S. Warren, J.K. Salmon, A parallel hashed oct-tree n-body algorithm, in Proceedings of the 1993 ACM/IEEE Conference on Supercomputing (Supercomputing’93), Portland (ACM, New York, 1993), pp. 12–21Google Scholar
  26. 26.
    W. Wen-mei, GPU Computing Gems Emerald Edition Applications of GPU Computing Series, 1st edn. (Morgan Kaufmann, Burlington, Massachusetts 2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Institute for Numerical SimulationBonnGermany

Personalised recommendations