A Hybrid Parallel Barnes-Hut Algorithm for GPU and Multicore Architectures

  • Hannes Hannak
  • Hendrik Hochstetter
  • Wolfgang Blochinger
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8097)


Using the Barnes-Hut algorithm as an example we deal with the design of parallel algorithms that are able to exploit multicore CPUs and GPUs conjointly. Specifically, we demonstrate how to modularize a parallel application according to specific aspects of parallel execution. This allows for a flexible assignment of individual modules to the two parallel architectures based on their actual performance characteristics. Furthermore, we discuss a hybrid module for the most time consuming part of the algorithm that utilizes CPU and GPU simultaneously employing a novel load balancing heuristic. Our experimental evaluation shows that our method greatly increases overall efficiency by allowing to deploy the optimal configuration of modules for each individual computer system.


Tree Building Force Evaluation Desktop Grid Tree Data Structure Tree Traversal 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anderson, D.P.: Boinc: A system for public-resource computing and storage. In: 5th IEEE/ACM International Workshop on Grid Computing, pp. 4–10 (2004)Google Scholar
  2. 2.
    Barnes, J.E., Hut, P.: A hierarchical O(N log N) force-calculation algorithm. Nature 324(6096), 446–449 (1986)CrossRefGoogle Scholar
  3. 3.
    Barnes, J.E.: A modified tree code: Don’t laugh; it runs. Journal of Computational Physics 87(1), 161–170 (1990)MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    Bédorf, J., Gaburov, E., Zwart, S.P.: A sparse octree gravitational n-body code that runs entirely on the GPU processor. Journal of Computational Physics 231(7), 2825–2839 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Billings, J.J.: Starscream – An open source galaxy modeling and simulation tool, (accessed in February 2013)
  6. 6.
    Blochinger, W., Dangelmayr, C., Schulz, S.: Aspect-oriented parallel discrete optimization on the cohesion desktop grid platform. In: Proc. of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), pp. 49–56 (2006)Google Scholar
  7. 7.
    Burtscher, M., Pingali, K.: An efficient CUDA implementation of the tree-based Barnes Hut n-body algorithm. In: Hwu, W.-M.W. (ed.) GPU Computing Gems Emerald Edition, ch. 6, pp. 75–92. Morgan Kaufmann Publishers Inc. (2011)Google Scholar
  8. 8.
    Gaburov, E., Bédorf, J., Zwart, S.P.: Gravitational tree-code on graphics processing units: Implementation in CUDA. Procedia CS 1(1), 1119–1127 (2010)Google Scholar
  9. 9.
    Hannak, H., Blochinger, W., Trieflinger, S.: A desktop grid enabled parallel Barnes-hut algorithm. In: Proceedings of the 31st IEEE International Performance Computing and Communications Conference (IPCCC 2012), pp. 120–129 (2012)Google Scholar
  10. 10.
    Heien, E., Kondo, D., Anderson, D.: A correlated resource model of internet end hosts. IEEE Transactions on Parallel and Distributed Systems 23(6), 977–984 (2012)CrossRefGoogle Scholar
  11. 11.
    Jetley, P., Wesolowski, L., Gioachin, F., Kalé, L.V., Quinn, T.R.: Scaling hierarchical n-body simulations on GPU clusters. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2010)Google Scholar
  12. 12.
    Keckeisen, M., Blochinger, W.: Parallel implicit integration for cloth animations on distributed memory architectures. In: Proc. of Eurographics Symposium on Parallel Graphics and Visualization 2004, pp. 119–126 (2004)Google Scholar
  13. 13.
    Makino, J.: Vectorization of a treecode. Journal of Computational Physics 87(1), 148–160 (1990)zbMATHCrossRefGoogle Scholar
  14. 14.
    Meißner, M., Hüttner, T., Blochinger, W., Weber, A.: Parallel direct volume rendering on PC networks. In: Arabnia, H.R. (ed.) Proc. of the Intl. Conf. on Parallel and Distributed Processing Techniques and Applications, PDPTA 1998. CSREA Press (1998)Google Scholar
  15. 15.
    Nakasato, N.: Implementation of a parallel tree method on a GPU. Journal of Computational Science 3(3), 132–141 (2012)CrossRefGoogle Scholar
  16. 16.
    Schulz, S., Blochinger, W., Held, M., Dangelmayr, C.: Cohesion - a microkernel based desktop grid platform for irregular task-parallel applications. Future Generation Computer Systems - The International Journal of Grid Computing: Theory, Methods and Applications 24(5), 354–370 (2008)CrossRefGoogle Scholar
  17. 17.
    Singh, J.P., Holt, C., Totsuka, T., Gupta, A., Hennessy, J.: Load balancing and data locality in adaptive hierarchical n-body methods: Barnes-hut, fast multipole, and radiosity. Journal of Parallel and Distributed Computing 27(2), 118–141 (1995)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Hannes Hannak
    • 1
  • Hendrik Hochstetter
    • 2
  • Wolfgang Blochinger
    • 1
  1. 1.Institute of Parallel and Distributed SystemsUniversity of StuttgartGermany
  2. 2.Computer Graphics and Multimedia Systems GroupUniversity of SiegenGermany

Personalised recommendations