Abstract
In this paper, we study two hierarchical N-Body methods for Network-on-Chip (NoC) architectures. The modern Chip Multiprocessor (CMP) designs are mainly based on the shared-bus communication architecture. As the number of cores increases, it suffers from high communication delays. Therefore, NoC based architecture is proposed. The N-Body problem is a classical problem of approximating the motion of bodies. Two methods, namely Barnes-Hut (Barnes) and Fast Multipole (FMM), have been developed for fast simulation. The two algorithms have been implemented and studied in conventional computer systems and Graphics Processing Units (GPUs). However, as a promising unconventional multicore architecture, the evaluation of N-Body methods in a NoC platform has not been well addressed. We define a NoC model based on state-of-the-art systems. Evaluation results are presented using a cycle accurate full system simulator. Experiments show that, Barnes scales better (53.7x/Barnes and 36.6x/FMM for 64 processing elements) and requires less cache than FMM. However, we observe hot-spot traffic in Barnes. Our analysis and experiment results provide a guideline for studying N-Body methods in a NoC platform.
Keywords
- Graphic Processing Unit
- Memory Controller
- Cache Coherence
- Fast Multipole Method
- Cache Bank
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This work is supported by Academy of Finland and Nokia Foundation. The authors would like to thank the anonymous reviewers for their feedback and suggestions.
Download conference paper PDF
References
Dally, W.J., Towles, B.: Route packets, not wires: on-chip inteconnection networks. In: Proceedings of the 38th Conference on Design Automation, pp. 684–689 (June 2001)
Intel: Intel research areas on microarchitecture (May 2011), http://techresearch.intel.com/projecthome.aspx?ResearchAreaId=11
Tilera: Tile-gx processor family (May 2011), http://www.tilera.com/products/processors/TILE-Gx_Family
Aarseth, S.J., Henon, M., Wielen, R.: A comparison of numerical methods for the study of star cluster dynamics. Astronomy and Astrophysics 37, 183–187 (1974)
Perrone, L., Nicol, D.: Using n-body algorithms for interference computation in wireless cellular simulations. In: Proc. of 8th Int. Symp. on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pp. 49–56 (2000)
Salmon, J.: Parallel n log n n-body algorithms and applications to astrophysics. In: Compcon Spring 1991, Digest of Papers, February-1 March, pp. 73–78 (1991)
Barnes, J., Hut, P.: A hierarchical o(n log n) force-calculation algorithm. Nature (1988)
Greengard, L.F.: The rapid evaluation of potential fields in particle systems. PhD thesis, New Haven, CT, USA (1987) AAI8727216
Holt, C., Singh, J.P.: Hierarchical n-body methods on shared address space multiprocessors. In: Proc. of 7th SIAM Conf. on PPSC (1995)
Singh, J.P., Hennessy, J.L., Gupta, A.: Implications of hierarchical n-body methods for multiprocessor architectures. ACM Tran. Comp. Sys. 13, 141–202 (1995)
Nyland, L., Harris, M., Prins, J.: Fast N-Body Simulation with CUDA. In: Nguyen, H. (ed.) GPU Gems 3. Addison Wesley Professional (August 2007)
Jetley, P., Wesolowski, L., Gioachin, F., Kalé, L., Quinn, T.: Scaling hierarchical n-body simulations on gpu clusters. In: SC 2010, pp. 1–11 (November 2010)
Hamada, T., Nitadori, K.: 190 tflops astrophysical n-body simulation on a cluster of gpus. In: SC 2010, pp. 1–9 (November 2010)
Tremblay, M., Chaudhry, S.: A third-generation 65nm 16-core 32-thread plus 32-scout-thread cmt sparc processor. In: ISSCC 2008, pp. 82–83 (February 2008)
Thoziyoor, S., Muralimanohar, N., Ahn, J.H., Jouppi, N.P.: Cacti 5.1. Technical Report HPL-2008-20, HP Labs
Global, H.: Ddr 3 sdram memory controller ip core (May 2011), http://www.hitechglobal.com/IPCores/DDR3Controller.htm
Kim, C., Burger, D., Keckler, S.W.: An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. In: ACM SIGPLAN, pp. 211–222 (October 2002)
Patel, A., Ghose, K.: Energy-efficient mesi cache coherence with pro-active snoop filtering for multicore microprocessors. In: Proceeding of the Thirteenth International Symposium on Low Power Electronics and Design, pp. 247–252 (August 2008)
Magnusson, P., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., Werner, B.: Simics: A full system simulation platform. Computer 35(2), 50–58 (2002)
Dejonghe, H.: A completely analytical family of anisotropic Plummer models. Royal Astronomical Society, Monthly Notices 224, 13–39 (1987)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xu, T.C., Liljeberg, P., Tenhunen, H. (2012). Study of Hierarchical N-Body Methods for Network-on-Chip Architectures. In: Alexander, M., et al. Euro-Par 2011: Parallel Processing Workshops. Euro-Par 2011. Lecture Notes in Computer Science, vol 7156. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29740-3_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-29740-3_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29739-7
Online ISBN: 978-3-642-29740-3
eBook Packages: Computer ScienceComputer Science (R0)
