Skip to main content
Log in

Particle simulation on the Cell BE architecture

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

This paper presents two parallel formulations for the Barnes-Hut algorithm on the Cell architecture, which differ in tree distribution and construction phases of the algorithm. In the initial parallelization, the domains are dynamically partitioned and assigned to the synergistic processing elements (SPEs), and SPEs construct local trees of the sub-domains in parallel. The enhanced parallelization scheme provides better clustering of the particles by sequentially constructing the global tree of the entire work space in the power processing element (PPE) and by partitioning the tree into sub-trees that can fit in the Local Store. SPEs operate on the sub-tree data and construct local trees in parallel. Our experimental evaluation indicates that this application performs much faster on the Cell BE compared to the Intel Xeon based system. Specifically, our first and second methods on the Cell BE outperform Intel Xeon by a factor of 5.8 and 7.1 for 8192 particles, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Kahle, J., Day, M., Hofstee, H., Johns, C., Maeurer, T., Shippy, D.: Introduction to the Cell multiprocessor. IBM J. Res. Dev. 49(4–5), 589–604 (2005)

    Article  Google Scholar 

  2. Kongetira, P., Aingaran, K., Olukotun, K.: Niagara: a 32-way multithreaded SPARC processor. IEEE Microw. Mag. 25(2), 21–29 (2005)

    Article  Google Scholar 

  3. Olukotun, K., Hammond, L.: The future of microprocessors. ACM Queue 3(7), 26–29 (2005)

    Article  Google Scholar 

  4. Kumar, R., Tullsen, D., Jouppi, N.: Heterogeneous chip multiprocessors. IEEE Comput. Soc. 38(11), 32–38 (2005)

    Article  Google Scholar 

  5. Taylor, M.B., Kim, J., Miller, J., Wentzlaff, D., Ghodrat, F., Greenwald, B., Hoffmann, H., Johnson, P., Lee, J.-W., Lee, W., Ma, A., Saraf, A., Seneski, M., Shnidman, N., Strumpen, V., Frank, M., Amarasinghe, S., Agarwal, A.: The RAW microprocessor: a computational fabric for software circuits and general purpose programs. IEEE MICRO 22(2), 25–35 (2002)

    Article  Google Scholar 

  6. Sankaralingam, K., Nagarajan, R., Liu, H., Kim, C., Huh, J., Burger, D., Keckler, S.W., Moore, C.R.: Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In: Proceedings of the 30th Annual International Symposium on Computer Architecture, pp. 442–433 (2003)

    Google Scholar 

  7. Swanson, S., Michelson, K., Schwerin, A., Oskin, M.: Wavescalar. In: Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, pp. 291–302 (2003)

    Google Scholar 

  8. Mai, K., Paaske, T., Jayasena, N., Ho, R., Dally, W.J., Horowitz, M.: Smart memories: a modular reconfigurable architecture. In: Proceedings of the Annual International Symposium on Computer Architecture, New York, NY, USA, pp. 161–171 (2000)

    Google Scholar 

  9. Hofstee, H.P.: Power efficient processor architecture and the Cell processor. In: Proceedings of the International Symposium on High Performance Computer Architecture, pp. 258–262 (2005)

    Google Scholar 

  10. Pham, D., Asano, S., Bolliger, M., Day, M.N., Hofstee, H.P., Johns, C., Kahle, J., Kameyama, A., Keaty, J., Masubuchi, Y., Riley, M., Shippy, D., Stasiak, D., Suzuoki, M., Wang, M., Warnock, J., Weitzel, S., Wendel, D., Yamazaki, T., Yazawa, K.: The design and implementation of a first-generation Cell processor. In: International Solid State Circuits Conference, San Fransisco (2005)

    Google Scholar 

  11. Asanovic, K., Bodik, R., Catanzaro, B., Gebis, J., Husbands, P., Keutzer, K., Patterson, D., Plishker, W., Shalf, J., Williams, S., Yelik, K.: The landscape of parallel computing research: a view from Berkley. Technical Report, EECS Department, University of California at Berkley, UCB/EECS-2006-183 (2006)

  12. Barnes, J., Hut, P.: A hierarchical O(nlog n) force calculation algorithm. Nature 324, 446–449 (1986)

    Article  Google Scholar 

  13. Grama, A., Kumar, V., Sameh, A.: Scalable parallel formulations of the Barnes-Hut method for N-Body simulations. In: Proceedings of Supercomputing, pp. 439–448 (1994)

    Google Scholar 

  14. Greengard, L.: The Rapid Evolution of Potential Fields in Particle Systems. The MIT Press, Cambridge (1988)

    Google Scholar 

  15. Ramachandran, U., et al.: Architectural mechanisms for explicit communication in shared memory multiprocessors. In: Proceedings of Supercomputing (1995)

    Google Scholar 

  16. Warren, M., Salmon, J.: A parallel hashed oct tree n-body algorithm. In: Proceedings of Supercomputing Conference (1993)

    Google Scholar 

  17. Singh, J., Holt, C., Totsuka, T., Gupta, A., Hennessy, J.: Load balancing and data locality in hierarchical n-body methods. J. Parallel Distrib. Comput. (1994)

  18. Bader, D.A., Agarwal, V., Madduri, K., Kang, S.: High performance combinatorial algorithm design on the Cell broadband engine processor. Parallel Comput. 33(10–11), 720–740 (2007)

    Article  Google Scholar 

  19. Kistler, M., Perrone, M., Petrini, F.: Cell multiprocessor communication network: built for speed. Communication build for speed. IEEE MICRO 26(3), 10–23 (2006)

    Article  Google Scholar 

  20. Williams, S., Shalf, J., Oliker, L., Kamil, S., Husbands, P., Yelick, K.: The potential of the cell processor for scientific computing. In: Proceedings of the 3rd Conference on Computing Frontiers, pp. 9–20 (2006)

    Chapter  Google Scholar 

  21. Bader, D.A., Agarwal, V.: FFTC: fastest Fourier transform for the IBM Cell broadband. In: Engine Eleventh Annual High Performance Embedded Computing Workshop (HPEC), Lexington, MA (2007)

    Google Scholar 

  22. Chow, A.C., Fossum, G.C., Brokenshire, D.A.: A programming example: large FFT on the Cell broadband engine. In: Tec. Conf. Proc. of the Global Signal Processing Expo (GSPx) (2005)

    Google Scholar 

  23. Cico, L., Cooper, R., Greene, J.: Performance and programability of the IBM/Sony/Toshiba Cell broadband engine processor. White paper (2006)

  24. Bader, D.A., Agarwal, V., Madduri, K.: On design and analysis of irregular algorithms on the Cell processor: a case study of list ranking. In: The IEEE International Parallel and Distributed Processing Symposium, Long Beach, CA (2007)

    Google Scholar 

  25. Villa, O., Scarpazza, D.P., Petrini, F., Peinador, J.F.: Challenges in mapping graph exploration algorithms on advanced multi-Core processors. In: Parallel and Distributed Processing Symposium, Long Beach, CA (2007)

    Google Scholar 

  26. Sachdeva, V., Kistler, M., Speight, E., Tzeng, T.H.K.: Exploring the viability of the Cell broadband engine for bioinformatics applications. In: Parallel and Distributed Processing Symposium, Long Beach, CA (2007)

    Google Scholar 

  27. Brokenshire, D.A.: Maximizing the power of the cell broadband engine processor: 25 tips to optimal application performance. In: IBM Developer Works (2006)

    Google Scholar 

  28. Demiroz, B., Topcuoglu, H., Kandemir, M., Tosun, O.: Parallelizing Barnes-Hut method on the Cell BE architecture. In: HIPEAC 2010, Third Workshop on Programmability Issues for Multicore Computers (MULTIPROG’10), Pisa, Italy (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haluk R. Topcuoglu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Demiroz, B., Topcuoglu, H.R., Kandemir, M. et al. Particle simulation on the Cell BE architecture. Cluster Comput 14, 419–432 (2011). https://doi.org/10.1007/s10586-011-0169-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-011-0169-4

Keywords

Navigation