Hardware and Software Tradeoffs for Task Synchronization on Manycore Architectures

  • Yonghong Yan
  • Sanjay Chatterjee
  • Daniel A. Orozco
  • Elkin Garcia
  • Zoran Budimlić
  • Jun Shirako
  • Robert S. Pavel
  • Guang R. Gao
  • Vivek Sarkar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6853)

Abstract

Manycore architectures – hundreds to thousands of cores per processor – are seen by many as a natural evolution of multicore processors. To take advantage of this massive parallelism in practice requires a productive parallel programming model, and an efficient runtime for the scheduling and coordination of concurrent tasks. A critical prerequisite for an efficient runtime is a scalable synchronization mechanism to support task coordination at different levels of granularity.

This paper describes the implementation of a high-level synchronization construct called phasers on the IBM Cyclops64 manycore processor, and compares phasers to lower-level synchronization primitives currently available to Cyclops64 programmers. Phasers support synchronization of dynamic tasks by allowing tasks to register and deregister with a phaser object. It provides a general unification of point-to-point and collective synchronizations with easy-to-use interfaces, thereby offering productivity advantages over hardware primitives when used on manycores. We have experimented with several approaches to phaser implementation using software, hardware and a combination of both to explore their portability and performance. The results show that a highly-optimized phaser implementation delivered comparable performance to that obtained with lower-level synchronization primitives. We also demonstrate the success of the hardware optimizations proposed for phasers.

Keywords

Phaser Implementation Hardware Support Parent Task Task Parallelism Task Synchronization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. In: Proceedings of the 20th Annual ACM SIGPLAN Conference on OOPSLA, pp. 519–538. ACM, New York (2005)Google Scholar
  2. 2.
    Chapel Programming Language, http://chapel.cray.com/
  3. 3.
    Shirako, J., Peixotto, D.M., Sarkar, V., Scherer, W.N.: Phasers: a unified deadlock-free construct for collective and point-to-point synchronization. In: Proceedings of the 22nd ICS, New York, NY, USA, pp. 277–288 (2008)Google Scholar
  4. 4.
    Guo, Y., Barik, R., Raman, R., Sarkar, V.: Work-First and Help-First Scheduling Policies for Async-Finish Task Parallelism. In: IPDPS 2009 (2009)Google Scholar
  5. 5.
    Cuvillo, J.d., Zhu, W., Hu, Z., Gao, G.R.: TiNy Threads: A Thread Virtual Machine for the Cyclops64 Cellular Architecture. In: IPDPS 2005, 265.2 (2005)Google Scholar
  6. 6.
    Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the cilk-5 multithreaded language. In: Proceedings of the ACM SIGPLAN Conference on PLDI. Ser. PLDI 1998, pp. 212–223. ACM Press, New York (1998)Google Scholar
  7. 7.
    Sarkar, V.: Synchronization using counting semaphores. In: Proceedings of the 2nd International Conference on Supercomputing, pp. 627–637. ACM, New York (1988)Google Scholar
  8. 8.
    Shirako, J., Sarkar, V.: Hierarchical phasers for scalable synchronization and reductions in dynamic parallelism. In: IPDPS 2010 (2010)Google Scholar
  9. 9.
    Wentzlaff, D., et al.: On-chip interconnection architecture of the tile processor. IEEE Micro. 27(5), 15–31 (2007)CrossRefGoogle Scholar
  10. 10.
    ROSE compiler framework, http://www.rosecompiler.org
  11. 11.
    Taflove, A., Hagness, S.: Computational Electrodynamics: The Finite-Difference Time-Domain Method, 3rd edn. Artech House Publishers, Boston (2005)MATHGoogle Scholar
  12. 12.
    Orozco, D., Gao, G.: Diamond tiling: A tiling framework for time-iterated scientific applications. In: CAPSL Technical Memo 091 (December 2009)Google Scholar
  13. 13.
    Goetz, B.: Java Concurrency In Practice. Addison-Wesley, Reading (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Yonghong Yan
    • 1
  • Sanjay Chatterjee
    • 1
  • Daniel A. Orozco
    • 2
  • Elkin Garcia
    • 2
  • Zoran Budimlić
    • 1
  • Jun Shirako
    • 1
  • Robert S. Pavel
    • 2
  • Guang R. Gao
    • 2
  • Vivek Sarkar
    • 1
  1. 1.Department of Computer ScienceRice UniversityUSA
  2. 2.Department of Electrical EngineeringUniversity of DelawareUSA

Personalised recommendations