Advertisement

Evaluation of OpenMP for the Cyclops Multithreaded Architecture

  • George Almasi
  • Eduard Ayguadé
  • Călin Caşcaval
  • José Castaños
  • Jesús Labarta
  • Francisco Martínez
  • Xavier Martorell
  • José Moreira
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2716)

Abstract

Multithreaded architectures have the potential of tolerating large memory and functional unit latencies and increase resource utilization. The Blue Gene/Cyclops architecture, being developed at the IBM T. J. Watson Research Center, is one such systems that offers massive intra-chip parallelism. Although the BG/C architecture was initially designed to execute specific applications, we believe that it can be effectively used on a broad range of parallel numerical applications. Programming such applications for this unconventional design requires a significant porting effort when using the basic built-in mechanisms for thread management and synchronization. In this paper, we describe the implementation of an OpenMP environment for parallelizing applications, currently under development at the CEPBA-IBM Research Institute, targeting BG/C. The environment is evaluated with a set of simple numerical kernels and a subset of the NAS OpenMP benchmarks. We identify issues that were not initially considered in the design of the BG/C architecture to support a programming model such as OpenMP. We also evaluate features currently offered by the BG/C architecture that should be considered in the implementation of an efficient OpenMP layer for massive intra-chip parallel architectures.

Keywords

Data Cache Thread Creation Hardware Thread Global Queue Software Thread 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Anant Agarwal. Raw computation. Scientific American, August 1999.Google Scholar
  2. 2.
    George Almási, Călin Caşcaval, José G. Castaõs, Monty Denneau, Derek Lieber, José E. Moreira, and Jr. Henry S. Warren. Dissecting Cyclops: A detailed analysis of a multithreaded architecture. In MEDEA Workshop on On-Chip Multiprocessor: Processor Architecture and Memory Hierarchy related Issues, September 2002.Google Scholar
  3. 3.
    D. Bailey, T. Harris, W. Saphir, R. van der Wijngaart, A. Woo, and Maurice Yarrow. The NAS parallel benchmarks 2.0. Technical Report Technical Report NAS-95-020, NASA Ames Research Center, December 1995.Google Scholar
  4. 4.
    L. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A scalable architecture based on single-chip multiprocessing. In 27th Annual International Symposium on Computer Architecture, pages 282–293, June 2000.Google Scholar
  5. 5.
    Călin Caşcaval, José Castaõs, Luis Ceze, Monty Denneau, Manish Gupta, Derek Lieber, José E. Moreira, Karin Strauss, and Henry S. Warren, Jr. Evaluation of a multithreaded architecture for cellular computing. In Proceedings of the 8th International Symposium of High Performance Computer Architecture, February 2002.Google Scholar
  6. 6.
    Intel Corporation. Intel hyperthreading technology. http://www.intel.com/info/hyperthreading. 2003.
  7. 7.
    D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, S. Fineberg, R. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishnan, and S. Weeratunga. The NAS parallel benchmarks. Technical Report Technical Report RNR-94-007, NASA Ames Research Center, March 1994.Google Scholar
  8. 8.
    Susan Eggers, Joel Emer, Henry Levy, Jack Lo, Rebecca Stamm, and Dean Tullsen. Simultaneous multithreading: A platform for next-generation processors. IEEE Micro, pages 12–18, September/October 1997.Google Scholar
  9. 9.
    Frances Allen et al. Blue gene: A vision for protein science using a petaflop supercomputer. IBM Systems Journal, 40(2):310–328, 2001.CrossRefGoogle Scholar
  10. 10.
    M. Gonzalez, E. Ayguadé, X. Martorell, J. Labarta, N. Navarro, and J. Oliver. NanosCompiler: Supporting flexible multilevel parallelism in OpenMP. Concurrency: Practice and Experience, 12(9), August 2000.Google Scholar
  11. 11.
    M. W. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCross, J. Brockman, W. Athas, A. Srivasava, V. Freech, J. Shin, and J. Park. Mapping irregular applications to DIVA, a PIM-based data-intensive architecture. In Proceedings of SC99, November 1999.Google Scholar
  12. 12.
    H. Jin, M. Frumkin, and J. Yan. The OpenMP implementation of the NAS parallel benchmarks and its performance. Technical Report Technical Report NAS-99-011, NASA Ames Research Center, October 1999.Google Scholar
  13. 13.
    Yi Kang, Michael Huang, Seung-Moon Yoo, Zhenzho Ge, Diana Keen, Vinh Lam, Prattap Pattnaik, and Josep Torrellas. FlexRAM: Toward an advanced intelligent memory system. In International Conference on Computer Design (ICCD), October 1999.Google Scholar
  14. 14.
    P. Kogge, S. Bass, J. Brockman, D. Chen, and E. Sha. Pursuing a petaflop: Point designs for 100 TF computers using PIM technologies. In Frontiers of Massively Parallel Computation Symposium, 1996.Google Scholar
  15. 15.
    Peter M. Kogge. The EXECUBE approach to massively parallel processing. In Intl. Conf. on Parallel Processing, August 1994.Google Scholar
  16. 16.
    Jack L. Lo, Susan J. Eggers, Henry M. Levy, Sujay S. Parekh, and Dean M. Tullsen. Tuning compiler optimizations for simultaneous multithreading. In International Symposium on Microarchitecture, pages 114–124, 1997.Google Scholar
  17. 17.
    H. Lu, Y. C. Hu, and W. Zwaenepoel. OpenMP on network of workstations. In Proc. of Supercomputing’98, 1998.Google Scholar
  18. 18.
    X. Martorell, E. Ayguadé, J.I. Navarro, J. Corbalán, M. González, and J. Labarta. Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors. In Proceedings of the 13th Int. Conference on Supercomputing ICS’99, June 1999.Google Scholar
  19. 19.
    X. Martorell, J. Labarta, J.I. Navarro, and E. Ayguadé. A library implementation of the nano-threads programming model. In Proceedings of Euro-Par’96, August 1996.Google Scholar
  20. 20.
    OpenMP Organization. OpenMP Fortran application interface, v. 2.0. http://www.openmp.org, June 2000.
  21. 21.
    Mark Oskin, Frederic T. Chong, and Timothy Sherwood. Active Pages: A computation model for intelligent memory. In International Symposium on Computer Architecture, pages 192–203, 1998.Google Scholar
  22. 22.
    David Patterson, Thomas Anderson, Neal Cardwell, Richard Fromm, Kimberly Keeton, Christoforos Kozyrakis, Randi Thomas, and Katherine Yelick. A case for intelligent RAM: IRAM. In Proceedings of IEEE Micro, April 1997.Google Scholar
  23. 23.
    Constantine D. Polychronopoulos, Milind B. Girkar, Mohammed Resa Haghighat, Chia Ling Lee, Bruce P. Leung, and Dale A. Schouten. Parafrase-2: An environment for parallelizing, partitioning, synchronizing and scheduling programs on multiprocessors. In 1989 International Conference on Parallel Processing, volume II, pages 39–48, St. Charles, Ill., 1989.Google Scholar
  24. 24.
    William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical recipes in C. In Cambridge University Press, 1992.Google Scholar
  25. 25.
    S. Rixner, W.J. Dally, U.J. Kapasi, B. Khailany, A. Lopez-Lagunas, P.R. Mattson, and J.D. Owens. A bandwidth-efficient architecture for media processing. In 31st International Symposium on Microarchitecture, November 1998.Google Scholar
  26. 26.
    M. Sato, S. Satoh, K. Kusano, and Y. Tanaka. Design of OpenMP compiler for an smp cluster, 1999.Google Scholar
  27. 27.
    Scientific Computing Associates, Inc. PCGPACK user’s guide.Google Scholar
  28. 28.
    A. Snavely, L. Carter, J. Boisseau, A. Majumdar, K. S. Gatlin, N. Mitchel, J. Feo, and B. Koblenz. Multiprocessor performance on the Tera MTA. In Proceedings Supercomputing’ 98, Orlando, Florida, Nov. 7–13 1998.Google Scholar
  29. 29.
    A. Snavely, G. Johnson, and J. Genetti. Data intensive volume visualization on the Tera MTA and Cray T3E. In Proceedings of the High Performance Computing Symposium-HPC’ 99, pages 59–64, 1999.Google Scholar
  30. 30.
    Silicon Graphics Computer Systems. Origin2000 and Onyx2 performance tuning and optimization guide. Technical Report Doc. num. 007-3430-002, 1998.Google Scholar
  31. 31.
    J. M. Tendler, J. S. Dodson, Jr. J. S. Fields, H. Le, and B. Sinharoy. POWER4 system microarchitecture. IBM Journal of Research and Development, 46(1):5–26, 2002.Google Scholar
  32. 32.
    Josep Torrellas, Liuxi Yang, and Anthony-Trung Nguyen. Toward a cost-effective DSM organization that exploits processor-memory integration. In Sixth International Symposium on High-Performance Computer Architecture, January 2000.Google Scholar
  33. 33.
    M. Tremblay. MAJC: Microprocessor architecture for Java computing. In Hot Chips, August 1999.Google Scholar
  34. 34.
    Dean M. Tullsen, Susan J. Eggers, and Henry M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In Proceedings of the 22nd Annual International Symposium on Computer Architecture, pages 392–403, June 1995.Google Scholar
  35. 35.
    Dean M. Tullsen, Jack L. Lo, Susan J. Eggers, and Henry M. Levy. Supporting fine-grained synchronization on a simultaneous multithreading processor. In HPCA, pages 54–58, 1999.Google Scholar
  36. 36.
    Elliot Waingold, Michael Taylor, Devabhaktuni Srikrishna, Vivek Sarkar, Walter Lee, Victor Lee, Jang Kim, Matthew Frank, Peter Finch, Rajeev Barua, Jonathan Babb, Saman Amarasinghe, and Anant Agarwal. Baring it all to software: Raw machines. IEEE Computer, pages 86–93, September 1997.Google Scholar
  37. 37.
    M. Yankelevsky and C. D. Polychronopoulos. α-Coral: A multigrain, multithreading processor architecture. In Proceedings of International Conference on Supercomputing’01, 2001.Google Scholar
  38. 38.
    H. P. Zima and T. Sterling. The Gilgamesh processor-in-memory architecture and its execution model. In Workshop on Compilers for Parallel Computers, June 2001.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • George Almasi
    • 2
  • Eduard Ayguadé
    • 1
  • Călin Caşcaval
    • 2
  • José Castaños
    • 2
  • Jesús Labarta
    • 1
  • Francisco Martínez
    • 1
  • Xavier Martorell
    • 1
  • José Moreira
    • 2
  1. 1.CEPBA-IBM Research InstituteUPCBarcelonaSpain
  2. 2.IBM Thomas J. Watson Research CenterYorktown Heights

Personalised recommendations