The Journal of Supercomputing

, Volume 36, Issue 2, pp 101–121 | Cite as

Experiences with Sweep3D implementations in Co-array Fortran

  • Cristian Coarfa
  • Yuri Dotsenko
  • John Mellor-Crummey


As part of the recent focus on increasing the productivity of parallel application developers, Co-array Fortran (CAF) has emerged as an appealing alternative to the Message Passing Interface (MPI). CAF belongs to the family of global address space parallel programming languages; such languages provide the abstraction of globally addressable memory accessed using one-sided communication. At Rice University we are developing caf c, an open source, multiplatform CAF compiler. Our earlier studies show that caf c-compiled CAF programs achieve similar performance to that of corresponding MPI codes for the NAS Parallel Benchmarks. In this paper, we present a study of several CAF implementations of Sweep3D on four modern architectures. We analyze the impact of using one-sided communication in Sweep3D, identify potential sources of inefficiencies and suggest ways to address them. Our results show that we achieve comparable performance to that of the MPI version on three cluster-based architectures and outperform it by up to 10 % on the SGI Altix 3000.


Open Source Programming Language Parallel Programming Message Passing Interface Addressable Memory 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Accelerated Strategic Computing Initiative. The ASCI Sweep3D Benchmark Code., 1995
  2. 2.
    ANSI. Myrinet-on-VME Protocol Specification (ANSI/VITA 26-1998). American National Standard Institute, 1998Google Scholar
  3. 3.
    Bailey D, Harris T, Saphir W, van der Wijngaart R, Woo A, Yarrow M (1995) The NAS parallel benchmarks 2.0. Technical Report NAS-95-020, NASA Ames Research CenterGoogle Scholar
  4. 4.
    Carlson WW, Draper JM, Culler DE, Yelick K, Warren K., Brooks E (1999) Introduction to UPC and language specification. Technical Report CCS-TR-99-157, IDA Center for Computing SciencesGoogle Scholar
  5. 5.
    Coarfa C, Dotsenko Y, Eckhardt J, Mellor-Crummey J (2003) Co-array Fortran Performance and Potential: An NPB Experimental Study. In: Proc. of the 16th Intl. Workshop on Languages and Compilers for Parallel Computing, number 2958 in LNCS. Springer-VerlagGoogle Scholar
  6. 6.
    Dotsenko Y, Coarfa C, Mellor-Crummey J (2004) A Multiplatform Co-array Fortran compiler. In: Proceedings of the 13th Intl. Conference of Parallel Architectures and Compilation Techniques, Antibes Juan-les-Pins, FranceGoogle Scholar
  7. 7.
    Dotsenko Y, Coarfa C, Mellor-Crummey J, Chavarrí a-Miranda D (2004) Experiences with Co-array Fortran on Hardware Shared Memory Platforms. In: Proceedings of the 17th International Workshop on Languages and Compilers for Parallel ComputingGoogle Scholar
  8. 8.
    Gropp W, Snir M, Nitzberg B, Lusk E (1998) MPI: The Complete Reference. MIT Press, 2nd ed.Google Scholar
  9. 9.
    Nieplocha J, Carpenter B (1999) ARMCI: A Portable Remote Memory Copy Library for Distributed Array Libraries and Compiler Run-Time Systems. Volume 1586 Lecture Notes in Computer Science Springer-Verlagpp pp. 533-546Google Scholar
  10. 10.
    Numrich RW, Reid JK (1998) Co-array Fortran for parallel programming. Technical Report RAL-TR-1998-060 Rutheford Appleton LaboratoryGoogle Scholar
  11. 11.
    Numrich RW, Reid JK (1998) Co-Array Fortran for parallel programming. ACM Fortran Forum 17(2):1–31Google Scholar
  12. 12.
    Nieplocha J, Tipparaju V, Saify A, Panda DK (2002) Protocols and strategies for optimizing performance of remote memory operations on clusters. In: Proc. Workshop Communication Architecture for Clusters (CAC02) of IPDPS’02, Ft. Lauderdale, FloridaGoogle Scholar
  13. 13.
    Open64 developers (2001) Open64 compiler and tools.
  14. 14.
    Open64/SL Developers (2002) Open64/SL compiler and tools.
  15. 15.
    Petrini F, Feng Wc, Hoisie A, Coll S, Frachtenberg E (2002) The Quadrics network: high performance clustering technology. IEEE Micro 22(1):46–57CrossRefGoogle Scholar
  16. 16.
    Rasmussen C, Sottile M, Bulatewicz T (2003) CHASM language interoperability tools.
  17. 17.
    Van der Wijngaart RF (1993) Efficient implementation of a 3-dimensional adi method on the ipsc/860. In: Proceedings of the 1993 ACM/IEEE conference on supercomputing, ACM Press pp. 102–111Google Scholar

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  • Cristian Coarfa
    • 1
  • Yuri Dotsenko
    • 1
  • John Mellor-Crummey
    • 1
  1. 1.Department of Computer ScienceRice UniversityHoustonUSA

Personalised recommendations