A Characterization of Shared Data Access Patterns in UPC Programs

  • Christopher Barton
  • Călin Caşcaval
  • José Nelson Amaral
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4382)

Abstract

The main attraction of Partitioned Global Address Space (PGAS) languages to programmers is the ability to distribute the data to exploit the affinity of threads within shared-memory domains. Thus, PGAS languages, such as Unified Parallel C (UPC), are a promising programming paradigm for emerging parallel machines that employ hierarchical data- and task-parallelism. For example, large systems are built as distributed-shared memory architectures, where multi-core nodes access a local, coherent address space and many such nodes are interconnected in a non-coherent address space to form a high-performance system.

This paper studies the access patterns of shared data in UPC programs. By analyzing the access patterns of shared data in UPC we are able to make three major observations about the characteristics of programs written in a PGAS programming model: (i) there is strong evidence to support the development of automatic identification and automatic privatization of local shared data accesses; (ii) the ability for the programmer to specify how shared data is distributed among the executing threads can result in significant performance improvements; (iii) running UPC programs on a hybrid architecture will significantly increase the opportunities for automatic privatization of local shared data accesses.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sobel - wikipedia, the free encyclopedia. http://en.wikipedia.org/wiki/Sobel
  2. 2.
    Amza, C., et al.: TreadMarks: Shared memory computing on networks of workstations. IEEE Computer 29(2), 18–28 (1996), citeseer.ist.psu.edu/amza96treadmarks.html Google Scholar
  3. 3.
    Bailey, D., et al.: The NAS parallel benchmarks. Technical Report RNR-94-007, NASA Ames Research Center (March 1994)Google Scholar
  4. 4.
    Bailey, D., et al.: The NAS parallel benchmarks 2.0. Technical Report NAS-95-020, NASA Ames Research Center (December 1995)Google Scholar
  5. 5.
    Barton, C., et al.: Shared memory programming for large scale machines. In: PLDI, June, pp. 108–117 (2006)Google Scholar
  6. 6.
    Berlin, K., et al.: Evaluating the impact of programming language features on the performance of parallel applications on cluster architectures. In: Rauchwerger, L. (ed.) LCPC 2003. LNCS, vol. 2958, Springer, Heidelberg (2004)Google Scholar
  7. 7.
    Bonachea, D.: GASNet specification, v1.1. Technical Report CSD-02-1207, U.C. Berkeley (November 2002)Google Scholar
  8. 8.
    Cantonnet, F., et al.: Performance monitoring and evaluation of a UPC implementation on a NUMA architecture. In: International Parallel and Distributed Processing Symposium (IPDPS), Nice, France, April, p. 274 (2003)Google Scholar
  9. 9.
    Cappello, F., Etiemble, D.: MPI versus MPI+OpenMP on the IBM SP for the NAS benchmarks. In: ACM/IEEE Supercomputing, November 2000, IEEE Computer Society Press, Los Alamitos (2000)Google Scholar
  10. 10.
    Cascaval, C., et al.: Performance and environment monitoring for continuous program optimization. IBM Journal of Research and Development 50(2/3) (2006)Google Scholar
  11. 11.
    Chen, W.-Y., et al.: A performance analysis of the Berkeley UPC compiler. In: Proceedings of the 17th Annual International Conference on Supercomputing (ICS’03), June (2003)Google Scholar
  12. 12.
    Coarfa, C., et al.: An evaluation of global address space languages: Co-Array Fortran and Unified Parallel C. In: Symposium on Principles and Practice of Parallel Programming (PPoPP), Chicago, IL, June, pp. 36–47 (2005)Google Scholar
  13. 13.
    UPC Consortium. UPC Language Specification, V1.2 (May 2005)Google Scholar
  14. 14.
    IBM Corporation. IBM XL UPC compilers (November 2005), http://www.alphaworks.ibm.com/tech/upccompiler
  15. 15.
    Costa, J.J., et al.: Running OpenMP applications efficiently on an everything-shared SDSM. Journal of Parallel and Distributed Computing 66, 647–658 (2005)CrossRefGoogle Scholar
  16. 16.
    Dwarkadas, S., Cox, A.L., Zwaenepoel, W.: An integrated compile-time/run-time software distributed shared memory system. In: Architectural Support for Programming Languages and Operating Systems, Cambridge, MA, pp. 186–197 (1996)Google Scholar
  17. 17.
    El-Ghazawi, T., Cantonnet, F.: UPC performance and potential: a NPB experimental study. In: Proceedings of the 2002 ACM/IEEE conference on Supercomputing (SC’02), Baltimore, Maryland, USA, November 2002, pp. 1–26. IEEE Computer Society Press, Los Alamitos (2002)Google Scholar
  18. 18.
    Hoeflinger, J.P.: Extending OpenMP to clusters (2006), http://www.intel.com/cd/software/products/asmo-na/eng/compilers/285865.htm
  19. 19.
    Keleher, P.: Distributed Shared Memory Using Lazy Release Consistency. PhD thesis, Rice University (December 1994)Google Scholar
  20. 20.
    Numrich, R.W., Reid, J.: Co-array fortran for parallel programming. ACM SIGPLAN Fortran Forum 17(2), 1–31 (1998)CrossRefGoogle Scholar
  21. 21.
    Smith, L., Bull, M.: Development of mixed mode mpi / openmp applications (Presented at Workshop on OpenMP Applications and Tools (WOMPAT 2000), San Diego, Calif., July 6-7, 2000). Scientific Programming 9(2-3), 83–98 (2001), citeseer.ist.psu.edu/article/smith00development.html Google Scholar
  22. 22.
    Wisniewski, R.W., et al.: Performance and environment monitoring for whole-system characterization and optimization. In: PAC2 - Power Performance equals Architecture x Circuits x Compilers, Yorktown Heights, NY, October 6-8, pp. 15–24 (2004)Google Scholar
  23. 23.
    Zhang, Z., Seidel, S.: Benchmark measurements of current UPC platforms. In: IPDPS, Denver, CO, April (2005)Google Scholar
  24. 24.
    Zhang, Z., Seidel, S.: A performance model for fine-grain accesses in UPC. In: IPDPS, Rhodes Island, Greece, April (2006)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Christopher Barton
    • 1
  • Călin Caşcaval
    • 2
  • José Nelson Amaral
    • 1
  1. 1.Department of Computing Science, University of Alberta, EdmontonCanada
  2. 2.IBM T.J. Watson Research Center, Yorktown Heights, NY, 10598 

Personalised recommendations