Multidimensional Blocking in UPC

  • Christopher Barton
  • Călin Caşcaval
  • George Almasi
  • Rahul Garg
  • José Nelson Amaral
  • Montse Farreras
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5234)


Partitioned Global Address Space (PGAS) languages offer an attractive, high-productivity programming model for programming large-scale parallel machines. PGAS languages, such as Unified Parallel C (UPC), combine the simplicity of shared-memory programming with the efficiency of the message-passing paradigm by allowing users control over the data layout. PGAS languages distinguish between private, shared-local, and shared-remote memory, with shared-remote accesses typically much more expensive than shared-local and private accesses, especially on distributed memory machines where shared-remote access implies communication over a network.

In this paper we present a simple extension to the UPC language that allows the programmer to block shared arrays in multiple dimensions. We claim that this extension allows for better control of locality, and therefore performance, in the language.

We describe an analysis that allows the compiler to distinguish between local shared array accesses and remote shared array accesses. Local shared array accesses are then transformed into direct memory accesses by the compiler, saving the overhead of a locality check at runtime. We present results to show that locality analysis is able to significantly reduce the number of shared accesses.


Iteration Space Loop Nest Direct Memory Access Data Layout Innermost Loop 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ayguade, E., Garcia, J., Girones, M., Labarta, J., Torres, J., Valero, M.: Detecting and using affinity in an automatic data distribution tool. In: Languages and Compilers for Parallel Computing, pp. 61–75 (1994)Google Scholar
  2. 2.
    Bikshandi, G., Guo, J., Hoeflinger, D., Almási, G., Fraguela, B.B., Garzarán, M.J., Padua, D.A., von Praun, C.: Programming for parallelism and locality with hierarchically tiled arrays. In: PPOPP, pp. 48–57 (2006)Google Scholar
  3. 3.
    Chamberlain, B.L., Choi, S.-E., Lewis, E.C., Lin, C., Snyder, L., Weathersby, D.: ZPL: A machine independent programming language for parallel computers. Software Engineering 26(3), 197–211 (2000)CrossRefGoogle Scholar
  4. 4.
    Dongarra, J.J., Du Croz, J., Hammarling, S., Hanson, R.J.: An extended set of FORTRAN Basic Linear Algebra Subprograms. ACM Transactions on Mathematical Software 14(1), 1–17 (1988)zbMATHCrossRefGoogle Scholar
  5. 5.
  6. 6.
    Blackford, L.S., et al.: ScaLAPACK: a linear algebra library for message-passing computers. In: Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing (Minneapolis, MN, 1997) (electronic), Philadelphia, PA, USA, p. 15. Society for Industrial and Applied Mathematics (1997)Google Scholar
  7. 7.
    Gupta, M., Schonberg, E., Srinivasan, H.: A unified framework for optimizing communication in data-parallel programs. IEEE Transactions on Parallel and Distributed Systems 7(7), 689–704 (1996)CrossRefGoogle Scholar
  8. 8.
  9. 9.
    Kremer, U.: Automatic data layout for distributed memory machines. Technical Report TR96-261, 14 (1996)Google Scholar
  10. 10.
    Numrich, R.W., Reid, J.: Co-array fortran for parallel programming. ACM Fortran Forum 17(2), 1–31 (1998)CrossRefGoogle Scholar
  11. 11.
    Paek, Y., Navarro, A.G., Zapata, E.L., Padua, D.A.: Parallelization of benchmarks for scalable shared-memory multiprocessors. In: IEEE PACT, p. 401 (1998)Google Scholar
  12. 12.
    Ponnusamy, R., Saltz, J.H., Choudhary, A.N., Hwang, Y.-S., Fox, G.: Runtime support and compilation methods for user-specified irregular data distributions. IEEE Transactions on Parallel and Distributed Systems 6(8), 815–831 (1995)CrossRefGoogle Scholar
  13. 13.
    Tu, P., Padua, D.A.: Automatic array privatization. In: Compiler Optimizations for Scalable Parallel Systems Languages, pp. 247–284 (2001)Google Scholar
  14. 14.
    UPC Language Specification, V1.2 (May 2005)Google Scholar
  15. 15.
    The X10 programming language (2004),
  16. 16.
    Yelick, K., Semenzato, L., Pike, G., Miyamoto, C., Liblit, B., Krishnamurthy, A., Hilfinger, P., Graham, S., Gay, D., Colella, P., Aiken, A.: Titanium: A high-performance java dialect. Concurrency: Practice and Experience 10(11-13) (September-November 1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Christopher Barton
    • 1
  • Călin Caşcaval
    • 2
  • George Almasi
    • 2
  • Rahul Garg
    • 1
  • José Nelson Amaral
    • 1
  • Montse Farreras
    • 3
  1. 1.University of AlbertaEdmontonCanada
  2. 2.IBM T.J. Watson Research Center  
  3. 3.Barcelona Supercomputing CenterUniversitat Politècnica de Catalunya 

Personalised recommendations