Data and Computation Abstractions for Dynamic and Irregular Computations

  • Sriram Krishnamoorthy
  • Jarek Nieplocha
  • P. Sadayappan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3769)


Effective data distribution and parallelization of computations involving irregular data structures is a challenging task. We address the twin-problems in the context of computations involving block-sparse matrices. The programming model provides a global view of a distributed block-sparse matrix. Abstractions are provided for the user to express the parallel tasks in the computation. The tasks are mapped onto processors to ensure load balance and locality. The abstractions are based on the Aggregate Remote Memory Copy Interface, and are interoperable with the Global Arrays programming suite and MPI. Results are presented that demonstrate the utility of the approach.


Data Region Communication Cost Northwest National Laboratory Global Array Global Address 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lim, A.W., Lam, M.S.: Maximizing parallelism and minimizing synchronization with affine partitions. Parallel Computing 24, 445–475 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Nieplocha, J., Carpenter, B.: ARMCI: A Portable Remote Memory Copy Library for Distributed Array Libraries and Compiler Run-time Systems. In: Proc. 3rd Workshop on Runtime Systems for Parallel Programming, RTSPP (1999)Google Scholar
  3. 3.
    Nieplocha, J., Palmer, B., Tipparaju, V., Krishnan, M., Trease, H., Apra, E.: Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit. Intern. J. High Perf. Comp. Applications (2005) (to appear)Google Scholar
  4. 4.
    Baumgartner, G., Bernholdt, D., Cociorva, D., Harrison, R., Hirata, S., Lam, C., Nooijen, M., Pitzer, R., Ramanujam, J., Sadayappan, P.: A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry. In: Proc. of Supercomputing 2002 (2002)Google Scholar
  5. 5.
    Plimpton, S.J., Hendrickson, B.A.: Parallel molecular dynamics with the embedded atom method. In: Proc. of Materials Theory and Modelling, MRS Proceedings, p. 37 (1993)Google Scholar
  6. 6.
    Coarfa, C., Dotsenko, Y., Mellor-Crummey, J.: A Multi-Platform Co-Array Fortran Compiler. In: Proc. of PACT (2004)Google Scholar
  7. 7.
    Parzyszek, K., Nieplocha, J., Kendall, R.A.: A Generalized Portable SHMEM Library for High Performance Computing. In: Proc. of the IASTED Parallel and Distributed Computing and Systems, pp. 401–406 (2000)Google Scholar
  8. 8.
    High Performance Computational Chemistry Group: NWChem, A Computational Chemistry Package for Parallel Computers, Version 4.6. Pacific Northwest National Laboratory (2004)Google Scholar
  9. 9.
    Nieplocha, J., Foster, I.: Disk Resident Arrays: An Array-Oriented I/O Library for Out-Of-Core Computations. In: Proc. 6th Symposium on the Frontiers of Massively Parallel Computation, pp. 196–204 (1996)Google Scholar
  10. 10.
    Çatalyürek, U.V., Aykanat, C.: PaToH: A Multilevel Hypergraph Partitioning Tool, Version 3.0. Bilkent University, Department of Computer Engineering (1999)Google Scholar
  11. 11.
    Karypis, G., Aggrawal, R., Kumar, V., Shekhar, S.: Multilevel hypergraph partitioning: Applications in VLSI domain. In: Proc. of 34th Design Automation Conference (1997)Google Scholar
  12. 12.
    Çatalyürek, U.V., Aykanat, C.: Hypergraph-partitioning based decomposition for parallel spars e-matrix vector multiplication. IEEE TPDS 10, 673–693 (1999)Google Scholar
  13. 13.
    Hitara, S.: Tensor contraction engine: Abstraction and automated parallel implementation of configuration-interaction, coupled-cluster, and many-body perturbation theories. J. Phys. Chem. A 107, 9887–9897 (2003)CrossRefGoogle Scholar
  14. 14.
    Duff, I.S., Marrone, M., Radicati, G., Vittoli, C.: Level 3 basic linear algebra subprograms for sparse matrices: a user-level interface. ACM Trans. Math. Softw. 23, 379–401 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Tuminaro, R.S., Heroux, M., Hutchinson, S.A., Shadid, J.N.: Official Aztec user’s guide: Version 2.1. Technical report, Sandia National Laboratories (1999)Google Scholar
  16. 16.
    Hendrickson, B., Leland, R.: The Chaco user’s guide: Version 2.0. Technical Report SAND94–2692, Sandia National Laboratories (1994)Google Scholar
  17. 17.
    Sinha, A., Kalé, L.: A load balancing strategy for prioritized execution of tasks. In: Seventh International Parallel Processing Symposium, Newport Beach, CA, pp. 230–237 (1993)Google Scholar
  18. 18.
    Kalé, L., Krishnan, S.: CHARM++: A Portable Concurrent Object Oriented System Based on C++. In: Paepcke, A. (ed.) Proceedings of OOPSLA 1993, pp. 91–108. ACM Press, New York (1993)CrossRefGoogle Scholar
  19. 19.
    Randall, K.H.: Cilk: Efficient Multithreaded Computing. PhD thesis, MIT Department of Electrical Engineering and Computer Science (1998)Google Scholar
  20. 20.
    Chang, C., Kurc, T., Sussman, A., Çatalyürek, U.V., Saltz, J.: A hypergraph-based workload partitioning strategy for parallel data aggregation. In: Proceedings of the Eleventh SIAM Conference on Parallel Processing for Scientific Computing. SIAM, Philadelphia (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Sriram Krishnamoorthy
    • 1
  • Jarek Nieplocha
    • 2
  • P. Sadayappan
    • 1
  1. 1.Department of Computer Science and EngineeringThe Ohio State UniversityColumbusUSA
  2. 2.Computational Sciences and Mathematics, Pacific Northwest National LaboratoryRichlandUSA

Personalised recommendations