The Journal of Supercomputing

, Volume 36, Issue 2, pp 153–170 | Cite as

Layout transformation support for the disk resident arrays framework

  • Sriram Krishnamoorthy
  • Gerald Baumgartner
  • Chi-Chung Lam
  • Jarek Nieplocha
  • P. Sadayappan
Article

Abstract

The Global Arrays (GA) toolkit provides a shared-memory programming model in which data locality is explicitly managed by the programmer. It inter-operates with MPI and supports a variety of language bindings. The Disk Resident Arrays (DRA) model extends the GA programming model to secondary storage. GA and DRA together provide a convenient programming model that encourages locality-aware programming by the user, while presenting a high-level abstraction. High performance depends on the appropriate distribution of the data in the disk-resident arrays. In this paper, we discuss the addition of layout transformation support to DRA. The implementation of an efficient parallel layout transformation algorithm is done on top of existing GA/DRA functions; thus GA/DRA is itself used in implementing the enhanced DRA functionality. Experimental performance data is provided that demonstrates the effectiveness of the new layout transformation functionality.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Anderson GL (1980) A stepwise approach to computing the multidimensional fast fourier transform of large arrays. IEEE Transactions on Acoustics and Speech Signal Processing 28(3):280–284CrossRefMATHGoogle Scholar
  2. 2.
    Bailey DH (1990) FFTs in external or hierarchical memory. Journal of Supercomputing 4(1):23–35CrossRefGoogle Scholar
  3. 3.
    Baumgartner G, Bernholdt DE, Cociorva D, Harrison R, Hirata S, Lam C, Nooijen M, Pitzer R, Ramanujam J, Sadayappan P (2003) A high-level approach to synthesis of high-performance codes for quantum chemistry. In: Proceedings of Supercomputing 2002Google Scholar
  4. 4.
    Chen Y, Foster I, Nieplocha J, Winslett W (1997) Optimizing collective I/O performance on parallel computers: A multisystem study. In: 11th ACM Intl. Conf. on SupercomputingGoogle Scholar
  5. 5.
    Cociorva D, Baumgartner G, Lam C, Sadayappan P, Ramanujam J, Nooijen M, Bernholdt D, Harrison R (2002) Space-time trade-off optimization for a class of electronic structure calculations. In: Proc. of ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (PLDI)Google Scholar
  6. 6.
    Cociorva D, Gao X, Krishnan S, Baumgartner G, Lam C, Sadayappan P, Ramanujam J (2003) Global communication optimization for tensor contraction expressions under memory constraints. In: Proc. of 17th International Parallel & Distributed Processing Symposium (IPDPS)Google Scholar
  7. 7.
    Cociorva D, Wilkins J, Baumgartner G, Sadayappan P, Ramanujam J, Nooijen M, Bernholdt DE, Harrison R (2001) Towards automatic synthesis of high-performance codes for electronic structure calculations: Data locality optimization. In: Proc. of the Intl. Conf. on High Performance ComputingGoogle Scholar
  8. 8.
    Eklundh JO (1972) A fast computer method for matrix transposing. IEEE Transactions on Computers 20(7):801–803MathSciNetGoogle Scholar
  9. 9.
    The Panda Project: Data Management for High-Performance Scientific Computation. http://drl.cs.uiuc.edu/panda/
  10. 10.
    Foster I, Nieplocha J (2001) Disk Resident Arrays: An array-oriented I/O library for out-of-core computations. In: Rajkumar Buyya, Hai Jin, and Toni Cortes (eds.) Disk arrays and parallel I/O: Theory and practice. IEEE Computer Society PressGoogle Scholar
  11. 11.
    Kaushik SD, Huang C-H, Johnson RW, Sadayappan P, Johnson JR (1993) Efficient transposition algorithms for large matrices. In: Proceedings of the 1993 ACM/IEEE conference on Supercomputing ACM Press, pp. 656–665.Google Scholar
  12. 12.
    Kazhiyur-Mannar R, Wenger R, Crawfis R, Dey TK (2003) Adaptive resolution isosurface construction in three and four dimensions. Technical Report OSU-CISRC-7/03–TR38, Dept. of Computer and Information Science, The Ohio State UniversityGoogle Scholar
  13. 13.
    Krishnamoorthy S, Baumgartner G, Cociorva D, Lam C, Sadayappan P (2003) Efficient parallel out-of-core matrix transposition. In: Proceedings of the International Conference on Cluster Computing. IEEE Computer Society PressGoogle Scholar
  14. 14.
    Krishnamoorthy S, Baumgartner G, Cociorva D, Lam C, Sadayappan P (2003) On efficient out-of-core matrix transposition. Technical Report OSU-CISRC-9/03-T52, Dept. of Computer and Information Science, The Ohio State UniversityGoogle Scholar
  15. 15.
    Krishnan S, Krishnamoorthy S, Baumgartner G, Cociorva D, Lam C, Sadayappan P, Ramanujam J, Bernholdt DE, Choppella V (2003) Data locality optimization for synthesis of efficient out-of-core algoritms. In: Proc. of the Intl. Conf. on High Performance ComputingGoogle Scholar
  16. 16.
    Krishnan S, Krishnamoorthy S, Baumgartner G, Lam C, Ramanujam J, Choppella V, Sadayappan P (2004) Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver. In: Proc. of 18th International Parallel & Distributed Processing Symposium (IPDPS)Google Scholar
  17. 17.
    Mirin AA, Cohen RH, Curtis BC, Dannevik WP, Dimits AM, Duchaineau MA, Eliason DE, Schikore DR, Anderson SE, Porter DH, Woodward PR, Shieh LJ, White SW (1999) Very high resolution simulation of compressible turbulence on the IBM-SP system. In: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing (CDROM) 70. ACM PressGoogle Scholar
  18. 18.
    Nieplocha J, Foster I (1996) Disk Resident Arrays: An array-oriented I/O library for out-of-core computations. In: Proceedings of the Sixth Symposium on the Frontiers of Massively Parallel Computation. IEEE Computer Society Press, pp. 196–204.Google Scholar
  19. 19.
    Nieplocha J, Harrison RJ, Littlefield RJ (1994) Global Arrays: A portable programming model for distributed memory computers. In: Supercomputing, pp. 340–349.Google Scholar
  20. 20.
    Nieplocha J, Harrison RJ, Littlefield RJ (1996) Global Arrays: A nonuniform memory access programming model for high-performance computers. The Journal of Supercomputing 10(2):169–189CrossRefGoogle Scholar
  21. 21.
  22. 22.
    Kent E Seamons and Marianne Winslett (1996) Multidimensional array I/O in Panda 1.0. The Journal of Supercomputing 10(2):191–211CrossRefGoogle Scholar
  23. 23.
    Jinwoo Suh, Prasanna VK (2002) An efficient algorithm for out-of-core matrix transposition. IEEE Transactions on Computers 51(4):420–438CrossRefGoogle Scholar
  24. 24.
    Synthesis of High-Performance Algorithms for Electronic Structure Calculations. http://www.cse.ohio-state.edu/~saday/TCE/index.html

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  • Sriram Krishnamoorthy
    • 1
  • Gerald Baumgartner
    • 2
  • Chi-Chung Lam
    • 1
  • Jarek Nieplocha
    • 3
  • P. Sadayappan
    • 1
  1. 1.Department of Computer Science and EngineeringThe Ohio State UniversityColumbusUSA
  2. 2.Department of Computer ScienceLouisiana State UniversityBaton RougeUSA
  3. 3.Computational Sciences and MathematicsPacific Northwest National LaboratoryRichlandUSA

Personalised recommendations