Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization

Cociorva, D.; Wilkins, J.; Baumgartner, G.; Sadayappan, P.; Ramanujam, J.; Nooijen, M.; Bernholdt, D.; Harrison, R.

doi:10.1007/3-540-45307-5_21

D. Cociorva⁷,
J. Wilkins⁷,
G. Baumgartner⁸,
P. Sadayappan⁸,
J. Ramanujam⁹,
M. Nooijen¹⁰,
D. Bernholdt¹¹ &
…
R. Harrison¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2228))

Included in the following conference series:

International Conference on High-Performance Computing

377 Accesses
12 Citations

Abstract

The goal of our project is the development of a program synthesis system to facilitate the development of high-performance parallel programs for a class of computations encountered in computational chemistry and computational physics. These computations are expressible as a set of tensor contractions and arise in electronic structure calculations. This paper provides an overview of a planned synthesis system that will take as input a high-level specification of the computation and generate high-performance parallel code for a number of target architectures. We focus on an approach to performing data locality optimization in this context. Preliminary experimental results on an SGI Origin 2000 are encouraging and demonstrate that the approach is effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancementof imperfectly-nested loops. ACM Intl. Conf. on Supercomputing, 2000.
Google Scholar
W. Aulbur. Parallel Implementation of Quasiparticle Calculations of Semiconductors andInsulators, Ph.D. Dissertation, Ohio State University, Columbus, OH, October 1996.
Google Scholar
J. Bilmes, K. Asanovic, C. Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC.In Proc. ACM International Conference on Supercomputing, pp. 340–347, 1997.
Google Scholar
D. Cociorva, J. Wilkins, C.-C. Lam, G. Baumgartner, P. Sadayappan, and J. Ramanujam.Loop optimization for a class of memory-constrained computations. In Proc. 15th ACM InternationalConference on Supercomputing, pp. 500–509, Sorrento, Italy, June 2001.
Google Scholar
M. Frigo and S. Johnson. FFTW: An adaptive software architecture for the FFT. In Proc.ICASSP 98, Volume 3, pages 1381–1384, 1998, http://www.fftw.org.
Google Scholar
G. Gao, R. Olsen, V. Sarkar and R. Thekkath. Collective Loop Fusion for Array Contraction.Proc. 5th LCPC Workshop New Haven, CT, Aug. 1992.
Google Scholar
S. Ghosh, M. Martonosi and S. Malik. Precise Miss Analysis for Program Transformationswith Caches of Arbitrary Associativity. 8th ACM Intl. Conf. on Architectural Support forProgramming Languages and Operating Systems, San Jose, CA, Oct. 1998.
Google Scholar
M. S. Hybertsen and S. G. Louie. Electronic correlation in semiconductors and insulators:band gaps and quasiparticle energies. Phys. Rev. B, 34:5390, 1986.
Google Scholar
J. Johnson, R. Johnson, D. Rodriguez, and R. Tolimieri. A methodology for designing, modifying,and implementing Fourier transform algorithms on various architectures. Circuits,Systems and Signal Processing, 9(4):449–500, 1990.
Article MATH MathSciNet Google Scholar
K. Kennedy et. al., Telescoping Languages: A Strategy for Automatic Generation of ScientificProblem-Solving Systems from Annotated Libraries. To appear in Journal of Paralleland Distributed Computing, 2001.
Google Scholar
K. Kennedy. Fast greedy weighted fusion. ACM Intl. Conf. on Supercomputing, May 2000.
Google Scholar
I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multi-level blocking. In Proc. SIGPLANConf. Programming Language Design and Implementation, June 1997.
Google Scholar
I. Kodukula, K. Pingali, R. Cox, and D. Maydan. An experimental evaluation of tiling andshackling for memory hierarchy management. In Proc. ACM International Conference onSupercomputing (ICS 99), Rhodes, Greece, June 1999.
Google Scholar
C. Lam. Performance Optimization of a Class of Loops Implementing Multi-DimensionalIntegrals, Ph.D. Dissertation, The Ohio State University, Columbus, OH, August 1999.
Google Scholar
C. Lam, D. Cociorva, G. Baumgartner and P. Sadayappan. Optimization of Memory Usageand Communication Requirements for a Class of Loops Implementing Multi-DimensionalIntegrals. Proc. 12th LCPC Workshop San Diego, CA, Aug. 1999.
Google Scholar
C. Lam, D. Cociorva, G. Baumgartner, and P. Sadayappan. Memory-optimal evaluation ofexpression trees involving large objects. In Proc. Intl. Conf. on High Perf. Comp., Dec. 1999.
Google Scholar
C. Lam, P. Sadayappan, and R. Wenger. Optimal reordering and mapping of a class ofnested-loops for parallel execution. In 9th LCPC Workshop, San Jose, Aug. 1996.
Google Scholar
C. Lam, P. Sadayappan and R. Wenger. On Optimizing a Class of Multi-Dimensional Loopswith Reductions for Parallel Execution. Par. Proc. Lett., (7) 2, pp. 157–168, 1997.
Article MathSciNet Google Scholar
C. Lam, P. Sadayappan and R. Wenger. Optimization of a Class of Multi-Dimensional Integralson Parallel Machines. Proc. of Eighth SIAM Conf. on Parallel Processing for ScientificComputing, Minneapolis, MN, March 1997.
Google Scholar
M. S. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations ofblocked algorithms. In Proc. of Fourth Intl. Conf. on Architectural Support for ProgrammingLanguages and Operating Systems, April 1991.
Google Scholar
T. J. Lee and G. E. Scuseria. Achieving chemical accuracy with coupled cluster theory. InS.R. Langhoff (Ed.), Quantum Mechanical Electronic Structure Calculations with ChemicalAccuracy, pp. 47–109, Kluwer Academic, 1997.
Google Scholar
W. Li. Compiler cache optimizations for banded matrix problems. In International Conferenceon Supercomputing, Barcelona, Spain, July 1995.
Google Scholar
J. M. L. Martin. InP. v. R. Schleyer, P. R. Schreiner, N. L. Allinger, T. Clark, J. Gasteiger, P. Kollman, H. F. SchaeferIII (Eds.), Encyclopedia of Computational Chemistry. Wiley &Sons, Berne (Switzerland). Vol. 1, pp. 115–128, 1998.
Google Scholar
K. S. McKinley, S. Carr and C.-W. Tseng. Improving Data Locality with Loop Transformations.ACM TOPLAS, 18(4):424–453, July 1996.
Article Google Scholar
N. Mitchell, K. Högstedt, L. Carter, and J. Ferrante. Quantifying the multi-level nature oftiling interactions. Intl. Journal of Parallel Programming, 26(6):641–670, June 1998.
Article Google Scholar
G. Rivera and C.-W. Tseng. Data Transformations for Eliminating Conflict Misses. ACMSIGPLAN PLDI, June 1998.
Google Scholar
G. Rivera and C.-W. Tseng. Eliminating Conflict Misses for High Performance Architectures.Proc. of 1998 Intl. Conf. on Supercomputing, July 1998.
Google Scholar
H. N. Rojas, R.W. Godby, and R. J. Needs. Space-time method for Ab-initio calculations ofself-energies and dielectric response functions of solids. Phys. Rev. Lett., 74:1827, 1995.
Article Google Scholar
S. Singhai and K. S. McKinley. A Parameterized Loop Fusion Algorithm for ImprovingParallelism and Cache Locality. The Computer Journal, 40(6):340–355, 1997.
Article Google Scholar
Y. Song and Z. Li. New Tiling Techniques to Improve Cache Temporal Locality. ACM SIGPLANPLDI, May 1999.
Google Scholar
M. Thottethodi, S. Chatterjee, and A. Lebeck. Tuning Strassen’s matrix multiplication formemory hierarchies. In Proc. Supercomputing’ 98, Nov. 1998.
Google Scholar
R. Whaley and J. Dongarra. Automatically Tuned Linear Algebra Software (ATLAS). InProc. Supercomputing’ 98, Nov. 1998.
Google Scholar
M. E. Wolf and M. S. Lam. A Data Locality Algorithm. ACM SIGPLAN PLDI, June 1991.
Google Scholar
M. E. Wolf, D. E. Maydan, and D. J. Chen. Combining loop transformations consideringcaches and scheduling. In Proceedings of the 29th Annual International Symposium on Microarchitecture,pages 274–286, Paris, France, December 2-4, 1996.
Google Scholar
J. Xiong, D. Padua, and J. Johnson. SPL: A language and compiler for DSP algorithms. ACMSIGPLAN PLDI, June 2001.
Google Scholar

Download references

Author information

Authors and Affiliations

Physics Dept., Ohio State Univ., USA
D. Cociorva & J. Wilkins
CIS Department, Ohio State University, USA
G. Baumgartner & P. Sadayappan
ECE Department, Louisiana State University, USA
J. Ramanujam
Chemistry Department, Princeton University, USA
M. Nooijen
Oak Ridge National Laboratory, USA
D. Bernholdt
Pacific Northwest National Laboratory, USA
R. Harrison

Authors

D. Cociorva
View author publications
You can also search for this author in PubMed Google Scholar
J. Wilkins
View author publications
You can also search for this author in PubMed Google Scholar
G. Baumgartner
View author publications
You can also search for this author in PubMed Google Scholar
P. Sadayappan
View author publications
You can also search for this author in PubMed Google Scholar
J. Ramanujam
View author publications
You can also search for this author in PubMed Google Scholar
M. Nooijen
View author publications
You can also search for this author in PubMed Google Scholar
D. Bernholdt
View author publications
You can also search for this author in PubMed Google Scholar
R. Harrison
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Mathematics and Computer Science, University of Paderborn, Fürstenallee 11, 33102, Paderborn, Germany
Burkhard Monien
Department of EE-Systems, Computer Engineering Division, University of Southern California, 3740 McClintock Avenue, EEB 200C, 90089-2562, Los Angeles, CA, USA
Viktor K. Prasanna
Independent Consultant, c/o Infosys Ltd., “Mangala”, Kuloor Ferry Road, Kottara, 575006, Mangalore, India
Sriram Vajapeyam

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cociorva, D. et al. (2001). Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization. In: Monien, B., Prasanna, V.K., Vajapeyam, S. (eds) High Performance Computing — HiPC 2001. HiPC 2001. Lecture Notes in Computer Science, vol 2228. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45307-5_21

Download citation

DOI: https://doi.org/10.1007/3-540-45307-5_21
Published: 04 December 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43009-4
Online ISBN: 978-3-540-45307-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics