Abstract
The goal of our project is the development of a program synthesis system to facilitate the development of high-performance parallel programs for a class of computations encountered in computational chemistry and computational physics. These computations are expressible as a set of tensor contractions and arise in electronic structure calculations. This paper provides an overview of a planned synthesis system that will take as input a high-level specification of the computation and generate high-performance parallel code for a number of target architectures. We focus on an approach to performing data locality optimization in this context. Preliminary experimental results on an SGI Origin 2000 are encouraging and demonstrate that the approach is effective.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancementof imperfectly-nested loops. ACM Intl. Conf. on Supercomputing, 2000.
W. Aulbur. Parallel Implementation of Quasiparticle Calculations of Semiconductors andInsulators, Ph.D. Dissertation, Ohio State University, Columbus, OH, October 1996.
J. Bilmes, K. Asanovic, C. Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC.In Proc. ACM International Conference on Supercomputing, pp. 340–347, 1997.
D. Cociorva, J. Wilkins, C.-C. Lam, G. Baumgartner, P. Sadayappan, and J. Ramanujam.Loop optimization for a class of memory-constrained computations. In Proc. 15th ACM InternationalConference on Supercomputing, pp. 500–509, Sorrento, Italy, June 2001.
M. Frigo and S. Johnson. FFTW: An adaptive software architecture for the FFT. In Proc.ICASSP 98, Volume 3, pages 1381–1384, 1998, http://www.fftw.org.
G. Gao, R. Olsen, V. Sarkar and R. Thekkath. Collective Loop Fusion for Array Contraction.Proc. 5th LCPC Workshop New Haven, CT, Aug. 1992.
S. Ghosh, M. Martonosi and S. Malik. Precise Miss Analysis for Program Transformationswith Caches of Arbitrary Associativity. 8th ACM Intl. Conf. on Architectural Support forProgramming Languages and Operating Systems, San Jose, CA, Oct. 1998.
M. S. Hybertsen and S. G. Louie. Electronic correlation in semiconductors and insulators:band gaps and quasiparticle energies. Phys. Rev. B, 34:5390, 1986.
J. Johnson, R. Johnson, D. Rodriguez, and R. Tolimieri. A methodology for designing, modifying,and implementing Fourier transform algorithms on various architectures. Circuits,Systems and Signal Processing, 9(4):449–500, 1990.
K. Kennedy et. al., Telescoping Languages: A Strategy for Automatic Generation of ScientificProblem-Solving Systems from Annotated Libraries. To appear in Journal of Paralleland Distributed Computing, 2001.
K. Kennedy. Fast greedy weighted fusion. ACM Intl. Conf. on Supercomputing, May 2000.
I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multi-level blocking. In Proc. SIGPLANConf. Programming Language Design and Implementation, June 1997.
I. Kodukula, K. Pingali, R. Cox, and D. Maydan. An experimental evaluation of tiling andshackling for memory hierarchy management. In Proc. ACM International Conference onSupercomputing (ICS 99), Rhodes, Greece, June 1999.
C. Lam. Performance Optimization of a Class of Loops Implementing Multi-DimensionalIntegrals, Ph.D. Dissertation, The Ohio State University, Columbus, OH, August 1999.
C. Lam, D. Cociorva, G. Baumgartner and P. Sadayappan. Optimization of Memory Usageand Communication Requirements for a Class of Loops Implementing Multi-DimensionalIntegrals. Proc. 12th LCPC Workshop San Diego, CA, Aug. 1999.
C. Lam, D. Cociorva, G. Baumgartner, and P. Sadayappan. Memory-optimal evaluation ofexpression trees involving large objects. In Proc. Intl. Conf. on High Perf. Comp., Dec. 1999.
C. Lam, P. Sadayappan, and R. Wenger. Optimal reordering and mapping of a class ofnested-loops for parallel execution. In 9th LCPC Workshop, San Jose, Aug. 1996.
C. Lam, P. Sadayappan and R. Wenger. On Optimizing a Class of Multi-Dimensional Loopswith Reductions for Parallel Execution. Par. Proc. Lett., (7) 2, pp. 157–168, 1997.
C. Lam, P. Sadayappan and R. Wenger. Optimization of a Class of Multi-Dimensional Integralson Parallel Machines. Proc. of Eighth SIAM Conf. on Parallel Processing for ScientificComputing, Minneapolis, MN, March 1997.
M. S. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations ofblocked algorithms. In Proc. of Fourth Intl. Conf. on Architectural Support for ProgrammingLanguages and Operating Systems, April 1991.
T. J. Lee and G. E. Scuseria. Achieving chemical accuracy with coupled cluster theory. InS.R. Langhoff (Ed.), Quantum Mechanical Electronic Structure Calculations with ChemicalAccuracy, pp. 47–109, Kluwer Academic, 1997.
W. Li. Compiler cache optimizations for banded matrix problems. In International Conferenceon Supercomputing, Barcelona, Spain, July 1995.
J. M. L. Martin. InP. v. R. Schleyer, P. R. Schreiner, N. L. Allinger, T. Clark, J. Gasteiger, P. Kollman, H. F. SchaeferIII (Eds.), Encyclopedia of Computational Chemistry. Wiley &Sons, Berne (Switzerland). Vol. 1, pp. 115–128, 1998.
K. S. McKinley, S. Carr and C.-W. Tseng. Improving Data Locality with Loop Transformations.ACM TOPLAS, 18(4):424–453, July 1996.
N. Mitchell, K. Högstedt, L. Carter, and J. Ferrante. Quantifying the multi-level nature oftiling interactions. Intl. Journal of Parallel Programming, 26(6):641–670, June 1998.
G. Rivera and C.-W. Tseng. Data Transformations for Eliminating Conflict Misses. ACMSIGPLAN PLDI, June 1998.
G. Rivera and C.-W. Tseng. Eliminating Conflict Misses for High Performance Architectures.Proc. of 1998 Intl. Conf. on Supercomputing, July 1998.
H. N. Rojas, R.W. Godby, and R. J. Needs. Space-time method for Ab-initio calculations ofself-energies and dielectric response functions of solids. Phys. Rev. Lett., 74:1827, 1995.
S. Singhai and K. S. McKinley. A Parameterized Loop Fusion Algorithm for ImprovingParallelism and Cache Locality. The Computer Journal, 40(6):340–355, 1997.
Y. Song and Z. Li. New Tiling Techniques to Improve Cache Temporal Locality. ACM SIGPLANPLDI, May 1999.
M. Thottethodi, S. Chatterjee, and A. Lebeck. Tuning Strassen’s matrix multiplication formemory hierarchies. In Proc. Supercomputing’ 98, Nov. 1998.
R. Whaley and J. Dongarra. Automatically Tuned Linear Algebra Software (ATLAS). InProc. Supercomputing’ 98, Nov. 1998.
M. E. Wolf and M. S. Lam. A Data Locality Algorithm. ACM SIGPLAN PLDI, June 1991.
M. E. Wolf, D. E. Maydan, and D. J. Chen. Combining loop transformations consideringcaches and scheduling. In Proceedings of the 29th Annual International Symposium on Microarchitecture,pages 274–286, Paris, France, December 2-4, 1996.
J. Xiong, D. Padua, and J. Johnson. SPL: A language and compiler for DSP algorithms. ACMSIGPLAN PLDI, June 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cociorva, D. et al. (2001). Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization. In: Monien, B., Prasanna, V.K., Vajapeyam, S. (eds) High Performance Computing — HiPC 2001. HiPC 2001. Lecture Notes in Computer Science, vol 2228. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45307-5_21
Download citation
DOI: https://doi.org/10.1007/3-540-45307-5_21
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43009-4
Online ISBN: 978-3-540-45307-9
eBook Packages: Springer Book Archive