Skip to main content

Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization

  • Conference paper
  • First Online:
High Performance Computing — HiPC 2001 (HiPC 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2228))

Included in the following conference series:

Abstract

The goal of our project is the development of a program synthesis system to facilitate the development of high-performance parallel programs for a class of computations encountered in computational chemistry and computational physics. These computations are expressible as a set of tensor contractions and arise in electronic structure calculations. This paper provides an overview of a planned synthesis system that will take as input a high-level specification of the computation and generate high-performance parallel code for a number of target architectures. We focus on an approach to performing data locality optimization in this context. Preliminary experimental results on an SGI Origin 2000 are encouraging and demonstrate that the approach is effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. N. Ahmed, N. Mateev, and K. Pingali. Synthesizing transformations for locality enhancementof imperfectly-nested loops. ACM Intl. Conf. on Supercomputing, 2000.

    Google Scholar 

  2. W. Aulbur. Parallel Implementation of Quasiparticle Calculations of Semiconductors andInsulators, Ph.D. Dissertation, Ohio State University, Columbus, OH, October 1996.

    Google Scholar 

  3. J. Bilmes, K. Asanovic, C. Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC.In Proc. ACM International Conference on Supercomputing, pp. 340–347, 1997.

    Google Scholar 

  4. D. Cociorva, J. Wilkins, C.-C. Lam, G. Baumgartner, P. Sadayappan, and J. Ramanujam.Loop optimization for a class of memory-constrained computations. In Proc. 15th ACM InternationalConference on Supercomputing, pp. 500–509, Sorrento, Italy, June 2001.

    Google Scholar 

  5. M. Frigo and S. Johnson. FFTW: An adaptive software architecture for the FFT. In Proc.ICASSP 98, Volume 3, pages 1381–1384, 1998, http://www.fftw.org.

    Google Scholar 

  6. G. Gao, R. Olsen, V. Sarkar and R. Thekkath. Collective Loop Fusion for Array Contraction.Proc. 5th LCPC Workshop New Haven, CT, Aug. 1992.

    Google Scholar 

  7. S. Ghosh, M. Martonosi and S. Malik. Precise Miss Analysis for Program Transformationswith Caches of Arbitrary Associativity. 8th ACM Intl. Conf. on Architectural Support forProgramming Languages and Operating Systems, San Jose, CA, Oct. 1998.

    Google Scholar 

  8. M. S. Hybertsen and S. G. Louie. Electronic correlation in semiconductors and insulators:band gaps and quasiparticle energies. Phys. Rev. B, 34:5390, 1986.

    Google Scholar 

  9. J. Johnson, R. Johnson, D. Rodriguez, and R. Tolimieri. A methodology for designing, modifying,and implementing Fourier transform algorithms on various architectures. Circuits,Systems and Signal Processing, 9(4):449–500, 1990.

    Article  MATH  MathSciNet  Google Scholar 

  10. K. Kennedy et. al., Telescoping Languages: A Strategy for Automatic Generation of ScientificProblem-Solving Systems from Annotated Libraries. To appear in Journal of Paralleland Distributed Computing, 2001.

    Google Scholar 

  11. K. Kennedy. Fast greedy weighted fusion. ACM Intl. Conf. on Supercomputing, May 2000.

    Google Scholar 

  12. I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multi-level blocking. In Proc. SIGPLANConf. Programming Language Design and Implementation, June 1997.

    Google Scholar 

  13. I. Kodukula, K. Pingali, R. Cox, and D. Maydan. An experimental evaluation of tiling andshackling for memory hierarchy management. In Proc. ACM International Conference onSupercomputing (ICS 99), Rhodes, Greece, June 1999.

    Google Scholar 

  14. C. Lam. Performance Optimization of a Class of Loops Implementing Multi-DimensionalIntegrals, Ph.D. Dissertation, The Ohio State University, Columbus, OH, August 1999.

    Google Scholar 

  15. C. Lam, D. Cociorva, G. Baumgartner and P. Sadayappan. Optimization of Memory Usageand Communication Requirements for a Class of Loops Implementing Multi-DimensionalIntegrals. Proc. 12th LCPC Workshop San Diego, CA, Aug. 1999.

    Google Scholar 

  16. C. Lam, D. Cociorva, G. Baumgartner, and P. Sadayappan. Memory-optimal evaluation ofexpression trees involving large objects. In Proc. Intl. Conf. on High Perf. Comp., Dec. 1999.

    Google Scholar 

  17. C. Lam, P. Sadayappan, and R. Wenger. Optimal reordering and mapping of a class ofnested-loops for parallel execution. In 9th LCPC Workshop, San Jose, Aug. 1996.

    Google Scholar 

  18. C. Lam, P. Sadayappan and R. Wenger. On Optimizing a Class of Multi-Dimensional Loopswith Reductions for Parallel Execution. Par. Proc. Lett., (7) 2, pp. 157–168, 1997.

    Article  MathSciNet  Google Scholar 

  19. C. Lam, P. Sadayappan and R. Wenger. Optimization of a Class of Multi-Dimensional Integralson Parallel Machines. Proc. of Eighth SIAM Conf. on Parallel Processing for ScientificComputing, Minneapolis, MN, March 1997.

    Google Scholar 

  20. M. S. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations ofblocked algorithms. In Proc. of Fourth Intl. Conf. on Architectural Support for ProgrammingLanguages and Operating Systems, April 1991.

    Google Scholar 

  21. T. J. Lee and G. E. Scuseria. Achieving chemical accuracy with coupled cluster theory. InS.R. Langhoff (Ed.), Quantum Mechanical Electronic Structure Calculations with ChemicalAccuracy, pp. 47–109, Kluwer Academic, 1997.

    Google Scholar 

  22. W. Li. Compiler cache optimizations for banded matrix problems. In International Conferenceon Supercomputing, Barcelona, Spain, July 1995.

    Google Scholar 

  23. J. M. L. Martin. InP. v. R. Schleyer, P. R. Schreiner, N. L. Allinger, T. Clark, J. Gasteiger, P. Kollman, H. F. SchaeferIII (Eds.), Encyclopedia of Computational Chemistry. Wiley &Sons, Berne (Switzerland). Vol. 1, pp. 115–128, 1998.

    Google Scholar 

  24. K. S. McKinley, S. Carr and C.-W. Tseng. Improving Data Locality with Loop Transformations.ACM TOPLAS, 18(4):424–453, July 1996.

    Article  Google Scholar 

  25. N. Mitchell, K. Högstedt, L. Carter, and J. Ferrante. Quantifying the multi-level nature oftiling interactions. Intl. Journal of Parallel Programming, 26(6):641–670, June 1998.

    Article  Google Scholar 

  26. G. Rivera and C.-W. Tseng. Data Transformations for Eliminating Conflict Misses. ACMSIGPLAN PLDI, June 1998.

    Google Scholar 

  27. G. Rivera and C.-W. Tseng. Eliminating Conflict Misses for High Performance Architectures.Proc. of 1998 Intl. Conf. on Supercomputing, July 1998.

    Google Scholar 

  28. H. N. Rojas, R.W. Godby, and R. J. Needs. Space-time method for Ab-initio calculations ofself-energies and dielectric response functions of solids. Phys. Rev. Lett., 74:1827, 1995.

    Article  Google Scholar 

  29. S. Singhai and K. S. McKinley. A Parameterized Loop Fusion Algorithm for ImprovingParallelism and Cache Locality. The Computer Journal, 40(6):340–355, 1997.

    Article  Google Scholar 

  30. Y. Song and Z. Li. New Tiling Techniques to Improve Cache Temporal Locality. ACM SIGPLANPLDI, May 1999.

    Google Scholar 

  31. M. Thottethodi, S. Chatterjee, and A. Lebeck. Tuning Strassen’s matrix multiplication formemory hierarchies. In Proc. Supercomputing’ 98, Nov. 1998.

    Google Scholar 

  32. R. Whaley and J. Dongarra. Automatically Tuned Linear Algebra Software (ATLAS). InProc. Supercomputing’ 98, Nov. 1998.

    Google Scholar 

  33. M. E. Wolf and M. S. Lam. A Data Locality Algorithm. ACM SIGPLAN PLDI, June 1991.

    Google Scholar 

  34. M. E. Wolf, D. E. Maydan, and D. J. Chen. Combining loop transformations consideringcaches and scheduling. In Proceedings of the 29th Annual International Symposium on Microarchitecture,pages 274–286, Paris, France, December 2-4, 1996.

    Google Scholar 

  35. J. Xiong, D. Padua, and J. Johnson. SPL: A language and compiler for DSP algorithms. ACMSIGPLAN PLDI, June 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cociorva, D. et al. (2001). Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization. In: Monien, B., Prasanna, V.K., Vajapeyam, S. (eds) High Performance Computing — HiPC 2001. HiPC 2001. Lecture Notes in Computer Science, vol 2228. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45307-5_21

Download citation

  • DOI: https://doi.org/10.1007/3-540-45307-5_21

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43009-4

  • Online ISBN: 978-3-540-45307-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics