Optimizing Cache Access: A Tool for Source-to-Source Transformations and Real-Life Compiler Tests

  • Ralph Müller-Pfefferkorn
  • Wolfgang E. Nagel
  • Bernd Trenkler
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3149)


Loop transformations are well known to be a very useful tool for performance improvements by optimizing cache access. Nevertheless, the automatic application is a complex and challenging task especially for parallel codes. Since the end of the 1980’s it has been promised by most compiler vendors that these features will be implemented – in the next release. We tested current FORTRAN90 compilers (on IBM, Intel and SGI hardware) for their capabilities in this field. This paper shows the results of our analysis. Motivated by this experience we have developed the optimization environment Goofi to assist programmers in applying loop transformations to their code thus gaining better performance for parallel codes even today.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    EP-CACHE: Tools for Efficient Parallel Programming of Cache Architectures. WWW Documentation (2002),
  2. 2.
    Brandes, T., et al.: Werkzeuge für die effiziente parallele Programmierung von Cache-Architekturen. In: 19.PARS-Workshop, Basel (2003)Google Scholar
  3. 3.
    Intel Corporation: Intel r_Fortran Compiler for Linux Systems (2003) Google Scholar
  4. 4.
    Silicon Graphics Inc.: MIPSpro Fortran 90 (2003) Google Scholar
  5. 5.
    IBM: XL Fortran for AIX V8.1.1 (2003) Google Scholar
  6. 6.
    Brunst, H., Nagel, W.E., Hoppe, H.C.: Group Based Performance Analysis for Multithreaded SMP Cluster Applications. In: Sakellariou, R., Keane, J.A., Gurd, J.R., Freeman, L. (eds.) Euro-Par 2001. LNCS, vol. 2150, p. 148. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  7. 7.
    Brandes, T.: ADAPTOR - High Performance Fortran Compilation System. Institute for Algorithms and Scientific Computing (SCAI FhG), Sankt Augustin (2000),
  8. 8.
    Schulz, M., Tao, J., Jeitner, J., Karl, W.: A Proposal for a New Hardware Cache Monitoring Architecture. In: Proceedings of the ACM/SIGPLAN workshop on Memory System Performance, Berlin, Germany, pp. 76–85 (2002)Google Scholar
  9. 9.
    Tao, J., Karl, W., Schulz, M.: Using Simulation to Understand the Data Layout of Programs. In: Proceedings of the IASTED International Conference on Applied Simulation and Modeling (ASM 2001), Marbella, Spain, pp. 349–354 (2001)Google Scholar
  10. 10.
    Tao, J., Brandes, T., Gerndt, M.: A Cache Simulation Environment for OpenMP. In: Proceedings of the Fifth European Workshop on OpenMP (EWOMP 2003), Aachen, Germany, pp. 137–146 (2003)Google Scholar
  11. 11.
    Brunst, H., Hoppe, H.C., Nagel, W.E., Winkler, M.: Performance Optimization for Large Scale Computing: The Scalable VAMPIR Approach. In: Alexandrov, V.N., Dongarra, J., Juliano, B.A., Renner, R.S., Tan, C.J.K. (eds.) ICCS-ComputSci 2001. LNCS, vol. 2074, p. 751. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  12. 12.
    Brunst, H., Nagel, W.E., Malony, A.D.: A distributed performance analysis architecture for clusters. In: IEEE International Conference on Cluster Computing, Cluster 2003, Hong Kong, China, pp. 73–81. IEEE Computer Society, Los Alamitos (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Ralph Müller-Pfefferkorn
    • 1
  • Wolfgang E. Nagel
    • 1
  • Bernd Trenkler
    • 1
  1. 1.Center for High Performance Computing (ZHR)Dresden University of TechnologyDresdenGermany

Personalised recommendations