Abstract
Loop transformations are well known to be a very useful tool for performance improvements by optimizing cache access. Nevertheless, the automatic application is a complex and challenging task especially for parallel codes. Since the end of the 1980’s it has been promised by most compiler vendors that these features will be implemented – in the next release. We tested current FORTRAN90 compilers (on IBM, Intel and SGI hardware) for their capabilities in this field. This paper shows the results of our analysis. Motivated by this experience we have developed the optimization environment Goofi to assist programmers in applying loop transformations to their code thus gaining better performance for parallel codes even today.
Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
EP-CACHE: Tools for Efficient Parallel Programming of Cache Architectures. WWW Documentation (2002), http://www.scai.fhg.de/292.0.html?&L=1
Brandes, T., et al.: Werkzeuge für die effiziente parallele Programmierung von Cache-Architekturen. In: 19.PARS-Workshop, Basel (2003)
Intel Corporation: Intel r_Fortran Compiler for Linux Systems (2003)
Silicon Graphics Inc.: MIPSpro Fortran 90 (2003)
IBM: XL Fortran for AIX V8.1.1 (2003)
Brunst, H., Nagel, W.E., Hoppe, H.C.: Group Based Performance Analysis for Multithreaded SMP Cluster Applications. In: Sakellariou, R., Keane, J.A., Gurd, J.R., Freeman, L. (eds.) Euro-Par 2001. LNCS, vol. 2150, p. 148. Springer, Heidelberg (2001)
Brandes, T.: ADAPTOR - High Performance Fortran Compilation System. Institute for Algorithms and Scientific Computing (SCAI FhG), Sankt Augustin (2000), http://www.scai.fhg.de/index.php?id=291&L=1
Schulz, M., Tao, J., Jeitner, J., Karl, W.: A Proposal for a New Hardware Cache Monitoring Architecture. In: Proceedings of the ACM/SIGPLAN workshop on Memory System Performance, Berlin, Germany, pp. 76–85 (2002)
Tao, J., Karl, W., Schulz, M.: Using Simulation to Understand the Data Layout of Programs. In: Proceedings of the IASTED International Conference on Applied Simulation and Modeling (ASM 2001), Marbella, Spain, pp. 349–354 (2001)
Tao, J., Brandes, T., Gerndt, M.: A Cache Simulation Environment for OpenMP. In: Proceedings of the Fifth European Workshop on OpenMP (EWOMP 2003), Aachen, Germany, pp. 137–146 (2003)
Brunst, H., Hoppe, H.C., Nagel, W.E., Winkler, M.: Performance Optimization for Large Scale Computing: The Scalable VAMPIR Approach. In: Alexandrov, V.N., Dongarra, J., Juliano, B.A., Renner, R.S., Tan, C.J.K. (eds.) ICCS-ComputSci 2001. LNCS, vol. 2074, p. 751. Springer, Heidelberg (2001)
Brunst, H., Nagel, W.E., Malony, A.D.: A distributed performance analysis architecture for clusters. In: IEEE International Conference on Cluster Computing, Cluster 2003, Hong Kong, China, pp. 73–81. IEEE Computer Society, Los Alamitos (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Müller-Pfefferkorn, R., Nagel, W.E., Trenkler, B. (2004). Optimizing Cache Access: A Tool for Source-to-Source Transformations and Real-Life Compiler Tests. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds) Euro-Par 2004 Parallel Processing. Euro-Par 2004. Lecture Notes in Computer Science, vol 3149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27866-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-27866-5_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22924-7
Online ISBN: 978-3-540-27866-5
eBook Packages: Springer Book Archive