Locality optimizations for parallel machines
This paper focuses on the problem of locality optimizations for high-performance uniprocessor and multiprocessor systems. It shows that the problems of minimizing interprocessor communication and optimizing cache locality can be formulated in a similar manner. It outlines the algorithms to optimize for the various levels of the memory hierarchy simultaneously.
Unable to display preview. Download preview PDF.
- 1.S. P. Amarasinghe and M. S. Lam. Communication optimization and code generation for distributed memory machines. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 126–138, June 1993.Google Scholar
- 2.C. Ancourt and F. Irigoin. Scanning polyhedra with DO loops. In Proceedings of the Third ACM/SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 39–50, April 1991.Google Scholar
- 3.J. M. Anderson and M. S. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In Proceedings of the SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 112–125, June 1993.Google Scholar
- 4.J. H. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, 1990.Google Scholar
- 5.High Performance Fortran Forum. High Performance Fortran Language Specification, January 1993. Draft Version 1.0.Google Scholar
- 6.D. Lenoski, K. Gharachorloo, J. Laudon, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. The Stanford DASH Multiprocessor. IEEE Computer, 25(3):63–79, March 1992.Google Scholar
- 7.E. E. Rothberg M. S. Lam and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV), pages 63–74, Apr. 1991.Google Scholar
- 8.T. C. Mowry, M. S. Lam, and A. Gupta. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 62–73, October 1992.Google Scholar
- 9.M. E. Wolf. Improving Locality and Parallelism in Nested Loops. PhD thesis, Stanford University, August 1992. Published as CSL-TR-92-538.Google Scholar
- 10.M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, pages 30–44, June 1991.Google Scholar
- 11.M. E. Wolf and M. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. Transactions on Parallel and Distributed Systems, 2(4):452–470, October 1991.Google Scholar