Data distribution and loop parallelization for shared-memory multiprocessors
Shared-memory multiprocessor systems can achieve high performance levels when appropriate work parallelization and data distribution are performed. These two actions are not independent and decisions have to be taken in a unified way trying to minimize execution time and data movement costs. The first goal is achieved by parallelizing loops (the main components suitable for parallel execution in scientific codes) and assign work to processors having in mind a good load balancing. The second goal is achieved when data is stored in the cache memories of processors minimizing both true and false sharing of cache lines. This paper describes the main features of our automatic parallelization and data distribution research tool and shows the performance of the parallelization strategies generated. The tool (named PDDT) accepts programs written in Fortran77 and generates directives of shared memory programming models (like Power Fortran from SGI or Exemplar from Convex).
KeywordsHigh Performance Compilers Loop Parallelization Static and Dynamic Data Mappings Cache Behavior Shared Memory Multiprocessors
Unable to display preview. Download preview PDF.
- [AAL95]J.M. Anderson, S.P. Amarasinghe, and M.S. Lam. Data and computation transformations for multiprocessors. In Principles and Practice of Parallel Programming, pages 166–178. ACM SIGPLAN, June 1995.Google Scholar
- [AGG+94]E. Ayguadé, J. Garcia, M. Gironès, J. Labarta, J. Torres, and M. Valero. Detecting and Using Affinity in an Automatic Data Distribution Tool. In K. Pingali et al., editor, Proceedings of the 7th Annual Workshop on Languages and Compilers for Parallel Computing, pages 61–75, Ithaca, NY, August 1994. Lecture Notes in Computer Science vol. 892, Springer-Verlag.Google Scholar
- [AGG+95]E. Ayguadé, J. Garcia, M. Gironès, M.L. Grande, and J Labarta. Data Redistribution in an Automatic Data Distribution Tool. In C.-H. Huang et al., editor, Proceedings of the 8th Annual Workshop on Languages and Compilers for Parallel Computing, pages 407–421, Columbus, Ohio, August 1995. Lecture Notes in Computer Science vol. 1033, Springer-Verlag.Google Scholar
- [AL93]J.M. Anderson and M.S. Lam. Global optimizations for parallelism and locality on scalable parallel machines. In Conference on Programming Language Design and Implementation, pages 112–125. ACM SIGPLAN, June 1993.Google Scholar
- [BCG+95]P. Banerjee, J.A. Chandy, M. Gupta, E.W. Hodges IV, J.G. Holm, A. Lain, D.J. Palermo, S. Ramaswamy, and E. Su. The Paradigm Compiler for Distributed-Memory Multicomputers. IEEE Computer, 28(10):37–47, October October 1995.Google Scholar
- [Con94]Convex. SPP1000 Systems Overview. Convex Computer Corporation, 1994.Google Scholar
- [JE95]T.E. Jeremiassen and S.J. Eggers. Reducing false sharing on shared memory multiprocessors through compile time data transformations. In Principles and Practice of Parallel Programming, pages 179–188. ACM, June 1995.Google Scholar
- [KK95]K. Kennedy and U. Kremer. Automatic Data Layout for High Performance Fortran. In Supercomputing'95, San Diego, CA, December 1995.Google Scholar
- [SGI96]Silicon Graphics Computer Systems SGI. Power Challenge Technical Report, 1996.Google Scholar
- [SSGC95]T.J. Scheffler, R. Schreiber, J.R. Gilbert, and S. Chatterjee. Aligning Parallel Arrays to Reduce Communication. In Frontiers95: The 5th Symposium on the Frontiers of Massively Parallel Computation, pages 324–331, February 1995.Google Scholar