Automatic Parallelization pp 110-135 | Cite as
Knowledge-Based Automatic Parallelization by Pattern Recognition
Abstract
We present the top—down design of a new system which performs automatic parallelization of numerical Fortran77, Fortran90 or C source programs for execution on distributed—memory message—passing multiprocessors such as e.g. the INTEL iPSC/860 or the TMC CM-5.
The key idea is a high—level pattern matching approach which in some useful way permits partial reverse-engineering of a wide class of numerical programs. With only a few hundred patterns, we will be able to completely match many important numerical algorithms. This is also applicable to so-called dusty deck sources that may be ‘encrypted’ by various former machine-specific optimizations.
We show how successful pattem matching enables safe algorithm replacement and allows more exact prediction of the performance of the parallelized target code than usually possible. Together with mathematical background knowledge and parallel compiler engineering, this opens access to a new potential for automatic parallelization that has never been exploited before.
Keywords
Pattern Match Syntax Tree Pattern Instance Cross Edge Target MachinePreview
Unable to display preview. Download preview PDF.
References
- [1]David H. Bailey and John T. Barton. The NAS Kernel Benchmark Program. Numerical Aerodynamic Simulations Systems Division, NASA Ames Research Center, June 1986.Google Scholar
- [2]Vasanth Balasundaram, Geoffrey Fox, Ken Kennedy, and Ulrich Kremer. A Static Performance Estimator to Guide Data Partitioning Decisions. In ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, volume 3, pages 213–223, 1991.Google Scholar
- [3]Michael Berry (Editor). Scientific Workload Characterization by Loop Based Analyses. Performance Evaluation Review, 19, February 1992.Google Scholar
- [4]Pradip Bose. Heuristic Rule-Based Program Transformations for Enhanced Vectorization. In Proc. of Int. Conf. on Parallel Processing, 1988.Google Scholar
- [5]Pradip Bose. Interactive Program Improvement via EAVE: An Expert Adviser for Vectorization. In Proc. Int. Conf. on Supercomputing, pages 119–130, July 1988.Google Scholar
- [6]Thomas Brandes and Manfred Sommer. A Knowledge-Based Parallelization Tool in a Programming Environment. In 16th Int. Conf. on Parallel Processing, pages 446–448, 1987.Google Scholar
- [7]Alan Carle, Ken Kennedy, Ulrich Kremer, and John Mellor-Crummey. Automatic Data Layout for Distributed Memory Machines in the D Programming Environment. In Proc. of AP’93 Int. Workshop on Automatic Distributed Memory Parallelization, Automatic Data Distribution and Automatic Parallel Performance Prediction, Saarbrücken, Germany, March 1993Google Scholar
- [8]Ian A. Carmichael and James R. Cordy. TXL - Tree Transformational Language Syntax and Informal Semantics. Dept. of Computing and Information Science, Queen’s University at Kingston, Canada, February 1992.Google Scholar
- [9]Barbara Chapman, Piyush Mehrotra, and Hans Zima. Programming in Vienna Fortran. In Third Workshop on Compilers for Parallel Computers, 1992.Google Scholar
- [10]Barbara M. Chapman, Heinz Herbeck, and Hans P. Zima. Automatic Support for Data Distribution. Technical Report ACPC/TR 91–14, Austrian Center for Parallel Computation, July 1991.Google Scholar
- [11]Barbara M. Chapman, Heinz Herbeck, and Hans P. Zima. Knowledge-based Parallelization for Distributed Memory Systems. Technical Report ACPGTR 91–11, Austrian Center for Parallel Computation, April 1991.Google Scholar
- [12]J. J. Dongarra, J. DuCroz, S. Hammarling, and R. Hanson. An Extended Set of Fortran Basic Linear Algebra Subprograms. ACM Trans. on Math. Software, 14 (1): 1–32, 1988.MATHCrossRefGoogle Scholar
- [13]Thomas Fahringer, Roman Blasko, and Hans Zima. Automatic Performance Prediction to Support Parallelization of Fortran Programs for Massively Parallel Systems. In Int. ACM Conference on Supercomputing,1992. Washington DC.Google Scholar
- [14]Christopher W. Fraser, Robert R. Henry, and Todd A. Proebsting. BURG — Fast Optimal Instruction Selection and Tree Parsing. SIGPLAN Notices, 27 (4): 68–76, April 1992.Google Scholar
- [15]Hans Michael Gemdt. Automatic Parallelization for Distributed-Memory Multiprocessing Systems. PhD thesis, Universität Bonn, 1989.Google Scholar
- [16]Michael Gemdt. Parallelization of Multigrid Programs in SUPERB. Technical Report ACPC/TR 90–6, Austrian Center for Parallel Computation, October 1990.Google Scholar
- [17]Manish Gupta and Prithviraj Banerjee. Automatic Data Partitioning on Distributed Memory Multiprocessors. Technical Report CRHC-90–14, Center forReliable and High-Performance Computing, University of Illinois at Urbana-Champaign, Oct. 1990.Google Scholar
- [18]Mehdi T. Harandi and Jim Q. Ning. Knowledge-based program analysis. IEEE Software, pages 74–81, January 1990.Google Scholar
- [19]Roman Hayer. Automatische Parallelisierung, Teil 2: Automatische Datenaufteilung für Parallelrechner mit verteiltem Speicher. Master thesis, Universität Saarbrücken, 1993.Google Scholar
- [20]Seema Hiranandani, Ken Kennedy, and Chau-Wen Tseng. Compiler–Support for Machine–Independent Parallel Programming in Fortran-D. Technical Report Rice COMP TR91–149, Rice University, March 1991.Google Scholar
- [21]Ken Kennedy and Ulrich Kremer. Automatic Data Alignment and Distribution for Loosely Synchronous Problems in an Interactive Programming Environment. Technical Report COMP TR91–155, Rice University, April 1991. See also this volume.Google Scholar
- [22]C.W. Keßler, W.J. Paul, and T. Rauher. Scheduling Vector Straight Line Code on Vector Processors. In R. Giegerich and S.L. Graham, editors, Code Generation–Concepts, Tools, Techniques, Springer Workshops in Computing Series, pages 77–91, 1992.Google Scholar
- [23]Christoph W. Keßler. The Basic PARAMAT Pattern Library, May 1993.Google Scholar
- [24]Kathleen Knobe, Joan D. Lukas, and Guy L. Steele. Data Optimization: Allocation of Arrays to Reduce Communication on SIMD Machines. Journal of Parallel and Distributed Computing, 8: 102–118, 1990.CrossRefGoogle Scholar
- [25]Kathleen Knobe and Venkataraman Natarajan. Data Optimization: Minimizing Residual Interprocessor Data Motion on SIMD machines. In Third Symposium on the Frontiers of Massively Parallel Computation, pages 416–423, 1990.CrossRefGoogle Scholar
- [26]Peter M. Kogge and Harold S. Stone. A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations. IEEE Transactions on Computers, C-22(8), August 1973.Google Scholar
- [27]Wojtek Kozaczynski, Jim Ning, and Tom Sarver. Program concept recognition. In Proc. of KBSE’92 Seventh Knowledge-Based Software Engineering Conference, pages 216–225, 1992.CrossRefGoogle Scholar
- [28]C. Lawson, R. Hanson, D. Kincaid, and E Krogh. Basic Linear Algebra Subprograms for Fortran Usage. ACM Trans. on Math. Software, 5: 308–325, 1979.MATHCrossRefGoogle Scholar
- [29]Jingke Li and Marina Chen. Index Domain Alignment: Minimizing Cost of Cross—referencing between Distributed Arrays. In Third Symposium on the Frontiers of Massively Parallel Computation, pages 424–433, 1990.CrossRefGoogle Scholar
- [30]A. K. Louis. Parallele Numerik. Course script and selected programs, unpublished, Universität Saarbriicken, 1992.Google Scholar
- [31]Frank McMahon. The Livermore Fortran Kernels: A Test of the Numeric Performance Range. Technical report, Lawrence Livermore National Laboratory, 1986.Google Scholar
- [32]Silvia M. Müller. Die Auswirkung der Startup-Zeit auf die Leistung paralleler Rechner bei numerischen Anwendungen. Master thesis, Universität Saarbrücken, 1989.Google Scholar
- [33]Shlomit S. Pinter and Ron Y. Pinter. Program Optimization and Parallelization Using Idioms. In Principles of Programming Languages, pages 79–92, 1991.Google Scholar
- [34]J. Saltz, K. Crowley, R. Mirchandaney, and H. Berryman. Runtime Scheduling and Execution of Loops on Message Passing Machines. Journal of Parallel and Distributed Computing, 8: 303–312, 1990.CrossRefGoogle Scholar
- [35]Klaus Stäben and Ulrich Trattenberg. Multigrid methods: Fundamental algorithms, model problem analysis and applications. In Springer Lecture Notes in Mathematics, Vol. 960, 1982.Google Scholar
- [36]Ko-Yang Wang and Dennis Gannon. Applying AI Techniques to Program Optimization for Parallel Computers. In Kai Hwang and D. DeGroot, editors, Parallel Processing for Supercomputers and Artificial Intelligence, pages 441–485, 1989.Google Scholar
- [37]Skef Wholey. Automatic Data Mapping for Distributed-Memory Parallel Computers. PhD thesis, Carnegie Mellon University, Pittsburgh, PA 15213, 1991.Google Scholar
- [38]Reinhard Wilhelm. Computation and Use of Data Flow Information in Optimizing Compilers. Acta Informatica, 12: 209–225, 1979.MATHGoogle Scholar
- [39]Hans Zima and Barbara Chapman. Supercompilers for Parallel and Vector Computers. ACM Press Frontier Series. Addison—Wesley, 1990.Google Scholar