Methods for Automated Problem Mapping
It is anticipated that in order to make effective use of many future high performance architectures, programs will have to exhibit at least a medium grained parallelism. Methods for aggregating work represented by a directed acyclic graph are of particular interest for use in conjunction with techniques now under development for the automated exploitation of parallelism.
In this paper we present a framework for partitioning very sparse triangular systems of linear equations that is designed to produce favorable performance results in a wide variety of parallel architectures. Efficient methods for solving these systems are of interest because (1) they provide a useful model problem for use in exploring heuristics for the aggregation, mapping and scheduling of relatively fine grained computations whose data dependencies are specified by directed acyclic graphs and (2) because such efficient methods can find direct application in the development of parallel algorithms for scientific computation.
Simple expressions are presented that describe how to schedule computational work with varying degrees of granularity. We use the Encore Multimax as a hardware simulator to investigate the performance effects of using the partitioning techniques presented here in shared memory architectures with varying relative synchronization costs.
KeywordsWindow Size Block Size Directed Acyclic Graph Load Imbalance Triangular System
Unable to display preview. Download preview PDF.
- M. C. Chen. A design methodology for synthesizing parallel algorithms and architectures. Journal of Parallel and Distributed Computing, 116–121, 1986.Google Scholar
- Multimax Technical Survey. Technical Report 726–01759 Rev A, Encore Computer Corporation, 1986.Google Scholar
- A. George, M.T. Heath, J. Liu, and E. Ng. Solution of Sparse Positive Definite Systems on a Shared-Memory Multiprocessor. Technical Report ORNL/TM-10260, Oak Ridge National Laboratory, January 1987.Google Scholar
- A. Greenbaum. Solving Sparse Triangular Linear Systems Using Fortran with Paralllel Extensions on the NYU Ultracomputer Prototype. Report 99, NYU Ultracomputer Note, April 1986.Google Scholar
- M. T. Heath and C. H. Romine. Parallel Solution of Triangular Systems on Distributed Memory Multiprocessors. Technical Report ORNL/TM-10384, Oak Ridge National Laboratory, March 1987.Google Scholar
- E. Horowitz and S. Sahni. Fundamentals of Data Structures. Computer Science Press, Rockville Maryland, 1983.Google Scholar
- Delosme J-M and Ilse Ipsen. An illustration of a methodology for the construction of efficient systolic architecture in vlsi. In Proceedings of the Second International Symposium on VLSI Technology, Systems, and Applications, pages 268–273, May 1985.Google Scholar
- J. F. Jordan, M. S. Benten, and N. S. Arenstorf. Force User’s Manual. Department of Electrical and Computer Engineering 80309–0425, University of Colorado, October 1986.Google Scholar
- J. Saltz. Automated Problem Scheduling and Reduction of Communication Delay Effects; submitted for publication. Report 87–22, ICASE, May 1987.Google Scholar
- Joel Saltz and M.C. Chen. Automated problem mapping: the crystal runtime system. In The Proceedings of the Hypercube Microprocessors Conf., Knoxville, TN, September 1986.Google Scholar
- M. Schultz Y. Saad. Parallel Implementations of Preconditioned Conjugate Gradient Methods. Department of Computer Science YALEU/DCS/TR-425, Yale University, October 1985.Google Scholar