Methods for Automated Problem Mapping

  • Joel Saltz
Conference paper
Part of the The IMA Volumes in Mathematics and Its Applications book series (IMA, volume 13)


It is anticipated that in order to make effective use of many future high performance architectures, programs will have to exhibit at least a medium grained parallelism. Methods for aggregating work represented by a directed acyclic graph are of particular interest for use in conjunction with techniques now under development for the automated exploitation of parallelism.

In this paper we present a framework for partitioning very sparse triangular systems of linear equations that is designed to produce favorable performance results in a wide variety of parallel architectures. Efficient methods for solving these systems are of interest because (1) they provide a useful model problem for use in exploring heuristics for the aggregation, mapping and scheduling of relatively fine grained computations whose data dependencies are specified by directed acyclic graphs and (2) because such efficient methods can find direct application in the development of parallel algorithms for scientific computation.

Simple expressions are presented that describe how to schedule computational work with varying degrees of granularity. We use the Encore Multimax as a hardware simulator to investigate the performance effects of using the partitioning techniques presented here in shared memory architectures with varying relative synchronization costs.


Window Size Block Size Directed Acyclic Graph Load Imbalance Triangular System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    M. C. Chen. A design methodology for synthesizing parallel algorithms and architectures. Journal of Parallel and Distributed Computing, 116–121, 1986.Google Scholar
  2. [2]
    E. Coffman, M. Garey, and D. Johnson. An application of bin packing to multiprocessor scheduling. SIAM Computing, 7(1):1–17, 1978.MathSciNetMATHCrossRefGoogle Scholar
  3. [3]
    Multimax Technical Survey. Technical Report 726–01759 Rev A, Encore Computer Corporation, 1986.Google Scholar
  4. [4]
    A. George, M.T. Heath, J. Liu, and E. Ng. Solution of Sparse Positive Definite Systems on a Shared-Memory Multiprocessor. Technical Report ORNL/TM-10260, Oak Ridge National Laboratory, January 1987.Google Scholar
  5. [5]
    R. Graham. Bounds on multiprocessor timing anomalies. SIAM Jr. on Appl. Math, 17(2):416–429, 1969.MATHCrossRefGoogle Scholar
  6. [6]
    A. Greenbaum. Solving Sparse Triangular Linear Systems Using Fortran with Paralllel Extensions on the NYU Ultracomputer Prototype. Report 99, NYU Ultracomputer Note, April 1986.Google Scholar
  7. [7]
    M. T. Heath and C. H. Romine. Parallel Solution of Triangular Systems on Distributed Memory Multiprocessors. Technical Report ORNL/TM-10384, Oak Ridge National Laboratory, March 1987.Google Scholar
  8. [8]
    E. Horowitz and S. Sahni. Fundamentals of Computer Algorithms. Computer Science Press, Rockville Maryland, 1978.MATHGoogle Scholar
  9. [9]
    E. Horowitz and S. Sahni. Fundamentals of Data Structures. Computer Science Press, Rockville Maryland, 1983.Google Scholar
  10. [10]
    Delosme J-M and Ilse Ipsen. An illustration of a methodology for the construction of efficient systolic architecture in vlsi. In Proceedings of the Second International Symposium on VLSI Technology, Systems, and Applications, pages 268–273, May 1985.Google Scholar
  11. [11]
    J. F. Jordan, M. S. Benten, and N. S. Arenstorf. Force User’s Manual. Department of Electrical and Computer Engineering 80309–0425, University of Colorado, October 1986.Google Scholar
  12. [12]
    J. Saltz. Automated Problem Scheduling and Reduction of Communication Delay Effects; submitted for publication. Report 87–22, ICASE, May 1987.Google Scholar
  13. [13]
    J.H. Saltz, V. K. Naik, and D.M. Nicol. Reduction of the effects of the communication delays in scientific algorithms on message passing mimd architectures. SIAM J. Sci. Stat. Comput, 8(1):s118, 1987.CrossRefGoogle Scholar
  14. [14]
    Joel Saltz and M.C. Chen. Automated problem mapping: the crystal runtime system. In The Proceedings of the Hypercube Microprocessors Conf., Knoxville, TN, September 1986.Google Scholar
  15. [15]
    M. Schultz Y. Saad. Parallel Implementations of Preconditioned Conjugate Gradient Methods. Department of Computer Science YALEU/DCS/TR-425, Yale University, October 1985.Google Scholar

Copyright information

© Springer-Verlag New York Inc. 1988

Authors and Affiliations

  • Joel Saltz
    • 1
    • 2
  1. 1.Research Center for Scientific ComputationYale UniversityNew HavenUSA
  2. 2.Department of Computer ScienceYale UniversityNew HavenUSA

Personalised recommendations