Sparse Matrix Structure for Dynamic Parallelisation Efficiency

  • Markus Ast
  • Cristina Barrado
  • José Cela
  • Rolf Fischer
  • Jesús Labarta
  • Óscar Laborda
  • Hartmut Manz
  • Uwe Schulz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1900)

Abstract

The simulated models and requirements of engineering programs like computational fluids dynamics and structural mechanics grow more rapidly than single processor performance. Automatic parallelisation seem to be the obvious approach for huge and historic packages like PERMAS. The approach is based on dynamic scheduling, which is more flexible than domain decomposition, is totally transparent to the end-user and shows good speedups because it is able to extract parallelism where others are not. In this paper we show the need of some preparatory steps on the big input matrices for good performance. We present a new approach for blocking that saves storage and decreases the computation critical path. Also a data distribution step is proposed that drives the dynamic scheduler decisions such that an efficient parallelisation can be achieved even on slow multiprocessor networks. A final and important step is the interleaving of the array blocks that are distributed to different processors. This step is essential to expose the parallelism to the scheduler.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
    M. Ast, R. Fischer, J. Labarta and H. Manz. “Run-Time Parallelization of Large FEM Analyses with PERMAS”. NASA’97 National Symposium. 1997.Google Scholar
  3. 3.
    T. Bui and C. Jones “A heuristic for reducing fill in sparse matrix factorization”. 6th SIAM Conf. Parallel Processing for Scientific Computing, pp. 445–452, 1993.Google Scholar
  4. 4.
    S. Fink, S. Baden and S. Kohn. “Efficient Run-Time Support for Irregular Block-Structured Applications”. Journal of Parallel and Distributed Computing 50, pp. 61–82. 1998.MATHCrossRefGoogle Scholar
  5. 5.
    T. Johnson. “A concurrent Dynamic Task Graph”. International Conference on Parallel Processing, 1993.Google Scholar
  6. 6.
    G. Karypis and V. Kumar. “A fast and highly quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing. 1995.Google Scholar
  7. 7.
    L. Komzsik, “Parallel Processing in MSC/Nastran’. 1993 MSC World Users Conference, Virginia, 1993. http://www.macsch.com
  8. 8.
    V. Kumar et al. “Introduction to parallel Computing. Design and analysis of algorithms. The Benjamin/Cumminngs Pub. 1994.Google Scholar
  9. 9.
    J. Liu. “Computational models and task scheduling for parallel sparse Cholesky factorization”. Parallel Computing 3, pp. 327–342, 1986.MATHCrossRefGoogle Scholar
  10. 10.
    Marc product description. http://www.marc.com/Product/MARC
  11. 11.
    R. Schreiber. “Scalability of sparse direct solvers”. Graph theory and sparse matrix computations, The IMA volumes in mathematics and its applications, vol. 56, pp. 191–209, 1993.Google Scholar
  12. 12.
    S. Venugopal, V. Naik. “Effects of partitioning and scheduling sparse matrix factorization on communications and load balance”. Supercomputing’91, pp. 866–875, 1991.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Markus Ast
    • 1
  • Cristina Barrado
    • 2
  • José Cela
    • 2
  • Rolf Fischer
    • 1
  • Jesús Labarta
    • 2
  • Óscar Laborda
    • 2
  • Hartmut Manz
    • 1
  • Uwe Schulz
    • 1
  1. 1.INTES Ingenieurgesellschaft für technische Software mbHStuttgartGermany
  2. 2.Universitat Politécnica de CatalunyaBarcelonaSpain

Personalised recommendations