Sparse Matrix Structure for Dynamic Parallelisation Efficiency
The simulated models and requirements of engineering programs like computational fluids dynamics and structural mechanics grow more rapidly than single processor performance. Automatic parallelisation seem to be the obvious approach for huge and historic packages like PERMAS. The approach is based on dynamic scheduling, which is more flexible than domain decomposition, is totally transparent to the end-user and shows good speedups because it is able to extract parallelism where others are not. In this paper we show the need of some preparatory steps on the big input matrices for good performance. We present a new approach for blocking that saves storage and decreases the computation critical path. Also a data distribution step is proposed that drives the dynamic scheduler decisions such that an efficient parallelisation can be achieved even on slow multiprocessor networks. A final and important step is the interleaving of the array blocks that are distributed to different processors. This step is essential to expose the parallelism to the scheduler.
KeywordsCritical Path Domain Decomposition Task Graph Parallelisation Strategy Preparatory Step
Unable to display preview. Download preview PDF.
- 1.Abaqus product performance. http://www.abaqus.com/products/p_performace58.htm
- 2.M. Ast, R. Fischer, J. Labarta and H. Manz. “Run-Time Parallelization of Large FEM Analyses with PERMAS”. NASA’97 National Symposium. 1997.Google Scholar
- 3.T. Bui and C. Jones “A heuristic for reducing fill in sparse matrix factorization”. 6th SIAM Conf. Parallel Processing for Scientific Computing, pp. 445–452, 1993.Google Scholar
- 5.T. Johnson. “A concurrent Dynamic Task Graph”. International Conference on Parallel Processing, 1993.Google Scholar
- 6.G. Karypis and V. Kumar. “A fast and highly quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing. 1995.Google Scholar
- 7.L. Komzsik, “Parallel Processing in MSC/Nastran’. 1993 MSC World Users Conference, Virginia, 1993. http://www.macsch.com
- 8.V. Kumar et al. “Introduction to parallel Computing. Design and analysis of algorithms. The Benjamin/Cumminngs Pub. 1994.Google Scholar
- 10.Marc product description. http://www.marc.com/Product/MARC
- 11.R. Schreiber. “Scalability of sparse direct solvers”. Graph theory and sparse matrix computations, The IMA volumes in mathematics and its applications, vol. 56, pp. 191–209, 1993.Google Scholar
- 12.S. Venugopal, V. Naik. “Effects of partitioning and scheduling sparse matrix factorization on communications and load balance”. Supercomputing’91, pp. 866–875, 1991.Google Scholar