Task-Queue Based Hybrid Parallelism: A Case Study
In this paper we report on our experiences with hybrid parallelism in PARDISO, a high-performance sparse linear solver. We start with the OpenMP-parallel numerical factorization algorithm and re-organize it using a central dynamic task queue to be able to add message passing functionality. The hybrid version allows the solver to run on a larger number of processors in a cost effective way with very reasonable performance. A speed-up of more than nine running on a four-node quad Itanium 2 SMP cluster is achieved in spite of the fact that a large potential to minimize MPI communication is not yet exploited in the first version of the implementation.
KeywordsShared Memory Message Passing Factorization Algorithm Block Column Hybrid Version
Unable to display preview. Download preview PDF.
- 1.Chow, E., Hysom, D.: Assessing performance of hybrid MPI/OpenMP programs on SMP clusters. Technical Report UCRL-JC-143957, Lawrence Livermore National Laboratory, Submitted to J. Parallel and Distributed Computing (May 2001)Google Scholar
- 2.Henty, D.S.: Performance of hybrid message-passing and shared-memory parallelism for discrete element modeling. In: Proceedings of the 2000 ACM/IEEE conference on Supercomputing, IEEE Computer Society, Los Alamitos (2000)Google Scholar
- 3.Infiniband Trade Association, http://www.infinibandta.org/home
- 5.Intel Math Kernel Library, http://www.intel.com/software/products/mkl/beta/features.htm
- 6.mpip: Lightweight, scalable mpi profiling, http://www.llnl.gov/CASC/mpip/
- 8.Rabenseifner, R.: Hybrid parallel programming: Performance problems and chances. In: Proc. 45th Cray Users’s Group (CUG) Meeting (May 2003)Google Scholar
- 10.Schenk, O., Gärtner, K.: Solving unsymmetric sparse systems of linear equations with PARDISO. Future Generation Computer Systems (2003)Google Scholar
- 12.Smith, L.: Mixed mode MPI/OpenMP programming. Technical Report EH9 3JZ, Edinburgh Parallel Computing Centre (2000)Google Scholar