Abstract
We present fast and highly scalable parallel computations for a number of important and fundamental matrix problems on distributed memory systems (DMS). These problems include matrix multiplication, matrix chain product, and computing the powers, the inverse, the characteristic polynomial, the determinant, the rank, the Krylov matrix, and an LU- and a QR-factorization of a matrix, and solving linear systems of equations. Our highly scalable parallel computations for these problems are based on a highly scalable implementation of the fastest sequential matrix multiplication algorithm on DMS. We show that compared with the best known parallel time complexities on parallel random access machines (PRAM), the most powerful but unrealistic shared memory model of parallel computing, our parallel matrix computations achieve the same speeds on distributed memory parallel computers (DMPC), and have an extra polylog factor in the time complexities on DMS with hypercubic networks. Furthermore, our parallel matrix computations are fully scalable on DMPC and highly scalable over a wide range of system size on DMS with hypercubic networks. Such fast (in terms of parallel time complexity) and highly scalable (in terms of our definition of scalability) parallel matrix computations were rarely seen before on any distributed memory systems.
Similar content being viewed by others
References
Arabnia HR (1993) A transputer-based reconfigurable parallel system. In: Atkins S, Wagner AS (eds) Transputer research and applications (NATUG 6), Vancouver, Canada. IOS Press, Amsterdam, pp 153–169
Arif Wani M, Arabnia HR (2003) Parallel edge–region-based segmentation algorithm targeted at reconfigurable multiring network. J Supercomput 25(1):43–62
Bhandarkar SM, Arabnia HR (1995) The REFINE multiprocessor—theoretical properties and algorithms. Parallel Comput 21(11):1783–1805
Bini D, Pan V (1994) Polynomial and matrix computations, vol 1, fundamental algorithms. Birkhäuser, Boston
Coppersmith D, Winograd S (1990) Matrix multiplication via arithmetic progressions. J Symb Comput 9:251–280
Csanky L (1976) Fast parallel matrix inversion algorithms. SIAM J Comput 5:618–623
Dekel E, Nassimi D, Sahni S (1981) Parallel matrix and graph algorithms. SIAM J Comput 10:657–673
Eshaghian MM (1993) Parallel algorithms for image processing on OMC. IEEE Trans Comput 40:827–833
Goldberg LA, Jerrum M, Leighton T, Rao S (1997) Doubly logarithmic communication algorithms for optical-communication parallel computers. SIAM J Comput 26:1100–1119
Grama A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing, 2nd edn. Addison-Wesley, Harlow
Ibarra OH, Moran S, Rosier LE (1980) A note on the parallel complexity of computing the rank of order n matrices. Inf Process Lett 11(4–5):162
Le Verrier UJJ (1840) Sur les variations seculaires des elementes elliptiques des sept planets principales. J Math Pures Appl 5:220–254
Leighton FT (1992) Introduction to parallel algorithms and architectures: arrays, trees, hypercubes. Morgan Kaufmann, San Mateo
Li K (2001) Scalable parallel matrix multiplication on distributed memory parallel computers. J Parallel Distrib Comput 61(12):1709–1731
Li K (2004) Fast and scalable parallel matrix computations with reconfigurable pipelined optical buses. Parallel Algorithms Appl 19(4):195–209
Li K (2007) Analysis of parallel algorithms for matrix chain product and matrix powers on distributed memory systems. IEEE Trans Parallel Distrib Syst 18(7):865–878
Li K (2008) Fast and scalable parallel matrix multiplication and its applications on distributed memory systems. In: Rajasekaran S, Reif J (eds) Parallel computing: models, algorithms, and applications. CRC Press, Boca Raton, Chap 47
Li K, Pan VY (2001) Parallel matrix multiplication on a linear array with a reconfigurable pipelined bus system. IEEE Trans Comput 50(5):519–525
Li K, Pan Y, Zheng SQ (1998) Fast and processor efficient parallel matrix multiplication algorithms on a linear array with a reconfigurable pipelined bus system. IEEE Trans Parallel Distrib Syst 9(8):705–720
Mehlhorn K, Vishkin U (1984) Randomized and deterministic simulations of PRAMs by parallel machines with restricted granularity of parallel memories. Acta Inf 21:339–374
Pan V (1987) Complexity of parallel matrix computations. Theor Comput Sci 54:65–85
Pan V, Reif J (1985) Efficient parallel solution of linear systems. In: Proceedings of 7th ACM symposium on theory of computing, May 1985, pp 143–152
Pan Y, Li K (1998) Linear array with a reconfigurable pipelined bus system—concepts and applications. J Inf Sci 106(3–4):237–258
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, K. Fast and highly scalable parallel computations for fundamental matrix problems on distributed memory systems. J Supercomput 54, 271–297 (2010). https://doi.org/10.1007/s11227-009-0319-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-009-0319-0