Adaptive Matrix Transpose Algorithms for Distributed Multicore Processors

Bowman, John C.; Roberts, Malcolm

doi:10.1007/978-3-319-12307-3_14

Adaptive Matrix Transpose Algorithms for Distributed Multicore Processors

John C. Bowman⁶ &
Malcolm Roberts⁷

Conference paper
First Online: 01 January 2015

1789 Accesses

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 117))

Abstract

An adaptive parallel matrix transpose algorithm optimized for distributed multicore architectures running in a hybrid OpenMP/MPI configuration is presented. Significant boosts in speed are observed relative to the distributed transpose used in the state-of-the-art adaptive FFTW library. In some cases, a hybrid configuration allows one to reduce communication costs by reducing the number of message passing interface (MPI) nodes, and thereby increasing message sizes. This also allows for a more slab-like than pencil-like domain decomposition for multidimensional fast Fourier transforms (FFT), reducing the cost of, or even eliminating the need for, a second distributed transpose. Nonblocking all-to-all transfers enable user computation and communication to be overlapped.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
However, the recent availability of serial cache-oblivious in-place transposition algorithms in some cases tips the balance in favor of local transposition, if transposed output is acceptable.

References

Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Foundations of Computer Science, 1999. 40th Annual Symposium on (IEEE, 1999), pp. 285–297
Google Scholar
Dow, M.: Transposing a matrix on a vector computer. Parallel Comput. 21(12), 1997 (1995)
Google Scholar
Choi, J., Dongarra, J.J., Walker, D.W.: Parallel matrix transpose algorithms on distributed memory concurrent computers. Parallel Comput. 21(9), 1387 (1995)
Google Scholar
Al Na’mneh, R., Pan, W.D., Yoo, S.M.: Efficient adaptive algorithms for transposing small and large matrices on symmetric multiprocessors. Informatica 17(4), 535 (2006)
Google Scholar
Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. IEEE 93(2), 216 (2005)
Google Scholar
Bowman, J.C., Roberts, M.: FFTW++: A fast Fourier transform \(\rm C++\) header class for the FFTW3 library. http://fftwpp.sourceforge.net (2010)
Google Scholar
Bowman, J.C., Roberts, M.: SIAM J. Efficient dealiased convolutions without padding, SIAM. Sci. Comput. 33(1), 386 (2011)
Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge Professor Wendell Horton for providing access to state-of-the-art computing facilities at the Texas Advanced Computer Center.

Author information

Authors and Affiliations

University of Alberta, Edmonton, AB T6G 2G1, Canada
John C. Bowman
University of Strasbourg, Strasbourg, France
Malcolm Roberts

Authors

John C. Bowman
View author publications
You can also search for this author in PubMed Google Scholar
Malcolm Roberts
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John C. Bowman .

Editor information

Editors and Affiliations

Department of Mathematics & Statistics, University of Guelph, Guelph, Ontario, Canada
Monica G. Cojocaru
Dept of Physics and Computer Science, Wilfrid Laurier University, Waterloo, Ontario, Canada
Ilias S. Kotsireas
Department of Mathematics and MS2Discovery Interdisciplinary Research Institute, Wilfrid Laurier University, Waterloo, Ontario, Canada
Roman N. Makarov
Department of Mathematics & MS2Discovery Interdisciplinary Research Institute, Wilfrid Laurier University, Waterloo, Ontario, Canada
Roderick V. N. Melnik
MS2Discovery Interdisciplinary Research Institute, Wilfrid Laurier University, Waterloo, Ontario, Canada
Hasan Shodiev

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bowman, J., Roberts, M. (2015). Adaptive Matrix Transpose Algorithms for Distributed Multicore Processors. In: Cojocaru, M., Kotsireas, I., Makarov, R., Melnik, R., Shodiev, H. (eds) Interdisciplinary Topics in Applied Mathematics, Modeling and Computational Science. Springer Proceedings in Mathematics & Statistics, vol 117. Springer, Cham. https://doi.org/10.1007/978-3-319-12307-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-12307-3_14
Published: 04 July 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12306-6
Online ISBN: 978-3-319-12307-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics