Three Dimensional SPMD Matrix–Matrix Multiplication Algorithm and a Stacked Many-Core Processor Architecture

Zekri, Ahmed S.

doi:10.1007/978-1-4614-3535-8_94

Ahmed S. Zekri³

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 152))

928 Accesses

Abstract

Current applications in image and media processing, scientific and engineering computing require a tremendous processing and higher memory bandwidth to gain high performance. Three dimensional multi/manycore processors stacked with memory layer(s) may provide good processing facilities to enhance the performance of these applications. In this paper, we introduce a proposal of a 3-D stacked many-core processor architecture composing of a number of processing elements (PEs) layers stacked with one or more memory layer shared among all PEs. Unlike many 3-D machine architectures, the proposed model uses local communications between PEs in both horizontal and vertical links avoiding the cost of building specialized interconnection networks. We present a novel memory efficient SPMD blocked algorithm for performing the kernel matrix–matrix multiply operation (MMM), on the 3D processor architecture. Our analytical evaluation of the 3-D stacked architecture showed a near linear speedup as the number of PE layers increases while data communication and redistribution is overlapped with computing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

IBM Research (2007) 3-D Chips: IBM moves Moore’s law into the third dimension. ScienceDaily 12 April 2007. http://www.sciencedaily.com/releases/2007/04/070412132140.htm
Xie Y (2010) Processor architecture design using 3D integration technology. In: Proceeding of the 23rd International Conference on VLSI Design, pp. 446–451
Google Scholar
Fox G, Otto S, Hey A (1987) Matrix algorithms on a hypercube I: matrix multiplication. Parallel Comput 4:17–31
Article MATH Google Scholar
van de Geijn R, Watts J (1995) SUMMA: scalable universal matrix multiplication algorithm. The University of Texas, Technical Report TR-95-13, April 1995
Google Scholar
Agarwal R, Gustavson F, Zubair M (1994) A high performance matrix multiplication algorithm on a distributed-memory parallel computer, using overlapped communication. IBM J Res Dev 38(6):673–681
Article Google Scholar
Cannon L (1969) A cellular computer to implement the kalman filter algorithm, Ph.D. dissertation, Montana State University, 1969
Google Scholar
Kung S (1988) VLSI array processors. Prentice Hall, Englewood Cliffs
Google Scholar
Agarwal R et al (1995) A three-dimensional approach to parallel matrix multiplictaion. IBM J Res Dev 39(5):575–582
Article Google Scholar
Ho C-T, Johnsson SL, Edelman A (1991) Matrix multiplication on hypercubes using full bandwidth and constant storage. In: The 1991 International Conference on Parallel Processing, pp. 447–451
Google Scholar
Kumar V, Gupta A (1994) Analyzing scalability of parallel algorithms and architectures. J Parallel Distrib Comput 22(3):379–391
Article Google Scholar
Grama A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing, 2nd edn. Addison Wesley, Reading
Google Scholar
Park N, Hong B, Prasanna VK (2002) Analysis of memory hierarchy performance of block data layout. In: ICPP ’02: Proceedings of the 2002 International Conference on Parallel Processing (ICPP’02), p. 35
Google Scholar
Kdouh W, El-Rewini H (2011) Reliability-aware platform optimization for 3d chip multi-processors. J Supercomput 51:1–20. http://dx.doi.org/10.1007/s11227-011-0577-5
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Faculty of Science, Alexandria University, El-Shatbi, Alexandria, 21526, Egypt
Ahmed S. Zekri

Authors

Ahmed S. Zekri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed S. Zekri .

Editor information

Editors and Affiliations

School of Engineering, University of Bridgeport, University Avenue 221, Bridgeport, 06604, Connecticut, USA
Khaled Elleithy
School of Engineering, University of Bridgeport, University Avenue 221, Bridgeport, 06604, Connecticut, USA
Tarek Sobh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zekri, A.S. (2013). Three Dimensional SPMD Matrix–Matrix Multiplication Algorithm and a Stacked Many-Core Processor Architecture. In: Elleithy, K., Sobh, T. (eds) Innovations and Advances in Computer, Information, Systems Sciences, and Engineering. Lecture Notes in Electrical Engineering, vol 152. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3535-8_94

Download citation

DOI: https://doi.org/10.1007/978-1-4614-3535-8_94
Published: 28 August 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-3534-1
Online ISBN: 978-1-4614-3535-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics