Abstract
Current applications in image and media processing, scientific and engineering computing require a tremendous processing and higher memory bandwidth to gain high performance. Three dimensional multi/manycore processors stacked with memory layer(s) may provide good processing facilities to enhance the performance of these applications. In this paper, we introduce a proposal of a 3-D stacked many-core processor architecture composing of a number of processing elements (PEs) layers stacked with one or more memory layer shared among all PEs. Unlike many 3-D machine architectures, the proposed model uses local communications between PEs in both horizontal and vertical links avoiding the cost of building specialized interconnection networks. We present a novel memory efficient SPMD blocked algorithm for performing the kernel matrix–matrix multiply operation (MMM), on the 3D processor architecture. Our analytical evaluation of the 3-D stacked architecture showed a near linear speedup as the number of PE layers increases while data communication and redistribution is overlapped with computing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
IBM Research (2007) 3-D Chips: IBM moves Moore’s law into the third dimension. ScienceDaily 12 April 2007. http://www.sciencedaily.com/releases/2007/04/070412132140.htm
Xie Y (2010) Processor architecture design using 3D integration technology. In: Proceeding of the 23rd International Conference on VLSI Design, pp. 446–451
Fox G, Otto S, Hey A (1987) Matrix algorithms on a hypercube I: matrix multiplication. Parallel Comput 4:17–31
van de Geijn R, Watts J (1995) SUMMA: scalable universal matrix multiplication algorithm. The University of Texas, Technical Report TR-95-13, April 1995
Agarwal R, Gustavson F, Zubair M (1994) A high performance matrix multiplication algorithm on a distributed-memory parallel computer, using overlapped communication. IBM J Res Dev 38(6):673–681
Cannon L (1969) A cellular computer to implement the kalman filter algorithm, Ph.D. dissertation, Montana State University, 1969
Kung S (1988) VLSI array processors. Prentice Hall, Englewood Cliffs
Agarwal R et al (1995) A three-dimensional approach to parallel matrix multiplictaion. IBM J Res Dev 39(5):575–582
Ho C-T, Johnsson SL, Edelman A (1991) Matrix multiplication on hypercubes using full bandwidth and constant storage. In: The 1991 International Conference on Parallel Processing, pp. 447–451
Kumar V, Gupta A (1994) Analyzing scalability of parallel algorithms and architectures. J Parallel Distrib Comput 22(3):379–391
Grama A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing, 2nd edn. Addison Wesley, Reading
Park N, Hong B, Prasanna VK (2002) Analysis of memory hierarchy performance of block data layout. In: ICPP ’02: Proceedings of the 2002 International Conference on Parallel Processing (ICPP’02), p. 35
Kdouh W, El-Rewini H (2011) Reliability-aware platform optimization for 3d chip multi-processors. J Supercomput 51:1–20. http://dx.doi.org/10.1007/s11227-011-0577-5
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this paper
Cite this paper
Zekri, A.S. (2013). Three Dimensional SPMD Matrix–Matrix Multiplication Algorithm and a Stacked Many-Core Processor Architecture. In: Elleithy, K., Sobh, T. (eds) Innovations and Advances in Computer, Information, Systems Sciences, and Engineering. Lecture Notes in Electrical Engineering, vol 152. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3535-8_94
Download citation
DOI: https://doi.org/10.1007/978-1-4614-3535-8_94
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-3534-1
Online ISBN: 978-1-4614-3535-8
eBook Packages: EngineeringEngineering (R0)