Skip to main content

Three Dimensional SPMD Matrix–Matrix Multiplication Algorithm and a Stacked Many-Core Processor Architecture

  • Conference paper
  • First Online:
Innovations and Advances in Computer, Information, Systems Sciences, and Engineering

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 152))

  • 928 Accesses

Abstract

Current applications in image and media processing, scientific and engineering computing require a tremendous processing and higher memory bandwidth to gain high performance. Three dimensional multi/manycore processors stacked with memory layer(s) may provide good processing facilities to enhance the performance of these applications. In this paper, we introduce a proposal of a 3-D stacked many-core processor architecture composing of a number of processing elements (PEs) layers stacked with one or more memory layer shared among all PEs. Unlike many 3-D machine architectures, the proposed model uses local communications between PEs in both horizontal and vertical links avoiding the cost of building specialized interconnection networks. We present a novel memory efficient SPMD blocked algorithm for performing the kernel matrix–matrix multiply operation (MMM), on the 3D processor architecture. Our analytical evaluation of the 3-D stacked architecture showed a near linear speedup as the number of PE layers increases while data communication and redistribution is overlapped with computing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. IBM Research (2007) 3-D Chips: IBM moves Moore’s law into the third dimension. ScienceDaily 12 April 2007. http://www.sciencedaily.com/releases/2007/04/070412132140.htm

  2. Xie Y (2010) Processor architecture design using 3D integration technology. In: Proceeding of the 23rd International Conference on VLSI Design, pp. 446–451

    Google Scholar 

  3. Fox G, Otto S, Hey A (1987) Matrix algorithms on a hypercube I: matrix multiplication. Parallel Comput 4:17–31

    Article  MATH  Google Scholar 

  4. van de Geijn R, Watts J (1995) SUMMA: scalable universal matrix multiplication algorithm. The University of Texas, Technical Report TR-95-13, April 1995

    Google Scholar 

  5. Agarwal R, Gustavson F, Zubair M (1994) A high performance matrix multiplication algorithm on a distributed-memory parallel computer, using overlapped communication. IBM J Res Dev 38(6):673–681

    Article  Google Scholar 

  6. Cannon L (1969) A cellular computer to implement the kalman filter algorithm, Ph.D. dissertation, Montana State University, 1969

    Google Scholar 

  7. Kung S (1988) VLSI array processors. Prentice Hall, Englewood Cliffs

    Google Scholar 

  8. Agarwal R et al (1995) A three-dimensional approach to parallel matrix multiplictaion. IBM J Res Dev 39(5):575–582

    Article  Google Scholar 

  9. Ho C-T, Johnsson SL, Edelman A (1991) Matrix multiplication on hypercubes using full bandwidth and constant storage. In: The 1991 International Conference on Parallel Processing, pp. 447–451

    Google Scholar 

  10. Kumar V, Gupta A (1994) Analyzing scalability of parallel algorithms and architectures. J Parallel Distrib Comput 22(3):379–391

    Article  Google Scholar 

  11. Grama A, Gupta A, Karypis G, Kumar V (2003) Introduction to parallel computing, 2nd edn. Addison Wesley, Reading

    Google Scholar 

  12. Park N, Hong B, Prasanna VK (2002) Analysis of memory hierarchy performance of block data layout. In: ICPP ’02: Proceedings of the 2002 International Conference on Parallel Processing (ICPP’02), p. 35

    Google Scholar 

  13. Kdouh W, El-Rewini H (2011) Reliability-aware platform optimization for 3d chip multi-processors. J Supercomput 51:1–20. http://dx.doi.org/10.1007/s11227-011-0577-5

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed S. Zekri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this paper

Cite this paper

Zekri, A.S. (2013). Three Dimensional SPMD Matrix–Matrix Multiplication Algorithm and a Stacked Many-Core Processor Architecture. In: Elleithy, K., Sobh, T. (eds) Innovations and Advances in Computer, Information, Systems Sciences, and Engineering. Lecture Notes in Electrical Engineering, vol 152. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3535-8_94

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-3535-8_94

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-3534-1

  • Online ISBN: 978-1-4614-3535-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics