FPGA-Based Systolic Computational-Memory Array for Scalable Stencil Computations

Sano, Kentaro

doi:10.1007/978-1-4614-1791-0_9

Kentaro Sano³

3682 Accesses
7 Citations

Abstract

Stencil computation is one of the typical kernels of numerical simulations, which requires acceleration for high-performance computing (HPC). However, the low operational-intensity of stencil computation makes it difficult to fully exploit the peak performance of recent multi-core CPUs and accelerators such as GPUs. Building custom-computing machines using programmable-logic devices, such as FPGAs, has recently been considered as a way to efficiently accelerate numerical simulations. Given of the many logic elements and embedded coarse-grained modules, state-of-the-art FPGAs are nowadays expected to efficiently perform floating-point operations with sustained performance comparable to or higher than that given by CPUs and GPUs. This chapter describes a case study of an FPGA-based custom computing machine (CCM) for high-performance stencil computations: a systolic computational-memory array (SCM array) implemented on multiple FPGAs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Hardcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Altera Corporation (2012), http://www.altera.com/literature/
R. Baxter, S. Booth, M. Bull, G. Cawood, J. Perry, M. Parsons, A. Simpson, A. Trew, A. McCormick, G. Smart, R. Smart, A. Cantle, R. Chamberlain, G. Genest, Maxwell a 64 FPGA supercomputer, in Proceedings AHS2007 Conference Secound NASA/ESA Conference on Adaptive Hardware and Systems (2007), pp. 287–294, http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4291933
W. Chen, P. Kosmas, M. Leeser, C. Rappaport, An fpga implementation of the two-dimensional finite-difference time-domain (FDTD) algorithm, in Proceedings of the 2004 ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays (FPGA2004) (2004), pp. 213–222, http://dl.acm.org/citation.cfm?id=968311
K. Datta, M. Murphy, V. Volkov, S. Williams, J. Carter, L. Oliker, D. Patterson, J. Shalf, K. Yelick, Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures, in Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (2008), pp. 1–12, http://dl.acm.org/citation.cfm?id=1413375
J.D. Davis, C.P. Thacker, C. Chang, BEE3: revitalizing computer architecture research. MSR-TR-2009-45 (Microsoft Research Redmond, WA, 2009)
Google Scholar
J.P. Durbano, F.E. Ortiz, J.R. Humphrey, P.F. Curt, D.W. Prather, FPGA-based acceleration of the 3D finite-difference time-domain method, in Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (2004), pp. 156–163, http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1364626
D.G. Elliott, M. Stumm, W. Snelgrove, C. Cojocaru, R. Mckenzie, Computational RAM: implementing processors in memory. Des. Test Comput. 16(1), 32–41 (1999)
Article Google Scholar
A. George, H. Lam, G. Stitt, Novo-G: at the forefront of scalable reconfigurable supercomputing. Comput. Sci. Eng. 13(1), 82–86 (2011)
Article Google Scholar
L.A. Hageman, D.M. Young, Applied Iterative Methods (Academic, New York, 1981)
MATH Google Scholar
T. Hauser, A flow solver for a reconfigurable FPGA-based hypercomputer. AIAA Aerosp. Sci. Meet. Exhib. AIAA-2005-1382 (2005)
Google Scholar
K.T. Johnson, A. Hurson, B. Shirazi, General-purpose systolic arrays. Computer 26(11), 20–31 (1993)
Article Google Scholar
H.T. Kung, Why systolic architecture? Computer 15(1), 37–46 (1982)
Article Google Scholar
W. Luzhou, K. Sano, S. Yamamoto, Local-and-global stall mechanism for systolic computational-memory array on extensible multi-FPGA system, in Proceedings of the International Conference on Field-Programmable Technology (FPT2010) (2010), pp. 102–109, http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5681763
W. Luzhow, K. Sano, S. Yamamoto, Domain-specific language and compiler for stencil computation on FPGA-based systolic computational-memory array, in Proceedings of the International Symposium on Applied Reconfigurable Computing (ARC2012) Springer, (2012), pp. 26–39, http://link.springer.com/chapter/10.1007%2F978-3-642-28365-9_3?LI=true
O. Mencer, K.H. Tsoi, S. Craimer, T. Todman, W. Luk, M.Y. Wong, P.H.W. Leong, Cube: a 512-FPGA cluster, in Proceedings of the IEEE Southern Programable Logic Conference 2009 (2009), pp. 51–57, http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4914907
H. Morishita, Y. Osana, N. Fujita, H. Amano, Exploiting memory hierarchy for a computational fluid dynamics accelerator on FPGAs, in Proceedings of the International Conference on Field-Programmable Technology (FPT2008) (2008), pp. 193–200, http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4762383
D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, K. Yelick, A case for intelligent RAM: IRAM. IEEE Micro 17(2), 34–44 (1997)
Article Google Scholar
D. Patterson, K. Asanovic, A. Brown, R. Fromm, J. Golbus, B. Gribstad, K. Keeton, C. Kozyrakis, D. Martin, S. Perissakis, R. Thomas, N. Treuhaft, K. Yelick, Intelligent RAM(IRAM): the industrial setting, applications, and architectures, in Proceedings of the International Conference on Computer Design (1997), pp. 2–9, http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=628842
E.H. Phillips, M. Fatica, Implementing the himeno benchmark with CUDA on GPU clusters, in Proceedings of International Symposium on Parallel and Distributed Processing (IPDPS) (2010), pp. 1–10, http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5470394
K. Sano, T. Iizuka, S. Yamamoto, Systolic architecture for computational fluid dynamics on FPGAs, in Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM) (2007), pp. 107–116, http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4297248
K. Sano, W. Luzhou, Y. Hatsuda, S. Yamamoto, Scalable FPGA-array for high-performance and power-efficient computation based on difference schemes, in Proceedings of the International Workshop on High-Performance Reconfigurable Computing Technology and Applications (HPRCTA) (2008), http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4745679
K. Sano, W. Luzhou, Y. Hatsuda, T. Iizuka, S. Yamamoto, FPGA-array with bandwidth-reduction mechanism for scalable and power-efficient numerical simulations based on finite difference methods. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 3(4), Article No. 21 (2010)
Google Scholar
K. Sano, W. Luzhou, S. Yamamoto, Prototype implementation of array-processor extensible over multiple FPGAs for scalable stencil computation. ACM SIGARCH Computer Architecture News (HEART special issue), 38(4), 80–86 (2010)
Google Scholar
K. Sano, Y. Hatsuda, S. Yamamoto, Scalable streaming-array of simple soft-processors for stencil computations with constant memory-bandwidth, in Proceedings of the 19th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM) (2011), pp. 234–241, http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5771279
R.N. Schneider, L.E. Turner, M.M. Okoniewski, Application of fpga technology to accelerate the finite-difference time-domain (FDTD) method, in Proceedings of the 2002 ACM/SIGDA 10th International Symposium on Field Programmable Gate Arrays (FPGA2002) (2002), pp. 97–105, http://dl.acm.org/citation.cfm?id=503063
W.D. Smith, A.R. Schnore, Towards an RCC-based accelerator for computational fluid dynamics applications. J. Supercomput. 30(3), 239–261 (2003)
Article Google Scholar
J.C. Strikwerda, Y.S. Lee, The accuracy of the fractional step method. SIAM J. Numer. Anal. 37(1), 37–47 (1999)
Article MATH MathSciNet Google Scholar
TERASIC Corp. (2012), Accessed 30th January 2013, http://www.terasic.com.tw
J.E. Vuillemin, P. Bertin, D. Roncin, M. Shand, H.H. Touati, P. Boucard, Programmable active memories: reconfigurable systems come of age. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 4(1), 56–69 (1996)
Google Scholar
S. Williams, A. Waterman, D. Patterson, Roofline: an insightful visual performance model for multicore architectures. Comm. ACM 52(4), 65–76 (2009)
Article Google Scholar
K.S. Yee, Numerical solution of inital boundary value problems involving maxwell’s equations in isotropic media. IEEE Trans. Antennas Propag. 14, 302–307 (1966)
Article MATH Google Scholar

Download references

Acknowledgements

This research and development were supported by Grant-in-Aid for Young Scientists(B) No. 20700040, Grant-in-Aid for Scientific Research (B) No. 23300012, and Grant-in-Aid for Challenging Exploratory Research No. 23650021 from the Ministry of Education, Culture, Sports, Science and Technology, Japan.

Author information

Authors and Affiliations

Tohoku University, 6-6-01 Aramaki Aza Aoba, Sendai, Miyagi, 980-8579, Japan
Kentaro Sano

Authors

Kentaro Sano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kentaro Sano .

Editor information

Editors and Affiliations

School of Computing Science, University of Glasgow, Glasgow, UK
Wim Vanderbauwhede
School of Engineering and Electronics, The University of Edinburgh, Edinburgh, UK
Khaled Benkrid

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sano, K. (2013). FPGA-Based Systolic Computational-Memory Array for Scalable Stencil Computations. In: Vanderbauwhede, W., Benkrid, K. (eds) High-Performance Computing Using FPGAs. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1791-0_9

Download citation

DOI: https://doi.org/10.1007/978-1-4614-1791-0_9
Published: 28 February 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1790-3
Online ISBN: 978-1-4614-1791-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics