Systolic Arrays

  • Yu Hen Hu
  • Sun-Yuan Kung


This chapter reviews the basic ideas of systolic array, its design methodologies, and historical development of various hardware implementations. Two modern applications, namely, motion estimation of video coding and wireless communication baseband processing are also discussed.


Motion Estimation Dependence Graph Systolic Array Index Point Very Large Scale Integration 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Annaratone, M., et al.: TheWARP computer: Architecture, implementation, and performance. IEEE Trans. Computers 36, 1523–1538 (1987)CrossRefGoogle Scholar
  2. 2.
    Arnould, E., Kung, H., et al.: A systolic array computer. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 10, pp. 232–235 (1985)Google Scholar
  3. 3.
    Borkar, S., Cohn, R., Cox, G., Gross, T., Kung, H.T., Lam, M., Levine, M.,Moore, B.,Moore, W., Peterson, C., Susman, J., Sutton, J., Urbanski, J., Webb, J.: Supporting systolic and memory communication in iwarp. In: Proc. 17th Intl. Symposium on Computer Architecture, pp. 71–80 (1990)Google Scholar
  4. 4.
    Broomhead, D., Harp, J., McWhirter, J., Palmer, K., Roberts, J.: A practical comparison of the systolic and wavefront array processing architectures. In: Proc. Int’l Conf. Acoustics, Speech, and Signal Processing, vol. 10, pp. 296–299 (1985)Google Scholar
  5. 5.
    Chen, Y.K., Kung, S.Y.: A systolic methodology with applications to full-search block matching architectures. J. of VLSI Signal Processing 19(1), 51–77 (1998)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Foulser, D.E.: The Saxpy Matrix-1: A general-purpose systolic computer. IEEE Computer 20, 35–43 (1987)Google Scholar
  7. 7.
    Gross, T., O’Hallaron, D.R.: iWarp: Anatomy of a Parallel Computing System. MIT Press, Boston, MA (1998)Google Scholar
  8. 8.
    Homewood, M.,May, D., Shepherd, D., Shepherd, R.: The IMS T800 Transputer. IEEEMicro 7(5), 10–26 (1987)Google Scholar
  9. 9.
    Hu, Y.H.: CORDIC-based VLSI architectures for digital signal processing. IEEE Signal Processing Magazine 9, 16–35 (1992)CrossRefGoogle Scholar
  10. 10.
  11. 11.
    Kittitornkun, S., Hu, Y.: Systolic full-search block matching motion estimation array structure. IEEE Trans. Circuits Syst. Video Technology 11, 248–251 (2001)CrossRefGoogle Scholar
  12. 12.
    Komarek, T., Pirsch, P.: Array architectures for block matching algorithms. IEEE Trans. Circuits Syst. 26(10), 1301–1308 (1989)CrossRefGoogle Scholar
  13. 13.
    Kung, H.T.: Why systolic array. IEEE Computers 15, 37–46 (1982)Google Scholar
  14. 14.
    Kung, S.Y.: On supercomputing with systolic/wavefront array processors. Proc. IEEE 72, 1054–1066 (1984)Google Scholar
  15. 15.
    Kung, S.Y.: VLSI Array Processors. Prentice Hall, Englewood Cliffs, NJ (1988)Google Scholar
  16. 16.
    Kung, S.Y., Arun, K.S., Gal-Ezer, R.J., Bhaskar Rao, D.V.: Wavefront array processor: Language, architecture, and applications. IEEE Trans. Computer 31(11), 1054–1066 (1982)CrossRefGoogle Scholar
  17. 17.
    Lin, C.P., Tseng, P.C., Chiu, Y.T., Lin, S.S., Cheng, C.C., Fang, H.C., Chao, W.M., Chen, L.G.: A 5mW MPEG4 SP encoder with 2D bandwidth-sharing motion estimation for mobile applications. In: Proc. International Solid-State Circuits Conference, pp. 1626–1635. San Francisco, CA (2006)Google Scholar
  18. 18.
    Ni, L.M., McKinley, P.: A survey of wormhole routing techniques in direct networks. IEEE Computer 26, 62–76 (1993)Google Scholar
  19. 19.
    Nicoud, J.D., Tyrrell, A.M.: The transputer T414 instruction set. IEEE Micro 9(3), 60–75 (1989)CrossRefGoogle Scholar
  20. 20.
    Pan, S.B., Chae, S., Park, R.: VLSI architectures for block matching algorithm. IEEE Tran. Circuits Syst. Video Technol. 6(1), 67–73 (1996)CrossRefGoogle Scholar
  21. 21.
    Seki, K., Kobori, T., Okello, J., Ikekawa, M.: A cordic-based reconfigrable systolic array processor for MIMO-OFDM wireless communications. In: Proc. IEEE Workshop on Signal Processing Systems, pp. 639–644. Shanghai, China (2007)Google Scholar
  22. 22.
    Taylor, R.: Signal processing with occam and the transputer. IEE Proceedings F: Communications, Radar and Signal Processing 131(6), 610–614 (1984)CrossRefGoogle Scholar
  23. 23.
    Texas Instruments: TMS320C40 Digital Signal Processors (1996). URL
  24. 24.
    Volder, J.E.: The CORDIC trigonometric computing technique. IRE Trans. on Electronic Computers EC-8(3), 330–334 (1959)CrossRefGoogle Scholar
  25. 25.
    Walther, J.S.: A unified algorithm for elementary functions. In: Spring Joint Computer Conf. (1971)Google Scholar
  26. 26.
    Whitby-Strevens, C.: Transputers-past, present and future. IEEE Micro 10(6), 16–19, 76–82 (1990)CrossRefGoogle Scholar
  27. 27.
    Yeo, H., Hu, Y.: A novel modular systolic array architecture for full-search block matching motion estimation. IEEE Tran. Circuits Syst. Video Technol. 5(5), 407–416 (1995)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringUniversity of WisconsinMadisonUSA
  2. 2.Department of Electrical EngineeringPrinceton UniversityPrincetonUSA

Personalised recommendations