VLSI Engineering pp 69-104 | Cite as

# Putting inner loops automatically in silicon

- 130 Downloads

## Abstract

Many of the time consuming inner loops are inherently regular and parallel. These are exactly the structures that are well suited for VLSI implementation. As a result, it will become increasingly common to have subroutines that are directly executeable in silicon. Does it imply that in the near future many large computations can be effectively carried out by small computers equipped with silicon subroutines? This talk will present a simplied characterization of the silicon subroutine approach, and discuss systolic architectures—a powerful method for implementing cost-effective silicon subroutines for computations such as pattern matching and error-correcting. CAD systems at CMU that have made it possible for us to design some rather complex chips, such as a programmable systolic chip, will also be briefly described.

## Keywords

Systolic Array Computer Science Department Very Large Scale Integration Systolic Architecture Systolic Algorithm## Preview

Unable to display preview. Download preview PDF.

## References

- [1]Barbacci, M.R. Instruction Set Processor Specifications (ISPS): The Notation and Its Application.
*IEEE Transactions on Computers*C-30(1):24–40, January, 1981.Google Scholar - [2]Bentley, J.L. A Parallel Algorithm for Constructing Minimum Spanning Trees.
*Journal of Algorithms*1:51–59, 1980.Google Scholar - [3]Bentley, J.L. and Kung, H.T. A Tree Machine for Searching Problems. In
*Proceedings of 1979 International Conference on Parallel Processing*, pages 257–266. IEEE, August, 1979. Also available as a CMU Computer Science Department technical report, August 1979.Google Scholar - [4]Blackmer, J., P. Kuekes and Frank, G. A 200 MOPS Systolic Processor. In
*Proceedings of SPIE Symposium, Vol. 298, Real-Time Signal Processing IV*. The Society of Photo-optical Instrumentation Engineers, August, 1981.Google Scholar - [5]Bojanczyk, A., Brent, R.P. and Kung, H.T.
*Numerically Stable Solution of Dense Systems of Linear Equations Using Mesh-Connected Processors*. Technical Report, Carnegie-Mellon University, Computer Science Department, May, 1981. The final version of the paper is to appear in*SIAM Journal on Scientific and Statistical Computing*.Google Scholar - [6]Brent, R.P. and Kung, H.T.
*Systolic VLSI Arrays for Polynomial GCD Computation*. Technical Report, Carnegie-Mellon University, Computer Science Department, May, 1982.Google Scholar - [7]Bromley, K., Symanski, J.J., Speiser, J.M., and Whitehouse, H.J. Systolic Array Processor Developments. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors),
*VLSI Systems and Computations*, pages 273–284. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.Google Scholar - [8]Cappello, P.R. and Steiglitz K. Digital Signal Processing Applications of Systolic Algorithms. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors),
*VLSI Systems and Computations*, pages 245–254. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.Google Scholar - [9]Chazelle, Bernard.
*Computational Geometry on a Systolic Chip*. Technical Report, Carnegie-Mellon University, Computer Science Department, May, 1982.Google Scholar - [10]Cohen, D.
*Mathematical Approach to Computational Networks*. Technical Report ISI/RR-78-73, University of Southern California, Information Sciences Institute, November, 1978.Google Scholar - [11]Fisher, A. Systolic Algorithms for Running Order Statistics in Signal and Image Processing. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors),
*VLSI Systems and Computations*, pages 265–272. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.Google Scholar - [12]Fisher, A.L. and Kung, H.T. Synchronizing Large Systolic Arrays. In
*Proceedings of SPIE Symposium, Vol. 341, Real-Time Signal Processing V.*The Society of Photo-Optical Instrumentation Engineers, May, 1982.Google Scholar - [13]Foster, M.J. and Kung, H.T. The Design of Special-Purpose VLSI Chips.
*Computer*13(1):26–40, January, 1980. Reprint of the paper appears in*Digital MOS Integrated Circuits*, edited by Elmasry, M.I., IEEE Press Selected Reprint Series, 1981, pp. 204–217. A preliminary version of the paper, entitled “Design of Special-Purpose VLSI Chips: Example and Opinions,” also appears in*Proceedings of the 7th International Symposium on Computer Architecture*, pp. 300–307, La Baule, France, May 1980.Google Scholar - [14]Foster, M.J. and Kung, H.T. Recognize Regular Languages With Programmable Building-Blocks. In Gray, J.P. (editor),
*VLSI 81*, pages 75–84. Academic Press, August, 1981. The final version is to appear in*Journal of Digital Systems*.Google Scholar - [15]Gentleman, W.M. and Kung, H.T. Matrix Triangularization by Systolic Arrays. In
*Proceedings of SPIE Symposium, Vol. 298, Real-Time Signal Processing IV*. The Society of Photo-optical Instrumentation Engineers, August, 1981.Google Scholar - [16]Guibas, L.J. and Liang, F.M. Systolic Stacks, Queues, and Counters. In
*Proceedings of the Conference on Advanced Research in VLSI*. Cambridge, Massachusetts, January, 1982.Google Scholar - [17]Guibas, L.J., Kung, H.T. and Thompson, C.D. Direct VLSI Implementation of Combinatorial Algorithms. In
*Proceedings of Conference on Very Large Scale Integration: Architecture, Design, Fabrication*, pages 509–525. California Institute of Technology, January, 1979.Google Scholar - [18]Hong, J.-W. and Kung, H.T. I/O Complexity: The Red-Blue Pebble Game. In
*Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing*, pages 326–333. ACM SIGACT, May, 1981.Google Scholar - [19]Huffman, D.A. The Synthesis of Linear Sequential Coding Networks. In Cherry, C. (editor),
*Information Theory*, pages 77–95. Academic press, 1957.Google Scholar - [20]Kung, H.T. Let's Design Algorithms for VLSI Systems. In
*Proceedings of Conference on Very Large Scale Integration: Architecture, Design, Fabrication*, pages 65–90. California Institute of Technology, January, 1979. Also available as a CMU Computer Science Department technical report, September 1979.Google Scholar - [21]Kung, H.T. Special-Purpose Devices for Signal and Image Processing: An Opportunity in VLSI. In
*Proceedings of the SPIE, Vol. 241, Real-Time Signal Processing III*, pages 76–84. The Society of Photo-Optical Instrumentation Engineers, July, 1980.Google Scholar - [22]Kung, H.T., Ruane, L.M., and Yen, D.W.L. A Two-Level Pipelined Systolic Array for Convolutions. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors),
*VLSI Systems and Computations*, pages 255–264. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.Google Scholar - [23]Kung, H.T. Use of VLSI in Algebraic Computation: Some Suggestions. In Wang, P.S. (editor),
*Proceedings of the 1981 ACM Symposium on Symbolic and Algebraic Computation*, pages 218–222. ACM SIGSAM, August, 1981.Google Scholar - [24]Kung, H.T. Why Systolic Architectures?
*Computer Magazine*15(1):37–46, January, 1982.Google Scholar - [25]Kung, H.T. and Lehman, P.L. Systolic (VLSI) Arrays for Relational Database Operations. In
*Proceedings of ACM-SIGMOD 1980 International Conference on Management of Data*, pages 105–116. ACM, May, 1980. Also available as a CMU Computer Science Department technical report, August 1979.Google Scholar - [26]Kung, H.T. and Leiserson, C.E. Systolic Arrays (for VLSI). In Duff, I. S. and Stewart, G. W. (editors),
*Sparse Matrix Proceedings 1978*, pages 256–282. Society for Industrial and Applied Mathematics, 1979. A slightly different version appears in*Introduction to VLSI Systems*by C. A. Mead and L. A. Conway, Addison-Wesley, 1980, Section 8.3.Google Scholar - [27]Kung, H.T. and Picard, R.L. Hardware Pipelines for Multi-Dimensional Convolution and Resampling. In
*Proceedings of the 1981 IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Database Management*, pages 273–278. IEEE Computer Society Press, November, 1981.Google Scholar - [28]Kung, H.T. and Song, S.W. A Systolic 2-D Convolution Chip. In Preston, K., Jr. and Uhr, L. (editor),
*Multicomputers and Image Processing: Algorithms and Programs*, pages 373–384. 1982. An extended abstract appears in*Proceedings of 1981 IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Database Management*, November 11–13, 1981, pp. 159–160.Google Scholar - [29]Lehman, P.L. A Systolic (VLSI) Array for Processing Simple Relational Queries. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors),
*VLSI Systems and Computations*, pages 285–295. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.Google Scholar - [30]Leiserson, C.E. Systolic Priority Queues. In
*Proceedings of Conference on Very Large Scale Integration: Architecutre, Design, Fabrication*, pages 199–214. California Institute of Technology, January, 1979. Also available as a CMU Computer Science Department technical report, April 1979.Google Scholar - [31]Leiserson, C.E. and Saxe, J.B. Optimizing Synchronous Systems. In
*Proceedings of the 22nd Annual Symposium on Foundations of Computer Science*, pages 23–36. IEEE Computer Society, October, 1981.Google Scholar - [32]Liu, K.Y. Architecture for VLSI Design of Reed-Solomon Encoders. In
*Proceedings of the Second Caltech VLSI Conference*. Caltech, January, 1981.Google Scholar - [33]Lyon, R.F. Two's Complement Pipeline Multipliers.
*IEEE Transactions on Communications*COM-24(4):418–425, April, 1976.Google Scholar - [34]Mead, C.A. and Conway, L.A.
*Introduction to VLSI Systems*. Addison-Wesley, Reading, Massachusetts, 1980.Google Scholar - [35]Mead, C.A., Pashley, R.D., Britton, L.D., Daimon, Y.T., and Sando, S.F. 128-Bit Multicomparator.
*IEEE Journal of Solid-State Circuits*SC-11(5):692–695, October, 1976.Google Scholar - [36]Mukhopadhyay, A. Hardware Algorithms for Nonnumeric Computation.
*IEEE Transactions on Computers*C-28(6):384–394, June, 1979.Google Scholar - [37]Noyce, R.N. Hardware Prospects and Limitations. In Dertouzos, M.L. and Moses, J. (editor),
*The Computer Age: A Twenty-Year View*, pages 321–337. IEEE, 1979.Google Scholar - [38]Ottmann, T., Rosenberg, A.L. and Stockmeyer, L.J.
*A Dictionary Machine for VLSI*. Technical Report RC 9060 (#39615), IBM Thomas J. Watson Research Center, Yorktown Heights, New York, 1981.Google Scholar - [39]Peterson, W.W. and Weldon, E.J., Jr.
*Error-Correcting Codes*. MIT Press, Cambridge, Massachusetts, 1972.Google Scholar - [40]Savage, C. A Systolic Data Structure Chip for Connectivity Problems. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors),
*VLSI Systems and Computations*, pages 296–300. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.Google Scholar - [41]Schirm IV, L. Multiplier-Accumulator Application Notes.Google Scholar
- [42]Song, S.W.
*On a High-Performance VLSI Solution to Database Problems*. PhD thesis, Carnegie-Mellon University, Computer Science Department, July, 1981. Also available as a CMU Computer Science Department technical report, August 1981.Google Scholar - [43]Sutherland, I.E. and Mead, C.A. Microelectronics and Computer Science.
*Scientific American*237(3):210–228, September, 1977.Google Scholar - [44]Swartzlander, E.E., Jr. and Gilbert, B.K. Arithmetic for Ultra-High-Speed Tomography.
*IEEE Transactions on Computers*C-29(5):341–354, May, 1980.Google Scholar - [45]Symanski, J.J. Progress on a Systolic Processor Implementation. In
*Proceedings of SPIE Symposium, Vol. 341, Real-Time Signal Processing V*. The Society of Photo-Optical Instrumentation, May, 1982.Google Scholar - [46]Todd, S. Algorithm and Hardware for a Merge Sort Using Multiple Processors.
*IBM Journal of Research and Development*22(5):509–517, September, 1978.Google Scholar - [47]Weiser, U. and Davis, A. A Wavefront Notation Tool for VLSI Array Design. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors),
*VLSI Systems and Computations*, pages 226–234. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.Google Scholar - [48]Whiteside, R.A., Hibbard, P.G. and Ostlund, N.S. Systolic Algorithms for Monte Carlo Simulations. Draft, CMU Computer Science Department.Google Scholar
- [49]Yen, D.W.L. and Kulkarni, A.V. The ESL Systolic Processor for Signal and Image Processing. In
*Proceedings of the 1981 IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Database Management*, pages 265–272. November, 1981.Google Scholar