Advertisement

Putting inner loops automatically in silicon

  • H. T. Kung
Chapter 3 VLSI Algorithms
  • 130 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 163)

Abstract

Many of the time consuming inner loops are inherently regular and parallel. These are exactly the structures that are well suited for VLSI implementation. As a result, it will become increasingly common to have subroutines that are directly executeable in silicon. Does it imply that in the near future many large computations can be effectively carried out by small computers equipped with silicon subroutines? This talk will present a simplied characterization of the silicon subroutine approach, and discuss systolic architectures—a powerful method for implementing cost-effective silicon subroutines for computations such as pattern matching and error-correcting. CAD systems at CMU that have made it possible for us to design some rather complex chips, such as a programmable systolic chip, will also be briefly described.

Keywords

Systolic Array Computer Science Department Very Large Scale Integration Systolic Architecture Systolic Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Barbacci, M.R. Instruction Set Processor Specifications (ISPS): The Notation and Its Application. IEEE Transactions on Computers C-30(1):24–40, January, 1981.Google Scholar
  2. [2]
    Bentley, J.L. A Parallel Algorithm for Constructing Minimum Spanning Trees. Journal of Algorithms 1:51–59, 1980.Google Scholar
  3. [3]
    Bentley, J.L. and Kung, H.T. A Tree Machine for Searching Problems. In Proceedings of 1979 International Conference on Parallel Processing, pages 257–266. IEEE, August, 1979. Also available as a CMU Computer Science Department technical report, August 1979.Google Scholar
  4. [4]
    Blackmer, J., P. Kuekes and Frank, G. A 200 MOPS Systolic Processor. In Proceedings of SPIE Symposium, Vol. 298, Real-Time Signal Processing IV. The Society of Photo-optical Instrumentation Engineers, August, 1981.Google Scholar
  5. [5]
    Bojanczyk, A., Brent, R.P. and Kung, H.T. Numerically Stable Solution of Dense Systems of Linear Equations Using Mesh-Connected Processors. Technical Report, Carnegie-Mellon University, Computer Science Department, May, 1981. The final version of the paper is to appear in SIAM Journal on Scientific and Statistical Computing.Google Scholar
  6. [6]
    Brent, R.P. and Kung, H.T. Systolic VLSI Arrays for Polynomial GCD Computation. Technical Report, Carnegie-Mellon University, Computer Science Department, May, 1982.Google Scholar
  7. [7]
    Bromley, K., Symanski, J.J., Speiser, J.M., and Whitehouse, H.J. Systolic Array Processor Developments. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors), VLSI Systems and Computations, pages 273–284. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.Google Scholar
  8. [8]
    Cappello, P.R. and Steiglitz K. Digital Signal Processing Applications of Systolic Algorithms. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors), VLSI Systems and Computations, pages 245–254. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.Google Scholar
  9. [9]
    Chazelle, Bernard. Computational Geometry on a Systolic Chip. Technical Report, Carnegie-Mellon University, Computer Science Department, May, 1982.Google Scholar
  10. [10]
    Cohen, D. Mathematical Approach to Computational Networks. Technical Report ISI/RR-78-73, University of Southern California, Information Sciences Institute, November, 1978.Google Scholar
  11. [11]
    Fisher, A. Systolic Algorithms for Running Order Statistics in Signal and Image Processing. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors), VLSI Systems and Computations, pages 265–272. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.Google Scholar
  12. [12]
    Fisher, A.L. and Kung, H.T. Synchronizing Large Systolic Arrays. In Proceedings of SPIE Symposium, Vol. 341, Real-Time Signal Processing V. The Society of Photo-Optical Instrumentation Engineers, May, 1982.Google Scholar
  13. [13]
    Foster, M.J. and Kung, H.T. The Design of Special-Purpose VLSI Chips. Computer 13(1):26–40, January, 1980. Reprint of the paper appears in Digital MOS Integrated Circuits, edited by Elmasry, M.I., IEEE Press Selected Reprint Series, 1981, pp. 204–217. A preliminary version of the paper, entitled “Design of Special-Purpose VLSI Chips: Example and Opinions,” also appears in Proceedings of the 7th International Symposium on Computer Architecture, pp. 300–307, La Baule, France, May 1980.Google Scholar
  14. [14]
    Foster, M.J. and Kung, H.T. Recognize Regular Languages With Programmable Building-Blocks. In Gray, J.P. (editor), VLSI 81, pages 75–84. Academic Press, August, 1981. The final version is to appear in Journal of Digital Systems.Google Scholar
  15. [15]
    Gentleman, W.M. and Kung, H.T. Matrix Triangularization by Systolic Arrays. In Proceedings of SPIE Symposium, Vol. 298, Real-Time Signal Processing IV. The Society of Photo-optical Instrumentation Engineers, August, 1981.Google Scholar
  16. [16]
    Guibas, L.J. and Liang, F.M. Systolic Stacks, Queues, and Counters. In Proceedings of the Conference on Advanced Research in VLSI. Cambridge, Massachusetts, January, 1982.Google Scholar
  17. [17]
    Guibas, L.J., Kung, H.T. and Thompson, C.D. Direct VLSI Implementation of Combinatorial Algorithms. In Proceedings of Conference on Very Large Scale Integration: Architecture, Design, Fabrication, pages 509–525. California Institute of Technology, January, 1979.Google Scholar
  18. [18]
    Hong, J.-W. and Kung, H.T. I/O Complexity: The Red-Blue Pebble Game. In Proceedings of the Thirteenth Annual ACM Symposium on Theory of Computing, pages 326–333. ACM SIGACT, May, 1981.Google Scholar
  19. [19]
    Huffman, D.A. The Synthesis of Linear Sequential Coding Networks. In Cherry, C. (editor), Information Theory, pages 77–95. Academic press, 1957.Google Scholar
  20. [20]
    Kung, H.T. Let's Design Algorithms for VLSI Systems. In Proceedings of Conference on Very Large Scale Integration: Architecture, Design, Fabrication, pages 65–90. California Institute of Technology, January, 1979. Also available as a CMU Computer Science Department technical report, September 1979.Google Scholar
  21. [21]
    Kung, H.T. Special-Purpose Devices for Signal and Image Processing: An Opportunity in VLSI. In Proceedings of the SPIE, Vol. 241, Real-Time Signal Processing III, pages 76–84. The Society of Photo-Optical Instrumentation Engineers, July, 1980.Google Scholar
  22. [22]
    Kung, H.T., Ruane, L.M., and Yen, D.W.L. A Two-Level Pipelined Systolic Array for Convolutions. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors), VLSI Systems and Computations, pages 255–264. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.Google Scholar
  23. [23]
    Kung, H.T. Use of VLSI in Algebraic Computation: Some Suggestions. In Wang, P.S. (editor), Proceedings of the 1981 ACM Symposium on Symbolic and Algebraic Computation, pages 218–222. ACM SIGSAM, August, 1981.Google Scholar
  24. [24]
    Kung, H.T. Why Systolic Architectures? Computer Magazine 15(1):37–46, January, 1982.Google Scholar
  25. [25]
    Kung, H.T. and Lehman, P.L. Systolic (VLSI) Arrays for Relational Database Operations. In Proceedings of ACM-SIGMOD 1980 International Conference on Management of Data, pages 105–116. ACM, May, 1980. Also available as a CMU Computer Science Department technical report, August 1979.Google Scholar
  26. [26]
    Kung, H.T. and Leiserson, C.E. Systolic Arrays (for VLSI). In Duff, I. S. and Stewart, G. W. (editors), Sparse Matrix Proceedings 1978, pages 256–282. Society for Industrial and Applied Mathematics, 1979. A slightly different version appears in Introduction to VLSI Systems by C. A. Mead and L. A. Conway, Addison-Wesley, 1980, Section 8.3.Google Scholar
  27. [27]
    Kung, H.T. and Picard, R.L. Hardware Pipelines for Multi-Dimensional Convolution and Resampling. In Proceedings of the 1981 IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Database Management, pages 273–278. IEEE Computer Society Press, November, 1981.Google Scholar
  28. [28]
    Kung, H.T. and Song, S.W. A Systolic 2-D Convolution Chip. In Preston, K., Jr. and Uhr, L. (editor), Multicomputers and Image Processing: Algorithms and Programs, pages 373–384. 1982. An extended abstract appears in Proceedings of 1981 IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Database Management, November 11–13, 1981, pp. 159–160.Google Scholar
  29. [29]
    Lehman, P.L. A Systolic (VLSI) Array for Processing Simple Relational Queries. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors), VLSI Systems and Computations, pages 285–295. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.Google Scholar
  30. [30]
    Leiserson, C.E. Systolic Priority Queues. In Proceedings of Conference on Very Large Scale Integration: Architecutre, Design, Fabrication, pages 199–214. California Institute of Technology, January, 1979. Also available as a CMU Computer Science Department technical report, April 1979.Google Scholar
  31. [31]
    Leiserson, C.E. and Saxe, J.B. Optimizing Synchronous Systems. In Proceedings of the 22nd Annual Symposium on Foundations of Computer Science, pages 23–36. IEEE Computer Society, October, 1981.Google Scholar
  32. [32]
    Liu, K.Y. Architecture for VLSI Design of Reed-Solomon Encoders. In Proceedings of the Second Caltech VLSI Conference. Caltech, January, 1981.Google Scholar
  33. [33]
    Lyon, R.F. Two's Complement Pipeline Multipliers. IEEE Transactions on Communications COM-24(4):418–425, April, 1976.Google Scholar
  34. [34]
    Mead, C.A. and Conway, L.A. Introduction to VLSI Systems. Addison-Wesley, Reading, Massachusetts, 1980.Google Scholar
  35. [35]
    Mead, C.A., Pashley, R.D., Britton, L.D., Daimon, Y.T., and Sando, S.F. 128-Bit Multicomparator. IEEE Journal of Solid-State Circuits SC-11(5):692–695, October, 1976.Google Scholar
  36. [36]
    Mukhopadhyay, A. Hardware Algorithms for Nonnumeric Computation. IEEE Transactions on Computers C-28(6):384–394, June, 1979.Google Scholar
  37. [37]
    Noyce, R.N. Hardware Prospects and Limitations. In Dertouzos, M.L. and Moses, J. (editor), The Computer Age: A Twenty-Year View, pages 321–337. IEEE, 1979.Google Scholar
  38. [38]
    Ottmann, T., Rosenberg, A.L. and Stockmeyer, L.J. A Dictionary Machine for VLSI. Technical Report RC 9060 (#39615), IBM Thomas J. Watson Research Center, Yorktown Heights, New York, 1981.Google Scholar
  39. [39]
    Peterson, W.W. and Weldon, E.J., Jr. Error-Correcting Codes. MIT Press, Cambridge, Massachusetts, 1972.Google Scholar
  40. [40]
    Savage, C. A Systolic Data Structure Chip for Connectivity Problems. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors), VLSI Systems and Computations, pages 296–300. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.Google Scholar
  41. [41]
    Schirm IV, L. Multiplier-Accumulator Application Notes.Google Scholar
  42. [42]
    Song, S.W. On a High-Performance VLSI Solution to Database Problems. PhD thesis, Carnegie-Mellon University, Computer Science Department, July, 1981. Also available as a CMU Computer Science Department technical report, August 1981.Google Scholar
  43. [43]
    Sutherland, I.E. and Mead, C.A. Microelectronics and Computer Science. Scientific American 237(3):210–228, September, 1977.Google Scholar
  44. [44]
    Swartzlander, E.E., Jr. and Gilbert, B.K. Arithmetic for Ultra-High-Speed Tomography. IEEE Transactions on Computers C-29(5):341–354, May, 1980.Google Scholar
  45. [45]
    Symanski, J.J. Progress on a Systolic Processor Implementation. In Proceedings of SPIE Symposium, Vol. 341, Real-Time Signal Processing V. The Society of Photo-Optical Instrumentation, May, 1982.Google Scholar
  46. [46]
    Todd, S. Algorithm and Hardware for a Merge Sort Using Multiple Processors. IBM Journal of Research and Development 22(5):509–517, September, 1978.Google Scholar
  47. [47]
    Weiser, U. and Davis, A. A Wavefront Notation Tool for VLSI Array Design. In Kung, H.T., Sproull, R.F., and Steele, G.L., Jr. (editors), VLSI Systems and Computations, pages 226–234. Computer Science Department, Carnegie-Mellon University, Computer Science Press, Inc., October, 1981.Google Scholar
  48. [48]
    Whiteside, R.A., Hibbard, P.G. and Ostlund, N.S. Systolic Algorithms for Monte Carlo Simulations. Draft, CMU Computer Science Department.Google Scholar
  49. [49]
    Yen, D.W.L. and Kulkarni, A.V. The ESL Systolic Processor for Signal and Image Processing. In Proceedings of the 1981 IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Database Management, pages 265–272. November, 1981.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1984

Authors and Affiliations

  • H. T. Kung
    • 1
  1. 1.Department of ComputerCarnegie-Mellon UniversityPittsburghUSA

Personalised recommendations