Skip to main content
Log in

Abstract

This survey paper reviews numerous high-level transformation techniques which can be applied at the algorithm or the architecture level to improve the performance of digital signal and image processing architectures and circuits implemented using VLSI technology. Successful design of VLSI signal and image processors requires careful selection of algorithms, architectures, implementation styles, and synthesis techniques. High-level transformations can play an important role in reducing silicon area or power at the same speed or in increasing the speed for same area. These transformations can also increase the suitability of an algorithm for a particular architectural style. The transformation techniques reviewed in this paper include pipelining, parallel processing, retiming, unfolding, folding, look-ahead, relaxed look-ahead, associativity, distributivity, and reduction in strength.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. R.E. Blahut,Fast Algorithms for Digital Signal Processing, Reading, MA: Addison-Wesley, 1987.

    Google Scholar 

  2. A.V. Oppenheim, and R. W. Schafer,Discrete-Time Signal Processing, Englewood Cliffs, NJ: Prentice Hall, 1989.

    MATH  Google Scholar 

  3. E.A. Lee and D.G. Messerschmitt, “Pipeline Interleaved Programmable DSPs,”IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 35, pp. 1320–1345, 1987.

    Article  Google Scholar 

  4. K.K. Parhi, “Algorithm Transformation Techniques for Concurrent Processors,”Proceedings of the IEEE, Vol. 77, pp. 1879–1895, 1989.

    Article  Google Scholar 

  5. A.V. Aho, R. Sethi, and J.D. Ullman,Compilers: Principles, Techniques and Tools, Reading: Addison-Wesley, 1986.

    Google Scholar 

  6. K.K. Parhi, and D.G. Messerschmitt, “Static Rate-Optimal Scheduling of Iterative Data-Flow Programs via Optimum Unfolding,”IEEE Trans. on Computers, Vol. 40, pp. 178–195, 1991.

    Article  Google Scholar 

  7. J.L. Hennessy and D.A. Patterson,Computer Architecture: A Quantitative Approach, San Mateo, CA: Morgan Kaufman Publishers, 1989.

    Google Scholar 

  8. K. Hwang, and F.A. Briggs,Computer Architecture and Parallel Processing, New York: McGraw Hill, 1984.

    MATH  Google Scholar 

  9. P.M. Kogge,The Architecture of Pipelined Computers, New York: McGraw Hill, 1981.

    MATH  Google Scholar 

  10. H.S. Stone,High-Performance Computer Architecture, Boston: Addison Wesley, 1990.

    Google Scholar 

  11. H. De Man,et al., “Cathedral II: A Silicon Compiler for Digital Signal Processing,”IEEE Design and Test Magazine, pp. 13–25, December 1986.

  12. H. De Man,et al., “Architecture Driven Synthesis Techniques for VLSI Implementation of DSP Algorithms,”Proceedings of the IEEE, pp. 319–335, 1990.

  13. P. Denyer,et al., “A Silicon Compiler for Signal Processing,”Proc. of the ESSCIRC conference, Brussels, pp. 215–218, September 1982.

  14. P. Denyer, and D. Renshaw,VLSI Signal Processing: A Bit-Serial Approach, Reading, MA: Addison-Wesley, 1985.

    Google Scholar 

  15. R.I. Hartley, and P.F. Corbett, “Digit-Serial Processing Techniques,”IEEE Trans. on Circuits and Systems, Vol. 37, pp. 707–719, 1990.

    Article  Google Scholar 

  16. R.I. Hartley and J.R. Jasica, “Behavioral to Structural Translation in a Bit-Serial Silicon Compiler,”IEEE Trans. on Computer Aided Design, Vol. 7, pp. 877–886, 1988.

    Article  Google Scholar 

  17. S.M. Heemstra de Groot, S. H. Gerez, and O. E. Hermann, “Range Chart Guided Iterative Data-Flow Graph Scheduling,”IEEE Trans. on Circuits and Systems-I: Fundamental Theory and Applications, Vol. 39, pp. 351–364, 1992.

    Article  MATH  Google Scholar 

  18. C.-T. Hwang, J.-H. Lee, and Y.-C. Hsu, “A Formal Approach to the Scheduling Problem in High-Level Synthesis,”IEEE Trans. on CAD of ICs, Vol. 10, pp. 464–475, 1991.

    Article  Google Scholar 

  19. R. Jain,et al., “Custom Design of a VLSI PCM-FDM Transmultiplexor from System Specifications to Circuit Layout Using a Computer Aided Design System,”IEEE Journal of Solid State Circuits, pp. 73–85, 1986.

  20. M.C. McFarland, A.C. Parker and R. Composano, “The High-Level Synthesis of Digital Systems,”Proceedings of the IEEE, pp. 301–318, 1990.

  21. N. Park, and A. Parker, “Sehwa: A Software Package for Synthesis of Pipelines from Behavioral Specifications,”IEEE Trans. on CAD of ICs, Vol. 7, pp. 356–370, 1988.

    Article  Google Scholar 

  22. K.K. Parhi, C.Y. Wang, and A. P. Brown, “Synthesis of Control Circuits in Folded Pipelined DSP Architectures,”IEEE Journal of Solid State Circuits, Vol. 27(1), pp. 29–43, January 1992.

    Article  Google Scholar 

  23. P.G. Paulin, and J.P. Knight, “Force Directed Scheduling for the Behavioral Synthesis of ASICs,”IEEE Trans. on CAD of ICs, Vol. 8, pp. 661–679, 1989.

    Article  Google Scholar 

  24. M. Potkonjak, and J. Rabaey, “Fast Implementation of Recursive Programs using Transformations,”in Proc. of 1992 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, March 1992, pp. V-569–572, San Francisco.

  25. M. Potkonjak, and J. Rabaey, “Pipelining: Just Another Transformation,”Proc. of 1992 Application Specific Array Processors, Oakland, IEEE Computer Society Press, pp. 163–177, August 1992.

    Google Scholar 

  26. J.M. Rabaey, C. Chu, P. Hoang, and M. Potkonjak, “Fast Prototyping of Data-Path Intensive Architectures,”IEEE Design and Test of Computers, pp. 40–51, June 1991.

  27. S.G. Smith, and P.B. Denyer,Serial Data Computation, Norwell, MA: Kluwer Academic Press, 1988.

    Google Scholar 

  28. Special Issue on “Hardware and Software for Signal Processing,”Proceedings of IEEE, September 1987.

  29. Special Issue on “High-Level Synthesis,”IEEE Design and Test of Computers, October 1990.

  30. Special Issue on “Silicon Compilation,”IEEE Design and Test of Computers, December 1990.

  31. Special Issue on “Rapid Prototyping,”IEEE Design and Test of Computers, June 1991.

  32. C.Y. Wang, and K.K. Parhi, “The MARS High-Level DSP Synthesis System,”in VLSI Design Methodologies for Digital Signal Processing Architectures, edited by M. Bayoumi, pp. 169–205, Kluwer Academic Press, 1994.

  33. C.-Y. Wang and K. K. Parhi, “Resource Constrained Loop List Scheduler for DSP Algorithms,”Journal of VLSI signal Processing, (to appear).

  34. U. Banerjee,et al., “Time and Parallel Processor Bounds for Fortran Like Loops,”IEEE Trans. on Computers, Vol. 28, pp. 660–670, 1979.

    Article  MATH  Google Scholar 

  35. N. Jouppi, and D. Wall, “Available Instruction Level Parallelism for Super-Scalar and Super-Pipelined Machines,”in Proc. of 3rd Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Boston, pp. 272–282, May 1989.

  36. M. Lam, “Software Pipelining: An Effective Scheduling Technique for VLIW Machines,”Proc. of the 1988 ACM SIGPLAN Conf. on Programming Languages Design and Implementation, pp. 318–328, Atlanta, June 1988.

  37. D.A. Padua, and M.J. Wolfe, “Advanced Compiler Organizations for Supercomputers,”Communications of the ACM, Vol. 29, pp. 1184–1201, 1986.

    Article  Google Scholar 

  38. B.R. Rau,et al., “Efficient Code Generation for Horizontal Architectures: Compiler Techniques and Architectural Support,”Proc. of 9th Int. Symp. on Computer Architecture, 1982.

  39. M.E. Wolf, and M.S. Lam, “A Loop Transformation Theory and an Algorithm to Maximize Parallelism,”IEEE Trans. on Parallel and Distributed Systems, Vol. 2, pp. 452–471, 1991.

    Article  Google Scholar 

  40. M. Hatamian, and G. Cash, “A 70-MHz 8-bit×8-bit Parallel Pipelined Multiplier in 2.5μm CMOS,”IEEE Journal of Solid State Circuits, 1986.

  41. M. Hatamian, and K.K. Parhi, “An 85-MHz Fourth-Order Programmable IIR Digital Filter Chip,”IEEE Journal of Solid-State Circuits, Vol. 27, pp. 175–183, 1992.

    Article  Google Scholar 

  42. T.G. Noll,et al., “A Pipelined 330 MHz Multiplier,”IEEE Journal of Solid State Circuits, pp. 411–416, 1986.

  43. M. Renfors, and Y. Neuvo, “The Maximum Sampling Rate of Digital Filters under Hardware Speed Constraints,”IEEE Transactions on Circuits and Systems, pp. 196–202, 1981.

  44. D.A. Schwartz, and T.P. Barnwell, III, “Cyclo-Static Solutions: Optimal Multiprocessor Realizations of Recursive Algorithms,”in VLSI Signal Processing II, IEEE Press, 1986.

  45. A.P. Chandrakasan, S. Sheng, and R.W. Brodersen, “Low-Power CMOS Digital Design,”IEEE Journal of Solid State Circuits, Vol. 27, pp. 473–484, 1992.

    Article  Google Scholar 

  46. T. Meng, and D.G. Messerschmitt, “Arbitrarily High Sampling Rate Adaptive Filters,”IEEE Trans. on Acoustics, Speech and Signal Processing, pp. 455–470, 1987.

  47. K. K. Parhi, and D.G. Messerschmitt, “Concurrent Architectures for Two-Dimensional Recursive Digital Filtering,”IEEE Trans. on Circuits and Systems, Vol. 36, pp. 813–829, 1989.

    Article  Google Scholar 

  48. K.K. Parhi, and D.G. Messerschmitt, “Pipeline Interleaving and Parallelism in Recursive Digital Filters,”IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 37, pp. 1099–1135, 1989.

    Article  MATH  Google Scholar 

  49. C.-W. Wu, and P.R. Cappello, “Application-Specific CAD of VLSI Second-Order Sections,”IEEE Trans. on Acoustics, Speech and Signal Processing, pp. 813–825, 1988.

  50. C.E. Leiserson, F. Rose, and J. Saxe, “Optimizing Synchronous Circuitry by Retiming,”in Proc. of the Third Caltech Conference on VLSI, pp. 87–116, Pasadena, CA, March 1983.

  51. E. Lawler,Combinatorial Optimization: Networks and Matroids, New York: Holt, Rinehart and Winston, 1976.

    MATH  Google Scholar 

  52. J.G. Chung, and K.K. Parhi, “Pipelining of Lattice IIR Digital Filters,”IEEE Trans. on Signal Processing, 42, pp. 751–761, 1994.

    Google Scholar 

  53. K.K. Parhi, “A Systematic Approach for Design of Digit-Serial Signal Processing Architectures,”IEEE Transactions on Circuits and Systems, Vol. 38, pp. 358–375, 1991.

    Article  Google Scholar 

  54. L.E. Lucke, and K.K. Parhi, “Data-Flow Transformations for Critical Path Time Reduction in High-Level DSP Synthesis,”IEEE Transactions on Computer Aided Design, 12, pp. 1063–1068, 1993.

    Article  Google Scholar 

  55. L. F. Chao, and E.H.M. Sha, “Retiming and Unfolding Data-Flow Graphs,”Proc. of 1992 Int. Conf. on Parallel Processing, pp. II-33–40, St. Charles, IL, August 1992.

  56. L.E. Lucke, and K.K. Parhi, “Generalized ILP Scheduling and Allocation for High-Level DSP Synthesis,”in Proc. of the IEEE Custom Integrated Circuits Conference, pp. 5.4.1–5.4.4, San Diego, May 1993.

  57. L.E. Lucke and K.K. Parhi, “Parallel Processing Architectures For Rank-Order and Stack Filters,”IEEE Trans. on Signal Processing, 42, pp. 1178–1189, 1994.

    Article  Google Scholar 

  58. G. Goossens, J. Rabaey, J. Vandewalle, and H. De Man, “An Efficient Microcode Compiler for Application Specific DSP Processors,”IEEE Trans. on Computer Aided Design, pp. 925–937, 1990.

  59. E.M. Girczyc, “Loop Winding—A Data Flow Approach to Functional Pipelining,”Proc. of the IEEE ISCAS, pp. 382–385, May 1987.

  60. H.T., Kung, “Why Systolic Architectures?,”IEEE Computer, pp. 37–42, January 1982.

  61. S.Y. Kung,VLSI Array Processors, Prentice Hall, 1988.

  62. P.R. Cappello, and K. Steiglitz, “Unifying VLSI Array Design with Linear Transformations of Space Time,”Advances in Computers Research, Vol. 2, pp. 23–65, 1984.

    Google Scholar 

  63. H.V. Jagadish, S.K. Rao, and T. Kailath, “Array Architectures for Iterative Algorithms,”Proc. of the IEEE, Vol. 75, pp. 1304–1321, 1987.

    Article  Google Scholar 

  64. G. Li and B. W. Wah, “The Design of Optimal Systolic Arrays,”IEEE Trans. on Computers, Vol. 34, pp. 66–77, 1985.

    MATH  Google Scholar 

  65. D.I. Moldovan and J.A.B. Fortes, “Partitioning and Mapping of Algorithms into Fixed Size Systolic Arrays,”IEEE Trans. on Computers, pp. 1–12, 1986.

  66. P. Quinton, “Automatic Synthesis of Systolic Arrays from Uniform Recurrent Equations,”Proc. of 11th Annual Symposium on Computer Architecture, pp. 208–214, June 1984.

  67. C.H. Gebotys and M.I. Elmasry, “Optimal Synthesis of High-Performance Architectures,”IEEE J. of Solid State Circuits, Vol. 27, pp. 389–397, 1992.

    Article  MATH  Google Scholar 

  68. C.H. Gebotys, “Synthesizing Embedded Speed Optimized Architectures,”IEEE J. of Solid State Circuits, Vol. 28, pp. 242–252, 1993.

    Article  Google Scholar 

  69. K.K. Parhi and D.G. Messerschmitt, “Concurrent Cellular VLSI Adaptive Filter Architectures,”IEEE Transactions on Circuits and Systems, Vol. 34, pp. 1141–1151, 1987.

    Article  Google Scholar 

  70. G. Fettweis and H. Meyr, “Parallel Viterbi Algorithm Implementation: Breaking the ACS Bottleneck,”IEEE Trans. on Communications, Vol. 37, pp. 785–790, 1989.

    Article  Google Scholar 

  71. H.D. Lin and D.G. Messerschmitt, “Finite State Machine has Unlimited Concurrency,”IEEE Trans. on Circuits and Systems, Vol. 38, pp. 465–475, 1991.

    Article  Google Scholar 

  72. K.K. Parhi, “Pipelining in Dynamic Programming Architectures,”IEEE Transactions on Signal Processing, Vol. 39, pp. 1442–1450, 1991.

    Article  Google Scholar 

  73. K.K. Parhi, “High-Speed VLSI Architectures for Huffman and Viterbi Decoders,”IEEE Transactions on Circuits and Systems-Part II: Analog and Digital Signal Processing, Vol. 39, pp. 385–391, 1992.

    Article  Google Scholar 

  74. K.K. Parhi and G. Shrimali, “A Concurrent Loss-Less Coder for Video Compression,”in IEEE Signal Processing V, pp. 257–266, IEEE Press, November 1992.

  75. P. Pirsch, (Ed.),VLSI Implementations for Image Communications, Series Advances in Image Communications, Vol. 2, Elsevier Science Publisher, 1993.

  76. G. Shrimali and K.K. Parhi, “Fast Arithmetic Coder Architectures,”Proc. of 1993 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. I:361–364, April 1993, Minneapolis (MN).

  77. N.R. Shanbhag and K.K. Parhi, “A Pipelined Adaptive Lattice Filter Architecture,”IEEE Transactions on Signal Processing, Vol. 41, pp. 1925–1939, 1993.

    Article  MATH  Google Scholar 

  78. N.R. Shanbhag and K.K. Parhi, “Relaxed Look-Ahead Pipelined LMS Adaptive Filters and their Application to ADPCM Coder,”IEEE Trans. on Circuits and Systems—Part II: Analog and Digital Signal Processing, pp. 753–766, 1993.

  79. N.R. Shanbhag and K.K. Parhi, “VLSI Implementation of a 100 MHz Pipelined ADPCM Codec Chip,”in VLSI Signal Processing VI, pp. 114–122, IEEE Press, Oct. 1993.

  80. N.R. Shanbhag and K.K. Parhi, “A Pipelined Adaptive Differential Vector Quantizer for Real-Time Video Compression,”Proc. of the IEEE Workshop on Visual Signal Processing and Communications, pp. 9–14, September 1992, Raleigh, NC.

  81. B.S. Haroun and M.I. Elmasry, “Architectural Synthesis for DSP Silicon Compilers,”IEEE Trans. on Computer Aided Design, Vol. 8, pp. 431–447, 1988.

    Article  Google Scholar 

  82. J. Vanhoof, I. Bolsens and H. De Man, “Compiling Multi-Dimensional Data Streams into Distributed DSP ASIC Memory,”Proc. of the IEEE Int. Conf. on Computer Aided Design, pp. 272–275, 1991.

  83. I. Verbauwhede, F. Catthoor, J. Vandewalle and H. De Man, “In-Place Memory Management of Algebraic Algorithms on Application Specific Processors,”in Algorithms and Parallel VLSI Architectures, (edited by E. Deprettereet al.), Elsevier Sc. Publ., Vol. B, pp. 353–362, 1991.

  84. M.F.X.B. van Swaaij, F.H.M. Franssen, F.V.M. Catthoor, and H. J. De Man, “Automating High-Level Control Flow Transformations for DSP Memory Management,” inVLSI Signal Processing V, pp. 397–406, IEEE Press, November 1992.

  85. C.D. Polychronopolous, “Compiler Organizations for Enhancing Parallelism and their Impact on Architecture Design,”IEEE Trans. on Computers, Vol. 37, pp. 991–1004, 1988.

    Article  Google Scholar 

  86. K.K. Parhi, “Video Data Format Converters using Minimum Number of Registers,”IEEE Trans. on Circuits and Systems for Video Technology, Vol. 2, pp. 255–267, 1992.

    Article  Google Scholar 

  87. K.K. Parhi, “Systematic Synthesis of DSP Data Format Converters using Life-Time Analysis and Forward-Backward Allocation,”IEEE Transactions on Circuits and Systems—Part II: Analog and Digital Signal Processing, Vol. 39, pp. 423–440, 1992.

    Article  Google Scholar 

  88. K.K. Parhi and T. Nishitani, “VLSI Architectures for Discrete Wavelet Transforms,”IEEE Trans. on VLSI Systems, Vol. 1, pp. 191–202, 1993.

    Article  Google Scholar 

  89. C. Loeffler, A. Ligtenberg and G.S. Moschytz, “Algorithm-Architecture Mapping for Custom DSP Chips,”Proc. of the 1988 IEEE Int. Symp. on Circuits and Systems, pp. 1953–1956, 1988, Helsinki (Finland).

  90. K.K. Parhi and J.M. Rabaey, “Implementation and Synthesis of VLSI Signal Processing Systems,”Video Tutorial, IEEE (March 1992).

  91. J. Rabaey, S. Pope and R. Brodersen, “An Integrated Automated Layout Generation System for DSP Circuits,”IEEE Transactions on Computer Aided Design, pp. 285–296, 1985.

  92. P. Lippens,et al., “PHIDEO: A Silicon Compiler for High Speed Algorithms,”Proc. of the European Conf. on Design Automation, Amsterdam, pp. 436–441, Feb. 1991.

  93. J. van Meerbergen, P. Lippens,et al., “A Design Strategy for High-Throughput Applications,” inVLSI Signal Processing V, IEEE Press, pp. 150–165, 1992.

  94. J.A. Abraham,et al., “Fault Tolerance Techniques for Systolic Arrays,”IEEE Computer, Vol. 20, No. 7, pp. 65–75, July 1987.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Parhi, K.K. High-level algorithm and architecture transformations for DSP synthesis. Journal of VLSI Signal Processing 9, 121–143 (1995). https://doi.org/10.1007/BF02406474

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02406474

Keywords

Navigation