Abstract
Real-time signal processing requires fast computation of inner products. Distributed arithmetic is a method of inner product computation that uses table-lookup and addition in place of multiplication. Distributed arithmetic has previously been shown to produce novel and seemingly efficient architectures for a variety of signal processing computations; however the methods of design, analysis and comparison have been ad hoc. We propose a systematic method for synthesizing optimal VLSI architectures using distributed arithmetic.
A partition of the inner product computation at the word and bit level produces a computation consisting of lookups and additions. We study two classes of algorithms to implement this computation, regular iterative algorithms and tree algorithms, each of which can be expressed in the form of a dependency graph. We use linear and nonlinear maps to assign computations to processors in space and time. Expressions are developed for the area, latency, period and arithmetic error for a particular partition and space/time map of the dependecy graph. We use these expressions to formulate a constrained optimization problem over a large class of architectures. We compare distributed arithmetic with more conventional methods for inner product computation and show how area, latency and period may be traded off while maintaining constant error.
Similar content being viewed by others
References
A. Peled and B. Liu, “A New Hardware Realization of Digital Filters,”IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-22, No. 6, 1974.
S. Zohar, “New Hardware Realization of Nonrecursive Digital Filters,”IEEE Transaction Computers, Vol. C-22, 1973, pp. 328–338.
S.A. White, “On Mechanization of Vector Multiplication,”Proceedings of the IEEE, Vol. 63, 1975, pp. 633–648.
S.A. White, “Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Revies,”IEEE ASSP Magazine, Vol. 6, No. 3, 1989, pp. 4–19.
W. Burleson, “Efficient Computation in VLSI with Distributed Arithmetic,” Ph.D. Dissertation, Univerdity of Colorado, 1989.
W. Burleson and L.L. Scharf, “VLSI Design of Inner Product Computers Using Distributed Arithmetic,”Proceedings of the International Symposium on Circuits and Systems, Portland, OR, 1989, pp. 158–161.
W. Burleson and L.L. Scharf, “Input/Output Complexity of Bit Level VLSI Array Architectures,”Proceedings of the Asilomar Conference on Computers, Signals and Systems, 1989.
M. Arjmand and R.A. Roberts, “On Comparing Fixed Point Implementations of Fixed Point Digital Filters,”IEEE Circuits and Systems Magazine, Vol. 3, No. 2, 1981.
M. Buttner and H.W. Scheussler, “On Structures for the Implementation of Distributed Arithmetic,”NTZ Communication Journal, Vol. 6, 1976.
S. Rao and T. Kailath, “What is a Systolic Algorithm,”Highly Parallel Signal Processing Architectures, SPIE Vol. 614, 1986.
K.K. Parhi, “Nibble-serial Arithmetic Processor Desings via Unfolding,”Proceedings of the International Symposium on Circuits and Systems, Portland, Oregon, 1989.
A.V. Aho, J.E. Hopcroft and J.D. Ullman,The Design and Analysis of Computer Algorithms, Reading, MA: Addison-Welsey, 1974.
R.J. Lipton and J. Valdes, “Census Functions: an Approach to VLSI Upper Bounds,”Proceedings of the Twenty-First Annual IEEE Symposium on Foundations of Computer Science, pp. 13–22.
C.S. Wallace, “A Suggestion for a Fast Multiplier,”IEEE Transactions on Computers, Vol. C-13, No. 2, February 1964, pp. 14–17.
L. Dadda, “Some Schemes for Parallel Multipliers,”Alta Frequenza, 34:349–356, 1965.
J. Vuillemin, “A Very Fast Multiplication Algorithm for VLSI Implementation,”Integration, the VLSI Journal, Vol. 1, 1983, pp. 39–52.
S.P. Smith and H.C. Torng, “A Fast Inner Product Processor Based on Equal Alignments,”Journal of Parallel and Distributed Computing, Vol. 2, 1985, pp. 376–390.
M.R. Buric and C.A. Mead, “Bit-Serial Inner Product Processors in VLSI,”Proceedings Caltech Conference on VLSI, 1981, pp. 155–164.
R.P. Brent and H.T. Kung, “A Regular Layout for Parallel Adders,”IEEE Transactions on Computers, Vol. C-31, No. 3, 1982.
B. Chazelle and L. Monier, “Optimality in VLSI,” pp. 151–160 in J.P. Gray (ed.),VLSI 81, New York: Academic Press, 1981.
C. Mead and L. Conway,Introduction to VLSI Systems, Reading, MA: Addison-Wesley, 1980.
K.D. Kammeyer, “Quantization Error Analysis of the Distributed Arthmetic,”IEEE Transactions on Circuits and Systems, Vol. CAS-24, No. 12, 1977.
F.J. Taylor, “An Analysis of the Distributed Arithmetic Digital Filter,”IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-34, No. 5, 1986.
C.F. Chen, “Implementing FIR Filters with Distributed Arithmetic,”IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-33, No. 4, 1985.
P.R. Cappello and K. Steiglitz, “Unifying VLSI Array Designs with Geometric Transformations,” inProceedings IEEE International Conference on Parallel Processing, 1983.
M. Chen, “The Generation of a Class of Multipliers: Synthesizing Highly Parallel Algorithms in VLSI,”IEEE Transactions on Computers, Vol. 37, No. 3, 1988.
S.Y. Kung,VLSI Array Processors, Englewood Cliffs, NJ: Prentice-Hall, 1988.
P.B. Denyer and D.J. Myers, “Carry-Save Arrays for VLSI Signal Processing,” pp. 151–160 in J.P. Gray (ed.),VLSI 81, New York: Academic Press, 1981.
P.R. Cappello and K. Steiglitz, “Completely-Pipelined Architectures for Digital Signal Processing,”IEEE Transaction on Acoustics, Speech, and Signal Processing, Vol. ASSP-vn31, No. 4, 1983.
R.F. Lyon, “Two's Complement Pipeline Multipliers,”IEEE Transactions on Communications, COM-24, 1976, pp. 418–425.
S.G. Smith and P.B. Denyer, “Effiient Bit-Serial Complex Multiplication and Sum-of-Products Computation Using Distributed Arithmetic,”Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Tokyo, 1986.
J.D. Ullman,Computational Aspects of VLSI, Rockville, MD: Computer Science Press, 1984.
C.D. Thompson, “A Complexity Theory for VLSI,” Ph.D. Dissertation, Department of Computer Science, Carnegie-Mellon University, 1980.
L. Kühnel and H. Schmeck, “A Closer Look at VLSI Multiplication,”Integration, the VLSI Journal, Vol. 6, 1988, pp. 345–359.
W. Burleson, “Memory Design of Bit-level VLSI Architectures,”Proceedings of the International Symposium on Circuits and Systems, New Orleans, 1990.
R. Jain, A. Ruetz and R.W. Brodersen, “Architectural Strategies for Digital Signal Processing Circuits,” in S.Y. Kung, R.E. Owen and J.G. Nash (eds.),VLSI Signal Processing II, New York: IEEE Press, 1986, pp. 361–372.
W. Burleson, L.L. Scharf, A.R. Gabriel and N.H. Endlsey, “A Systolic VLSI Chip for Implementing Orthogonal Transforms,”IEEE Journal of Solid-State Circuits, Vol. 24, No. 2, 1989, pp. 466–469.
Author information
Authors and Affiliations
Additional information
This work was supported by Ball Aerospace, Boulder, CO and by the Office of Naval Research, Electronics Branch, Arlington, VA under contract ONR 89-J-1070.
Rights and permissions
About this article
Cite this article
Burleson, W.P., Scharf, L.L. A VLSI design methodology for distributed arithmetic. J VLSI Sign Process Syst Sign Image Video Technol 2, 235–252 (1991). https://doi.org/10.1007/BF00925468
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF00925468