Abstract
We present an efficient approach for the partitioning of algorithms implementing long convolutions. The dependence graph (DG) of a convolution algorithm is locally sequential globally parallel (LSGP) partitioned into smaller, less complex convolution algorithms. The LSGP partitioned DG is mapped onto a signal flow graph (SFG), in which each processor element (PE) performs a small convolution algorithm. The key is then to reduce the complexity of the SFG in two steps: 1. local reduction of complexity: the short Fast Fourier Transform (FFT) is used to perform the small convolution within the PE; and 2. global reduction of complexity: the short FFTs within the PEs are relocated to the global level, where redundant short FFT operations are eliminated. The remaining operation within the PEs is now a simple element-wise multiply-add. After a graph transform, the structure of the SFG kernel is recognized as a set of parallel small convolutions. If we use the short FFT to perform these short convolutions, we come to our final realization of the long convolution algorithm. The computational complexity of this realization is close to the optimum for convolutions, that is, O(N log N). Our approach is thus achieving this N log N –low without having to implement large-size FFTs. We use, instead, small FFT blocks. The advantage is that small FFT transforms are commercially available, and that they can even be implemented in single-chip VLSI architectures. Our final SFG is three dimensional and can be mapped efficiently onto prototype architectures or dedicated VLSI processors. We demonstrate the procedure in the paper by a design example: the implementation of a prototype convolution architecture that we designed for a real-time radar imaging system.
Similar content being viewed by others
References
S.Y. Kung. VLSI Array Processors. Prentice-Hall, Inc., 1988.
Peter Held. Functional Design of Data-Flow Networks. PhD thesis, Dept. EE, Delft University of Technology, May 1996.
L. Thiele. On the Hierarchical Design of VLSI Processor Arrays. In Proceedings IEEE ISCAS' 88, pages 2517–2520, 1988.
J. Bu. Systematic Design of Regulat VLSI Processor Arrays. PhD thesis, Delft University of Technolgy, May 1990.
E.F. Deprettere, P. Held, and P. Wielage. Model and Methods for Regular Array Design. Int. Journal of High Speed Electronics, 4(4), 1993.
A.L. Oppenheim and R.W. Schafer. Discrete-Time Signal Processing. Prentice-Hall, Inc., 1989.
J.G. McWirther. Algorithmic Engineering in Adaptive Signal Processing. IEE Proceedings-F, Radar and Signal Processing, 139(3):226–232, June 1992.
J.G. McWirther and I.K. Proudler. Algorithmic Engineering: A Worked Example. In Proceedings European Signal Processing Conference (EUSIPCO)' 92, pages 5–12. Elseviers Science Publishers B.V., 1992.
E.F. Deprettere. Example of Combined Algorithm Development and Architecture Design. Integration, the VLSI Journal, 16(3):199–220, 1993.
L.H.J. Bierens and E.F. Deprettere. Engineering Multirate Convolutions for Radar Imaging. In Proceedings ICASSP' 96, pages 3217–3220. IEEE, 1996.
R. Portnoff. Time-Frequency Representation of Digital Signals and Systems based on Short-Time Fourier Analysis. IEEE Transactions on Acoustics, Speech and Signal Processing, 28(1):55–69, February 1980.
M. Vetterli. Running FIR and IIR Filtering Using Multirate Filter Banks. IEEE Transactions on Acoustics, Speech and Signal Processing, 36(5):730–738, May 1988.
L.H.J. Bierens. Architectures for Real-Time On-Board Synthetic Aperture Radar Processing. PhD thesis, Delft University of Technology, December 1995.
G.P.M. Egelmeers and P.C.W. Sommen. A New Method for Efficient Convolution in Frequency Domain by Nonuniform Partitioning for Adaptive Filtering. IEEE Transactions on Signal Processing, 44(12):3123–3129, January 1996.
R.C. Agarwal and C.S. Burrus. Fast One-Dimensional Digital Convolution by Multidimensional Techniques. IEEE Transactions on Acoustics, Speech and Signal Processing, 22(1):1–10, February 1974.
H.C. Chiang and J.C. Liu. Fast Algorithm for FIR Filtering in the Transform Domain. IEEE Transactions on Signal Processing, 44(1):126–129, January 1996.
GEC Plessey. Digital Video and Digital Signal Processing, IC Handbook, December 1993. Data Sheet: PDSP16510A, Stand Alone FFT Processor.
L.H.J. Bierens. A Feasibilty Study of an On-Board High Resolution Real-Time Airborne SAR Processor. In Proceedings Microwaves' 94, pages 68–73. NEXUS, 1994.
GEC Plessey. Digital Video and Digital Signal Processing, IC Handbook, December 1993. Data Sheet: PDSP16116/A, 16 by 16 bit complex multiplier.
COMPASS. Passport: Standard Cell Library, 1994.
M.J.G. Boerrigter. A Feasibility Study of a single Chip Implementation of Fast Convolution for Real Time SAR Processing. Master's thesis, University of Twente, August 1995.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Bierens, L., Deprettere, E. Efficient Partitioning of Algorithms for Long Convolutions and their Mapping onto Architectures. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 18, 51–64 (1998). https://doi.org/10.1023/A:1007993310185
Published:
Issue Date:
DOI: https://doi.org/10.1023/A:1007993310185