Implementation on RISC Architectures
Abstract
A wide variety of DFT and convolution algorithms have been designed to optimize computations with respect to the number of arithmetic operations, especially multiplications. Blahut (1985) [1] offers an excellent survey of many algorithms designed using this methodology. Today, with the rapid advance in VLSI technology and the availability of high speed and inexpensive floating-point processors, the time required to carry out a fixed-point addressing operation, or a floating-point addition can effectively be the same as the floating-point multiplication. Some advanced architectures have these functional units working in parallel with multiple operations realized in one or a few cycles at the same time. Traditional algorithm design of trading multiplications for additions, therefore, is not only ineffective but can result in a significant decrease in performance!
Keywords
Data Cache Assembly Code Intel I860 Machine Cycle Convolution AlgorithmPreview
Unable to display preview. Download preview PDF.
Bibliography
- [1]Blahut, R.E. (1985), Fast Algorithms For Digital Signal Processing, Addison-Wesley, Reading, MA.Google Scholar
- [2]Bogoch, S., Bason, I., Williams, J. and Russell, M. (1990), “Supercomputers Get Personal,” BYTE Magazine, 231–237.Google Scholar
- [3]Dewar, R.B. and Smosna, M. (1990), Microprocessors: A Programmer’s View, McGraw-Hill Publishing Co., New York.Google Scholar
- [4]Granata, J.A. (1990), The Design of Discrete Fourier Transform and Convolution Algorithms For RISC Architectures, Ph.D. dissertation, the City University of New York.Google Scholar
- [5]Hennessy, J.L. (1984), “VLSI Processor Architecture,” IEEE Computers C-33, 1221–1246.CrossRefGoogle Scholar
- [6]Linzer, E. and Feig, E. (1991), “Implementation of Efficient FFT Algorithms on Fused Multiply-Add Architectures,” to appear.Google Scholar
- [7]Lu, C. (1991), “Implementation of ‘Multiply-Add’ FFT Algorithms for Complex and Real Data Sequences,” Proceeding of IEEE International Conference on Circuits and Systems, Singapore.Google Scholar
- [8]Lu C., Cooley, J.W. and Tolimieri, R. (1993),“FFT ALgorithms for Prime Transform Sizes and Their Implementations on VAX, IBM 3090VF and RS/6000,” IEEE Trans. Signal Processing 41 (2), February.Google Scholar
- [9]Lu C., Cooley, J.W. and Tolimieri, R. (1991), “Variants of the Winograd Multiplicative FFT Algorithms and Their Implementation on RS/6000,” Proceedings ICASSP-91, Toronto.Google Scholar
- [10]Lu, C., An, M., Qian, Z. and Tolimieri, R. (1992), “Small FFT module Implementation on the Intel i860 Processor,” the Proc. ICS PAT, November, 2–5, Cambridge, MA.Google Scholar
- [11]Margulis, N. (1990), i860 Microprocessor Architecture, McGraw- Hill Publishing Co., New York.Google Scholar
- [12]Patterson, D.A. (1985), “Reduced Instruction Set Computers,” Communications of The ACM 28 (1), 8–21.CrossRefGoogle Scholar
- [13]Patterson, D.A. and Sequin, C.H. (1981), “RISC I: A Reduced Instruction Set VLSI Computer,” Proc. 8th Internat. Sympos. Computer Architectures ACM, 443–457.Google Scholar
- [14]Patterson, D.A. and Sequin, C.H. (1982), “AVLSI RISC,” IEEE Computer Mag, September, 8–22.Google Scholar
- [15]Radin, G. (1982), “The 801 Minicomputer,” Computer Architecture News 10, 39 - 47.CrossRefGoogle Scholar
- [16]Stallings, W. (1990), Reduced Instruction Set Computers ( RISC ), Second Edition. IEEE Computer Society Press.Google Scholar
- [17]Tolimieri, R., An, M. and Lu, C. (1989), Algorithms for Discrete Fourier Transform and Convolutions, Springer-Verlag, New York.Google Scholar
- [18]IBM Journal of Research and Development: Special Issue on IBM RISC System/6000 Processor, June, 1990.Google Scholar
- [19]Intel, iPSC/860 Supercomputer Advanced Information Fact Sheet. Intel 1990.Google Scholar
- [20]ATT DSP Parallel Processor BTAOO User Manual, AT T, 1988.Google Scholar