# A New Overlap Save Algorithm for Fast Block Convolution and Its Implementation Using FFT

## Abstract

Convolution of data with a long-tap filter is often implemented by overlap save algorithm (OSA) using fast Fourier transform (FFT). But there are some redundant computations in the traditional OSA because the FFT is applied to the overlapped data (concatenation of previous block and the current block) while the DFT computations are recursive. In this paper, we first analyze the redundancy by decomposing the OSA into two processes related to the previous and current block. Then we eliminate the redundant computations by introducing a new transform which is applied only to the current data, not to the overall overlapped data. Hence the size of transform is reduced by half compared to the traditional OSA. The new transform is in the form of DFT and it can be implemented by defining a new butterfly structure. However we implement it by a cascade of twiddle factor and conventional FFT in this paper, in order to use the FFT libraries in PC and DSP. The computational complexity in this case is analyzed and compared with the existing methods. In the experiment, the proposed method is applied to several block convolutions and partitioned-block convolutions. The CPU time is reduced more than expected from the arithmetic analysis, which implies that the reduced transform size gives additional advantage in data manipulation.

## Keywords

Overlap save algorithm QDFT Block convolution## Notes

### Acknowledgements

This research was performed for the Intelligent Robotics Development Program, one of the 21st Century Frontier R&D Programs funded by the Ministry of Knowledge Economy (MKE).

## References

- 1.Oppenhiem, A. V., & Schafer, R. W. (1989).
*Discrete-time signal processing*. Englewood Cliffs: Prentice-Hall.Google Scholar - 2.Agarwal, R. C., & Burrus, C. S. (1978). Number theoretic transforms to implement fast digital convolution.
*Proceedings of IEEE, 63*(4), 550–560.CrossRefMathSciNetGoogle Scholar - 3.Mou, Z.-J., & Duhamel, P. (1991). Short-length FIR filters and their use in fast nonrecursive filtering.
*IEEE Transactions on Signal Processing, 39*(6), 1322–1332.CrossRefGoogle Scholar - 4.Duhamel, P. (1986). Implementation of “split-radix” FFT algorithms for complex, real, and real-symmetric data.
*IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-34*(2), 285–295.CrossRefMathSciNetGoogle Scholar - 5.Johnson, S. G., & Frigo, M. (2007). A modified split-radix fft with fewer arithmetic operations.
*IEEE Transactions on Signal Processing, 55*(1), 111–119.CrossRefMathSciNetGoogle Scholar - 6.Vetterli, M. (1988). Running FIR and IIR filtering using multirate filter banks.
*Transactions on Acoustics, Speech, and Signal Processing, 36*(5), 730–738.CrossRefMATHGoogle Scholar - 7.Gardner, W. G. (1995). Efficient convolution without input–output delay.
*Journal of Audio Engineering Society, 43*(3), 127–136.Google Scholar - 8.Torger, A., & Farina, A. (2001). Real-time partitioned convolution for ambiophonics surround sound. In
*IEEE workshop on applications of signal processing to audio and acoustics*(pp. 21–24).Google Scholar - 9.Shynk, J. J. (1992). Frequency-domain and multirate adaptive filtering.
*IEEE Signal Processing Magazine*,*9*, 14–37.CrossRefGoogle Scholar - 10.Farina, A., Glasgal, R., Armelloni, E., & Torger, A. (2001). Ambiophonic principles for the recording and reproduction of surround sound for music. In
*19th AES conference*(pp. 21–24).Google Scholar - 11.Matusiak, R. (1997).
*Implementing fast Fourier transform algorithms of real-valued sequences with the TMS320 DSP family*. Application Report of Texas Instruments.Google Scholar - 12.Prati, G. (1978). A discrete adaptive equalizer based on the overlap save filtering technique. In
*Canadian communications and power conference*(pp. 141–144).Google Scholar - 13.Narasimha, M. J. (2006). Modified overlap add and overlap save convolution algorithms for real signals.
*IEEE Signal Processing Letters, 13*(11), 669–671.CrossRefGoogle Scholar - 14.Kuk, J. G., Kim, S. Y., & Cho, N. I. (2009). An overlap save algorithm for block convolution with reduced complexity. In
*IEEE international conference on acoustics, speech and signal processing*(pp. 605–608).Google Scholar - 15.Intel Performance Libraries. Intel integrated performance primitives website. http://software.intel.com/en-us/intel-ipp/.