Skip to main content
Log in

A New Overlap Save Algorithm for Fast Block Convolution and Its Implementation Using FFT

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript


Convolution of data with a long-tap filter is often implemented by overlap save algorithm (OSA) using fast Fourier transform (FFT). But there are some redundant computations in the traditional OSA because the FFT is applied to the overlapped data (concatenation of previous block and the current block) while the DFT computations are recursive. In this paper, we first analyze the redundancy by decomposing the OSA into two processes related to the previous and current block. Then we eliminate the redundant computations by introducing a new transform which is applied only to the current data, not to the overall overlapped data. Hence the size of transform is reduced by half compared to the traditional OSA. The new transform is in the form of DFT and it can be implemented by defining a new butterfly structure. However we implement it by a cascade of twiddle factor and conventional FFT in this paper, in order to use the FFT libraries in PC and DSP. The computational complexity in this case is analyzed and compared with the existing methods. In the experiment, the proposed method is applied to several block convolutions and partitioned-block convolutions. The CPU time is reduced more than expected from the arithmetic analysis, which implies that the reduced transform size gives additional advantage in data manipulation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others


  1. Oppenhiem, A. V., & Schafer, R. W. (1989). Discrete-time signal processing. Englewood Cliffs: Prentice-Hall.

    Google Scholar 

  2. Agarwal, R. C., & Burrus, C. S. (1978). Number theoretic transforms to implement fast digital convolution. Proceedings of IEEE, 63(4), 550–560.

    Article  MathSciNet  Google Scholar 

  3. Mou, Z.-J., & Duhamel, P. (1991). Short-length FIR filters and their use in fast nonrecursive filtering. IEEE Transactions on Signal Processing, 39(6), 1322–1332.

    Article  Google Scholar 

  4. Duhamel, P. (1986). Implementation of “split-radix” FFT algorithms for complex, real, and real-symmetric data. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-34(2), 285–295.

    Article  MathSciNet  Google Scholar 

  5. Johnson, S. G., & Frigo, M. (2007). A modified split-radix fft with fewer arithmetic operations. IEEE Transactions on Signal Processing, 55(1), 111–119.

    Article  MathSciNet  Google Scholar 

  6. Vetterli, M. (1988). Running FIR and IIR filtering using multirate filter banks. Transactions on Acoustics, Speech, and Signal Processing, 36(5), 730–738.

    Article  MATH  Google Scholar 

  7. Gardner, W. G. (1995). Efficient convolution without input–output delay. Journal of Audio Engineering Society, 43(3), 127–136.

    Google Scholar 

  8. Torger, A., & Farina, A. (2001). Real-time partitioned convolution for ambiophonics surround sound. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 21–24).

  9. Shynk, J. J. (1992). Frequency-domain and multirate adaptive filtering. IEEE Signal Processing Magazine, 9, 14–37.

    Article  Google Scholar 

  10. Farina, A., Glasgal, R., Armelloni, E., & Torger, A. (2001). Ambiophonic principles for the recording and reproduction of surround sound for music. In 19th AES conference (pp. 21–24).

  11. Matusiak, R. (1997). Implementing fast Fourier transform algorithms of real-valued sequences with the TMS320 DSP family. Application Report of Texas Instruments.

  12. Prati, G. (1978). A discrete adaptive equalizer based on the overlap save filtering technique. In Canadian communications and power conference (pp. 141–144).

  13. Narasimha, M. J. (2006). Modified overlap add and overlap save convolution algorithms for real signals. IEEE Signal Processing Letters, 13(11), 669–671.

    Article  Google Scholar 

  14. Kuk, J. G., Kim, S. Y., & Cho, N. I. (2009). An overlap save algorithm for block convolution with reduced complexity. In IEEE international conference on acoustics, speech and signal processing (pp. 605–608).

  15. Intel Performance Libraries. Intel integrated performance primitives website.

Download references


This research was performed for the Intelligent Robotics Development Program, one of the 21st Century Frontier R&D Programs funded by the Ministry of Knowledge Economy (MKE).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jung Gap Kuk.


Appendix 1: Properties of QDFT

1.1 Reversibility of QDFT


Substituting \(X_k^q\) in inverse QDFT in Eq. 17 with forward QDFT, we have

$$ \frac{1}{N} \sum\limits_{k=0}^{N-1}\left(\sum\limits_{m=0}^{N-1} x_m W_N^{m(k+3/4)}\right)W_N^{-n(k+3/4)} $$

and several algebraic steps give

$$ \begin{array}{lll} &&\frac{1}{N} \sum\limits_{k=0}^{N-1}\left(\sum\limits_{m=0}^{N-1} x_m W_N^{m(k+3/4)}\right)W_N^{-n(k+3/4)} \\ &&\frac{1}{N} \sum\limits_{k=0}^{N-1}\sum\limits_{m=0}^{N-1} x_m W_N^{(m-n)(k+3/4)} \\ &&\sum\limits_{m=0}^{N-1} x_m \frac{1}{N} \sum\limits_{k=0}^{N-1} W_N^{(m-n)(k+3/4)}\\ &&\sum\limits_{m=0}^{N-1} x_m \frac{1}{N} W_N^{(m-n)(3/4)} \sum\limits_{k=0}^{N-1} W_N^{(m-n)k}. \label{eq:de} \end{array} $$

The summation \(\sum_{k=0}^{N-1} W_N^{(m-n)k}\) in Eq. 23 is zero for all values of m except for the case when m − n = pN, which results in N. It can therefore be replaced by an infinite sum of Kronecker delta functions with respect to p and Eq. 23 is reduced to

$$ \begin{array}{lll} &&\sum\limits_{m=0}^{N-1} x_m \frac{1}{N} W_N^{(m-n)(3/4)} \sum\limits_{p=-\infty}^{\infty} \delta_{m,n-l+pN} \\ &&\sum\limits_{p=-\infty}^{\infty} x_{n+pN} W_N^{pN(3/4)}. \label{eq:de1} \end{array} $$

In Eq. 24, the summation has a non-zero value when p = 0 because the x n is defined as 0 outside [0, N − 1] and the non-zero value is x n . Hence the reversibility of QDFT is proved. □

1.2 Convolution Property of QDFT

Property Multiplication of two sequences in QDFT domain \(\mathbf{X}_k^q \mathbf{G}_k^q\) corresponds to \(\sum_{k=0}^n x_k g_{n-k} +j\sum_{k=n+1}^{N-1} x_k g_{n-k+N}\) in time domain.


The N-point QDFT based block convolution can be written as \(\frac{1}{N} \sum_{k=0}^{N-1}X_k^q G_k^q W_N^{-n(k+\frac{3}{4})}\) and several algebraic steps give

$$ \begin{array}{lll} & &\frac{1}{N} \sum\limits_{k=0}^{N-1}X_k^q G_k^q W_N^{-n(k+\frac{3}{4})}\\ &&\frac{1}{N}\sum\limits_{k=0}^{N-1}\sum\limits_{l=0}^{N-1}x_l W_N^{n(l+\frac{3}{4})} \sum\limits_{m=0}^{N-1}g_m W_N^{m(k+\frac{3}{4})} W_N^{-n(k+\frac{3}{4})}\\ &&\frac{1}{N}\sum\limits_{l=0}^{N-1}x_l \sum\limits_{m=0}^{N-1} g_m \sum\limits_{k=0}^{N-1} W_N^{(l+m-n)(k+\frac{3}{4})}\\ &&\sum\limits_{l=0}^{N-1}x_l \sum\limits_{m=0}^{N-1} g_m W_N^{\frac{3}{4}(l+m-n)} \frac{1}{N}\sum\limits_{k=0}^{N-1} e^{-\frac{2\pi i}{N} (l+m-n)k}. \label{eq:k} \end{array} $$

As in Eq. 25, the summation with the index k is zero for all values of m except for the case when l + m − n = pN(p ∈ ℤ), which results in N. It can therefore be replaced by an infinite sum of Kronecker delta functions with respect to p. We may also extend the limits of m to infinity, with the understanding that the x and g sequences are defined as 0 outside [0, N − 1]. Continuing with the derivation, we have

$$ \begin{array}{lll} &&\sum\limits_{l=0}^{N-1}x_l \sum\limits_{m=-\infty}^{\infty} g_m W_N^{\frac{3}{4}(l+m-n)} \sum\limits_{p=-\infty}^{\infty} \delta_{m,n-l+pN} \\ &&\sum\limits_{l=0}^{N-1} x_l \sum\limits_{p=-\infty}^{\infty} j^p g_{n-l+pN} \label{eq:pr} \end{array} $$

where g n − l + pN has non-zero values only if p = 0 or p = 1. Hence this can be rewritten as

$$ \sum\limits_{l=0}^n x_l g_{n-l} +j\sum\limits_{l=n+1}^{N-1} x_l g_{n-l+N} $$

and the convolution property of QDFT is proved. □

Appendix 2: Direct radix-2 and radix-4 Implementation of QDFT

QDFT can be implemented in a similar way to the conventional FFT. To generalize the discussion, we consider the N-point transform \(T_s^N\) as

$$ \label{eq:t} T_s^N : X^s=\sum\limits_{n=0}^{N-1} x_n W_N^{n(k+s)} $$

where s means the amount of shift of frequency index. We can have DFT by s = 0, ODFT by s = 1/2 and QDFT by s = 3/4 from Eq. 27. Based on the definition of \(T_s^N\), applying Cooley–Tukey decomposition to \(T_s^N\) yields two shorter transforms of size N/2 and thus radix-2 structure as

$$ \begin{array}{rll} \label{eq:radix2} T_{s/2}^{N/2} &: X_{2k}^s=\sum\limits_{n=0}^{N/2-1} (x_n+W_2^s x_{n+N/2} )W_{N/2}^{n(k+s/2)}\\ T_{(s+1)/2}^{N/2} &: X_{2k+1}^s=\sum\limits_{n=0}^{N/2-1} (x_n-W_2^s x_{n+N/2} )W_{N/2}^{n(k+(s+1)/2)} \end{array} $$

where 0 ≤ k < N/2. That is, the N-point transform \(T_s^N\) is decomposed into two N/2-point transforms \(T_{s/2}^{N/2}\) and \(T_{(s+1)/2}^{N/2}\). The butterfly structure corresponding to Eq. 28 is shown in Fig. 2 where directed line means that the data is multiplied by − 1.

Fig. 2
figure 2

The butterfly structure for the radix-2 QDFT

\(T_s^N\) can also be implemented by radix-4 structure where \(T_s^N\) is decomposed into four shorter transforms of size N/4 : \(T_{s/4}^{N/4}\),\(T_{s/4+1}^{N/4}\),\(T_{s/4+2}^{N/4}\) and \(T_{s/4+3}^{N/4}\) as in Eq. 29.

$$ \begin{array}{rll} T_{s/4}^{N/4} &: X_{4k}^s = \sum\limits_{n=0}^{N/4-1} (x_n + W_4^s x_{n+N/4} + W_4^{2s} x_{n+2N/4} \\ & \qquad \qquad + W_4^{3s} x_{n+2N/4}) W_{N/4}^{n(k+s/4)} \\ T_{s/4+1}^{N/4} &: X_{4k+1}^s = \sum\limits_{n=0}^{N/4-1} (x_n - jW_4^s x_{n+N/4} - W_4^{2s} x_{n+2N/4} \\ & \qquad \qquad + jW_4^{3s} x_{n+2N/4}) W_{N/4}^{n(k+(s+1)/4)}\\ T_{s/4+2}^{N/4} &: X_{4k+2}^s = \sum\limits_{n=0}^{N/4-1} (x_n - W_4^s x_{n+N/4} + W_4^{2s} x_{n+2N/4} \\ & \qquad \qquad - W_4^{3s} x_{n+2N/4}) W_{N/4}^{n(k+(s+2)/4)}\\ T_{s/4+3}^{N/4} &: X_{4k+3}^s = \sum\limits_{n=0}^{N/4-1} (x_n - jW_4^s x_{n+N/4} - W_4^{2s} x_{n+2N/4} \\ & \qquad \qquad - jW_4^{3s} x_{n+2N/4}) W_{N/4}^{n(k+(s+3)/4)} \label{eq:radix4} \end{array} $$

The butterfly structure of Eq. 29 is shown in Fig. 3 where directed line and dotted line mean that the data is multiplied by − 1 and j, respectively. The directed and dotted line, of course, means − j.

Fig. 3
figure 3

The butterfly structure for the radix-4 QDFT

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuk, J.G., Kim, S. & Cho, N.I. A New Overlap Save Algorithm for Fast Block Convolution and Its Implementation Using FFT. J Sign Process Syst 63, 143–152 (2011).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: