Journal of Real-Time Image Processing

, Volume 2, Issue 4, pp 281–291 | Cite as

A hierarchical pipelining architecture and FPGA implementation for lifting-based 2-D DWT

  • Chunhui ZhangEmail author
  • Yun Long
  • Fadi Kurdahi
Special Issue


Numerous VLSI architectures for 2-D discrete wavelet transform (DWT) have been brought forward. While most of the designs displayed good performance through parallel processing, few of them addressed thoroughly how to sustain such high throughput computing which is crucial in real-time applications. Although the affordable data transfer bandwidth has been increased tremendously during the past decade, the pressure on data communication has not yet been relieved from stream-intensive applications. The design of 2-D DWT belongs to such cases. In this paper, we expose the performance gap between the computing core and the entire system, distinguishing them by quantitative approach with metrics of peak performance and mean-time performance. In order to narrow down the discrepancy without degrading either of the two criteria, on the one hand, we introduce a software-pipelining lifting-based computing kernel to remove data dependence for peak performance, on the other hand, we apply loop fusing technique and a hierarchical pipelining method to enhance data locality and boost the mean-time performance. The architecture has been implemented in Xilinx Virtex-II FPGA, taking advantage of Virtex-II’s embedded multipliers and block RAMs. We use Daubechies (9, 7) and LeGall (5, 3) filters (the default lossy and lossless filters in JPEG2000) for illustration whereas it is a general method for other DWT filters. The post-place and routing operation frequency for Daubechies (9, 7) is 138 MHz. Notably, the mean-time performance parameterized by image size and decomposition level achieves closely to peak performance.


Discrete Wavelet Transform Peak Performance Lift Scheme Frame Buffer Data Flow Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    JPEG2000 image coding system, ISO/IEC International Standard 15444-1. ITU Recommendation T.800, 2000Google Scholar
  2. 2.
    CS6210 discrete wavelet transform. Amphion,
  3. 3.
    LB_2DFDWT: line-based programmable forward DWT. Cast Inc., cast_lb_2dfdwt.pdf
  4. 4.
    RC_2DDWT: combine 2D forward/inverse discrete wavelet transform. Cast Inc., cast_rc_2ddwt.pdf
  5. 5.
    Andra, K., Chakrabarti, C., Acharya, T.: A VLSI architecture for lifting-based forward and inverse wavelet transform. IEEE Trans. Signal Process. 50(4), 966–977 (2002)CrossRefGoogle Scholar
  6. 6.
    Chen, C-Y., Yang, Z-L., Wang, T-C., Chen, L-G.: A programmable parallel VLSI architecture for 2-D discrete wavelet transform. J. VLSI Signal Process. 28, 151–163 (2001)zbMATHCrossRefGoogle Scholar
  7. 7.
    Chesney, D.R., Cheng, B.H.: Generalising the unimodular approach. In: Proceedings of ICPADS’94, pp. 398–404 (1994)Google Scholar
  8. 8.
    Chrysafis, C., Ortega, A.: Line based, reduced memory, wavelet image compression. IEEE Trans. Image Process. 9, 378–389 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Daubechies, I., Sweldens, W.: Factoring wavelet transforms into lifting schemes. J. Fourier Anal. Appl. 4, 247–269 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Dillen, G., Georis, B., Legat, J-D., Cantineau, O.: Combined line-based architecture for the 5-3 and 9-7 wavelet transform of JPEG2000. IEEE Trans. Circuits Syst. Video Technol. 13(9), 944–950 (2003)CrossRefGoogle Scholar
  11. 11.
    García, A., Ramírez, J., Meyer-Bäse, U., Castillo, E., Lloris-Ruíz, A.: Efficient embedded FPL resource usage for MS-based polyphase DWT filter banks. In: Proceedings of FPL 2005, pp. 531–534 (2005)Google Scholar
  12. 12.
    Jiang, W., Ortega, A.: Lifting factorization-based discrete wavelet transform architecture design. IEEE Trans. Circuits Syst. Video Technol. 11(5), 651–657 (2001)CrossRefGoogle Scholar
  13. 13.
    Mallat, S.: A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 11(7), 674–693 (1989)zbMATHCrossRefGoogle Scholar
  14. 14.
    Ravasi, M., Tenze, L., Mattavelli, M.: A scalable and programmable architecture for 2-D DWT decoding. IEEE Trans. Circuits Syst. Video Technol. 12(8), 671–677 (2002)CrossRefGoogle Scholar
  15. 15.
    Twelves S, Wu M, White A (2001) JPEG2000 wavelet transform using starcore, an2089/d rev. 1 October 2001Google Scholar
  16. 16.
    Zhang, C., Long, Y., Kurdahi, F.: A scalable embedded JPEG2000 architecture. J. Syst. Arch. 53(8), 524–538 (2007)CrossRefGoogle Scholar
  17. 17.
    Zhang, C., Long, Y., Oum, S.Y., Kurdahi, F.: Software-pipelined 2-D discrete wavelet transform with VLSI hierarchical implementation. In: Proceedings of RISSP’03, pp. 148–153 (2003)Google Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  1. 1.Department of Electrical Engineering and Computer Science ET508, zotcode 2625University of CaliforniaIrvineUSA

Personalised recommendations