Skip to main content
Log in

A Structured Fast Algorithm for the VLSI Pipeline Implementation of Inverse Discrete Cosine Transform

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

The forward and inverse DCT has many applications in digital signal processing area, but, due to its high arithmetic complexity, it is necessary to find efficient software implementations or even to find VLSI implementations for them. Existing fast algorithms for IDCT or DCT have a SFG graph that is not very regular and modular and, even more importantly, the topology of the interconnection network is not easy to implement in VLSI due to the so-called retrograde indexing. Due to this problem, there are few pipeline implementations for IDCT and DCT, although pipelining is an efficient engineering solution that allows high speed performance with a reduced hardware complexity and power consumption. In this paper, we present an efficient solution to successfully reformulate the IDCT algorithm with a focus on developing a modular and regular computation structure that can be easily implemented using a pipelined VLSI architecture. Using new restructuring input sequences that can be computed in parallel with the new SFG graph in a pipeline manner, a novel efficient fast algorithm for the computation of inverse discrete cosine transform is presented. The obtained SFG graph has the best structure that can be obtained for IDCT, avoiding the so-called retrograde indexing and being highly regular and modular. Moreover, the obtained SFG graph is scalable, being easy to extend to larger values of N that is a power of 2. It can also be used to obtain a generalization of a radix-2 algorithm for length \(N = p \cdot 2^{m}\), where “p” is a prime number. This algorithm is based on a recursive decomposition of the computation of the inverse DCT that requires a reduced number of arithmetic operations and has a regular and simple computational structure that can be easily implemented in VLSI in a pipeline manner. Its main advantages are its simple, regular and modular computational structure and its high potential to be pipelined so that it can be used to obtain an efficient pipeline VLSI implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

This manuscript has no associated data.

References

  1. M. Ayinala, K.K. Parhi, FFT architectures for real-valued signals based on radix-23 and radix-24 algorithms. IEEE Trans. Circuits Syst. Regul. Pap. 60(9), 2422–2430 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  2. F.M. Bayer, R.J. Cintra, A. Madanayake, U.S. Potluri, Multiplierless approximate 4-point DCT VLSI architectures for transform block coding. Electron. Lett. 49(24), 1532–1534 (2013)

    Article  Google Scholar 

  3. F.M. Bayer, R.J. Cintra, DCT-like transform for image compression requires 14 additions only. Electron. Lett. 48(15), 919–921 (2012)

    Article  Google Scholar 

  4. S. Boussakta, O. Alshibami, Fast algorithm for the 3-D DCT-II. IEEE Trans. Signal Process. 52(7), 992–1001 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  5. Y.H. Chan, W.C. Siu, On the realization of discrete cosine transform using the distributed arithmetic. IEEE Trans. Circuits Syst. 39(9), 705–712 (1992)

    Article  MATH  Google Scholar 

  6. Y.-H. Chan, W.-C. Siu, Generalized approach for the realization of discrete cosine transform using cyclic convolutions, in Proceedings of IEEE Conference on Acoustics, Speech, and Signal Processing, Minneapolis, USA, 277–280 (1993)

  7. C.-H. Chang, C.-L. Wang, Y.-T. Chang, Efficient VLSI architectures for fast computation of the discrete Fourier transform and its inverse. IEEE Trans. Signal Process. 48(11), 3206–3216 (2000)

    Article  MATH  Google Scholar 

  8. Y.N. Chang, An efficient VLSI architecture for normal I/O order pipeline FFT design. IEEE Trans. Circuits Syst. Exp. Briefs 55(12), 1234–1238 (2008)

    Article  Google Scholar 

  9. Z. Chen, Q. Han, W.-K. Cham, Low-complexity order-64 integer cosine transform design and its application in HEVC. IEEE Trans. Circuits Syst. Video Technol. 28(9), 2407–2412 (2018)

    Article  Google Scholar 

  10. C. Cheng, K.K. Parhi, A novel systolic array structure for DCT. IEEE Trans. Circuits Syst. Express Briefs 52(7), 366–369 (2005)

    Article  Google Scholar 

  11. C. Cheng, K.K. Parhi, High-throughput VLSI architecture for FFT computation. IEEE Trans. Circuits Syst. Express Briefs 54(10), 863–867 (2007)

    Article  Google Scholar 

  12. C. Cheng, K.K. Parhi, Hardware efficient fast DCT based on novel cyclic convolution structures. IEEE Trans. Signal Process. 54(11), 4419–4434 (2006)

    Article  MATH  Google Scholar 

  13. D.F. Chiper, A new systolic array algorithm for memory-based VLSI array implementation of DCT. In Proceedings of Second IEEE Symposium on Computers and Communications (ISCC 1997). p. 297–301 (1997)

  14. D.F. Chiper, M.N.S. Swamy, M.O. Ahmad, An efficient unified framework for the VLSI implementation of a prime-length DCT/IDCT with high throughput. IEEE Trans. Signal Process. Regul. Pap. 54(6), 2925–2936 (2007)

    Article  MATH  Google Scholar 

  15. D.F. Chiper, P. Ungureanu, Novel VLSI algorithm and architecture with good quantization properties for a high-throughput area efficient systolic array implementation of DCT. EURASIP J. Adv. Signal Process. 1, 1–14 (2011)

    Google Scholar 

  16. D.F. Chiper, A structured dual split-radix algorithm for the discrete Hartley transform of length 2N. Circuits Syst. Signal Process. 37(1), 290–304 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  17. R.J. Cintra, F.M. Bayer, C.J. Tablada, Low-complexity 8-point DCT approximations based on integer functions. Signal Process. 99, 201–214 (2014)

    Article  Google Scholar 

  18. R.J. Cintra, F.M. Bayer, A DCT approximation for image compression. IEEE Signal Process. Lett. 18(10), 579–582 (2011)

    Article  Google Scholar 

  19. V.A. Coutinho, R.J. Cintra, F.M. Bayer, Low-complexity multidimensional DCT approximations for high-order tensor data Decorrelation. IEEE Trans. Image Process. 26(5), 2296–2310 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  20. W.H. Fang, M.L. Wu, An efficient unified systolic architecture for the computation of discrete trigonometric transforms. in Proceedings of IEEE Symposium on Circuits and Systems ISCAS 1997), vol. 3, p. 2092–2095 (1997)

  21. W.H. Fang, M.L. Wu, Unified fully-pipelined implementations of one- and two-dimensional real discrete trigonometric transforms. IEICE Trans. Fund. Electron. Commun. Comput. Sci. 82(10), 2219–2230 (1999)

    Google Scholar 

  22. M. Garrido, K.K. Parhi, A pipelined FFT architecture for real-valued signals. IEEE Trans. Circuits Syst. Regul. Pap. 56(12), 2634–2643 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  23. J.I. Guo, C.M. Liu, C.W. Jen, A new array architecture for prime-length discrete cosine transform. IEEE Trans. Signal Process. 41, 1 (1993)

    MATH  Google Scholar 

  24. H. Hsiao, L. Chen, T.-D. Chiueh, C.-T. Chen, High throughput CORDIC-based systolic array design for the discrete cosine transform. IEEE Trans. Circuits Syst. Video Technol. 5(3), 218–255 (1995)

    Article  Google Scholar 

  25. S.F. Hsiao, W.R. Shiue, J.M. Tseng, A cost efficient and fully-pipelinable architecture for DCT/IDCT. IEEE Trans. Consum. Electron. 45(3), 515–525 (1999)

    Article  Google Scholar 

  26. S.F. Hsiao, W.R. Shiue, J.M. Tseng, Design and implementation of a novel linear array DCT/IDCT processor with complexity of order. IEEE Proc. Vis. Images Signal Process. 147(10), 400–408 (2000)

    Article  Google Scholar 

  27. S.F. Hsiao, W.R. Shiue, J.M. Tseng, Design and implementation of a novel linear-array DCT/IDCT processor with complexity of order log2N. IEE Proc. Vis. Image Signal Process. 147, 5 (2000)

    Article  Google Scholar 

  28. H.S. Hou, A fast recursive algorithm for computing the discrete cosine transform. IEEE Trans. Acoust. Speech Signal Process. 35(10), 1445–1461 (1987)

    Google Scholar 

  29. Y.M. Huang, J.L. Wu, C.L. Chang, A generalized output pruning algorithm for matrix-vector multiplication and its application to compute pruning discrete cosine transform. IEEE Trans. Signal Process. 48(2), 561–563 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  30. M. Jridi, A. Alfalou, P.K. Meher, A generalized algorithm and reconfigurable architecture for efficient and scalable orthogonal approximation of DCT. IEEE Trans. Circuits Syst. Regul. Pap. 62(2), 449–457 (2015)

    Article  Google Scholar 

  31. M. Jridi, P.K. Meher, A scalable approximate DCT architectures for efficient HEVC compliant video coding. IEEE Trans. Circuits Syst. Video Technol. 27(8), 1815–1825 (2017)

    Article  Google Scholar 

  32. D.W. Kim, et al., A compatible DCT/IDCT architecture using hardwired distributed arithmetic. in Proceedings of IEEE International Symposium on Circuits and Systems, (ISCAS’2001), vol. II, p. 457–460 ( 2001)

  33. C.W. Kok, Fast algorithm for computing discrete cosine transform. IEEE Trans. Signal Process. 45(3), 757–760 (1997)

    Article  Google Scholar 

  34. S.Y. Kung, VLSI Array Processors (Prentice Hall, Englewood Cliffs, 1988).

    Google Scholar 

  35. B.G. Lee, A new algorithm for computing the discrete cosine transform. IEEE Trans. Acoust. Speech Signal Process. 32(12), 1243–1245 (1984)

    MATH  Google Scholar 

  36. A. Madisetti, A.N. Wilson, A 100 Mhz 2-D 8x8 DCT/IDCT processor for HDTV applications. IEEE Trans. Circuits Syst. Video Technol. 5(2), 158–165 (1995)

    Article  Google Scholar 

  37. V. Madisetti, W. Douglas, Digital Signal Processing Handbook on CD (CRC Press, Boca Raton, 1999).

    Google Scholar 

  38. E.P. Mariatos, D.E. Metafas, J.A. Hallas, C.E. Goutis, A fast DCT processor based on special purpose CORDIC rotators. in Proceedings of IEEE International Symposium on Circuits and Systems, (ISCAS 1994), vol.4, p. 271–274 (1994)

  39. M. Masera, G. Masera, M. Martina, An area-efficient variable-size fixed-point DCT architecture for HEVC encoding. IEEE Trans. Circuits Syst. Video Technol. 30(1), 232–242 (2020)

    Article  Google Scholar 

  40. P.K. Meher, Systolic designs for DCT using low-complexity concurrent convolutional formulation. IEEE Trans. Circuits Syst. Video Technol. 16(9), 1041–1050 (2006)

    Article  Google Scholar 

  41. J. Nikara, J. Takola, D. Akopian, J. Saarinen, Pipeline architecture for DCT/IDCT. in Proceedings of IEEE International Symposium on Circuits and Systems, (ISCAS 2001), vol. 4, p. 902–905 (2001)

  42. U.S. Potluri, A. Madanayake, R.J. Cintra, F.M. Bayer, S. Kulasekera, A. Edirisuriya, Improved 8-point approximate DCT for image and video compression requiring only 14 additions. IEEE Trans. Circuits Syst. 61(6), 1727–1740 (2014)

    Article  Google Scholar 

  43. M.T. Sun, T.C. Chen, A.M. Gottlieb, VLSI implementation of a 16_16 discrete cosine transform. IEEE Trans. Circuits Syst. 36(4), 610–617 (1989)

    Article  Google Scholar 

  44. C.J. Tablada, F.M. Bayer, R.J. Cintra, A class of DCT approximations based on the Feig-Winograd algorithm. Signal Process. 113(8), 38–51 (2015)

    Article  Google Scholar 

  45. E.H. Wold, A.M. Derspain, Pipeline and parallel-pipeline FFT processors for VLSI implementations. IEEE Trans. Comput. 33(5), 414–426 (1984)

    Article  MATH  Google Scholar 

  46. T. Xanthopoulos, A.P. Chandrakasan, A low-power IDCT macrocell for MPEG2MP@ML exploring data distribution properties for minimal activity. IEEE J. Solid-State Circuits 34(5), 693–703 (1999)

    Article  Google Scholar 

  47. J. Xie, P.K. Meher, J. He, Hardware-efficient realization of prime-length DCT based on distributed arithmetic. IEEE Trans. Comput. 62(6), 1170–1178 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  48. S. Yu, E.E. Swartzlander, DCT implementation with distributed arithmetic. IEEE Trans. Comput. 50(9), 985–991 (2001)

    Article  Google Scholar 

  49. H.D. Yun, S.U. Lee, On the fixed-point-error analysis of several fast DCT algorithms. IEEE Trans. Circuits Syst. Video Technol. 3(2), 27–41 (1993)

    Google Scholar 

  50. F. Zhou, P. Komerup, High speed DCT/IDCT using a pipelined CORDIC algorithm. in Proceedings of 12th Symposium on Computer Architecture, p. 180–187, (1995).

Download references

Acknowledgement

This work was supported by a grant of the Romanian Ministry of Education and Research, CNCS – UEFISCDI, project number PN-III-P4-ID-PCE2020-0713, within PNCDI III.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Doru Florin Chiper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chiper, D.F. A Structured Fast Algorithm for the VLSI Pipeline Implementation of Inverse Discrete Cosine Transform. Circuits Syst Signal Process 40, 5351–5366 (2021). https://doi.org/10.1007/s00034-021-01718-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-021-01718-5

Keywords

Navigation