Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Mapping of Discrete Cosine Transforms onto Distributed Hardware Architectures

  • 97 Accesses

  • 3 Citations


We present an algorithmically-aware, high-level partitioning methodology for discrete cosine transforms (DCT) targeted to distributed hardware architectures. The methodology relies on the exploration of alternate DCT formulations as part of the partition optimization process. To the best of our knowledge, no previously proposed DCT algorithm exists that is capable of consistently producing alternate regular formulations for an n-size DCT. Hence, a new Cooley-Tukey-like DCT factorization algorithm was developed to allow exploration of alternate formulations as part of the partitioning optimization process. The use of our factorization mechanism along with a greedy strategy to explore the space of equivalent DCT formulations yielded partitioning solutions with as much as 18% reduction in latency and 83% reduction in run-time as compared to previously proposed regular DCT formulations.

This is a preview of subscription content, log in to check access.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13


  1. 1.

    Nikara, J. (2004). Application-specific parallel structures for discrete cosine transform and variable length coding. PhD thesis, Tampere University of Technology.

  2. 2.

    Hsiao, S.-F., & Tseng, J.-M. (2001). Parallel, pipelined and folded architectures for computation of 1-D and 2-D DCT in image and video codec. Journal of VLSI Signal Processing, 28(3), 205–220.

  3. 3.

    Srinivasan, V., Govindarajan, S., & Vemuri, R. (2001). Fine-grained and coarse-grained behavioral partitioning with effective utilization of memory and design space exploration for multi-FPGA architectures. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 9(1), 140–159.

  4. 4.

    Bringmann, O., Menn, C., & Rosenstiel, W. (2000). Target architecture oriented high-level synthesis for multi-FPGA based emulation. In Proceedings of the European design and test conference 2000 (pp. 326–332).

  5. 5.

    Duncan, A. A., Hendry, D. C., & Gray, P. (2001). The COBRA-ABS high-level synthesis system for multi-FPGA custom computing machines. IEEE Transactions on Very Large Scale Integration (VLSI ) Systems, 9(1), 218–223.

  6. 6.

    Arce-Nazario, R. A., Jimenez, M., & Rodriguez, D. (2006). Functionally-aware partitioning of discrete signal transforms for distributed hardware architectures. In Proceedings of the 49th midwest symposium on circuits and systems (pp. 1438–1441).

  7. 7.

    Arce-Nazario, R. A., Jimenez, M., & Rodriguez, D. (2007). Algorithmic-level exploration of discrete signal transforms for partitioning to distributed hardware architectures. IET Computers and Digital Techniques, 1(5), 557–564.

  8. 8.

    Nordin, G., Milder, P. A., Hoe, J. C., & Püschel, M., (2005). Automatic generation of customized discrete Fourier transform IPs. In Proceedings of the 2005 design automation conference (June).

  9. 9.

    Bornstein, C. F., Litman, A., Maggs, B. M., Sitaraman, R. K., & Yatzkar, T. (1998). On the bisection width and expansion of butterfly networks. In Proceedings of the 12th international parallel processing symposium (pp. 144–150) (March).

  10. 10.

    Wang, Z. (1991). Pruning the fast discrete cosine transform. IEEE Transactions on Communications, 39(5), 640–643 (May).

  11. 11.

    Püschel, M., Moura, J. M. F., Johnson, J., Padua, D., Veloso, M., Singer, B. W., et al. (2005). SPIRAL: Code generation for DSP transforms. In Proceedings of the IEEE, special issue on “Program Generation, Optimization, and Adaptation”, vol. 93(2).

  12. 12.

    Puschel, M. (2003). Cooley-Tukey FFT like algorithms for the DCT. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing, vol. 2, pp. 501–504 (April).

  13. 13.

    Takala, J., Akopian, D., Astola, J., & Saarinen, J. (2000). Constant geometry algorithm for discrete cosine transform. IEEE Transactions on Signal Processing, 48(6), 1840–1843.

  14. 14.

    Takala, J. H., Jarvinen, T. S., Salmela, P. V., & Akopian, D. A. (2001). Multi-port interconnection networks for radix-r algorithms. In Proceedings IEEE international conference on acoustics, speech, and signal processing (ICASSP ’01).

  15. 15.

    Singer, B., & Veloso, M. (2003). Learning to construct fast signal processing implementations. Journal of Machine Learning Research, 3, 887–919.

  16. 16.

    Brodersen, B., Chang, C., Wawrzynek, J., Werthimer, D., & Wright, M. (2004). BEE2: A multi-purpose computing platform for radio telescope digital signal processing applications. In International square kilometre array meeting.

Download references


This work has been performed at the University of Puerto Rico at Mayagüez with support from NSF grants CNS − 0424546 and HRD − 9817642.

Author information

Correspondence to Rafael A. Arce-Nazario.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Arce-Nazario, R.A., Jiménez, M. & Rodríguez, D. Mapping of Discrete Cosine Transforms onto Distributed Hardware Architectures. J Sign Process Syst Sign Image Video Technol 53, 367–382 (2008). https://doi.org/10.1007/s11265-008-0239-x

Download citation


  • Discrete cosine transforms
  • Distributed hardware architecture
  • Partitioning methodology