Mapping IDCT of MPEG2 on Coarse-Grained Reconfigurable Array for Matching 1080p Video Decoding
Coarse-grained reconfigurable array (CGRA) can achieve flexible and highly efficiencies for computing-intensive application such as multimedia, baseband processing and etc. MPEG2 is a popular multimedia algorithm which suits for CGRA. IDCT takes around 29 % of total time for MEPG2 Decoding, which is one of main parts of MPEG2. IDCT belongs to computation-intensive which fits for CGRA. The paper explores the parallelism of IDCT algorithm, mapping it on coarse-grained reconfigurable array. The simulation result shows 693 clock cycles are needed to complete 8 × 8 IDCT on REMUS, the cycles needed is just 36 % of XPP, just 24.7 % of ARM. The method improves performance for MPEG2 decoding. The performance fulfils MPEG2 decoding for 1080p @30 fps streams when employs 200 MHz clock frequency.
KeywordsIDCT MPEG2 CGRA Mapping REMUS
This work is supported in part by the China National High Technologies Research Program (No. 2012AA012701), the Tsinghua Information S&T National Lab Creative Team Project, the International S&T Cooperation Project of China grant (No. 2012DFA11170), the Tsinghua Indigenous Research Project (No. 20111080997), the Special Scientific Research Funds for Commonweal Section (No. 200903010), the Science and Technology Project of Jiangxi Province (No. 20112BBF60050) and the NNSF of China grant (No. 61274131).
- 1.XPP-III Processor Overview White Paper (2006)Google Scholar
- 2.Veredas F-J, Scheppler M et al (2005) Custom implementation of the coarse-grained reconfigurable ADRES architecture for multimedia purposes. In: International conference on field programmable logic and applications, 2005Google Scholar
- 3.Zhu M, Liu L, Yin S et al (2010) A reconfigurable multi-processor SoC for media applications. In: IEEE international symposium on circuits and systems, 2010Google Scholar
- 4.“MPEG-2 White Paper (2000)Google Scholar
- 5.Holliman M, YK Chen (2003) MPEG decoding workload characterization. In; Proceedings of workshop on computer architecture evaluation using commercial workloads 2003Google Scholar
- 6.Swamy R, Khorasani M, Liu Y, Elliott D, Bates S (2005) A fast pipelined implementation of a two-dimensional in verse discrete cosine transform. In: Conference on electrical and computer engineering 2005Google Scholar
- 7.Fang Bo et al (2005) Techniques for efficient DCT/IDCT implementation on generic GPU. In: IEEE international symposium on circuits and systems 2005Google Scholar
- 8.Winger LL Source adaptive software 2D iDCT with SIMD. In: IEEE international conference on acoustics, speech, and signal processing 2000Google Scholar
- 9.Wikipedia [Online]. Available: http://en.wikipedia.org/wiki/Discrete_cosine_transform
- 10.Rettberg A et al (2001) A fast asynchronous re-configurable architecture for multimedia applications. In: 14th symposium on integrated circuits and systems design 2001Google Scholar
- 11.Smit LT et al (2007) Implementation of a 2-D 8 × 8 IDCT On the Reconfigurable Montium Core”, International Conference on Field Programmable Logic and Applications, 2007Google Scholar