Journal of Signal Processing Systems

, Volume 60, Issue 2, pp 183–210 | Cite as

Optimizing the H.264/AVC Video Encoder Application Structure for Reconfigurable and Application-Specific Platforms

  • Muhammad Shafique
  • Lars Bauer
  • Jörg Henkel


The H.264/AVC video coding standard features diverse computational hot spots that need to be accelerated to cope with the significantly increased complexity compared to previous standards. In this paper, we propose an optimized application structure (i.e. the arrangement of functional components of an application determining the data flow properties) for the H.264 encoder which is suitable for application-specific and reconfigurable hardware platforms. Our proposed application structural optimization for the computational reduction of the Motion Compensated Interpolation is independent of the actual hardware platform that is used for execution. For a MIPS processor we achieve an average speedup of approximately 60× for Motion Compensated Interpolation. Our proposed application structure reduces the overhead for Reconfigurable Platforms by distributing the actual hardware requirements amongst the functional blocks. This increases the amount of available reconfigurable hardware per Special Instruction (within a functional block) which leads to a 2.84× performance improvement of the complete encoder when compared to a Benchmark Application with standard optimizations. We evaluate our application structure by means of four different hardware platforms.


H.264 MPEG-4 AVC Motion compensation Motion estimation Rate distortion In-loop de-blocking filter ASIP Reconfigurable platform RISPP Special instructions Hardware accelerators 


  1. 1.
    ITU-T Rec. H.264 and ISO/IEC 14496-10:2005 (E) (MPEG-4 AVC) “Advanced video coding for generic audiovisual services”, 2005.Google Scholar
  2. 2.
    ITU-T H.264 reference software version JM 13.2. Retrieved from
  3. 3.
    X264—a free H.264/AVC encoder. Retrieved from
  4. 4.
    Chen, Z., Zhou, P., & He, Y. (2002). Fast integer pel and fractional pel motion estimation for JVT, JVT-F017, 6th JVT Meeting, Awaji, December.Google Scholar
  5. 5.
    Raja, G., & Mirza, M. J. (2004). Performance comparison of advanced video coding H.264 standard with baseline H.263 and H.263+ standards. IEEE International Symposium on Communications and Information Technology (ISCIT), 2, 743–746.CrossRefGoogle Scholar
  6. 6.
    Wiegand, T., Sullivan, G. J., Bjntegaard, G., & Luthra, A. (2003). Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 560–576. doi: 10.1109/TCSVT.2003.815165 (CSVT).CrossRefGoogle Scholar
  7. 7.
    Ostermann, J., et al. (2004). Video coding with H.264/AVC: tools, performance, and complexity. IEEE Circuits and Systems Magzine, 4(1), 7–28. doi: 10.1109/MCAS.2004.1286980.CrossRefGoogle Scholar
  8. 8.
    Wiegand, T., et al. (2003). Rate-constrained coder control and comparison of video coding standards. IEEE Transactions on Circuits and Systems for Video Technology, 13(7), 688–703. doi: 10.1109/TCSVT.2003.815168 (CSVT).CrossRefGoogle Scholar
  9. 9.
    Bjontegaard, G. (2001). Calculation of average PSNR differences between RD-curves. ITU-T SG16 Doc. VCEG-M33.Google Scholar
  10. 10.
    Ziauddin, S. M., ul-Haq, I., Nadeem, M., & Shafique, M. Methods and systems for providing low cost robust operational control for video encoders, Pub. Date: Sept. 6, 2007; Patent Pub. No. US-2007-0206674-A1, Class: 375240050 (USPTO).Google Scholar
  11. 11.
    Yuan, W., Lin, S., Zhang, Y., Yuan, W., & Luo, H. (2006). Optimum bit allocation and rate control for H. 264/AVC. IEEE Transactions on Circuits and Systems for Video Technology, 16(6), 705–715. doi: 10.1109/TCSVT.2006.875215 (CSVT).CrossRefGoogle Scholar
  12. 12.
    Milani, S., et al. (2003). A rate control algorithm for the H.264 encoder. Baiona Workshop on Signal Processing in Communications.Google Scholar
  13. 13.
    Xtensa, L.X.: 2 processor, Tensilica Inc. Retrieved from
  14. 14.
    Xtensa, L.X.: 2 I/O Bandwidth. Retrieved from
  15. 15.
    CoWare Inc: LISATek. Retrieved from
  16. 16.
    Arctangent processor. Retrieved from
  17. 17.
    Chen, T. C., Lian, C. J., & Chen, L. G. (2006). Hardware architecture design of an H.264/AVC video codec, Asia and South Pacific Conference on Design Automation (ASP-DAC), pp. 750–757.Google Scholar
  18. 18.
    Reconfigurable Instruction Cell Array, U.K. Patent Application Number 0508589.9.Google Scholar
  19. 19.
    Major, A., Yi, Y., Nousias, I., Milward, M., Khawam, S., & Arslan, T. (2006). H.264 Decoder implementation on a dynamically reconfigurable instruction cell based architecture. IEEE International SOC Conference, pp. 49–52.Google Scholar
  20. 20.
    Lee, W. H., & Kim, J. H. (2006). “H.264 Implementation with Embedded Reconfigurable Architecture”, IEEE International Conference on Computer and Information Technology (CIT), pp. 247–251.Google Scholar
  21. 21.
    The XPP team. (2002). The XPP White Paper, PACT Corporation, Release 2.1, pp. 1–4.Google Scholar
  22. 22.
    May, F. (2004). “PACT XPP virtual platform based on AXYS maxSim 5.0”, PACT Corporation, Revision 0.3, pp. 12.Google Scholar
  23. 23.
    Berekovic, M., Kanstein, A., Desmet, D., Bartic, A., Mei, B., & Mignolet, J. (2005). Mapping of video compression algorithms on the ADRES coarse-grain reconfigurable array. Workshop on Multimedia and Stream Processors, Barcelona, November 12.Google Scholar
  24. 24.
    Veredas, F. J., Scheppler, M., Moffat, W., & Mei, B. (2005). Custom implementation of the coarse-grained reconfigurable ADRES Architecture for multimedia purposes. IEEE International Conference on Field Programmable Logic and Applications (FPL), pp. 106–111.Google Scholar
  25. 25.
    Mei, B., Veredas, F. J., & Masschelein, B. (2005). Mapping an H.264/AVC decoder onto the ADRES reconfigurable architecture. IEEE International Conference on Field Programmable Logic and Applications (FPL), pp. 622–625.Google Scholar
  26. 26.
    Martina, M., Masera, G., Fanucci, L., & Saponara, S. (2006). Hardware co-processors for real-time and high-quality H.264/avc video coding, 14th European Signal Processing Conference (EUSIPCO), pp. 200–204.Google Scholar
  27. 27.
    Yang, L., et al. (2005). An effective variable block-size early termination algorithm for H.264 video coding. IEEE Transactions on Circuits and Systems for Video Technology, 15(6), 784–788. doi: 10.1109/TCSVT.2005.848306 (CSVT).CrossRefGoogle Scholar
  28. 28.
    Lahti, J., et al. (2005). Algorithmic optimization of H.264/AVC encoder. IEEE International Symposium on Circuits and Systems (ISCAS), 4, 3463–3466.CrossRefGoogle Scholar
  29. 29.
    Kant, S., Mithun, U., & Gupta, P. (2006). Real time H.264 video encoder implementation on a programmable DSP processor for videophone applications. International Conference on Consumer Electronics (ICCE), pp. 93–94.Google Scholar
  30. 30.
    Zhou, X., Yu, Z. H., & Yu, S. Y. (1998). Method for detecting all-zero DCT coefficients ahead of discrete cosine transform and quantization. Electronics Letters, 34(19), 1839–1840. doi: 10.1049/el:19981308.CrossRefGoogle Scholar
  31. 31.
    Yang, J. F., Chang, S. H., & Chen, C. Y. (2002). Computation reduction for motion search in low rate video coders. IEEE Transactions on Circuits and Systems for Video Technology, 12(10), 948–951. doi: 10.1109/TCSVT.2002.804892 (CSVT).CrossRefGoogle Scholar
  32. 32.
    Yu, A., Lee, R., & Flynn, M. (1997). Performance enhancement of H.263 encoder based on zero coefficient prediction. ACM International Conference on Multimedia, pp. 21–29.Google Scholar
  33. 33.
    Suh, K. B., Park, S. M., & Cho, H. J. (2005). An efficient hardware architecture of intra prediction and TQ/IQIT module for H.264 encoder. ETRI Journal, 27(5), 511–524.CrossRefGoogle Scholar
  34. 34.
    Agostini, L., et al. (2006). High throughput architecture for H.264/AVC forward transforms block. ACM Great Lakes symposium on VLSI (GLSVLSI), pp. 320–323.Google Scholar
  35. 35.
    Luczak, A., & Garstecki, P. (2005). A flexible architecture for image reconstruction in H.264/AVC decoders (vol. 1). European Conference Circuit Theory and Design, pp. I/217–I/220.Google Scholar
  36. 36.
    Deng, L., Gao, W., Hu, M. Z., & Ji, Z. Z. (2005). An efficient hardware implementation for motion estimation of AVC standard. IEEE Transactions on Consumer Electronics, 51(4), 1360–1366. doi: 10.1109/TCE.2005.1561868.CrossRefGoogle Scholar
  37. 37.
    Yap, S. Y., et al. (2005). A fast VLSI architecture for full-search variable block size motion estimation in MPEG-4 AVC/H.264. Asia and South Pacific Conference on Design Automation (ASP-DAC), pp. 631–634.Google Scholar
  38. 38.
    Ou, C.-M., Le, C.-F., & Hwang, W.-J. (2005). An efficient VLSI architecture for H.264 variable block size motion estimation. IEEE Transactions on Consumer Electronics, 51(4), 1291–1299. doi: 10.1109/TCE.2005.1561858.CrossRefGoogle Scholar
  39. 39.
    Suh, J. W., & Jeong, J. (2004). Fast sub-pixel motion estimation techniques having lower computational complexity. IEEE Transactions on Consumer Electronics, 50(3), 968–973. doi: 10.1109/TCE.2004.1341708.CrossRefGoogle Scholar
  40. 40.
    Min, K. Y., & Chong, J. W. (2007). A memory and performance optimized architecture of deblocking filter in H.264/AVC. International Conference on Multimedia and Ubiquitous Engineering (MUE), pp. 220–225.Google Scholar
  41. 41.
    Shih, S. Y., Chang, C. R., & Lin, Y. L. (2006). A near optimal deblocking filter for H.264 advanced video coding. Asia and South Pacific Conference on Design Automation (ASP-DAC), pp. 170–175.Google Scholar
  42. 42.
    Parlak, M., & Hamzaoglu, I. (2006). An efficient hardware architecture for H.264 adaptive deblocking filter algorithm. First NASA/ESA Conference on Adaptive Hardware and Systems (AHS), pp. 381–385.Google Scholar
  43. 43.
    Chen, C.-M., & Chen, C.-H. (2007). An efficient pipeline architecture for deblocking filter in H.264/AVC. IEICE Transactions on Information and Systems, E 90–D(1), 99–107.Google Scholar
  44. 44.
    Arbelo, C., Kanstein, A., Lopez, S., Lopez, J. F., Berekovic, M., Sarmiento, R., et al. (2007). Mapping control-intensive video kernels onto a coarse-grain reconfigurable architecture: the H.264/AVC deblocking filter. Design, Automation, and Test in Europe (DATE), pp. 1–6.Google Scholar
  45. 45.
    Hwang, H., Oh, T., Jung, H., & Ha, S. (2006). Conversion of reference C code to dataflow model H.264 encoder case study. Asia and South Pacific Conference on Design Automation (ASP-DAC), pp. 152–157.Google Scholar
  46. 46.
    Lim, K. P., Wu, S., Wu, D. J., Rahardja, S., Lin, X., Pan, F., et al. (2003). Fast Inter Mode Selection, JVT-I020, 9th JVT Meeting, San Diego, United States, September.Google Scholar
  47. 47.
    Hu, Y., Li, Q., Ma, S., & Kuo, C.-C.J. (2007). Fast H.264/AVC inter-mode decision with RDC optimization. International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP), pp. 511–516.Google Scholar
  48. 48.
    Pan, F., Lin, X., Rahardja, S., Lim, K. P., Li, Z. G., Feng, G.N., Wu, D., & Wu, S. (2003). “Fast Mode Decision for Intra Prediction”, JVT-G013, 7th JVT Meeting, Pattaya, Thailand, March.Google Scholar
  49. 49.
    Bauer, L., Shafique, M., Kramer, S., & Henkel, J. (2007). RISPP: rotating instruction set processing platform, 44th Design Automation Conference (DAC), pp. 791–796.Google Scholar
  50. 50.
    Bauer, L., Shafique, M., Teufel, D., & Henkel, J. (2007). A self-adaptive extensible embedded processor. International Conference on Self-Adaptive and Self-Organizing Systems (SASO), pp. 344–347.Google Scholar
  51. 51. Test Media. Retrieved from
  52. 52.
    Vassiliadis, S., et al. (2004). The MOLEN polymorphic processor. IEEE Transactions on Computers, 53(11), 1363–1375. doi: 10.1109/TC.2004.104.CrossRefGoogle Scholar
  53. 53.
    Vassiliadis, S., & Soudris, D. (2007). Fine- and coarse-grain reconfigurable computing. Berlin: Springer.CrossRefGoogle Scholar
  54. 54.
    Henkel, J. (2003). Closing the SoC design gap. IEEE Computer, 36(9), 119–121 (September).Google Scholar

Copyright information

© Springer Science + Business Media, LLC 2008

Authors and Affiliations

  1. 1.Chair for Embedded SystemsUniversity of KarlsruheKarlsruheGermany

Personalised recommendations