Memory Performance Optimizations For Real-Time Software HDTV Decoding

Article

Abstract

Pure software HDTV video decoding is still a challenging task on entry-level to mid-range desktop and notebook PCs, even with today’s microprocessors frequency measured in GHz. This paper shows that the performance bottleneck in a software MPEG-2 decoder has been shifted to memory operations, as microprocessor technologies including multimedia instruction extensions have been improving at a fast rate during the past years.

Our study exploits concurrencies at macroblock level to alleviate the performance bottleneck in a software MPEG-2 decoder. First, the paper introduces an interleaved block-order data layout to improve CPU cache performance. Second, the paper describes an algorithm to explicitly prefetch macroblocks for motion compensation. Finally, the paper presents an algorithm to schedule interleaved decoding and output at macroblock level. Our implementation and experiments show that these methods can effectively hide the latency of memory and frame buffer. The optimizations improve the performance of a multimedia-instruction-optimized software MPEG-2 decoder by a factor of about two. On a PC with a 933 MHz Pentium III CPU, the decoder can decode and display 1280 × 720-resolution HDTV streams at over 62 frames per second.

Keywords

MPEG-2 decompression motion compensation concurrency CPI cache locality prefetching 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    K. Patel, B.C. Smith, and L.A. Rowe, “Performance of a Software MPEG Video Decoder,” in Proceedings of the 1st ACM International Conference On Multimedia, 1993, pp. 75–82.Google Scholar
  2. 2.
    M. Ikekawa, D. Ishii, E. Murata, K. Numata, Y. Takamizawa, and M. Tanaka, “A Real-time Software MPEG- 2 Decoder For Multimedia PCs,” in International Conference on Consumer Electronics, Digest of Technical Papers, 1997, pp. 2–3.Google Scholar
  3. 3.
    R.B. Lee, “Realtime MPEG Video via Software Decompression on a PA-RISC Processor,” Compcon ‘95. “Technologies for the Information Superhighway,” 1995, pp. 186–192.Google Scholar
  4. 4.
    Y. Tung, C. Ho, and J. Wu, “MMX-based DCT and MC Algorithms for Real-Time Pure Software MPEG Decoding,” in IEEE Intl. Conf. on Multimedia Computing and Systems, vol. 1, 1999, pp. 357–362.Google Scholar
  5. 5.
    C. Zhou et al., “MPEG Video Decoding with the UltraSPARC Visual Instruction Set,” Compcon ‘95. “Technologies for the Information Superhighway”, 1995, pp. 470–477.Google Scholar
  6. 6.
    D.A. Patterson and J.L. Hennessy, Computer Organization and Design, 2nd edn. Morgan Kaufmann Publishers, 1998.Google Scholar
  7. 7.
    A. Peleg, S. Wilkie, and U. Weiser, “Intel MMX for Multimedia PCs,” Communications of the ACM, vol. 40, no. 1, 1997, pp. 25–38.CrossRefGoogle Scholar
  8. 8.
    D. LeGall, “MPEG: A Video Compression Standard for Multimedia Applications,” Communications of the ACM, vol. 34, no. 4, 1991, pp. 46–58.MathSciNetCrossRefGoogle Scholar
  9. 9.
    ISO/IEC 13818-2:2000. Information Technology—Generic Coding of Moving Pictures and Associated Audio Information: Video, 2nd edn. 2000.Google Scholar
  10. 10.
    ISO/IEC 14496-2:2001. Coding of Audio-Visual Objects—Part 2: Visual, 2nd edn. 2001.Google Scholar
  11. 11.
    M. Liou, “Overview of the p× 64 kbit/s Video Coding Standard,” Communications of the ACM, vol. 34, no. 4, 1991, pp. 59–63.CrossRefGoogle Scholar
  12. 12.
    ITU-T. Recommendation H.263: Video Coding for Low Bitrate Communication. ITU, 1995.Google Scholar
  13. 13.
    ITU-T. Recommendation H.264: Advanced Video Coding for Generic Audiovisual Services. ITU, 2003.Google Scholar
  14. 14.
    P. Ranganathan, S. Adve, and N.P. Jouppi, “Performance of Image and Video Processing with General- Purpose Processors and Media ISA Extensions,” in Proc. International Symposium on Computer Architecture, 1999, pp. 124–135.Google Scholar
  15. 15.
    W. Abu-Sufah, D.J. Kuck, and D.H. Lawrie, “Automatic Program Transformations for Virtual Memory Computers,” in Proceedings of the National Computer Conference, June 1979, pp. 969–974.Google Scholar
  16. 16.
    J.L. Elshoff, “Some Programming Techniques for Processing Multi-Dimensional Matrices in a Paging Environment,” in Proceedings of the National Computer Conference, 1974.Google Scholar
  17. 17.
    S. Coleman and K.S. McKinley, “Tile Size Selection Using Cache Organization and Data Layout,” in Proceedings of the Conference on Programming Language Design and Implementation, 1995, pp. 279–290.Google Scholar
  18. 18.
    D. Gannon, W. Jalby, and K. Gallivan, “Strategies for Cache and Local Memory Management by Global Program Transformation,” Journal of Parallel and Distributed Computing, vol. 5, 1988, pp. 587–616.CrossRefGoogle Scholar
  19. 19.
    M.D. Lam, E.E. Rothberg, and M.E. Wolf, “The Cache Performance and Optimizations of Blocked Algorithms,” in Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 1991, pp. 63–74.Google Scholar
  20. 20.
    J. Philbin, J. Edler, O.J. Anshus, C.C. Douglas, and K. Li, “Thread Scheduling For Cache Locality,” in Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, 1996, pp. 60–71.Google Scholar
  21. 21.
    N.P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache Prefetch Buffers,” in Proceedings of the 17th Annual Symposium on Computer Architecture, 1990, pp. 364–375.Google Scholar
  22. 22.
    A.J. Smith, “Cache Memories,” ACM Computing Surveys, vol. 14, no. 3, 1982, pp. 473–530.CrossRefGoogle Scholar
  23. 23.
    J.-L. Baer and T.-F. Chen, “An Effective On-chip Preloading Scheme to Reduce Data Access Penalty,” in Proceedings of the 1991 Conference on Supercomputing, 1991, pp. 176–186.Google Scholar
  24. 24.
    T.-F. Chen and J.-L. Baer, “A Performance Study of Software and Hardware Data Prefetching Schemes,” in Proceedings of the 21st Annual International Symposium on Computer Architecture, 1994, pp. 223–232.Google Scholar
  25. 25.
    A.C. Klaiber and H.M. Levy, “An Architecture for Software-Controlled Data Prefetching,” in Proceedings of the 18th Annual International Symposium on Computer Architecture, 1991, pp. 43–53.Google Scholar
  26. 26.
    D. Callahan, K. Kennedy, and A. Porterfield, “Software Prefetching,” in Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 1991, pp. 40–52.Google Scholar
  27. 27.
    T.C. Mowry, “Tolerating Latency in Multiprocessors Through Compiler-inserted Prefetching,” ACM Transactions on Computer System, vol. 16, no. 1, 1998, pp. 55–92.CrossRefGoogle Scholar
  28. 28.
    T.C. Mowry, M.S. Lam, and A. Gupta, “Design and Evaluation of a Compiler Algorithm for Prefetching,” in Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, 1992, pp. 62–73.Google Scholar
  29. 29.
    P. Ranganathan, V.S. Pai, H. Abdel-Shafi, and S.V. Adve, “The Interaction of Software Prefetching with ILP Processors in Shared-Memory Systems,” in Proceedings of the 24th International Symposium on Computer Architecture, 1997, pp. 144–156.Google Scholar
  30. 30.
    P. Soderquist and M. Leeser, “Optimizing the Data Cache Performance of a Software MPEG-2 Video Decoder,” in Proc. International Conference on Multimedia, 1997, 291–301.Google Scholar
  31. 31.
    D.F. Zucker, M.J. Flynn, and R.B. Lee, “A Comparison of Hardware Prefetching Techniques for Multimedia Benchmarks,” in Proc. of the Third IEEE International Conference on Multimedia Computing and Systems, 1996, pp. 236–244.Google Scholar
  32. 32.
    D.F. Zucker, M.J. Flynn, and R.B. Lee, “Improving Performance for Software MPEG Players,” Compcon ‘96. Technologies for the Information Superhighway, 1996, pp. 327–332.Google Scholar
  33. 33.
    D.F. Zucker, R.B. Lee, and M.J. Flynn, “An Automated Method for Software Controlled Cache Prefetching,” in Proceedings of the Thirty-First Hawaii International Conference on System Sciences, vol. 7, 1998, pp. 106–114.Google Scholar
  34. 34.
    D.F. Zucker, R.B. Lee, and M.J. Flynn, “Hardware and Software Cache Prefetching Techniques for MPEG Benchmarks,” in IEEE Transactions on Circuits and Systems for Video Technology, vol. 10, no. 5, 2000, pp. 782–796.CrossRefGoogle Scholar
  35. 35.
    R. Cucchiara, M. Piccardi, and A. Prati, “Exploiting Cache in Multimedia,” in IEEE International Conference on Multimedia Computing and System, vol. 1, 1999, pp. 345–350.Google Scholar
  36. 36.
    R. Cucchiara, M. Piccardi, and A. Prati, “Hardware Prefetching Techniques for Cache Memories in Multimedia Applications,” in Proceedings of the 5th IEEE International Workshop on Computer Architectures for Machine Perception, 2000, pp. 311–319.Google Scholar
  37. 37.
    Y.-K. Chen, E. Debes, R. Lienhart, M. Holliman, and M. Yeung, “Evaluating and Improving Performance of Multimedia Applications on Simultaneous Multi-Threading,” in Proceedings of International Conference on Parallel and Distributed Systems, 2002.Google Scholar
  38. 38.
    L. Peng, J. Song, S. Ge, and Y.-K.Chen, “Case Studies: Memory Behavior of Multithreaded Multimedia and AI Applications,” in Proceedings of Workshop on Computer Architecture Evaluation using Commercial Workloads, 2004, pp. 33–40.Google Scholar
  39. 39.
    Microsoft Corp. Visual C++ 6.0 with Service Pack 5. http://msdn.microsoft.com/visualc/
  40. 40.
    Intel Corp. VTune Performance Analyzer,” http://developer.intel.com/software/products/vtune/
  41. 41.
    S. Eckart and C.E. Fogg, “ISO/IEC MPEG-2 Software Video Codec,” in Proc. Digital Video Compression: Algorithms and Technologies 1995, SPIE, 1995, pp. 100–109.Google Scholar
  42. 42.
    Y. Arai, T. Agui, and M. Nakajima, “A Fast DCT-SQ Scheme for Images,” in Transactions of the IEICE, no. 11, November 1988, pp. 1095–1097.Google Scholar
  43. 43.
    Intel Corp, “Application Note AP-529: Using MMX Instructions to Implement Optimized Motion Compensation for MPEG1 Video Playback,” Archived at http://www.cae.wisc.edu/~ece734/mmx/AP-529.html.
  44. 44.
    P. Denning, “Virtual Memory,” Computing Surveys, vol. 2, no. 3, 1970, pp. 169.CrossRefGoogle Scholar
  45. 45.
    M.J. Holliman, E.Q. Li, and Y.-K. Chen, “MPEG Decoding Workload Characterization,” in Proceedings of Workshop on Computer Architecture Evaluation using Commercial Workloads, Feb. 2003, pp. 23–34.Google Scholar
  46. 46.
    Intel Corp, “Intel Architecture Optimization Reference Manual,” http://www.intel.com/design/pentiumii/manuals/245127.htm
  47. 47.
    Intel Corp. Intel Architecture Software Developer’s Manual Volume 3: System Programming, Chapter 9, Memory Cache Control,” http://developer.intel.com/design/pentiumii/manuals/243192.htm
  48. 48.
    M.D. Hill, “Aspects of Cache Memory and Instruction Buffer Performance,” PhD thesis, Computer Science Division, University of California at Berkeley, 1987.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  1. 1.IBM TJ Watson Research CenterNYUSA
  2. 2.Princeton UniversityPrincetonUSA
  3. 3.AT&T Labs ResearchFlorham ParkUSA

Personalised recommendations