Performance Scalability of Multimedia Instruction Set Extensions
Current media ISA extensions such as Sun’s VIS consist of SIMD-like instructions that operate on short vector registers. In order to exploit more parallelism in a superscalar processor provided with such instructions, the issue width has to be increased. In the Complex Streamed Instruction (CSI) set exploiting more parallelism does not involve issuing more instructions. In this paper we study how the performance of superscalar processors extended with CSI or VIS scales with the amount of parallel execution hardware. Results show that the performance of the CSI-enhanced processor scales very well. For example, increasing the datapath width of the CSI execution unit from 16 to 32 bytes improves the kernel-level performance by a factor of 1.56 on average. The VIS-enhanced machine is unable to utilize large amounts of parallel execution hardware efficiently. Due to the huge number of instructions that need to be executed, the decode-issue logic constitutes a bottleneck.
Unable to display preview. Download preview PDF.
- 1.D. Burger and T.M. Austin. The SimpleScalar Tool Set, Version 2.0. Technical Report 1342, Univ. of Wisconsin-Madison, Comp. Sci. Dept., 1997.Google Scholar
- 2.Jesus Corbal, Mateo Valero, and Roger Espasa. Exploiting a New Level of DLP in Multimedia Applications. In MICRO 32, 1999.Google Scholar
- 3.M. Gries. The Impact of Recent DRAM Architectures on Embedded Systems Performance. In EUROMICRO 26, 2000.Google Scholar
- 4.L. Gwennap. AltiVec Vectorizes PowerPC. Microprocessor Report, 12(6), 1998.Google Scholar
- 5.J.L. Hennessy and D.A. Patterson. Computer Architecture-A Quantitative Approach. Morgan Kaufmann, second edition, 1996.Google Scholar
- 6.Kai Hwang and Faye A. Briggs. Computer Architecture and Parallel Processing. McGraw-Hill, second edition, 1984.Google Scholar
- 7.PC SDRAM Specification, Rev 1.7. Intel Corp., November 1999.Google Scholar
- 8.B. Juurlink, D. Tcheressiz, S. Vassiliadis, and H. Wijshoff. Implementation and Evaluation of the Complex Streamed Instruction Set. In Int. Conf. on Parallel Architectures and Compilation Techniques (PACT), 2001.Google Scholar
- 10.C. Lee, M. Potkonjak, and W.H. Mangione-Smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communication Systems. In MICRO 30, 1997.Google Scholar
- 11.S. Palacharla, N.P. Jouppi, and J.E. Smith. Complexity-Effective Superscalar Processors. In ISCA’97, 1997.Google Scholar
- 12.Alex Peleg, Sam Wilkie, and Uri Weiser. Intel MMX for Multimedia PCs. Communications of the ACM, 40(1):24–38, January 1997.Google Scholar
- 13.P. Ranganathan, S. Adve, and N.P. Jouppi. Performance of Image and Video Processing with General-Purpose Processors and Media ISA Extensions. In ISCA 26, pages 124–135, 1999.Google Scholar
- 14.Shreekant Thakkar and Tom Huff. The Internet Streaming SIMD Extensions. Intel Technology Journal, May 1999.Google Scholar
- 15.Marc Tremblay, J. Michael O’Conner, Venkatesh Narayanan, and Lian He. VIS Speeds New Media Processing. IEEE Micro, 16(4):10–20, August 1996.Google Scholar
- 16.VIS Software Developer’s Kit. Available at http://www.sun.com/processors/oem/vis/.