A Vectorizing Compiler for Multimedia Extensions

Sreraman, N.; Govindarajan, R.

doi:10.1023/A:1007559022013

A Vectorizing Compiler for Multimedia Extensions

Published: August 2000

Volume 28, pages 363–400, (2000)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

N. Sreraman¹ &
R. Govindarajan²

226 Accesses
86 Citations
Explore all metrics

Abstract

In this paper, we present an implementation of a vectorizing C compiler for Intel's MMX (Multimedia Extension). This compiler would identify data parallel sections of the code using scalar and array dependence analysis. To enhance the scope for application of the subword semantics, our compiler performs several code transformations. These include strip mining, scalar expansion, grouping and reduction, and distribution. Thereafter inline assembly instructions corresponding to the data parallel sections are generated. We have used the Stanford University Intermediate Format (SUIF), a public domain compiler tool, for our implementation. We evaluated the performance of the code generated by our compiler for a number of benchmarks. Initial performance results reveal that our compiler generated code produces a reasonable performance improvement (speedup of 2 to 6.5) over the the code generated without the vectorizing transformations/inline assembly. In certain cases, the performance of the compiler generated code is within 85% of the hand-tuned code for MMX architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

REFERENCES

R. B. Lee and M. D. Smith, Media processing: A new design target, IEEE Micro, pp. 6–10 (August 1996).
T. M. Conte, P. K. Dubey, M. D. Jennings, R. B. Lee, A. Peleg, S. Rathnam, M. Schlansker, P. Song, and A. Wolfe, Challenges to combining general-purpose and multimedia processors, IEEE Micro, pp. 33–37 (December 1997).
R. B. Lee, Subword parallelism, IEEE Micro (August 1997).
Intel, Intel Programmers User Manual (1996).
U. Weiser and A. Peleg, MMX technology extension to Intel architecture, IEEE Micro, pp. 42–50 (August 1996).
M. Tremblay, VIS speeds new media processing, IEEE Micro, pp. 10–20 (August 1996).
R. B. Lee, Subword parallelism with MAX-2, IEEE Micro, pp. 51–59 (August 1996).
P. K. Dubey, Architectural and design implication of media processing, HIPC'98 Tutorial Lecture (1998).
K. Kennedy and R. Allen, Automatic translation of FORTRAN programs to vector form, ACM Trans. Progr. Lang. Syst., 9(4):491–554 (October 1987).
Google Scholar
H. Zima and B. Chapman, Supercompilers for Parallel and Vector Computers, Addison-Wesley, Reading, Massachusetts (1991).
Google Scholar
M. Wolfe, High Performance Compilers for Parallel Computing, Addison-Wesley, Reading, Massachusetts (1996).
Google Scholar
D. F. Bacon, S. L. Graham, and O. J. Sharp, Compiler transformation for high-performance computing, ACM Computing Surveys, 26(4):345–420 (December 1995).
Google Scholar
Suif Compiler Group, SUIF Manual, Stanford University Compiler Group (1994).
A. V. Aho, J. D. Ullman, and R. Sethi, Compilers, Principles, Techniques and Tools, Addison-Wesley, Reading, Massachusetts (1986).
Google Scholar
J. Ferrante, K. J. Ottenstein, and J. D. Warren, The program dependence graph and its use in optimization, ACM Trans. Progr. Lang. Syst., 9(3):319–349 (July 1987).
Google Scholar
U. Banerjee, Dependence Analysis for Supercomputing, Kluwer Academic Publishers, Boston, Massachusetts (1988).
Google Scholar
M. Burke and R. Cytron, Interprocedural dependence analysis and parallelization, Proc. ACM SIGPLAN Symp. on Compiler Construction, Palo Alta, California (July 1986).
D. Kuck, Y. Muraoka, and S. Chen, On the number of operations simultaneously executable in FORTRAN-like programs and their resulting speedup, IEEE Trans. Computers, C-21(12):1293–1310 (December 1972).
Google Scholar
G. Goff, K. Kennedy, and C-W. Tseng. Practical dependence testing, Proc. ACM SIGPLAN Conf. Progr. Lang. Design and Implementation (PLDI-91), Toronto, Ontario, pp. 15–29 (June 1991).
W. Pugh, A practical algorithm for exact array dependence analysis, Commun. ACM, 35(8):102–115 (August 1992).
Google Scholar
Suif Compiler Group, An Overview of the SUIF Compiler System, Stanford University Compiler Group (1994).
A. Darte and F. Vivien, On the optimality of Allen and Kennedy's algorithm for parallelism extraction in nested loops. Special Issue on Optimizing Compilers for Parallel Languages. J. Parallel Algorithms and Applications, 12(1-3):83–112 (1997).
Google Scholar
J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren, Conversion of control dependence to data dependence, Proc. of the Tenth SIGACT-SIGPLAN Conf. Principles Progr. Lang. (POPL-83), Austin, Texas, pp. 177–189 (January 1983).
K. Kennedy and K. S. McKinley, Loop distribution with arbitrary control flow, Proc. Supercomputing, New York, pp. 407–416 (November 1990).
M. D. Smith, Extending SUIF for Machine-specific Optimizations, Technical Report, Harvard University, Cambridge, Massachusetts (July 1997).
Google Scholar
R. Cytron and J. Ferrante, What's in a name?-or-the value of renaming for parallelism detection and storage allocation, Proc. Int'l. Conf. Parallel Processing, pp. 19–27 (1987).
B. Underwood, Brennan's guide to inline assembly. http://www.rt66.com/~tbrennan/ djgpp/djgpp_asm.html.
C. Young. The SUIF Control Flow Graph Library, Harvard University, Cambridge, Massachusetts (1996).
Google Scholar
M. Thekaulp, Digital Video Processing, Prentice-Hall, Englewood Cliffs, New Jersey (1995).
Google Scholar
D. DeVries, SUIF vectorizing compiler, IEEE Micro, pp. 51–59 (August 1996).
K. Asanovic and D. Johnson, Torrent architecture manual, Technical Report, ICSI (1996).
M. Lam and G. Cheong, An optimizer for multimedia instruction set, SUIF Workshop Preliminary Report.
D. Brooks and M. Martonosi, Dynamically exploiting narrow width operands to improve processor power and performance, Proc. of the Fifth Int'l. Symp. on High Performance Computer Architecture, pp. 51–59 (January 1999).
A. J. C. Bik, M. Girkar, and M. R. Haghighat, Incorporating Intel MMX technology into a Java JIT compiler, Sci. Progr., 7:167–184 (1999).
Google Scholar
A. Krall and S. Lelait, Vectorizing techniques for VIS. Dagstuhl Seminar on Instruction and Loop-Level Parallelism, Report No. 237 (April 1997).
S. S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufmann, San Francisco, California (1997).
Google Scholar
V. H. Allan, R. B. Jones, R. M. Lee, and S. J. Allan, Software pipelining, ACM Computing Surveys, 27(3):367–432 (September 1995).
Google Scholar
B. R. Rau and C. D. Glaeser, Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing, Proc. 14th Ann. Microprogr. Workshop, Chatham, Massachusetts, pp. 183–198 (October 12-15, 1981).
M. Lam, Software pipelining: An effective scheduling technique for VLIW machines, Proc. SIGPLAN'88 Conf. Progr. Lang. Design and Implementation, Atlanta, Georgia, pp. 318–328 (June 22-24, 1988).

Download references

Author information

Authors and Affiliations

Microsoft Corporation, One Microsoft Way, Redmond, Washington, 98052
N. Sreraman
Supercomputer Education and Research Centre, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, 560 012, India
R. Govindarajan

Authors

N. Sreraman
View author publications
You can also search for this author in PubMed Google Scholar
R. Govindarajan
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sreraman, N., Govindarajan, R. A Vectorizing Compiler for Multimedia Extensions. International Journal of Parallel Programming 28, 363–400 (2000). https://doi.org/10.1023/A:1007559022013

Download citation

Issue Date: August 2000
DOI: https://doi.org/10.1023/A:1007559022013

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Vectorizing Compiler for Multimedia Extensions

Abstract

Access this article

Similar content being viewed by others

PostSLP: Cross-Region Vectorization of Fully or Partially Vectorized Code

Automated Compiler Optimization of Multiple Vector Loads/Stores

A Study on Vectorization Methods for Multicore SIMD Architecture Provided by Compilers

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A Vectorizing Compiler for Multimedia Extensions

Abstract

Access this article

Similar content being viewed by others

PostSLP: Cross-Region Vectorization of Fully or Partially Vectorized Code

Automated Compiler Optimization of Multiple Vector Loads/Stores

A Study on Vectorization Methods for Multicore SIMD Architecture Provided by Compilers

REFERENCES

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation