Abstract
In this paper, we present an implementation of a vectorizing C compiler for Intel's MMX (Multimedia Extension). This compiler would identify data parallel sections of the code using scalar and array dependence analysis. To enhance the scope for application of the subword semantics, our compiler performs several code transformations. These include strip mining, scalar expansion, grouping and reduction, and distribution. Thereafter inline assembly instructions corresponding to the data parallel sections are generated. We have used the Stanford University Intermediate Format (SUIF), a public domain compiler tool, for our implementation. We evaluated the performance of the code generated by our compiler for a number of benchmarks. Initial performance results reveal that our compiler generated code produces a reasonable performance improvement (speedup of 2 to 6.5) over the the code generated without the vectorizing transformations/inline assembly. In certain cases, the performance of the compiler generated code is within 85% of the hand-tuned code for MMX architecture.
Similar content being viewed by others
REFERENCES
R. B. Lee and M. D. Smith, Media processing: A new design target, IEEE Micro, pp. 6–10 (August 1996).
T. M. Conte, P. K. Dubey, M. D. Jennings, R. B. Lee, A. Peleg, S. Rathnam, M. Schlansker, P. Song, and A. Wolfe, Challenges to combining general-purpose and multimedia processors, IEEE Micro, pp. 33–37 (December 1997).
R. B. Lee, Subword parallelism, IEEE Micro (August 1997).
Intel, Intel Programmers User Manual (1996).
U. Weiser and A. Peleg, MMX technology extension to Intel architecture, IEEE Micro, pp. 42–50 (August 1996).
M. Tremblay, VIS speeds new media processing, IEEE Micro, pp. 10–20 (August 1996).
R. B. Lee, Subword parallelism with MAX-2, IEEE Micro, pp. 51–59 (August 1996).
P. K. Dubey, Architectural and design implication of media processing, HIPC'98 Tutorial Lecture (1998).
K. Kennedy and R. Allen, Automatic translation of FORTRAN programs to vector form, ACM Trans. Progr. Lang. Syst., 9(4):491–554 (October 1987).
H. Zima and B. Chapman, Supercompilers for Parallel and Vector Computers, Addison-Wesley, Reading, Massachusetts (1991).
M. Wolfe, High Performance Compilers for Parallel Computing, Addison-Wesley, Reading, Massachusetts (1996).
D. F. Bacon, S. L. Graham, and O. J. Sharp, Compiler transformation for high-performance computing, ACM Computing Surveys, 26(4):345–420 (December 1995).
Suif Compiler Group, SUIF Manual, Stanford University Compiler Group (1994).
A. V. Aho, J. D. Ullman, and R. Sethi, Compilers, Principles, Techniques and Tools, Addison-Wesley, Reading, Massachusetts (1986).
J. Ferrante, K. J. Ottenstein, and J. D. Warren, The program dependence graph and its use in optimization, ACM Trans. Progr. Lang. Syst., 9(3):319–349 (July 1987).
U. Banerjee, Dependence Analysis for Supercomputing, Kluwer Academic Publishers, Boston, Massachusetts (1988).
M. Burke and R. Cytron, Interprocedural dependence analysis and parallelization, Proc. ACM SIGPLAN Symp. on Compiler Construction, Palo Alta, California (July 1986).
D. Kuck, Y. Muraoka, and S. Chen, On the number of operations simultaneously executable in FORTRAN-like programs and their resulting speedup, IEEE Trans. Computers, C-21(12):1293–1310 (December 1972).
G. Goff, K. Kennedy, and C-W. Tseng. Practical dependence testing, Proc. ACM SIGPLAN Conf. Progr. Lang. Design and Implementation (PLDI-91), Toronto, Ontario, pp. 15–29 (June 1991).
W. Pugh, A practical algorithm for exact array dependence analysis, Commun. ACM, 35(8):102–115 (August 1992).
Suif Compiler Group, An Overview of the SUIF Compiler System, Stanford University Compiler Group (1994).
A. Darte and F. Vivien, On the optimality of Allen and Kennedy's algorithm for parallelism extraction in nested loops. Special Issue on Optimizing Compilers for Parallel Languages. J. Parallel Algorithms and Applications, 12(1-3):83–112 (1997).
J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren, Conversion of control dependence to data dependence, Proc. of the Tenth SIGACT-SIGPLAN Conf. Principles Progr. Lang. (POPL-83), Austin, Texas, pp. 177–189 (January 1983).
K. Kennedy and K. S. McKinley, Loop distribution with arbitrary control flow, Proc. Supercomputing, New York, pp. 407–416 (November 1990).
M. D. Smith, Extending SUIF for Machine-specific Optimizations, Technical Report, Harvard University, Cambridge, Massachusetts (July 1997).
R. Cytron and J. Ferrante, What's in a name?-or-the value of renaming for parallelism detection and storage allocation, Proc. Int'l. Conf. Parallel Processing, pp. 19–27 (1987).
B. Underwood, Brennan's guide to inline assembly. http://www.rt66.com/~tbrennan/ djgpp/djgpp_asm.html.
C. Young. The SUIF Control Flow Graph Library, Harvard University, Cambridge, Massachusetts (1996).
M. Thekaulp, Digital Video Processing, Prentice-Hall, Englewood Cliffs, New Jersey (1995).
D. DeVries, SUIF vectorizing compiler, IEEE Micro, pp. 51–59 (August 1996).
K. Asanovic and D. Johnson, Torrent architecture manual, Technical Report, ICSI (1996).
M. Lam and G. Cheong, An optimizer for multimedia instruction set, SUIF Workshop Preliminary Report.
D. Brooks and M. Martonosi, Dynamically exploiting narrow width operands to improve processor power and performance, Proc. of the Fifth Int'l. Symp. on High Performance Computer Architecture, pp. 51–59 (January 1999).
A. J. C. Bik, M. Girkar, and M. R. Haghighat, Incorporating Intel MMX technology into a Java JIT compiler, Sci. Progr., 7:167–184 (1999).
A. Krall and S. Lelait, Vectorizing techniques for VIS. Dagstuhl Seminar on Instruction and Loop-Level Parallelism, Report No. 237 (April 1997).
S. S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufmann, San Francisco, California (1997).
V. H. Allan, R. B. Jones, R. M. Lee, and S. J. Allan, Software pipelining, ACM Computing Surveys, 27(3):367–432 (September 1995).
B. R. Rau and C. D. Glaeser, Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing, Proc. 14th Ann. Microprogr. Workshop, Chatham, Massachusetts, pp. 183–198 (October 12-15, 1981).
M. Lam, Software pipelining: An effective scheduling technique for VLIW machines, Proc. SIGPLAN'88 Conf. Progr. Lang. Design and Implementation, Atlanta, Georgia, pp. 318–328 (June 22-24, 1988).
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Sreraman, N., Govindarajan, R. A Vectorizing Compiler for Multimedia Extensions. International Journal of Parallel Programming 28, 363–400 (2000). https://doi.org/10.1023/A:1007559022013
Issue Date:
DOI: https://doi.org/10.1023/A:1007559022013