Abstract
Modulo scheduling is a major optimization of high performance compilers wherein The body of a loop is replaced by an overlapping of instructions from different iterations. Hence the compiler can schedule more instructions in parallel than in the original option. Modulo scheduling, being a scheduling optimization, is a typical backend optimization relying on detailed description of the underlying CPU and its instructions to produce a good schedule. This work considers the problem of applying modulo scheduling at source level as a loop transformation, using only general information of the underlying CPU architecture. By doing so it is possible: a) Create a more retargeble compiler as modulo scheduling is now applied at source level, b) Study possible interactions between modulo scheduling and common loop transformations. c) Obtain a source level optimizer whose output is readable to the programmer, yet its final output can be efficiently compiled by a relatively “simple” compiler.
Experimental results show that source level modulo scheduling can improve performance also when low level modulo scheduling is applied by the final compiler, indicating that high level modulo scheduling and low level modulo scheduling can co-exist to improve performance. An algorithm for source level modulo scheduling modifying the abstract syntax tree of a program is presented. This algorithm has been implemented in an automatic parallelizer (Tiny). Preliminary experiments yield runtime and power improvements also for the ARM CPU for embedded systems.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Sim-panalyzer: http://www.eecs.umich.edu/panalyzer/
Ullman, J., Aho, A., Sethi, R.: Compilers: Principles, Techniques and Tools. Addison-Wesley, Reading (1986)
Allan, V.H., et al.: Software pipelining. ACM Computing Surveys 27(3), 367–432 (1995)
Bacon, D.F., Graham, S.L., Sharp, O.J.: Compiler transformations for high-performance computing. ACM Computing Surveys 26(4), 345–420 (1994)
Bailey, D.: Nas kernel benchmark program: http://www.netlib.org/benchmark/nas
Dongarra, J., Luszczek, P., Petitet, A.: The linpack benchmark: Past, present, and future: http://www.netlib.org/utk/jackdongarra
Faigin, K.A., et al.: The Polaris internal representation. International Journal of Parallel Programming 22(5), 553–586 (1994)
Huang, J., Leng, T.: Generalized loop-unrolling: a method for program speed-up (1997)
Jarp, S.: Optimizing IA-64 performance. Journal of Software tools 26(7), 21–22, 24, 26 (2001)
Lam, M.: Software pipelining: an effective scheduling technique for vliw machines. In: PLDI, pp. 318–328 (1988)
McMahon, F.H.: Lawrence livermore national laboratory fortrn kernel:mflops
North, V.R.: Ia-64 code generation: http://citeseer.ist.psu.edu/385244.html
Polychronopoulos, C.D., et al.: The structure of parafrase-2: an advanced parallelizing compiler for c and fortran. In: Selected papers of the second workshop on Languages and compilers for parallel computing, pp. 423–453 (1990)
Pugh, W.: The omega test: a fast and practical integer programming algorithm for dependence analysis. In: Supercomputing, pp. 4–13 (1991)
Qasem, A., Jin, G., Mellor-Crummey, J.: Improving performance with integrated program transformations. Technical Report TR03-419, Rice University (2003)
Rau, B.R., Glaese, C.D.: Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In: Proceeding of the 14th Annual Workshop on Microprogramming, October 1981, pp. 183–198 (1981)
Ramakrishna Rau, B.: Iterative modulo scheduling: An algorithm for software pipelining loops. In: MICRO, pp. 63–74 (1994)
Ramakrishna Rau, B.: Iterative-modulo-scheduling. In: HPL-94-115, November 22 (1995)
Warter, N.J., Lavery, D.M., Hwu, W.W.: The benefit of predicated execution for software pipelining. In: HICSS-26 Conference Proceedings, vol. 1 (January 1993)
Warter, N.J., et al.: Enhanced modulo scheduling for loops with conditional branches. In: The 25th Annual International Symposium on Microarchitecture, Portland, Oregon, ACM Press, New York (1992)
Warter, N.J., et al.: Reverse if-conversion. SIGPLAN Not. 28(6), 290–299 (1993)
Wolfe, M.: The tiny loop restructuring research tool. In: Proceedings of the International Conference on Parallel Processing (1991)
Zaky, A.M.: Efficient Static Scheduling of Loops on Synchronous Multiprocessors. PhD thesis, Ohio State University, OH (1989)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this chapter
Cite this chapter
Ben-Asher, Y., Meisler, D. (2007). Towards a Source Level Compiler: Source Level Modulo Scheduling. In: Reps, T., Sagiv, M., Bauer, J. (eds) Program Analysis and Compilation, Theory and Practice. Lecture Notes in Computer Science, vol 4444. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71322-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-71322-7_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71315-9
Online ISBN: 978-3-540-71322-7
eBook Packages: Computer ScienceComputer Science (R0)