Convergent Compilation Applied to Loop Unrolling

Nethercote, Nicholas; Burger, Doug; McKinley, Kathryn S.

doi:10.1007/978-3-540-71528-3_10

Nicholas Nethercote¹⁷,
Doug Burger¹⁷ &
Kathryn S. McKinley¹⁷

Part of the book series: Lecture Notes in Computer Science ((THIPEAC,volume 4050))

548 Accesses
2 Citations

Abstract

Well-engineered compilers use a carefully selected set of optimizations, heuristic optimization policies, and a phase ordering. Designing a single optimization heuristic that works well with other optimization phases is a challenging task. Although compiler designers evaluate heuristics and phase orderings before deployment, compilers typically do not statically evaluate nor refine the quality of their optimization decisions during a specific compilation.

This paper identifies a class of optimizations for which the compiler can statically evaluate the effectiveness of its heuristics and phase interactions. When necessary, it then modifies and reapplies its optimization policies. We call this approach convergent compilation, since it iterates to converge on high quality code. This model incurs additional compilation time to avoid some of the difficulties of predicting phase interactions and perfecting heuristics

This work was motivated by the TRIPS architecture which has resource constraints that have conflicting phase order requirements. For example, each atomic execution unit (a TRIPS block) has a maximum number of instructions (128) and a fixed minimum execution time cost. Loop unrolling and other optimizations thus seek to maximize the number of mostly full blocks. Because unrolling enables many downstream optimizations, it needs to occur well before code generation, but this position makes it impossible to accurately predict the final number of instructions. After the compiler generates code, it knows the exact instruction count and consequently if it unrolled too much or too little or just right. If necessary, convergent unrolling then goes back and adjusts the unroll amount accordingly and reapplies subsequent optimization phases. We implement convergent unrolling which automatically matches the best hand unrolled version for a set of microbenchmarks on the TRIPS architectural simulator.

Convergent compilation can help solve other phase ordering and heuristic tuning compilation challenges. It is particularly well suited for resource constraints that the compiler can statically evaluate such as register usage, instruction level parallelism, and code size. More importantly, these resource constraints are key performance indicators in embedded, VLIW, and partitioned hardware and indicate that convergent compilation should be broadly applicable.

This work is supported by DARPA F33615-03-C-4106, DARPA NBCH30390004, NSF ITR CCR-0085792, NSF CCR-0311829, NSF EIA-0303609, and IBM. Any opinions, findings and conclusions are the authors’ and do not necessarily reflect those of the sponsors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Stephenson, M., Amarasinghe, S., Martin, M.C., O’Reilly, U.M.: Meta optimization: Improving compiler heuristics with machine learning. In: Proceedings of PLDI 2003, San Diego, CA, pp. 77–90 (2003)
Google Scholar
Stephenson, M., Amarasinghe, S.: Predicting unroll factors using supervised classification. In: The International Conference on Code Generation and Optimization, San Jose, CA, pp. 123–134 (2005)
Google Scholar
Cavazos, J., Moss, J.E.B.: Inducing heuristics to decide whether to schedule. In: Proceedings of PLDI 2004, Washington, DC, pp. 183–194 (2004)
Google Scholar
Agarwal, V., Hrishikesh, M., Keckler, S.W., Burger, D.: Clock rate versus IPC: The end of the road for conventional microarchitectures. In: Proceedings of the 27th International Symposium on Computer Architecture, pp. 248–259 (2000)
Google Scholar
Kailas, K., Ebcioglu, K., Agrawala, A.K.: CARS: A new code generation framework for clustered ILP processors. In: International Symposium on High-Performance Computer Architecture, pp. 133–143 (2001)
Google Scholar
Kessler, C., Bednarski, A.: Optimal integrated code generation for clustered VLIW architectures. In: Joint Conference on Languages, Compilers and Tools for Embedded Systems, pp. 102–111 (2002)
Google Scholar
Zhong, H., Fan, K., Mahlke, S., Schlansker, M.: A distributed control path architecture for vliw processors. In: International Conference on Parallel Architectures and Compilation Techniques, Washington, DC, pp. 197–206 (2005)
Google Scholar
Burger, D., Keckler, S.W., McKinley, K.S.: et al.: Scaling to the end of silicon with EDGE architectures. IEEE Computer, 44–55 (2004)
Google Scholar
Smith, A., Burrill, J., Gibson, J., Maher, B., Nethercote, N., Yoder, B., Buger, D., McKinley, K.S.: Compiling for edge architectures. In: The International Conference on Code Generation and Optimization, pp. 185–195 (2006)
Google Scholar
Callahan, D., Carr, S., Kennedy, K.: Improving register allocation for subscripted variables. In: Proceedings of PLDI 1990, White Plains, NY, pp. 53–65 (1990)
Google Scholar
Dean, J., Chambers, C.: Towards better inlining decisions using inlining trials. In: Proceedings of LFP ’94, Orlando, FL, pp. 273–282 (1994)
Google Scholar
Brasier, T.S., Sweany, P.H., Carr, S., Beaty, S.J.: CRAIG: A practical framework for combining instruction scheduling and register assignment. In: International Conference on Parallel Architecture and Compiler Techniques, Cyprus (1995)
Google Scholar
Arnold, M.R., Welc, A., Rajan, V.T.: Improving virtual machine performance using a cross-run profile repository. In: ACM Conference Proceedings on Object–Oriented Programming Systems, Languages, and Applications, pp. 297–311. ACM Press, New York (2005)
Google Scholar
Moss, J.E.B., Utgoff, P.E., Cavazos, J., Precup, D., Stefanovic, D., Brodley, C., Scheeff, D.: Learning to schedule straight-line code. In: Neural Information Processing Systems – Natural and Synthetic, Denver, CO (1997)
Google Scholar
Yotov, K., Li, X., Ren, G., Cibulskis, M., DeJong, G., Garzaran, M.J., Padua, D., Pingali, K., Stodghill, P., Wu, P.: A comparison of empirical and model-driven optimization. In: Proceedings of PLDI 2003, San Diego, CA, pp. 63–76 (2003)
Google Scholar
Cavazos, J., Moss, J.E.B., O’Boyle, M.F.P.: Hybrid optimizations: Which optimization algorithm to use? In: International Conference on Compiler Construction, Vienna, Austria (2006)
Google Scholar
Agakov, F., Bonilla, E., Cavazos, J., Franke, B., Fursin, G., O’Boyle, M.F.P., Thomson, J., Toussaint, M., Williams, C.K.I.: Using machine learning to focus iterative optimization. In: The International Conference on Code Generation and Optimization, New York, NY, pp. 295–305 (2005)
Google Scholar
Almagor, L., Cooper, K.D., Grosul, A., Harvey, T.J., Reeves, S.W., Subramanian, D., Torczon, L., Waterman, T.: Finding effective compilation sequences. In: Proceedings of LCTES 2004, Washington, DC (2004)
Google Scholar
Cooper, K.D., Schielke, P.J., Subramanian, D.: Optimizing for reduced code space using genetic algorithms. In: Proceedings of LCTES ’99, Atlanta, pp. 1–9 (1999)
Google Scholar
Haneda, M., Knijnenburg, P.M.W., Wijshoff, H.A.G.: Automatic selection of compiler options using non-parametric inferential statistics. In: International Conference on Parallel Architecture and Compiler Techniques, St. Louis, MO, pp. 123–132 (2005)
Google Scholar
Ladd, S.R.: Acovea: Using natural selection to investigate software complexities (2003), http://www.coyotegulch.com/products/acovea/
Triantafyllis, S., Vachharajani, M., August, D.I.: Compiler optimization-space exploration. The Journal of Instruction-level Parallelism, 1–25 (2005)
Google Scholar
Lau, J., Arnold, M., Hind, M., Calder, B.: Online performance auditing: Using hot optimizations without getting burned. In: ACM Conference on Programming Language Design and Implementation, pp. 239–251. ACM, New York (2006)
Google Scholar
Alpern, B., Attanasio, D., Barton, J.J., et al.: The Jalapeño virtual machine. IBM System Journal 39 (2000)
Google Scholar
Arnold, M., Fink, S., Grove, D., Hind, M., Sweeney, P.: A survey of adaptive optimization in virtual machines. IEEE Computer 93 (2005)
Google Scholar
Smith, M.D.: Overcoming the challenges to feedback-directed optimization. In: Proceedings of Dynamo ’00, Boston, MA, pp. 1–11 (2000)
Google Scholar
Sun MicroSystems: The Sun HotSpot compiler (2005), http://java-sun.com/products/hotspot
Nagarajan, R., Sankaralingam, K., Burger, D., Keckler, S.W.: A design space evaluation of grid processor architectures. In: Proceedings of MICRO34, Austin, TX, pp. 40–53 (2001)
Google Scholar
Fisher, J.A.: Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers C-30, 478–490 (1981)
Google Scholar
Mahlke, S.A., Lin, D.C., Chen, W.Y., Hank, R.E., Bringmann, R.A.: Effective compiler support for predicated execution using the hyperblock. In: Proceedings of MICRO25, Portland, OR, pp. 45–54 (1992)
Google Scholar
Maher, B., Smith, A., Burger, D., McKinley, K.S.: Merging head and tail duplication for convergent hyperblock formation. Technical Report TR-06-36, University of Texas at Austin (2006)
Google Scholar
Baker, B.S.: Parameterized pattern matching: Algorithms and applications. Journal of Computer and System Sciences 52, 28–42 (1996)
Article MathSciNet MATH Google Scholar
Robison, A.D.: Impact of economics on compiler optimization. In: Proceedings of the ACM 2001 Java Grande Conference, Palo Alto, CA, pp. 1–10. ACM Press, New York (2001)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

The University of Texas at Austin, USA
Nicholas Nethercote, Doug Burger & Kathryn S. McKinley

Authors

Nicholas Nethercote
View author publications
You can also search for this author in PubMed Google Scholar
Doug Burger
View author publications
You can also search for this author in PubMed Google Scholar
Kathryn S. McKinley
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Chalmers University of Technology, 412 96, Gothenburg, Sweden
Per Stenström

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nethercote, N., Burger, D., McKinley, K.S. (2007). Convergent Compilation Applied to Loop Unrolling . In: Stenström, P. (eds) Transactions on High-Performance Embedded Architectures and Compilers I. Lecture Notes in Computer Science, vol 4050. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71528-3_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-71528-3_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71527-6
Online ISBN: 978-3-540-71528-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics