A Framework for Loop Distribution on Limited On-Chip Memory Processors

Wang, Lei; Tembe, Waibhav; Pande, Santosh

doi:10.1007/3-540-46423-9_10

Lei Wang⁵,
Waibhav Tembe⁵ &
Santosh Pande⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1781))

Included in the following conference series:

International Conference on Compiler Construction

834 Accesses
8 Citations

Abstract

This work proposes a framework for analyzing the flow of values and their re-use in loop nests to minimize data traffic under the constraints of limited on-chip memory capacity and dependences. Our analysis first undertakes fusion of possible loop nests intra-procedurally and then performs loop distribution. The analysis discovers the closeness factor of two statements which is a quantitative measure of data traffic saved per unit memory occupied if the statements were under the same loop nest over the case where they are under different loop nests. We then develop a greedy algorithm which traverses the program dependence graph (PDG) to group statements together under the same loop nest legally. The main idea of this greedy algorithm is to transitively generate a group of statements that can legally execute under a given loop nest that can lead to a minimum data traffic. We implemented our framework in Petit [2], a tool for dependence analysis and loop transformations. We show that the benefit due to our approach results in eliminating as much as 30 % traffic in some cases improving overall completion time by a 23.33 % for processors such as TI’s TMS320C5x.

Supported in part by NSF through grant no. #EIA 9871345

Contact author for future communications about this paper

Download to read the full chapter text

Chapter PDF

A methodology correlating code optimizations with data memory accesses, execution time and energy consumption

Article 13 May 2019

A Practical and Aggressive Loop Fission Technique

Online Dynamic Dependence Analysis for Speculative Polyhedral Parallelization

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

J. Eyre and J. Bier, “DSP Processors Hit the Mainstream”, ‘COMPUTER’, 31(8):51–59, August 1998.
Article Google Scholar
Petit, Uniform Library, Omega Library, Omega Calculater. ‘http://www.cs.umd.edu/projects/omega/index.html’ 141, 149
Texas Instruments. ‘TMS 320C5x User’s Guide.
Google Scholar
Embedded Java. http://java.sun.com/products/embeddedjava/.
A. Sundaram and S. Pande, “Compiler Optimizations for Real Time Execution of Loops on Limited Memory Embedded Systems”, Proceedings of IEEE International Real Time Systems Symposium, Madrid, Spain, pp.154–164. 142
Google Scholar
A. Sundaram and S. Pande, “An Efficient Data Partitioning Method for Limited Memory Embedded Systems”, 1998 ACM SIGPLAN Workshop on Languages, Compilers and Tools for Embedded Systems (in conjunction with PLDI’ 98), Montreal, Canada, Springer-Verlag, pp. 205–218. 142
Google Scholar
F. Irigoin and R. Triolet, “Supernode Partitioning”. in 15th Symposium on Principles of Programming Languages (POPL XV), pages 319–329, 1988. 151
Google Scholar
J. Ramanujam and P. Sadayappan, “Tiling Multidimensional Iteration Spaces for Multicomputers.” Journal of Parallel and Distributed Computing, 16:108–120, 1992. 151
Article Google Scholar
U. Banerjee, “Loop transformations for restructuring compilers”, Boston: Kluwer Academic, 1994.
Google Scholar
W. Li, “Compiling for NUMA parallel machines”, Ph.D. Thesis, Cornell University, Ithaca, NY, 1993.
Google Scholar
M. Wolfe, High Performance Compilers for Parallel Computing, Addison Wesley, 1996. 151
Google Scholar
M. Wolfe, “Iteration space tiling for memory hierarchies” in Third SIAM Conference on Parallel Processing for Scientific Computing, December 1987. 150
Google Scholar
R. Schreiber and J. Dongarra, “Automatic Blocking of Nested Loops”. Technical report, RIACS, NASA Ames Research Center, and Oak Ridge National Laboratory, May 1990. 151
Google Scholar
I. Kodukula, N. Ahmed and K. Pingali, “Data Centric Multi-level Blocking” in ACM Programming Language Design and Implementation 1997 (PLDI’ 97), pp. 346–357. 151
Google Scholar
P. Panda, A. Nicolau and N. Dutt, “Memory Organization for Improved Data Cache Performance in Embedded Processors”, Proceedings of 1996 International Symposium on System Synthesis. 141
Google Scholar
N. Mitchell, K. Hogstedt, L. Carter and J. Ferrante, “Quantifying the Multi-level Nature of Tiling Interactions”, International Journal of Parallel Programming, Vol 26, No 6, 1998, pp. 641–670. 151
Article Google Scholar
K. Cooper and T. Harvey, “Compiler Controlled Memory”, Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct. 3–7, 1998, San Jose, CA. 142
Google Scholar
K. McKinley, S. Carr, and C.-W. Tseng, Improving data locality with loop transformation. ACM Transactions on Programming Languages and Systems (PLDI) 18(4):424–453, July 1996. 151
Article Google Scholar
M. Wolf and M. Lam, “A data locality optimizing algorithm”, in proceedings of ACM Special Interest Group on Programming Languages (SIGPLAN) 91 Conf. Programming Language Design and Implementation(PLDI’91), pp. 30–44, Toronto, Canada, June 1991. 151
Google Scholar
K. Kennedy and K. McKinley, “Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution” in Languages and Compilers for Parallel Computing (LCPC) 1993. 151
Google Scholar
G. Gao, R. Olsen, V. Sarkar and R. Thekkath “Collective Loop Fusion for Array Contraction” in Languages and Compilers for Parallel Computing (LCPC) 1992. 151
Google Scholar
E. Dusterwald, R. Gupta and M. Soffa, “A Practical Data-flow Framework for Array Reference Analysis and its Application in Optimization” in ACM Programming Language Design and Implementation (PLDI) 1993 pp. 68–77. 151
Google Scholar
R. Gupta and R. Bodik, “Array Data-Flow Analysis for Load-Store Optimizations in Superscalar Architectures,” in Eighth Annual Workshop on Languages and Compilers for Parallel Computing (LCPC) 1995. Also published in International Journal of Parallel Computing, Vol. 24, No. 6, pages 481–512,1996. 151
Google Scholar

Download references

Author information

Authors and Affiliations

Compiler Research Laboratory, Department of ECECS, ML 0030, University of Cincinnati, PO Box 210030, Cincinnati, OH, 45221-0030
Lei Wang, Waibhav Tembe & Santosh Pande

Authors

Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Waibhav Tembe
View author publications
You can also search for this author in PubMed Google Scholar
Santosh Pande
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computing Science, University of Glasgow, Glasgow, G12 8QQ, Scotland
David A. Watt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, L., Tembe, W., Pande, S. (2000). A Framework for Loop Distribution on Limited On-Chip Memory Processors. In: Watt, D.A. (eds) Compiler Construction. CC 2000. Lecture Notes in Computer Science, vol 1781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46423-9_10

Download citation

DOI: https://doi.org/10.1007/3-540-46423-9_10
Published: 01 June 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67263-0
Online ISBN: 978-3-540-46423-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

A Framework for Loop Distribution on Limited On-Chip Memory Processors

Abstract

Chapter PDF

Similar content being viewed by others

A methodology correlating code optimizations with data memory accesses, execution time and energy consumption

A Practical and Aggressive Loop Fission Technique

Online Dynamic Dependence Analysis for Speculative Polyhedral Parallelization

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Framework for Loop Distribution on Limited On-Chip Memory Processors

Abstract

Chapter PDF

Similar content being viewed by others

A methodology correlating code optimizations with data memory accesses, execution time and energy consumption

A Practical and Aggressive Loop Fission Technique

Online Dynamic Dependence Analysis for Speculative Polyhedral Parallelization

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation