Abstract
This work proposes a framework for analyzing the flow of values and their re-use in loop nests to minimize data traffic under the constraints of limited on-chip memory capacity and dependences. Our analysis first undertakes fusion of possible loop nests intra-procedurally and then performs loop distribution. The analysis discovers the closeness factor of two statements which is a quantitative measure of data traffic saved per unit memory occupied if the statements were under the same loop nest over the case where they are under different loop nests. We then develop a greedy algorithm which traverses the program dependence graph (PDG) to group statements together under the same loop nest legally. The main idea of this greedy algorithm is to transitively generate a group of statements that can legally execute under a given loop nest that can lead to a minimum data traffic. We implemented our framework in Petit [2], a tool for dependence analysis and loop transformations. We show that the benefit due to our approach results in eliminating as much as 30 % traffic in some cases improving overall completion time by a 23.33 % for processors such as TI’s TMS320C5x.
Supported in part by NSF through grant no. #EIA 9871345
Contact author for future communications about this paper
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
J. Eyre and J. Bier, “DSP Processors Hit the Mainstream”, ‘COMPUTER’, 31(8):51–59, August 1998.
Petit, Uniform Library, Omega Library, Omega Calculater. ‘http://www.cs.umd.edu/projects/omega/index.html’ 141, 149
Texas Instruments. ‘TMS 320C5x User’s Guide.
Embedded Java. http://java.sun.com/products/embeddedjava/.
A. Sundaram and S. Pande, “Compiler Optimizations for Real Time Execution of Loops on Limited Memory Embedded Systems”, Proceedings of IEEE International Real Time Systems Symposium, Madrid, Spain, pp.154–164. 142
A. Sundaram and S. Pande, “An Efficient Data Partitioning Method for Limited Memory Embedded Systems”, 1998 ACM SIGPLAN Workshop on Languages, Compilers and Tools for Embedded Systems (in conjunction with PLDI’ 98), Montreal, Canada, Springer-Verlag, pp. 205–218. 142
F. Irigoin and R. Triolet, “Supernode Partitioning”. in 15th Symposium on Principles of Programming Languages (POPL XV), pages 319–329, 1988. 151
J. Ramanujam and P. Sadayappan, “Tiling Multidimensional Iteration Spaces for Multicomputers.” Journal of Parallel and Distributed Computing, 16:108–120, 1992. 151
U. Banerjee, “Loop transformations for restructuring compilers”, Boston: Kluwer Academic, 1994.
W. Li, “Compiling for NUMA parallel machines”, Ph.D. Thesis, Cornell University, Ithaca, NY, 1993.
M. Wolfe, High Performance Compilers for Parallel Computing, Addison Wesley, 1996. 151
M. Wolfe, “Iteration space tiling for memory hierarchies” in Third SIAM Conference on Parallel Processing for Scientific Computing, December 1987. 150
R. Schreiber and J. Dongarra, “Automatic Blocking of Nested Loops”. Technical report, RIACS, NASA Ames Research Center, and Oak Ridge National Laboratory, May 1990. 151
I. Kodukula, N. Ahmed and K. Pingali, “Data Centric Multi-level Blocking” in ACM Programming Language Design and Implementation 1997 (PLDI’ 97), pp. 346–357. 151
P. Panda, A. Nicolau and N. Dutt, “Memory Organization for Improved Data Cache Performance in Embedded Processors”, Proceedings of 1996 International Symposium on System Synthesis. 141
N. Mitchell, K. Hogstedt, L. Carter and J. Ferrante, “Quantifying the Multi-level Nature of Tiling Interactions”, International Journal of Parallel Programming, Vol 26, No 6, 1998, pp. 641–670. 151
K. Cooper and T. Harvey, “Compiler Controlled Memory”, Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct. 3–7, 1998, San Jose, CA. 142
K. McKinley, S. Carr, and C.-W. Tseng, Improving data locality with loop transformation. ACM Transactions on Programming Languages and Systems (PLDI) 18(4):424–453, July 1996. 151
M. Wolf and M. Lam, “A data locality optimizing algorithm”, in proceedings of ACM Special Interest Group on Programming Languages (SIGPLAN) 91 Conf. Programming Language Design and Implementation(PLDI’91), pp. 30–44, Toronto, Canada, June 1991. 151
K. Kennedy and K. McKinley, “Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution” in Languages and Compilers for Parallel Computing (LCPC) 1993. 151
G. Gao, R. Olsen, V. Sarkar and R. Thekkath “Collective Loop Fusion for Array Contraction” in Languages and Compilers for Parallel Computing (LCPC) 1992. 151
E. Dusterwald, R. Gupta and M. Soffa, “A Practical Data-flow Framework for Array Reference Analysis and its Application in Optimization” in ACM Programming Language Design and Implementation (PLDI) 1993 pp. 68–77. 151
R. Gupta and R. Bodik, “Array Data-Flow Analysis for Load-Store Optimizations in Superscalar Architectures,” in Eighth Annual Workshop on Languages and Compilers for Parallel Computing (LCPC) 1995. Also published in International Journal of Parallel Computing, Vol. 24, No. 6, pages 481–512,1996. 151
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, L., Tembe, W., Pande, S. (2000). A Framework for Loop Distribution on Limited On-Chip Memory Processors. In: Watt, D.A. (eds) Compiler Construction. CC 2000. Lecture Notes in Computer Science, vol 1781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46423-9_10
Download citation
DOI: https://doi.org/10.1007/3-540-46423-9_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67263-0
Online ISBN: 978-3-540-46423-5
eBook Packages: Springer Book Archive