Precise Data Locality Optimization of Nested Loops

Loechner, Vincent; Meister, Benoît; Clauss, Philippe

doi:10.1023/A:1013535431127

Precise Data Locality Optimization of Nested Loops

Published: January 2002

Volume 21, pages 37–76, (2002)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Vincent Loechner¹,
Benoît Meister¹ &
Philippe Clauss¹

124 Accesses
21 Citations
Explore all metrics

Abstract

A significant source for enhancing application performance and for reducing power consumption in embedded processor applications is to improve the usage of the memory hierarchy. In this paper, a temporal and spatial locality optimization framework of nested loops is proposed, driven by parameterized cost functions. The considered loops can be imperfectly nested. New data layouts are propagated through the connected references and through the loop nests as constraints for optimizing the next connected reference in the same nest or in the other ones. Unlike many existing methods, special attention is paid to TLB (Translation Lookaside Buffer) effectiveness since TLB misses can take from tens to hundreds of processor cycles. Our approach only considers active data, that is, array elements that are actually accessed by a loop, in order to prevent useless memory loads and take advantage of storage compression and temporal locality. Moreover, the same data transformation is not necessarily applied to a whole array. Depending on the referenced data subsets, the transformation can result in different data layouts for a same array. This can significantly improve the performance since a priori incompatible references can be simultaneously optimized. Finally, the process does not only consider the innermost loop level but all levels. Hence, large strides when control returns to the enclosing loop are avoided in several cases, and better optimization is provided in the case of a small index range of the innermost loop.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

A. Aho, R. Sethi, and J. Ullman. Compilers: Principles, Techniques and Tools. Addison Wesley, Reading, Mass., 1987.
Google Scholar
J. M. Anderson, S. P. Amarasinghe, and M. S. Lam. Data and computation transformations for multiprocessors. In A. Press, ed., Proceedings, Principles and Practice of Parallel Programming, 1995.
U. Banerjee. Unimodular transformations of double loops. In Advances in Languages and Compilers for Parallel Processing, 1991.
U. Banerjee. Loop Transformations for Restructuring Compilers—The Foundations. Kluwer Academic Publishers, Norwell, Mass., 1993.
Google Scholar
S. Chatterjee, V. V. Jain, A. R. Lebeck, and S. Mundhra. Nonlinear array layouts for hierarchical memory systems. In Proceedings of the ACM International Conference on Supercomputing, Rhodes, Greece, 1999.
M. Cierniak and W. Li. Unifying data and control transformations for distributed shared-memory machines. In Proceedings Programming Language Design and Implementation, 1995.
P. Clauss. Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs. In 10th ACM International Conference on Supercomputing, Philadelphia, 1996.
P. Clauss. Handling memory cache policy with integer points countings. In Euro-Par'97, Passau, pp. 285–293, 1997.
P. Clauss and V. Loechner. Parametric analysis of polyhedral iteration spaces. Journal of VLSI Signal Processing, 19: 179–194, 1998.
Google Scholar
P. Feautrier. Automatic parallelization in the polytope model. In G.-R. Perrin and A. Darte, eds. The Data Parallel Programming Model, Vol. 1132 of Lecture Notes in Computer Science, pp. 79-100. Springer-Verlag, Berlin, 1996.
Google Scholar
Y.-J. Ju. and H. Dietz. Reduction of cache coherence overhead by compiler data layout and loop transformations. In Proceedings of the 4th International Workshop on Languages and Compilers for Parallel Computing, 1992.
M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee. A matrix-based approach to global locality optimization. Journal of Parallel and Distributed Computing, 58: 190–235, 1999.
Google Scholar
M. Lam, E. Rothberg, and M. Wolf. The cache performance of blocked algorithms. In International Conference on ASPLOS, 1991.
S.-T. Leung and J. Zahorjan. Optimizing data locality by array restructuring. Technical Report 95-09-01, University of Washington, Department of Computer Science and Engineering, 1995.
W. Li. Compiling for NUMA parallel machines. Ph.D. thesis, Department of Computer Science, Cornell University, Ithaca, NY, 1993.
Google Scholar
V. Loechner and D. K. Wilde. Parameterized polyhedra and their Vertices. International Journal of Parallel Programming, 25: 525–549, 1997.
Google Scholar
M. O'Boyle and P. Knijnenburg. Nonsingular data transformations: definition, validity, and applications. International Journal of Parallel Programming, 27: 131–159, 1999.
Google Scholar
F. Quilleré, S. Rajopadhye, and D. Wilde. Generation of efficient nested loops from polyhedra. International Journal of Parallel Programming, 28: 469–498, 2000.
Google Scholar
J. M. Rabaey and M. Pedram. Low Power Design Methodologies. Kluwer Academic Publishers, Norwell, Mass., 1995.
Google Scholar
A. Schrijver. Theory of Linear and Integer Programming. John Wiley and Sons, New York, 1986.
Google Scholar
M. R. Swanson, L. Stoller, and J. Carter. Increasing TLB reach using superpages backed by shadow memory. In Proceedings of the 25th Annual International Symposium on Computer Architecture, pp. 204–213, 1998.
D. Wilde. A library for doing polyhedral operations. Master's thesis, Oregon State University, Corvallis, 1993.
Google Scholar
M. Wolf and M. Lam. A data locality optimizing algorithm. In Proceedings of ACM SIGPLAN 91 Conference Programming Language Design and Implementation, Toronto, Ont., pp. 30–44, 1991.
M. Wolfe. More iteration space tiling. In Proceedings of Supercomputing'89, pp. 655–664, 1989.
M. Wolfe. High Performance Compilers for Parallel Computing. Addison Wesley, Reading, Mass., 1996.
Google Scholar

Download references

Author information

Authors and Affiliations

ICPS/LSIIT, Université Louis Pasteur, Strasbourg, Pôle API, Bd Sébastien Brant, F-67400, Illkirch, France
Vincent Loechner, Benoît Meister & Philippe Clauss

Authors

Vincent Loechner
View author publications
You can also search for this author in PubMed Google Scholar
Benoît Meister
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Clauss
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Loechner, V., Meister, B. & Clauss, P. Precise Data Locality Optimization of Nested Loops. The Journal of Supercomputing 21, 37–76 (2002). https://doi.org/10.1023/A:1013535431127

Download citation

Issue Date: January 2002
DOI: https://doi.org/10.1023/A:1013535431127

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Precise Data Locality Optimization of Nested Loops

Abstract

Access this article

Similar content being viewed by others

An Affine Scheduling Framework for Integrating Data Layout and Loop Transformations

Loop Nest Tiling for Image Processing and Communication Applications

An Analytical Model for Loop Tiling Transformation

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Precise Data Locality Optimization of Nested Loops

Abstract

Access this article

Similar content being viewed by others

An Affine Scheduling Framework for Integrating Data Layout and Loop Transformations

Loop Nest Tiling for Image Processing and Communication Applications

An Analytical Model for Loop Tiling Transformation

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation