The Journal of Supercomputing

, Volume 61, Issue 3, pp 935–965 | Cite as

Data locality optimization of interference graphs based on polyhedral computations

Article

Abstract

In achieving high performance on modern architectures it is critical to make effective use of the memory hierarchy. There are compiler-directed locality enhancement techniques that allow the transformation of program to achieve a higher locality: loop transformations, which are constrained by data dependences and data layout transformations, which have a global impact on the program locality. Due to these drawbacks, there must be a unification of the two techniques to achieve the benefits of both. In this paper, a novel unification of these techniques is presented. Using a model based on parameterized polyhedra and introducing new concepts, we propose a data locality optimization algorithm.

In comparison with the other approaches, the technique proposed is capable of solving more conflicts and optimizing more references, a subtle way is proposed to optimize incompatible references to the same array, in the same loop, and also references in a cycle in the interference graph. Using parameterized cost functions, our technique estimates the importance of each sub-graph and optimizes data locality. Our experimental results show a significant improvement over the prior approaches.

Keywords

Locality optimization Scanning constraint Spatial and temporal locality Conflict resolving Eigenvalues and eigenspaces Parameterized polyhedra 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Kandemir M, Choudhary A, Ramanujam J, Banerjee P (1999) A matrix-based approach to global locality optimization. J Parallel Distrib Comput 58(2):190–235 CrossRefGoogle Scholar
  2. 2.
    Loechner V, Meister B, Clauss P (2002) Precise data locality optimization of nested loops. J Supercomput 21(1):37–76 MATHCrossRefGoogle Scholar
  3. 3.
    Allen R, Kennedy K (2001) Optimizing compilers for modern architectures. Kaufmann, San Mateo Google Scholar
  4. 4.
    Bastoul C, Feautrier P (2003) Improving data locality by chunking. In: CC’12 int conf on compiler construction. Lecture notes in computer science, vol 2622, pp 320–335 CrossRefGoogle Scholar
  5. 5.
    Bik A, Knijnenburg P, Wijshoff H (1994) Reshaping access patterns for generating sparse codes. In: Proc of the 7th int workshop on languages and compilers for parallel computing, NY, USA, pp 406–422 Google Scholar
  6. 6.
    Coleman S, McKinley K (1995) Tile size selection using cache organization and data layout. In: Proc of the ACM SIGPLAN conf on programming language design and implementation (PLDI ’95), La Jolla, California, USA, pp 279–290 CrossRefGoogle Scholar
  7. 7.
    Li W (1993) Compiling for NUMA parallel machines. PhD thesis, Computer Science Department, Cornell University, NY Google Scholar
  8. 8.
    Manjikian N, Abdelrahman T (1995) Fusion of loops for parallelism and locality. In: Proc of the 24th int conf on parallel processing (ICPP’95), Oconomowoc, Wisconsin, vol II, pp 19–28 Google Scholar
  9. 9.
    McKinley K, Carr S, Tseng C (1996) Improving data locality with loop transformations. ACM Trans Program Lang Syst, 18(4):424–453 CrossRefGoogle Scholar
  10. 10.
    Wolf M, Lam M (1991) A data locality optimizing algorithm. In: Proc of the SIGPLAN ’91 conf on programming language design and implementation, Toronto, Ontario, pp 30–44 Google Scholar
  11. 11.
    Wolfe M (1989) More iteration space tiling. In: Proc of supercomputing ’89, Reno, Nevada, pp 655–664 Google Scholar
  12. 12.
    Kandemir M, Choudhary A, Shenoy N, Banerjee P, Ramanujam J (1998) A hyperplane based approach for optimizing spatial locality in loop nests. In: Proc of the 1998 ACM int conf on supercomputing (ICS’98), Melbourne, Australia, pp 69–76 Google Scholar
  13. 13.
    Kandemir M, Choudhary A, Shenoy N, Banerjee P, Ramanujam J (1999) A linear algebra framework for automatic determination of optimal data layouts. IEEE Trans Parallel Distrib Syst 10(2):115–135 CrossRefGoogle Scholar
  14. 14.
    O’Boyle M, Knijnenburg P (1999) Non-singular data transformations: definition, validity, and applications. Int J Parallel Program 27(1):131–159 CrossRefGoogle Scholar
  15. 15.
    Rivera G, Tseng C (1998) Data transformations for eliminating conflict misses. In: The ACM SIGPLAN conf on programming language design and implementation (PLDI’98), Montreal, Canada, pp 38–49 Google Scholar
  16. 16.
    Kandemir M (2004) Improving whole-program locality using intra-procedural and inter-procedural transformations. J Parallel Distrib Comput 65(7):564–582 Google Scholar
  17. 17.
    Kandemir M, Choudhary A, Ramanujam J, Banerjee P (1998) Improving locality using loop and data transformations in an integrated framework. In: Proc of the 31st int symp on microarchitecture (MICRO-31), Dallas, Texas, pp 285–296 Google Scholar
  18. 18.
    O’Boyle M, Knijnenburg P (2002) Integrating loop and data transformations for global optimization. J Parallel Distrib Comput 62(4):563–590 MATHCrossRefGoogle Scholar
  19. 19.
    Kandemir M, Choudhary A, Ramanujam J, Banerjee P (1999) A graph based framework to detect optimal memory layouts for improving data locality. In: Proc of the 1999 int parallel processing symp (IPPS’99), San Juan, Puerto Rico, pp 738–743 Google Scholar
  20. 20.
    Blume W, Eigenmann R (1994) An overview of symbolic analysis techniques needed for the effective parallelization of the PERFECT benchmarks. In: Proc of the 1994 int conf on parallel processing (ICPP’94), North Carolina State University, vol II, pp 233–238 CrossRefGoogle Scholar
  21. 21.
    Loechner V, Meister B, Clauss P (2001) Data sequence locality: a generalization of temporal locality. In: Proc of the 7th int Euro-Par conf Manchester on parallel processing, UK, pp 262–272 Google Scholar
  22. 22.
    Loechner V, Wilde D (1997) Parameterized polyhedra and their vertices. Int J Parallel Program, 25(6):525–549 CrossRefGoogle Scholar
  23. 23.
    Anderson J, Lam M (1993) Global optimizations for parallelism and locality on scalable parallel machines. In: Proc SIGPLAN conf on programming language design and implementation (PLDI’93), Albuquerque, New Mexico, pp 112–125 Google Scholar
  24. 24.
    Wilde D (1993) A library for doing polyhedral operations. Technical report PI 785, IRISA, Rennes, France Google Scholar
  25. 25.
    Bastoul C, Cohen A, Girbal A, Sharma S, Temam O (2003) Putting polyhedral loop transformations to work. In: Proc of the workshop on languages and compilers for parallel computing (LCPC’03), Texas, USA. Lecture notes in computer science, vol 2558, pp 23–30 Google Scholar
  26. 26.
    Feautrier P (1996) Automatic parallelization in the polytope model. In: The data parallel programming model. Lecture notes in computer science, vol 1132, pp 79–103 CrossRefGoogle Scholar
  27. 27.
    Ramanujam J (1992) A linear algebraic view of loop transformations and their interaction. In: Sorensen D (ed) Proc of the 5th SIAM conf on parallel processing for scientific computing. SIAM, Philadelphia, pp 543–548 Google Scholar
  28. 28.
    Schrijver A (1986) Theory of linear and integer programming. Wiley, New York MATHGoogle Scholar
  29. 29.
    Ancourt C, Irigoin F (1991) Scanning polyhedra with DO loops. In: Proc of the 3rd ACM SIGPLAN symp principle and practice of parallel programming, Williamsburg, USA, pp 39–50 Google Scholar
  30. 30.
    Bastoul C (2002) Generating loops for scanning polyhedra—CLooG user’s guide. Technical report 2002/23, PRiSM, Versailles University, France Google Scholar
  31. 31.
    Le Verge H, Van Dongen V, Wilde D (1994) Loop nest synthesis using the polyhedral library. Technical report 830, IRISA, Rennes, France Google Scholar
  32. 32.
    Rajopadhye S, Quilleré F, Wilde D (2000) Generation of efficient nested loops from polyhedra. Int J Parallel Program, 28(5):469–498 CrossRefGoogle Scholar
  33. 33.
    Wolfe M (1996) High performance compilers for parallel computing. Addison-Wesley, Reading MATHGoogle Scholar
  34. 34.
    Sam SV (2009) A bijective proof for a theorem of Ehrhart. Am Math Mon 116(8):688–701 MathSciNetMATHCrossRefGoogle Scholar
  35. 35.
    Clauss P (1996) Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs. Research report ICPS 96-03, 10th ACM int conf on Supercomputing (ICS’96), Philadelphia, Pennsylvania, USA, pp 278–285 Google Scholar
  36. 36.
    Ancourt C, Irigoin F, Yang Y (1995) Minimal data dependence abstractions for loop transformations. Int J Parallel Program, 23(4):359–388 CrossRefGoogle Scholar
  37. 37.
    Kandemir M (2004) Impact of data transformations on memory bank locality. In: Proc of the design, automation and test in Europe conf and exhibition (DATE’ 04), Paris, France, vol 1, pp 10506–10511 Google Scholar
  38. 38.
    Gilbert W (1993) Bricklaying and the Hermite normal form. Am Math Mon 100(3):242–245 MATHCrossRefGoogle Scholar
  39. 39.
    Zhan X (2006) Completion of a partial integral matrix to a unimodular matrix. Linear Algebra Appl 414(1):373–377 MathSciNetMATHCrossRefGoogle Scholar
  40. 40.
    Mohar B, Poljak S (1993) Eigenvalues in combinatorial optimization: combinatorial and graph-theoretical problems in linear algebra. IMA Vol Math Appl 50:107–151 MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.School of Computer EngineeringIran University of Science and TechnologyTehranIran

Personalised recommendations