Advertisement

A Comparison of Compiler Tiling Algorithms

  • Gabriel Rivera
  • Chau-Wen Tseng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1575)

Abstract

Linear algebra codes contain data locality which can be exploited by tiling multiple loop nests. Several approaches to tiling have been suggested for avoiding conflict misses in low associativity caches. We propose a new technique based on intra-variable padding and compare its performance with existing techniques. Results show padding improves performance of matrix multiply by over 100% in some cases over a range of matrix sizes. Comparing the efficacy of different tiling algorithms, we discover rectangular tiles are slightly more efficient than square tiles. Overall, tiling improves performance from 0-250%. Copying tiles at run time proves to be quite effective.

Keywords

Matrix Size Cache Line Column Size Tile Size Program Language Design 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Bacon, D., Chow, J.-H., Ju, D.-C., Muthukumar, K., Sarkar, V.: A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness. In: Proceedings of CASCON 1994, Toronto, Canada (October 1994)Google Scholar
  2. 2.
    Bailey, D.: Unfavorable strides in cache memory systems. Technical Report RNR- 92-015, NASA Ames Research Center (May 1992)Google Scholar
  3. 3.
    Callahan, D., Carr, S., Kennedy, K.: Improving register allocation for subscripted variables. In: Proceedings of the SIGPLAN 1990 Conference on Programming Language Design and Implementation, White Plains, NY (June 1990)Google Scholar
  4. 4.
    Carr, S., Kennedy, K.: Compiler blockability of numerical algorithms. In: Proceedings of Supercomputing 1992, Minneapolis, MN (November 1992)Google Scholar
  5. 5.
    Cierniak, M., Li, W.: Unifying data and control transformations for distributed shared-memory machines. In: Proceedings of the SIGPLAN 1995 Conference on Programming Language Design and Implementation, La Jolla, CA (June 1995)Google Scholar
  6. 6.
    Coleman, S., McKinley, K.S.: Tile size selection using cache organization and data layout. In: Proceedings of the SIGPLAN 1995 Conference on Programming Language Design and Implementation, La Jolla, CA (June 1995)Google Scholar
  7. 7.
    Esseghir, K.: Improving data locality for caches. Master’s thesis, Dept. of Computer Science, Rice University (September 1993)Google Scholar
  8. 8.
    Ferrante, J., Sarkar, V., Thrash, W.: On estimating and enhancing cache effectiveness. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds.) Languages and Compilers for Parallel Computing, Fourth International Workshop, Santa Clara, CA. Springer, Heidelberg (1991)Google Scholar
  9. 9.
    Gannon, D., Jalby, W., Gallivan, K.: Strategies for cache and local memory management by global program transformation. Journal of Parallel and Distributed Computing 5(5), 587–616 (1988)CrossRefGoogle Scholar
  10. 10.
    Ghosh, S., Martonosi, M., Malik, S.: Cache miss equations: An analytical representation of cache misses. In: Proceedings of the 1997 ACM International Conference on Supercomputing, Vienna, Austria (July 1997)Google Scholar
  11. 11.
    Irigoin, F., Triolet, R.: Supernode partitioning. In: Proceedings of the Fifteenth Annual ACM Symposium on the Principles of Programming Languages, San Diego, CA (January 1988)Google Scholar
  12. 12.
    Kandemir, M., Ramanujam, J., Choudhary, A.: A compiler algorithm for optimizing locality in loop nests. In: Proceedings of the 1997 ACM International Conference on Supercomputing, Vienna, Austria (July 1997)Google Scholar
  13. 13.
    Lam, M., Rothberg, E., Wolf, M.E.: The cache performance and optimizations of blocked algorithms. In: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), Santa Clara, CA (April 1991)Google Scholar
  14. 14.
    Lebeck, A., Wood, D.: Cache profiling and the SPEC benchmarks: A case study. IEEE Computer 27(10), 15–26 (1994)Google Scholar
  15. 15.
    Manjikian, N., Abdelrahman, T.: Fusion of loops for parallelism and locality. IEEE Transactions on Parallel and Distributed Systems 8(2), 193–209 (1997)CrossRefGoogle Scholar
  16. 16.
    Mitchell, N., Carter, L., Ferrante, J., Hogstedt, K.: Quantifying the multi-level nature of tiling interactions. In: Proceedings of the Tenth Workshop on Languages and Compilers for Parallel Computing, Minneapolis, MN (August 1997)Google Scholar
  17. 17.
    McKinley, K.S., Carr, S., Tseng, C.-W.: Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems 18(4), 424–453 (1996)CrossRefGoogle Scholar
  18. 18.
    McKinley, K.S., Temam, O.: A quantitative analysis of loop nest locality. In: Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), Boston, MA (October 1996)Google Scholar
  19. 19.
    O’Boyle, M., Knijnenburg, P.: Non-singular data transformations: Definition, validity, and applications. In: Proceedings of the 1997 ACM International Conference on Supercomputing, Vienna, Austria (July 1997)Google Scholar
  20. 20.
    Rivera, G., Tseng, C.-W.: Data transformations for eliminating conflict misses. In: Proceedings of the SIGPLAN 1998 Conference on Programming Language Design and Implementation, Montreal, Canada (June 1998)Google Scholar
  21. 21.
    Rivera, G., Tseng, C.-W.: Eliminating conflict misses for high performance architectures. In: Proceedings of the 1998 ACM International Conference on Supercomputing, Melbourne, Australia (July 1998)Google Scholar
  22. 22.
    Temam, O., Fricker, C., Jalby, W.: Cache interference phenomena. In: Proceedings of the 1994 ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, Santa Clara, CA (May 1994)Google Scholar
  23. 23.
    Temam, O., Granston, E., Jalby, W.: To copy or not to copy: A compiletime technique for assessing when data copying should be used to eliminate cache conflicts. In: Proceedings of Supercomputing 1993, Portland, OR (November 1993)Google Scholar
  24. 24.
    Wolf, M., Maydan, D., Chen, D.-K.: Combining loop transformations considering caches and scheduling. In: Proceedings of the 29th IEEE/ACM International Symposium on Microarchitecture, Paris, France (December 1996)Google Scholar
  25. 25.
    Wolf, M.E., Lam, M.: A data locality optimizing algorithm. In: Proceedings of the SIGPLAN 1991 Conference on Programming Language Design and Implementation, Toronto, Canada (June 1991)Google Scholar
  26. 26.
    Wolfe, M.J.: More iteration space tiling. In: Proceedings of Supercomputing 1989, Reno, NV (November 1989)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Gabriel Rivera
    • 1
  • Chau-Wen Tseng
    • 1
  1. 1.Department of Computer ScienceUniversity of Maryland College ParkUSA

Personalised recommendations