Skip to main content

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 613))

Abstract

Tile-size selection is known to be a complex problem. This paper develops a new selection algorithm targeting relaxation codes. Unlike previous algorithms, this new algorithm considers the effect of loop skewing, which is necessary to tile such codes. It also estimates loop overhead and incorporates them into the execution cost model, which turns out to be critical to the decision between tiling a single loop level vs. tiling two loop levels. Our preliminary experimental results show a significant impact of these previously ignored issues on the execution time of tiled loops in relaxation codes. In our experiments, we measured the cache miss rate and the execution time of five benchmark programs on a single processor and we compared our algorithm with previous algorithms. Our algorithm achieves an average speedup of 1.27 to 1.63 over all the other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allan, V., Jones, R., Lee, R., and Allan, S. (1993). Software pipelining. ACM Computing Surveys, 27 (3): 367–432.

    Article  Google Scholar 

  2. Callahan, D., Carr, S., and Kennedy, K. (1990). Improving register allocation for subscripted variables. In Proceedings of ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation, pages 53–65, White Plains, New York.

    Chapter  Google Scholar 

  3. Chame, J. and Moon, S. (1999). A tile selection algorithm for data locality and cache interference. In Proceedings of the Thirteenth ACM International Conference on Supercomputing, pages 492–499, Rhodes, Greece.

    Google Scholar 

  4. Coleman, S. and McKinley, K. S. (1995). Tile size selection using cache organization and data layout. In Proceedings of ACM SIGPLANConference on Programming Language Design and Implementation, pages 279–290, La Jolla, CA.

    Google Scholar 

  5. Ferrante, J., Sarkar, V., and Thrash, W.(1991). On estimating and enhancing cache effectiveness. In Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing. Also in Lecture Notes in Computer Science pp. 328–341Springer-Verlag, August 1991.

    Google Scholar 

  6. Ghosh, S., Martonosi, M., and Malik, S.(1998). Precise miss analysis for program transformations with caches of arbitrary associativity. In Proceedings of the Eighth ACM Conference on Architectural Support for Programming Languages and Operating Systemspages 228–239San Jose, California.

    Google Scholar 

  7. Hennessy, J. and Patterson, D. (1996). Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers.

    Google Scholar 

  8. Kodukula, I., Ahmed, N., and Pingali, K. (1997). Data-centric multilevel blocking. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 346–357, Las Vegas, NV.

    Google Scholar 

  9. Lam, M. S., Rothberg, E. E., and Wolf, M. E. (1991). The cache performance and optimizations of blocked algorithms. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 63–74, Santa Clara, CA.

    Google Scholar 

  10. Manjikian, N. and Abdelrahman, T. (1997). Fusion of loops for parallelism and locality. IEEE Transactions on Parallel and Distributed Systems, 8 (2): 193–209.

    Article  Google Scholar 

  11. McKinley, K., Carr, S., and Tseng, C.-W. (1996). Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18 (4): 424–453.

    Article  Google Scholar 

  12. Mitchell, N., Högstedt, K., Carter, L., and Ferrante, J. (1998). Quantifying the multi-level nature of tiling interactions. International Journal of Parallel Programming, 26 (6): 641–670.

    Article  Google Scholar 

  13. Panda, P., Nakamura, H., Dutt, N., and Nicolau, A. (1999). Augmenting loop tiling with data alignment for improved cache performance. IEEE Transactions on Computers, 48 (2): 142–149.

    Article  Google Scholar 

  14. Park, S. and Miller, K. (1988). Random number generators: Good ones are hard to find. Communications of the ACM, 31 (10): 1192–1201.

    Article  MathSciNet  Google Scholar 

  15. Rivera, G. and Tseng, C.-W. (1998). Eliminating conflict misses for high performance architectures. In Proceedings of the 1998 ACMInternational Conference on Supercomputing, pages 353–360, Melbourne, Australia.

    Google Scholar 

  16. Rivera, G. and Tseng, C.-W. (1999). A comparison of compiler tiling algorithms. In Proceedings of the Eighth International Conference on Compiler Construction,Amsterdam, The Netherlands.

    Google Scholar 

  17. Song, Y. and Li, Z. (1999a). A compiler framework for tiling imperfectly-nested loops. In Proceedings of the Twelfth International Workshop on Languages and Compilers for Parallel Computing,San Diego, CA.

    Google Scholar 

  18. Song, Y. and Li, Z. (1999b). New tiling techniques to improve cache temporal locality. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 215–228, Atlanta, GA.

    Google Scholar 

  19. Song, Y. and Li, Z. (2000). Impact of tile-size selectionfor skewed tiling. Technical Report CSD-TR-00–0018, Department of Computer Science, Purdue University.

    Google Scholar 

  20. Temam, O., Fricker, C., and Jalby, W. (1994). Cache interference phenomena. In Proceedings of SIGMETRICS’94, pages 261–271, Santa Clara, CA.

    Google Scholar 

  21. Wolf, M. (1992). Improving Locality and Parallelism in Nested Loops. PhD thesis, Department of Computer Science, Stanford University.

    Google Scholar 

  22. Wolf, M. E. and Lam, M. S. (1991). A data locality optimizing algorithm. In Proceedings of ACM SIGPLAN Conference on Programming Languages Design and Implementation, pages 30–44, Toronto, Ontario, Canada.

    Google Scholar 

  23. Wolf, M. E., Maydan, D. E., and Chen, D.-K. (1996). Combining loop transformations considering caches and scheduling. In Proceedings of the Twenty-Ninth Annual IEEE/ACM International Symposium on Microarchitecture, pages 274–286, Paris, France.

    Chapter  Google Scholar 

  24. Wolfe, M. (1995). High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer Science+Business Media New York

About this chapter

Cite this chapter

Song, Y., Li, Z. (2001). Impact of Tile-Size Selection for Skewed Tiling. In: Lee, G., Yew, PC. (eds) Interaction between Compilers and Computer Architectures. The Springer International Series in Engineering and Computer Science, vol 613. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3337-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-3337-2_3

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-4896-0

  • Online ISBN: 978-1-4757-3337-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics