Impact of Tile-Size Selection for Skewed Tiling

Song, Yonghong; Li, Zhiyuan

doi:10.1007/978-1-4757-3337-2_3

Yonghong Song³ &
Zhiyuan Li³

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 613))

115 Accesses
4 Citations

Abstract

Tile-size selection is known to be a complex problem. This paper develops a new selection algorithm targeting relaxation codes. Unlike previous algorithms, this new algorithm considers the effect of loop skewing, which is necessary to tile such codes. It also estimates loop overhead and incorporates them into the execution cost model, which turns out to be critical to the decision between tiling a single loop level vs. tiling two loop levels. Our preliminary experimental results show a significant impact of these previously ignored issues on the execution time of tiled loops in relaxation codes. In our experiments, we measured the cache miss rate and the execution time of five benchmark programs on a single processor and we compared our algorithm with previous algorithms. Our algorithm achieves an average speedup of 1.27 to 1.63 over all the other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allan, V., Jones, R., Lee, R., and Allan, S. (1993). Software pipelining. ACM Computing Surveys, 27 (3): 367–432.
Article Google Scholar
Callahan, D., Carr, S., and Kennedy, K. (1990). Improving register allocation for subscripted variables. In Proceedings of ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation, pages 53–65, White Plains, New York.
Chapter Google Scholar
Chame, J. and Moon, S. (1999). A tile selection algorithm for data locality and cache interference. In Proceedings of the Thirteenth ACM International Conference on Supercomputing, pages 492–499, Rhodes, Greece.
Google Scholar
Coleman, S. and McKinley, K. S. (1995). Tile size selection using cache organization and data layout. In Proceedings of ACM SIGPLANConference on Programming Language Design and Implementation, pages 279–290, La Jolla, CA.
Google Scholar
Ferrante, J., Sarkar, V., and Thrash, W.(1991). On estimating and enhancing cache effectiveness. In Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing. Also in Lecture Notes in Computer Science pp. 328–341Springer-Verlag, August 1991.
Google Scholar
Ghosh, S., Martonosi, M., and Malik, S.(1998). Precise miss analysis for program transformations with caches of arbitrary associativity. In Proceedings of the Eighth ACM Conference on Architectural Support for Programming Languages and Operating Systemspages 228–239San Jose, California.
Google Scholar
Hennessy, J. and Patterson, D. (1996). Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers.
Google Scholar
Kodukula, I., Ahmed, N., and Pingali, K. (1997). Data-centric multilevel blocking. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 346–357, Las Vegas, NV.
Google Scholar
Lam, M. S., Rothberg, E. E., and Wolf, M. E. (1991). The cache performance and optimizations of blocked algorithms. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 63–74, Santa Clara, CA.
Google Scholar
Manjikian, N. and Abdelrahman, T. (1997). Fusion of loops for parallelism and locality. IEEE Transactions on Parallel and Distributed Systems, 8 (2): 193–209.
Article Google Scholar
McKinley, K., Carr, S., and Tseng, C.-W. (1996). Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18 (4): 424–453.
Article Google Scholar
Mitchell, N., Högstedt, K., Carter, L., and Ferrante, J. (1998). Quantifying the multi-level nature of tiling interactions. International Journal of Parallel Programming, 26 (6): 641–670.
Article Google Scholar
Panda, P., Nakamura, H., Dutt, N., and Nicolau, A. (1999). Augmenting loop tiling with data alignment for improved cache performance. IEEE Transactions on Computers, 48 (2): 142–149.
Article Google Scholar
Park, S. and Miller, K. (1988). Random number generators: Good ones are hard to find. Communications of the ACM, 31 (10): 1192–1201.
Article MathSciNet Google Scholar
Rivera, G. and Tseng, C.-W. (1998). Eliminating conflict misses for high performance architectures. In Proceedings of the 1998 ACMInternational Conference on Supercomputing, pages 353–360, Melbourne, Australia.
Google Scholar
Rivera, G. and Tseng, C.-W. (1999). A comparison of compiler tiling algorithms. In Proceedings of the Eighth International Conference on Compiler Construction,Amsterdam, The Netherlands.
Google Scholar
Song, Y. and Li, Z. (1999a). A compiler framework for tiling imperfectly-nested loops. In Proceedings of the Twelfth International Workshop on Languages and Compilers for Parallel Computing,San Diego, CA.
Google Scholar
Song, Y. and Li, Z. (1999b). New tiling techniques to improve cache temporal locality. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 215–228, Atlanta, GA.
Google Scholar
Song, Y. and Li, Z. (2000). Impact of tile-size selectionfor skewed tiling. Technical Report CSD-TR-00–0018, Department of Computer Science, Purdue University.
Google Scholar
Temam, O., Fricker, C., and Jalby, W. (1994). Cache interference phenomena. In Proceedings of SIGMETRICS’94, pages 261–271, Santa Clara, CA.
Google Scholar
Wolf, M. (1992). Improving Locality and Parallelism in Nested Loops. PhD thesis, Department of Computer Science, Stanford University.
Google Scholar
Wolf, M. E. and Lam, M. S. (1991). A data locality optimizing algorithm. In Proceedings of ACM SIGPLAN Conference on Programming Languages Design and Implementation, pages 30–44, Toronto, Ontario, Canada.
Google Scholar
Wolf, M. E., Maydan, D. E., and Chen, D.-K. (1996). Combining loop transformations considering caches and scheduling. In Proceedings of the Twenty-Ninth Annual IEEE/ACM International Symposium on Microarchitecture, pages 274–286, Paris, France.
Chapter Google Scholar
Wolfe, M. (1995). High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company.
Google Scholar

Download references

Author information

Authors and Affiliations

Purdue University, USA
Yonghong Song & Zhiyuan Li

Authors

Yonghong Song
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyuan Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Iowa State University, 50011, Ames, IA, USA
Gyungho Lee
University of Minnesota, 55455, Minneapolis, MN, USA
Pen-Chung Yew

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Song, Y., Li, Z. (2001). Impact of Tile-Size Selection for Skewed Tiling. In: Lee, G., Yew, PC. (eds) Interaction between Compilers and Computer Architectures. The Springer International Series in Engineering and Computer Science, vol 613. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3337-2_3

Download citation

DOI: https://doi.org/10.1007/978-1-4757-3337-2_3
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-4896-0
Online ISBN: 978-1-4757-3337-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics