Skip to main content

Improving cache performance through tiling and data alignment

  • Systems and Applications
  • Conference paper
  • First Online:
Solving Irregularly Structured Problems in Parallel (IRREGULAR 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1253))

Abstract

We address the problem of improving the data cache performance of numerical applications — specifically, those with blocked (or tiled) loops. We present DAT, a data alignment technique utilizing array-padding, to improve program performance through minimizing cache conflict misses. We describe algorithms for selecting tile sizes for maximizing data cache utilization, and computing pad sizes for eliminating self-interference conflicts in the chosen tile. We also present a generalization of the technique to handle applications with several tiled arrays. Our experimental results comparing our technique with previous published approaches on machines with different cache configurations show consistently good performance on several benchmark programs, for a variety of problem sizes.

This work was partially supported by grants from NSF(CDA-9422095) and ONR(N00014-93-1-1348).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Carr and K. Kennedy, “Compiler blockability of numerical algorithms,” Proceedings of Supercomputing '92, Minneapolis, MN, November 1992.

    Google Scholar 

  2. D. Callahan, K. Kennedy, and A. Porterfield, “Software Prefetching,” International Conference on Architectural Support for Programming Languages and Operating Systems, pp 40–52, April 1991.

    Google Scholar 

  3. S. Coleman and K. S. McKinley, “Tile size selection using cache organization and data layout,” Proceedings of the SIGPLAN'95 Conference on Programming Language Design and Implementation, La Jolla, CA, June 1995.

    Google Scholar 

  4. P. M. Embree and B. Kimble, “C Language Algorithms for Digital Signal Processing,” Prentice Hall, Englewood Cliffs, 1991.

    Google Scholar 

  5. K. Esseghir, “Improving data locality for caches,” Master's thesis, Dept. of Computer Science, Rice University, 1993.

    Google Scholar 

  6. M. Lam, E. Rothberg, and M. E. Wolf, “The cache performance and optimizations of blocked algorithms,” Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Santa Clara, CA April 1991.

    Google Scholar 

  7. A. R. Lebeck and D. A. Wood, “Cache profiling and the SPEC benchmarks: A case study,” IEEE Computer, Vol. 27, No. 10, October 1994.

    Google Scholar 

  8. J. J. Navarro, T. Juan, and T. Lang, “Mob forms: A class of multilevel block algorithms for dense linear algebra operations,” Proceedings of the 1994 ACM International Conference on Supercomputing, Manchester, England, June 1994.

    Google Scholar 

  9. D. A. Patterson and J. L. Hennessy, “Computer Organization & Design — The Hardware/Software Interface,” Morgan Kaufman Publishers, pp. 454–530, 1994.

    Google Scholar 

  10. W. H. Press, et. al., “Numerical Recipes in C: The Art of Scientific Computing,” Cambridge University Press, 1992.

    Google Scholar 

  11. Sun Microsystems Laboratories Inc., “Shade User's Manual,” Mountain View, CA, USA, 1993.

    Google Scholar 

  12. O. Temam, E. D. Granston, and W. Jalby “to Copy or Not to Copy: A Compile-Time Technique for Assessing When Data Copying Should be Used to Eliminate Cache Conflicts,” Proceedings of Supercomputing '93, Portland, OR, USA, November 1993.

    Google Scholar 

  13. M. J. Wolfe, “More iteration space tiling,” Proceedings of Supercomputing '89, Reno, NV, USA, November 1989.

    Google Scholar 

  14. M. E. Wolf and M. Lam, “A data locality optimizing algorithm,” Proceedings of the SIGPLAN'91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Gianfranco Bilardi Afonso Ferreira Reinhard Lüling José Rolim

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Panda, P.R., Nakamura, H., Dutt, N.D., Nicolau, A. (1997). Improving cache performance through tiling and data alignment. In: Bilardi, G., Ferreira, A., Lüling, R., Rolim, J. (eds) Solving Irregularly Structured Problems in Parallel. IRREGULAR 1997. Lecture Notes in Computer Science, vol 1253. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63138-0_16

Download citation

  • DOI: https://doi.org/10.1007/3-540-63138-0_16

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63138-5

  • Online ISBN: 978-3-540-69157-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics