On estimating and enhancing cache effectiveness

  • J. Ferrante
  • V. Sarkar
  • W. Thrash
VIII. Cache Memory Issues
Part of the Lecture Notes in Computer Science book series (LNCS, volume 589)

Abstract

In this paper, we consider automatic analysis of a program's cache usage to achieve greater cache effectiveness. We show how to estimate efficiently the number of distinct cache lines used by a given loop in a nest of loops. Given this estimate of the number of cache lines needed, we can estimate the number of cache misses for a nest of loops. Our estimates can be used to guide program transformations such as loop interchange to achieve greater cache effectiveness. We present simulation results that show our estimates are reasonable for simple cases such as matrix multiply. We analyze the array sizes for which our estimates differ from our simulation results, and provide recommendations on how to handle such arrays in practice.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    A. V. Aho, R. Sethi, and J.D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986.Google Scholar
  2. [2]
    Frances Allen, Michael Burke, Philippe Charles, Ron Cytron, and Jeanne Ferrante. An overview of the ptran analysis system for multiprocessing. Proceedings of the ACM 1987 International Conference on Supercomputing, 1987. Also published in The Journal of Parallel and Distributed Computing, Oct., 1988, Vol. 5, No. 5, pp. 617–640.Google Scholar
  3. [3]
    H. B. Bakoglu, G. F. Grohoski, and R. K. Montoye. The ibm risc system/6000 processor: Hardware overview. IBM Journal of Research and Development, 34(1):12–23, January 1990.Google Scholar
  4. [4]
    H. B. Bakoglu and T. Whiteside. Risc system/6000 hardware overview. IBM RISC System/6000 Technology, pages 8–15, 1990. IBM Corporation SA23-2619.Google Scholar
  5. [5]
    Vasanth Balasundaram. A mechanism for keeping useful internal information in parallel programming tools: The data access descriptor. Journal of Parallel and Distributed Computing, 9:154–170, 1990.Google Scholar
  6. [6]
    Utpal Banerjee. Data dependence in ordinary programs. Technical report, University of Illinois at Urbana-Champaign, 1976. M.S. Thesis.Google Scholar
  7. [7]
    Utpal Banerjee. Dependence Analysis for Supercomputing, Kluwer Academic Publishers, Norwell, Massachusetts, 1988.Google Scholar
  8. [8]
    Michael Burke and Ron Cytron. Interprocedural dependence analysis and parallelization. Proceedings of the Sigplan '86 Symposium on Compiler Construction, 21(7):162–175, July 1986.Google Scholar
  9. [9]
    David Callahan and Allan Porterfield. Data cache performance of supercomputer applications. Proceedings of Supercomputing '90, pages 564–572, November 1990. New York, New York.Google Scholar
  10. [10]
    Larry Carter, Jeanne Ferrante, Vivek Sarkar, and Wendy Thrash. On estimating and enhancing cache effectivness, 1991. Full paper corresponding to this extended abstract.Google Scholar
  11. [11]
    Kyle Gallivan, William Jalby, and Dennis Gannon. On the problem of optimizing data transfers for complex memory systems. Technical report, U. of IL-Center for Supercomputing Research and Development, July Also in Proc. of ACM 1988 Int'l. Conf. on Supercomputing, St. Malo, France, July 4–8, 1988, pp.238–253. 1988.Google Scholar
  12. [12]
    Dennis Gannon, William Jalby, and Kyle Gallivan. Strategies for cache and local memory management by global program transformations. Proceedings of the First ACM International Conference on Supercomputing, June 1987.Google Scholar
  13. [13]
    Kourosh Gharachorloo and Vivek Sarkar. Loop partitioning and blocking to reduce communication and cache miss traffic. Foils documenting work done at the IBM T.J. Watson Research Center during the summer of 1989., August 1989.Google Scholar
  14. [14]
    John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, 1990.Google Scholar
  15. [15]
    Donald E. Knuth. Seminumerical Algorithms, Volume 2, The Art of Computer Programming, Second Edition. Addison-Wesley, 1981.Google Scholar
  16. [16]
    Monica S. Lam, Edward E. Rothberg, and Michael E. Wolf. The cache performance and optimization of blocked algorithms. Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, April 1991.Google Scholar
  17. [17]
    David A. Padua and Michael J. Wolfe. Advanced compiler optimizations for supercomputers. Communications of the ACM, 29(12):1184–1201, December 1986.Google Scholar
  18. [18]
    Allan K. Porterfield. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD thesis, Rice University, May 1989. Rice COMP TR89-93.Google Scholar
  19. [19]
    Rafael Saavedra-Barrera. Private communication, March 1991.Google Scholar
  20. [20]
    Vivek Sarkar. Determining average program execution times and their variance. Proceedings of the 1989 SIGPLAN Conference on Programming Language Design and Implementation, 24(7):298–312, July 1989.Google Scholar
  21. [21]
    Zhiyu Shen, Zhiyuan Li, and Pen-Chung Yew. An empirical study on array subscripts and data dependences. Technical report, University of Illinois-CSRD, May 1989. CSRD Rpt. No. 840 Appeared in the Proceedings of the 1989 Int'l Conf. on Parallel Processing.Google Scholar
  22. [22]
    Michael E. Wolf and Monica S. Lam. A data locality optimization algorithm. Proceedings of the ACM SIGPLAN Symposium on Programming Language Design and Implementation, June 1991.Google Scholar
  23. [23]
    Michael J. Wolfe. Optimizing Supercompilers for Supercomputers. Pitman, London and The MIT Press, Cambridge, Massachusetts, 1989. In the series, Research Monographs in Parallel and Distributed Computing This monograph is a revised version of the author's Ph.D. dissertation published as Technical Report UIUCDCS-R-82-1105, U. Illinois at Urbana-Champaign, 1982.Google Scholar

Copyright information

© Springer-Verlag 1992

Authors and Affiliations

  • J. Ferrante
    • 1
  • V. Sarkar
    • 2
  • W. Thrash
    • 3
  1. 1.IBM Research DivisionT. J. Watson Research CenterYorktown Heights
  2. 2.IBM Palo Alto Scientific CenterPalo Alto
  3. 3.Department of Computer Science and Engineering, FR-35University of WashingtonSeattle

Personalised recommendations