Advertisement

Modulo scheduling with cache reuse information

  • Chen Ding
  • Steve Carr
  • Phil Sweany
Workshop 17: Instruction-Level Parallelism
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1300)

Abstract

Software pipelining for instruction-level parallel computers with non-blocking caches usually assigns memory access latency by assuming either all accesses are cache hits or all are cache misses. We contend setting memory latencies by cache reuse analysis leads to better software pipelining than either an all-hit or all-miss assumption. Using a simple cache-reuse model, our software pipelining optimization achieved 10% improved execution performance over assuming all-cache-hits and used 18% fewer registers than required by an all-cache-miss assumption. We conclude that software pipelining for architectures with non-blocking cache should incorprate a memory-reuse model.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abraham, S., Sugumar, R., Windheiser, D., Rau, B., and Gupta, R. Predictability of load/store instruction latencies. In Proceedings of the 26th International Symposium on Microarchitecture (MICRO-26) (Austin, TX, December 1993), pp. 139–152.Google Scholar
  2. 2.
    Chen, T.-F., and Baer, J.-L. Reducing memory latency via non-blocking and prefetching caches. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Boston, Massachusetts, 1992), pp. 51–61.Google Scholar
  3. 3.
    Ding, C., Carr, S., and Sweany, P. Software pipelining with cache-reuse information. Tech. Rep. 96-07, Michigan Technological University, Sept. 1996. ftp://cs.mtu.edu/pub/carr/moduto.ps.gz.Google Scholar
  4. 4.
    Lam, M. Software pipelining: An effective scheduling technique for VLIW machines. SIGPLAN Notices 23, 7 (July 1988), 318–328. Proceedings of the ACM SIGPLAN '88 Conference on Programming Language Design and Implementation.Google Scholar
  5. 5.
    McKinley, K. S., Carr, S., and Tseng, C.-W. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems 18, 4 (1996), 424–453.Google Scholar
  6. 6.
    Mowry, T. C., Lam, M. S., and Gupta, A. Design and evaluation of a compiler algorithm for prefetching. In Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Boston, Massachusetts, 1992), pp. 62–75.Google Scholar
  7. 7.
    Rau, B. Iterative modulo scheduling. In Proceedings of the 27th International Symposium on Microarchitecture (MICRO-27) (San Jose, CA, December 1994), pp. 63–74.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Chen Ding
    • 1
  • Steve Carr
    • 2
  • Phil Sweany
    • 2
  1. 1.Dept. of Computer ScienceRice UniversityHouston
  2. 2.Dept. of Computer ScienceMichigan Technological UniversityHoughton

Personalised recommendations