International Journal of Parallel Programming

, Volume 33, Issue 5, pp 529–559 | Cite as

Using the First-Level Caches as Filters to Reduce the Pollution Caused by Speculative Memory References

  • Onur Mutlu
  • Hyesoon Kim
  • David N. Armstrong
  • Yale N. Patt

Abstract

High-performance processors employ aggressive branch prediction and prefetching techniques to increase performance. Speculative memory references caused by these techniques sometimes bring data into the caches that are not needed by correct execution. This paper proposes the use of the first-level caches as filters that predict the usefulness of speculative memory references. With the proposed technique, speculative memory references bring data only into the first-level caches rather than all levels in the cache hierarchy. The processor monitors the use of the cache blocks in the first-level caches and decides which blocks to keep in the cache hierarchy based on the usefulness of cache blocks. It is shown that a simple implementation of this technique usually outperforms inclusive and exclusive baseline cache hierarchies commonly used by today’s processors and results in IPC performance improvements of up to 10% on the SPEC CPU2000 integer benchmarks.

Keywords

Caches cache pollution cache filtering speculative memory references runahead execution 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    J. E. Smith, A Study of Branch Prediction Strategies, Proceedings of the 8th International Symposium on Computer Architecture, pp. 135–148 (1981).Google Scholar
  2. 2.
    T.-Y. Yeh and Y. N. Patt, Two-Level Adaptive Branch Prediction, Proceedings of the 24th ACM/IEEE International Symposium on Microarchitecture, pp. 51–61 (1991).Google Scholar
  3. 3.
    Gindele, J.D. 1977Buffer Block Prefetching MethodIBM Technical Disclosure Bulletin.20696697JulyGoogle Scholar
  4. 4.
    N. P. Jouppi, Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers, Proceedings of the 17th International Symposium on Computer Architecture, pp. 364–373 (1990).Google Scholar
  5. 5.
    J.-L. Baer and T. Chen, An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty, Proceedings of Supercomputing ’91, pp. 178–186 (1991).Google Scholar
  6. 6.
    D. Joseph and D. Grunwald, Prefetching using Markov Predictors, Proceedings of the 24th International Symposium on Computer Architecture, pp, 252–263 (1997).Google Scholar
  7. 7.
    M. H. Lipasti, C. Wilkerson, and J. P. Shen, Value Locality and Load Value Prediction, Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 226–237 (1996).Google Scholar
  8. 8.
    A. Moshovos and G. S. Sohi, Streamlining Inter-operation Memory Communication via Data Dependence Prediction, Proceedings of the 30th ACM/IEEE International Symposium on Microarchitecture, pp. 235–245 (1997).Google Scholar
  9. 9.
    T. C. Mowry, M. S. Lam, and A. Gupta, Design and Evaluation of a Compiler Algorithm for Prefetching, Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 62–73 (1992).Google Scholar
  10. 10.
    A. Yoaz, M. Erez, R. Ronen, and S. Jourdan, Speculation Techniques for Improving Load Related Instruction Scheduling, Proceedings of the 26th International Symposium on Computer Architecture, pp. 42–53 (1999).Google Scholar
  11. 11.
    Tomasulo, R.M. 1967An Efficient Algorithm for Exploiting Multiple Arithmetic UnitsIBM Journal of Research and Development112533JanuaryGoogle Scholar
  12. 12.
    O. Mutlu, H. Kim, D. N. Armstrong, and Y. N. Patt, Understanding The Effects of Wrong-Path Memory References on Processor Performance, Proceedings of the 3rd Workshop on Memory Performance Issues, pp. 56–64 (2004).Google Scholar
  13. 13.
    D. Kroft, Lockup-free Instruction Fetch/Prefetch Cache Organization, Proceedings of the 8th International Symposium on Computer Architecture, pp. 81–87 (1981).Google Scholar
  14. 14.
    Kessler, R.E. 1999The Alpha 21264 microprocessorIEEE Micro.192436CrossRefGoogle Scholar
  15. 15.
    O. Mutlu, J. Stark, C. Wilkerson, and Y. N. Patt, Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors, Proceedings of the 9th International Symposium on High Performance Computer Architecture, pp. 129–140 (2003).Google Scholar
  16. 16.
    S. McFarling, Combining Branch Predictors, Technical Report TN-36, Digital Western Research Laboratory (June 1993).Google Scholar
  17. 17.
    T.-Y. Yeh and Y. N. Patt, Alternative Implementations of Two-Level Adaptive Branch Prediction, Proceedings of the 19th International Symposium on Computer Architecture, pp. 124–134 (1992).Google Scholar
  18. 18.
    P.-Y. Chang, E. Hao, and Y. N. Patt, Predicting Indirect Jumps Using a Target Cache, Proceedings of the 24th International Symposium on Computer Architecture, pp. 274–283 (1997).Google Scholar
  19. 19.
    J. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy, POWER4 System Microarchitecture, IBM Technical White Paper (October 2001).Google Scholar
  20. 20.
    A. KleinOsowski and D. J. Lilja, MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research, Computer Architecture Letters, 1 (June 2002).Google Scholar
  21. 21.
    J.-L. Baer and W.-H. Wang, On the Inclusion Properties for Multi-level Cache Hierarchies, Proceedings of the 15th International Symposium on Computer Architecture, pp. 73–80 (1988).Google Scholar
  22. 22.
    N. P. Jouppi and S. J. E. Wilton, Tradeoffs in Two-Level On-Chip Caching, Proceedings of the 21st International Symposium on Computer Architecture, pp. 34–45 (1994).Google Scholar
  23. 23.
    E. Rotenberg, Q. Jacobson, and J. E. Smith, A Study of Control Independence in Superscalar Processors, Proceedings of the 5th International Symposium on High Performance Computer Architecture, pp. 115–124 (1999).Google Scholar
  24. 24.
    D. N. Armstrong, H. Kim, O. Mutlu, and Y. N. Patt, Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery, Proceedings of the 37th ACM/IEEE International Symposium on Microarchitecture, pp. 119–128 (2004).Google Scholar
  25. 25.
    G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel, The Microarchitecture of the Pentium 4 Processor, Intel Technology Journal, Q12001 Issue (February 2001).Google Scholar
  26. 26.
    X. Zhuang and H.-H. S. Lee, A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches, Proceedings of the 32nd International Conference on Parallel Processing, pp. 286–293 (2003).Google Scholar
  27. 27.
    V. Srinivasan, G. S. Tyson, and E. S. Davidson, A Static Filter for Reducing Prefetch Traffic, Technical Report CSE-TR-400-99, University of Michigan Technical Report (1999).Google Scholar
  28. 28.
    A.-C. Lai, C. Fide, and B. Falsafi, Dead-Block Prediction and Dead-Block Correlating Prefetchers, Proceedings of the 28th International Symposium on Computer Architecture, pp. 144–154 (2001).Google Scholar
  29. 29.
    P. Jain, S. Devadas, and L. Rudolph, Controlling Cache Pollution in Prefetching With Software-assisted Cache Replacement, Technical Report CSG-462, Massachusetts Institute of Technology (2001).Google Scholar
  30. 30.
    K. Beyls and E. H. D’Hollander, Reuse Distance-Based Cache Hint Selection, Proceedings of the 8th International Euro-Par Conference on Parallel Processing, pp. 265–274 (2002).Google Scholar
  31. 31.
    Z. Wang, K. S. McKinley, A. L. Rosenberg, and C. C. Weems, Using the Compiler to Improve Cache Replacement Decisions, Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (2002).Google Scholar
  32. 32.
    R. L. Lee, P.-C. Yew, and D. H. Lawrie, Data Prefetching in Shared Memory Multiprocessors, Proceedings of the International Conference on Parallel Processing (1987).Google Scholar
  33. 33.
    R. I. Bahar and G. Albera, Performance Analysis of Wrong-Path Data Cache Accesses, Workshop on Performance Analysis and its Impact on Design (1998).Google Scholar
  34. 34.
    R. Sendag, D. J. Lilja, and S. R. Kunkel, Exploiting the Prefetching Effect Provided by Executing Mispredicted Load Instructions, Proceedings of the 8th International Euro-Par Conference on Parallel Processing, pp. 468–480 (2002).Google Scholar
  35. 35.
    T. L. Johnson and W. W. Hwu, Run-time Adaptive Cache Hierarchy Management via Reference Analysis, Proceedings of the 24th International Symposium on Computer Architecture, pp. 315–326 (1997).Google Scholar
  36. 36.
    N. N. Mekhiel, Multi-Level Cache With Most Frequently Used Policy: A New Concept in Cache Design, International Conference on Computer Applications in Industry and Engineering (Nov 1995).Google Scholar

Copyright information

© Springer Science+Business Media, Inc. 2005

Authors and Affiliations

  • Onur Mutlu
    • 1
  • Hyesoon Kim
    • 1
  • David N. Armstrong
    • 1
  • Yale N. Patt
    • 1
  1. 1.Department of Electrical and Computer EngineeringUniversity of Texas at AustinAustinUSA

Personalised recommendations