Using the First-Level Caches as Filters to Reduce the Pollution Caused by Speculative Memory References

Mutlu, Onur; Kim, Hyesoon; Armstrong, David N.; Patt, Yale N.

doi:10.1007/s10766-005-7304-x

Using the First-Level Caches as Filters to Reduce the Pollution Caused by Speculative Memory References

Published: October 2005

Volume 33, pages 529–559, (2005)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Onur Mutlu¹,
Hyesoon Kim¹,
David N. Armstrong¹ &
…
Yale N. Patt¹

119 Accesses
7 Citations
Explore all metrics

Abstract

High-performance processors employ aggressive branch prediction and prefetching techniques to increase performance. Speculative memory references caused by these techniques sometimes bring data into the caches that are not needed by correct execution. This paper proposes the use of the first-level caches as filters that predict the usefulness of speculative memory references. With the proposed technique, speculative memory references bring data only into the first-level caches rather than all levels in the cache hierarchy. The processor monitors the use of the cache blocks in the first-level caches and decides which blocks to keep in the cache hierarchy based on the usefulness of cache blocks. It is shown that a simple implementation of this technique usually outperforms inclusive and exclusive baseline cache hierarchies commonly used by today’s processors and results in IPC performance improvements of up to 10% on the SPEC CPU2000 integer benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PS-Cache: an energy-efficient cache design for chip multiprocessors

Article 13 September 2014

Joan J. Valls, Alberto Ros, … Maria E. Gomez

Dynamically Spawning Speculative Threads to Improve Speculative Path Execution

A New Prefetch Policy for Data Filter Cache in Energy-Aware Embedded Systems

References

J. E. Smith, A Study of Branch Prediction Strategies, Proceedings of the 8th International Symposium on Computer Architecture, pp. 135–148 (1981).
T.-Y. Yeh and Y. N. Patt, Two-Level Adaptive Branch Prediction, Proceedings of the 24th ACM/IEEE International Symposium on Microarchitecture, pp. 51–61 (1991).
J.D. Gindele (1977) ArticleTitleBuffer Block Prefetching Method IBM Technical Disclosure Bulletin. 20 IssueID2 696–697
Google Scholar
N. P. Jouppi, Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers, Proceedings of the 17th International Symposium on Computer Architecture, pp. 364–373 (1990).
J.-L. Baer and T. Chen, An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty, Proceedings of Supercomputing ’91, pp. 178–186 (1991).
D. Joseph and D. Grunwald, Prefetching using Markov Predictors, Proceedings of the 24th International Symposium on Computer Architecture, pp, 252–263 (1997).
M. H. Lipasti, C. Wilkerson, and J. P. Shen, Value Locality and Load Value Prediction, Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 226–237 (1996).
A. Moshovos and G. S. Sohi, Streamlining Inter-operation Memory Communication via Data Dependence Prediction, Proceedings of the 30th ACM/IEEE International Symposium on Microarchitecture, pp. 235–245 (1997).
T. C. Mowry, M. S. Lam, and A. Gupta, Design and Evaluation of a Compiler Algorithm for Prefetching, Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 62–73 (1992).
A. Yoaz, M. Erez, R. Ronen, and S. Jourdan, Speculation Techniques for Improving Load Related Instruction Scheduling, Proceedings of the 26th International Symposium on Computer Architecture, pp. 42–53 (1999).
R.M. Tomasulo (1967) ArticleTitleAn Efficient Algorithm for Exploiting Multiple Arithmetic Units IBM Journal of Research and Development 11 25–33
Google Scholar
O. Mutlu, H. Kim, D. N. Armstrong, and Y. N. Patt, Understanding The Effects of Wrong-Path Memory References on Processor Performance, Proceedings of the 3rd Workshop on Memory Performance Issues, pp. 56–64 (2004).
D. Kroft, Lockup-free Instruction Fetch/Prefetch Cache Organization, Proceedings of the 8th International Symposium on Computer Architecture, pp. 81–87 (1981).
R.E. Kessler (1999) ArticleTitleThe Alpha 21264 microprocessor IEEE Micro. 19 IssueID2 24–36 Occurrence Handle10.1109/40.755465
Article Google Scholar
O. Mutlu, J. Stark, C. Wilkerson, and Y. N. Patt, Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors, Proceedings of the 9th International Symposium on High Performance Computer Architecture, pp. 129–140 (2003).
S. McFarling, Combining Branch Predictors, Technical Report TN-36, Digital Western Research Laboratory (June 1993).
T.-Y. Yeh and Y. N. Patt, Alternative Implementations of Two-Level Adaptive Branch Prediction, Proceedings of the 19th International Symposium on Computer Architecture, pp. 124–134 (1992).
P.-Y. Chang, E. Hao, and Y. N. Patt, Predicting Indirect Jumps Using a Target Cache, Proceedings of the 24th International Symposium on Computer Architecture, pp. 274–283 (1997).
J. Tendler, S. Dodson, S. Fields, H. Le, and B. Sinharoy, POWER4 System Microarchitecture, IBM Technical White Paper (October 2001).
A. KleinOsowski and D. J. Lilja, MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research, Computer Architecture Letters, 1 (June 2002).
J.-L. Baer and W.-H. Wang, On the Inclusion Properties for Multi-level Cache Hierarchies, Proceedings of the 15th International Symposium on Computer Architecture, pp. 73–80 (1988).
N. P. Jouppi and S. J. E. Wilton, Tradeoffs in Two-Level On-Chip Caching, Proceedings of the 21st International Symposium on Computer Architecture, pp. 34–45 (1994).
E. Rotenberg, Q. Jacobson, and J. E. Smith, A Study of Control Independence in Superscalar Processors, Proceedings of the 5th International Symposium on High Performance Computer Architecture, pp. 115–124 (1999).
D. N. Armstrong, H. Kim, O. Mutlu, and Y. N. Patt, Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery, Proceedings of the 37th ACM/IEEE International Symposium on Microarchitecture, pp. 119–128 (2004).
G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel, The Microarchitecture of the Pentium 4 Processor, Intel Technology Journal, Q12001 Issue (February 2001).
X. Zhuang and H.-H. S. Lee, A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches, Proceedings of the 32nd International Conference on Parallel Processing, pp. 286–293 (2003).
V. Srinivasan, G. S. Tyson, and E. S. Davidson, A Static Filter for Reducing Prefetch Traffic, Technical Report CSE-TR-400-99, University of Michigan Technical Report (1999).
A.-C. Lai, C. Fide, and B. Falsafi, Dead-Block Prediction and Dead-Block Correlating Prefetchers, Proceedings of the 28th International Symposium on Computer Architecture, pp. 144–154 (2001).
P. Jain, S. Devadas, and L. Rudolph, Controlling Cache Pollution in Prefetching With Software-assisted Cache Replacement, Technical Report CSG-462, Massachusetts Institute of Technology (2001).
K. Beyls and E. H. D’Hollander, Reuse Distance-Based Cache Hint Selection, Proceedings of the 8th International Euro-Par Conference on Parallel Processing, pp. 265–274 (2002).
Z. Wang, K. S. McKinley, A. L. Rosenberg, and C. C. Weems, Using the Compiler to Improve Cache Replacement Decisions, Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (2002).
R. L. Lee, P.-C. Yew, and D. H. Lawrie, Data Prefetching in Shared Memory Multiprocessors, Proceedings of the International Conference on Parallel Processing (1987).
R. I. Bahar and G. Albera, Performance Analysis of Wrong-Path Data Cache Accesses, Workshop on Performance Analysis and its Impact on Design (1998).
R. Sendag, D. J. Lilja, and S. R. Kunkel, Exploiting the Prefetching Effect Provided by Executing Mispredicted Load Instructions, Proceedings of the 8th International Euro-Par Conference on Parallel Processing, pp. 468–480 (2002).
T. L. Johnson and W. W. Hwu, Run-time Adaptive Cache Hierarchy Management via Reference Analysis, Proceedings of the 24th International Symposium on Computer Architecture, pp. 315–326 (1997).
N. N. Mekhiel, Multi-Level Cache With Most Frequently Used Policy: A New Concept in Cache Design, International Conference on Computer Applications in Industry and Engineering (Nov 1995).

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Texas at Austin, Austin, TX, 78712, USA
Onur Mutlu, Hyesoon Kim, David N. Armstrong & Yale N. Patt

Authors

Onur Mutlu
View author publications
You can also search for this author in PubMed Google Scholar
Hyesoon Kim
View author publications
You can also search for this author in PubMed Google Scholar
David N. Armstrong
View author publications
You can also search for this author in PubMed Google Scholar
Yale N. Patt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Onur Mutlu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mutlu, O., Kim, H., Armstrong, D.N. et al. Using the First-Level Caches as Filters to Reduce the Pollution Caused by Speculative Memory References. Int J Parallel Prog 33, 529–559 (2005). https://doi.org/10.1007/s10766-005-7304-x

Download citation

Issue Date: October 2005
DOI: https://doi.org/10.1007/s10766-005-7304-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using the First-Level Caches as Filters to Reduce the Pollution Caused by Speculative Memory References

Abstract

Access this article

Similar content being viewed by others

PS-Cache: an energy-efficient cache design for chip multiprocessors

Dynamically Spawning Speculative Threads to Improve Speculative Path Execution

A New Prefetch Policy for Data Filter Cache in Energy-Aware Embedded Systems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

PS-Cache: an energy-efficient cache design for chip multiprocessors

Dynamically Spawning Speculative Threads to Improve Speculative Path Execution

A New Prefetch Policy for Data Filter Cache in Energy-Aware Embedded Systems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation